Hi. I have a VCF file that was created by PLINK (derived from SNP chip data) and I seek to use GATK/ValidateVariants to confirm that the REF alleles agree with the alleged genome reference. In building the VCF, since PLINK doesn't know about any specific reference, it just puts the chromosome IDs '1','2', etc. in as the contig names (that's fine), but the contig lengths appear to be simply set from the maximum position it has for any call on that chromosome. Needless to say, that disagrees with the actual real length from the real genome, causing ValidateVariants to stop before it gets going ("Found contigs with the same name but different lengths"). Is there some way to bypass this check? possibly via "--validation-type-to-exclude"? but I don't see any option in the ValidateVariants documentation to bypass that check.
Please sign in to leave a comment.