gatk-4.1.8.1/ValidateVariants - bypass contig length check?
Hi. I have a VCF file that was created by PLINK (derived from SNP chip data) and I seek to use GATK/ValidateVariants to confirm that the REF alleles agree with the alleged genome reference. In building the VCF, since PLINK doesn't know about any specific reference, it just puts the chromosome IDs '1','2', etc. in as the contig names (that's fine), but the contig lengths appear to be simply set from the maximum position it has for any call on that chromosome. Needless to say, that disagrees with the actual real length from the real genome, causing ValidateVariants to stop before it gets going ("Found contigs with the same name but different lengths"). Is there some way to bypass this check? possibly via "--validation-type-to-exclude"? but I don't see any option in the ValidateVariants documentation to bypass that check.
thanks.
-
Matthew Maher here is a tutorial for ValidateSamFile that will also work for ValidateVariants. Look there for more info on how to remove those checks.
Please sign in to leave a comment.
1 comment