SelectVariants and Hard Filtering order changes final number of variants?
I'm facing some doubts about the pipeline I use for the Variant Calling with GATK. Usually, after GATK HaplotypeCaller on different samples of the same population, I sum up all the GVCFs in a database created with GATK GenomicDBImport and then extract the multi-VCF with GATK GenotypeGVCFs.
My question come now at the FIltering step: after having applied the Hard Filters you suggest in, I use to extract a sample of interest from that filtered multi-vcf with GATK SelectVariants.
Testing how would be the results reversing the procedure (first SelectVariants, than filter only that sample), I noticed that the final numebr of variants in that VCF was different (smaller than the other one). To get the same values, I needed to filter again the sample-VCF of the first attempt and than I got the same number of variants.
In issue I've seen you suggest to maintain 1. Filtering 2. Extraction. But I didn not expect taht difference in varinats number.
Any guess about it? What would you reccomend?
Did you observe any variant sites to be lost in the reversed order? Normally multisample VCF also contains non-variant sites for your sample of interest therefore if your filtration or selection involves removal of non-variant sites observing different numbers could be understandable. Can you check if a particular variant site being lost in one or the other method and can you post it here?
Please sign in to leave a comment.
1 comment