SelectVariants and Hard Filtering order changes final number of variants?
Hi,
I'm facing some doubts about the pipeline I use for the Variant Calling with GATK. Usually, after GATK HaplotypeCaller on different samples of the same population, I sum up all the GVCFs in a database created with GATK GenomicDBImport and then extract the multi-VCF with GATK GenotypeGVCFs.
My question come now at the FIltering step: after having applied the Hard Filters you suggest in https://gatk.broadinstitute.org/hc/en-us/articles/360037499012?id=3225, I use to extract a sample of interest from that filtered multi-vcf with GATK SelectVariants.
Testing how would be the results reversing the procedure (first SelectVariants, than filter only that sample), I noticed that the final numebr of variants in that VCF was different (smaller than the other one). To get the same values, I needed to filter again the sample-VCF of the first attempt and than I got the same number of variants.
In issue https://gatk.broadinstitute.org/hc/en-us/community/posts/18816995666459-Splitting-multiple-sample-VCF-to-single-sample-VCF-before-VariantFilteration?input_string=Filtering%20before%20or%20after%20extraction%20of%20a%20single%20VCF I've seen you suggest to maintain 1. Filtering 2. Extraction. But I didn not expect taht difference in varinats number.
Any guess about it? What would you reccomend?
-
Hi
Did you observe any variant sites to be lost in the reversed order? Normally multisample VCF also contains non-variant sites for your sample of interest therefore if your filtration or selection involves removal of non-variant sites observing different numbers could be understandable. Can you check if a particular variant site being lost in one or the other method and can you post it here?
Regards.
Please sign in to leave a comment.
1 comment