Too many variants after WGS workflow for a pig line
I used GATK for the very fist time for a whole genome sequencing data in a pig line. I am seeing way too many SNPs and INDELS.
I used Haplotype Caller to generate gVCF files and Combine for merging all gVCFs. Then I used Joint Genotype call and got my final VCF.
I did try the VariantFiltration on my files and it didn't give me an error but the issue is still there.
Also, when there is joint genotype calling, shouldn't it be going for a SNP that is in 30% of samples instead??
Anyone who have figured a better way for non-human genome data, please do let me know.
Hi Alia Parveen,
After joint calling your variants, your final VCF needs to be filtered. GATK calls many possible variants and without filtering you can get many false positives.
Hopefully this tutorial will help you out: (How to) Filter variants either with VQSR or by hard-filtering
You can also check out this troubleshooting document for looking into HaplotypeCaller variant calls.
However, a lot of our recommendations are based on human data, so researchers with non-human data might be able to help you out more. I'm going to move your post into the Community Discussions section of our forum to the topic Special GATK use cases. You can read more about our forum guidelines here.
Hope this helps, and hopefully other researchers can help out as well.
Please sign in to leave a comment.