Variant filtered from GVCFAnswered
I have a known variant that appears in the GVCF but is filtered out in the VCF in two different samples. They look ok to my naive eye - does anyone have any thoughts based on the GVCF calls below?
chr20 4699605 rs1799990 A G,<NON_REF> 177.64 . BaseQRankSum=-1.817;DB;DP=30;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=108000,30;ReadPosRankSum=1.009 GT:AD:DP:GQ:PL:SB 0/1:19,9,0:28:99:185,0,568,242,594,836:10,9,4,5
GATK version used: 18.104.22.168
Command used: nfcore/sarek haplotypecaller
Thank you for writing into the GATK forum! To troubleshoot this issue you are seeing, I would first recommend working through this troubleshooting doc, When HaplotypeCaller and Mutect2 do not call an expected variant.
See what you can find out from that article and then let me know what further questions you have.
Thank you for your help! I have looked at the trouble shooting doc.
The expected variant was called at the site of interest in the GCVF (see above), but excluded from the VCF by GenotypGVCFs.
I didn't find any evidence of repeats in this region.
The quality and depth look ok.. the only thing that looks unusual to me is the BaseQRankSum - I believe that indicates that the quality of the ALT reads is lower? Also, the ratio of the REF and ALT alleles deviates from the expected 1:1.
LP could you share what this site looks like in the HaplotypeCaller bamout? Make sure to highlight this region so we can see the read counts.
I would recommend upgrading to a newer GATK version to see if that helps and also use the parameter --linked-de-bruijn-graph.
One other thing I would like to see is if you force call the allele with GenotypeGVCFs, what the VCF line looks like. Also, if this variant gets filtered during variant filtering.
I have attached a screenshot from IGV for the HaplotypeCaller bam, with the variant highlighted (from Sample A).
How do you force the variant to be called by GenotypeGVCFs?
Many thanks again
LP you can try using the argument -all-sites! Let me know what that looks like.
I ran this:
gatk GenotypeGVCFs --output allsites.vcf --reference Homo_sapiens_assembly38.fasta --variant Sample_A.g.vcf.gz --include-non-variant-sites true --intervals chr20:4690000-4700000
on GATK 22.214.171.124 (to keep consistent with the Sarek pipeline) and seem to have the same result for our variant of interest:
chr20 4699605 . A G 177.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.817e+00;DP=30;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=6.34;ReadPosRankSum=1.01;SOR=0.818 GT:AD:DP:GQ:PL 0/1:19,9:28:99:185,0,568
I did restrict the interval for time's sake - would that make any difference?
LP thanks for this information! Could you also provide the output with a newer version of GATK and --linked-de-bruijn-graph enabled?
We repeated the analysis with GATK 4.2.1 with --linked-de-bruijn-graph enabled and the variant now survives to the VCF!
We are not sure yet if this is because the new version or because of --linked-de-bruijn-graph so will repeat without this option enabled.
Is there any reason why --linked-de-bruijn-graph is not enabled as default? Are there any risks or drawbacks with using it?
Thanks again for your help with this
I'm glad you are seeing improvement when using the --linked-de-bruijn-graph option! The -linked-de-bruijn-graph option is relatively new so our developers are still determining if they want to make it default. It has been helping many users recover sites of interest, so I would recommend continuing to use this argument!
Here are some related forum posts:
Let me know if you have any other questions.
Please sign in to leave a comment.