Why do I see no evidence of alt allele in GVCF?
Hello GATK team,
I’m trying to understand GATK’s behavior in several cases where I’m expecting a 1/1 call but am getting ./. or 0/0. More specifically it's about what I'm seeing in the GVCF files.
Here’s an example: there’s a known dbSNP variant that is homozygous in NA12878 at position chr1:1318756. In the attached IGV screenshot you can see 4 reads carrying the A allele, and no reference alleles. Coverage is low but mapping and base quality are high (>30 in all cases). Because of the low coverage and depending on what happens in assembly, I understand that the position can't necessarily be called, but what I'm confused about is its representation in the GVCF file.
Running HaplotypeCaller once in -ERC BP_RESOLUTION and then in -ERC GVCF mode I get the following output.
Variant at position chr1:1318756 and surrounding GVCF rows from -ERC BP_RESOLUTION output:
chr1 1318755 . A <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:4,0:4:9:0,9,135
chr1 1318756 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:0,4:4:0:0,0,0
chr1 1318757 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:6,0:6:15:0,15,225
The same position (plus surrounding) in -ERC GVCF output:
chr1 1318754 . T <NON_REF> . . END=1318755 GT:DP:GQ:MIN_DP:PL 0/0:4:9:4:0,9,135
chr1 1318756 . G <NON_REF> . . END=1318756 GT:DP:GQ:MIN_DP:PL 0/0:4:0:4:0,0,0
chr1 1318757 . T <NON_REF> . . END=1318764 GT:DP:GQ:MIN_DP:PL 0/0:6:15:6:0,15,225
In both cases, there seems to be no trace of the ALT allele A which I assume is because assembly of the region wasn't successful (the -bamout output is empty too), but what I do not understand is why the output in BP_RESOLUTION mode correctly records the AD info as 0,4 (thus giving at least a hint that there was some non-ref allele) whereas the GVCF mode output lacks any information of there being anything but the reference allele. Is this supposed to be this way? I understand the final 0/0 call with low GQ, but expected to see some form of evidence of the possible alt allele in the GVCF. To me, it looks like the GVCF mode output could be misinterpreted as 4 reads supporting the reference allele at position chr1:1318756 (albeit with quality 0).
Thanks in advance!
Anne-Katrin
-
I'm seeing the same problem with low-pass data. Any word on what is causing this behavior? Without more information it definitely seems like a bug.
-
Hello akemde, we have some troubleshooting documents that can help you figure out why HaplotypeCaller might not call an expected variant and why a variant at a certain site is not called. Please read through those to get more information about your results and why they are occuring.
One more note is that HaplotypeCaller is not meant to be run without filtering, and you may need to filter these results to get the final calls and annotations.
Please sign in to leave a comment.
2 comments