Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Haplotypecaller : variant calling not seen in bamout alignments

Answered
0

14 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi fabienne JABOT-HANIN, what evidence are you seeing for the high number of alternative alleles? The AC at this location is 1. Here is our documentation on VCF format for more information on the annotations in the VCF output: https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format

    0
    Comment actions Permalink
  • Avatar
    Fabienne

    Hi Genevieve, Thank you for your answer.

    I was meaning that AD=106/86, so there are a high number of reads carrying the alternative allele whereas there are only 4 alt alleles in the bam alignment.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Fabienne,

    Do you mean that there were 4 reads carrying the alt allele in the input bam? HaplotypeCaller does local re-assembly at active regions and so the site can look different from the input bam. You can run HaplotypeCaller with the -bamout option to see what the site looks like after realignment.

    Also I would recommend reading these articles:

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    fabienne JABOT-HANIN

    Hi Genevieve,

    That's what I did, and in the bamout file, there were only 4 reads supporting the alternative allele, whereas in the gvcf and vcf AD mentioned 106 REF / 86 ALT.

    Do you want me to send you the files ?

    Best

    Fabienne

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I see, could you share the screenshot of this location in the bamout?

    Could you also check and see what happens at this site without the EMIT_ALL_CONFIDENT_SITES mode and do the default? If you are looking to get a GVCF the common option is -ERC GVCF.

    0
    Comment actions Permalink
  • Avatar
    fabienne JABOT-HANIN

    Here is attached the screenshot of the location in IGV in the bamout :

    The variant is called at :

    chr17 78064001 . G A 469.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.082;DP=368;ExcessHet=3.0103;FS=81.951;MLEAC=1;MLEAF=0.500;MQ=55.41;MQRankSum=1.060;QD=2.45;ReadPosRankSum=8.442;SOR=3.353 GT:AD:DP:GQ:PL 0/1:106,86:192:99:477,0,2190

     

    When I use HaplotypeCaller without EMIT_ALL_CONFIDENT_SITES mode and without gvcf output, I get exactly the same variant in the vcf file.

    chr17 78064001 . G A 469.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.082;DP=368;ExcessHet=3.0103;FS=81.951;MLEAC=1;MLEAF=0.500;MQ=55.41;MQRankSum=1.060;QD=2.45;ReadPosRankSum=8.442;SOR=3.353 GT:AD:DP:GQ:PL 0/1:106,86:192:99:477,0,2190

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi fabienne JABOT-HANIN,

    Thanks for sharing that. Could you also share an IGV screenshot of the input bam? And then could you highlight the reads in the bamout that are in the read group “ArtificialHaplotypeRG”?

    One other thing you could do is try GATK 4.2.0.0 because there have been multiple bug fixes with HaplotypeCaller since 4.1.3.0 and one of them could have solved this.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    fabienne JABOT-HANIN

    Hi Genevieve,

    Here you could find the bamout with the ArtificialHaplotypeRG :

     

    And the view of the original bam :

    I'm going to install and try the GATK4.2.0.0 and teel you if it's always the same or not.

    Best

    Fabienne

    0
    Comment actions Permalink
  • Avatar
    fabienne JABOT-HANIN

    Genevieve,

    I try GATK 4.2.0.0 and I have the same variant in the vcf after HaplotypeCaller :

    chr17 78064001 . G A 469.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.082;DP=384;ExcessHet=3.0103;FS=65.619;MLEAC=1;MLEAF=0.500;MQ=54.99;MQRankSum=-0.595;QD=2.34;ReadPosRankSum=8.442;SOR=2.738 GT:AD:DP:GQ:PL 0/1:106,95:201:99:477,0,2190
    chr17 78064015 . G C 2668.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=6.029;DP=419;ExcessHet=3.0103;FS=26.614;MLEAC=1;MLEAF=0.500;MQ=50.99;MQRankSum=0.050;QD=11.31;ReadPosRankSum=-5.021;SOR=0.871 GT:AD:DP:GQ:PL 0/1:38,198:236:45:2676,0,45

    The bamout is very similar to the previous one.

    So the new version doesn't seem to change anything. There may be a problem in this region.

    Best

    Fabienne

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Fabienne,

    Thanks for sharing that information. Bummer, I was hoping the new version of GATK would be better! :)

    There is an option in HaplotypeCaller that we recommend for improving calling when certain sites are not as expected: --linked-de-bruijn-graph. You can try running HaplotypeCaller with that option and see if it helps. More information is available at this troubleshooting document: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant

    Otherwise, could you re-run HaplotypeCaller with --debug-graph-transformations on this region. The option will output a bunch of .dot files that will give us insight to what is happening at this site. So, you'll want to make sure that you restrict the analysis to a small region or the output will be huge. Please put them in one folder and upload them to our ftp site as a bug report. The instructions are here: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    fabienne JABOT-HANIN

    Hi Genevieve,

    I've uploaded the files to your ftp server. The name of the file is :

    Fabienne_JABOT-HANIN_bug_report.zip

    I hope you will find what's happening there.

    Best

    Fabienne

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Fabienne,

    We took a look at the graph from the files you provided. This seems to be an edge case because there is repetition in the reference at this site. We think there are a couple things you can do to fix this:

    1. Run HaplotypeCaller with GATK 4.2.0.0 using the --linked-de-brujin-graph option.
    2. If that doesn't work, run 4.2.0.0 with --error-correction-log-odds 3.0 

    Please let me know if either of these help with the issue.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    fabienne JABOT-HANIN

    Hi Genevieve,

    Thank you for your suggestions.

    I've tried with the --linked-de-bruijn-graph option and the variant doesn't appear anymore. This seems to fix the problem ! Great !

    So, does its mean that we should always call the bam with this new option ? Is the calling more reliable with this option ?

    Best

    Fabienne

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Fabienne,

    So glad to hear that it worked! The --linked-de-bruijn graph is better in many cases and is frequently faster as well. It is a relatively new option and the team hasn't officially decided to change the default behavior. However, since you are seeing improvement with this option, I would definitely recommend that you use it.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk