Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Variant not called by HaplotypeCaller (not called in father, but present in child)

0

3 comments

  • Avatar
    James Emery

    Hello Rolf Schröder. I'm sorry to hear about this issue with HaplotypeCaller. It seems like you have already found exactly the link with advice that we usually deploy in these situations. Looking at the Father output with LinkedDebrujin graph the 32 base insertion that was called for the father looks like it is perhaps some sort of failure or pathological behavior for the more complicated graph. It is difficult to tell without closer inspection/debugging exactly what happened here, its possible that is a pathological site where assembly has indeed failed and that the LinkedDebrujinGraph is also failing there. 

    To answer your question about LinkedDebrujinGraph. It is a more complicated graph alogrithm that fixes many faults in the existing graph but also introduces some faults of its own and while we think its better (it is part of the standard practice for Mutect2 now) we are conservative about introducing defaults that will introduce unexpected behavior to the HaplotypeCaller by default. 

    There are a number of other possible solutions to recover this event. First the most likely issue is that there is a repetitiveness in the reference at that site that is causing both assembly graphs to struggle (though the fact that it worked for the child somewhat complicates this story). First you could try providing a VCF file for that site with the `--alleles` argument to force the haplotype caller to assemble and call that variant. This will bypass any issues with assembly and catch the variant. Along a similar stream, in GATK 4.5.0.0 and on we have introduced a new `--pileup-detection` and associated arguments that supplements the assembly engine by looking at the pileups for SNPs and indels that assembly might have failed on. This is a use at your own risk mode however as it does increase the number of false positives called and has been optimized for use in the DRAGEN-GATK pipeline.

    You could try calling using the DRAGEN-GATK caller mode `--dragen-378-concordance-mode` which changes a lot of settings and adds some new geneotying arguments to emulate the dragen algorithms for better calling. Among the changes in that mode are our recommended and tested pileup-caller arguments. 

    0
    Comment actions Permalink
  • Avatar
    Rolf Schröder

    Hey James,

    thank you very much for that answer. In the meantime, I had also looked for repetitive sequences and realized that this might be the culprint. Here are my results for now. I will test your proposals for completeness, though.

    # These sequences are in that genomic area
    # Here is the expectec C>T change
    # |
    CTCCATGGACTCCCAGATGTTAGCAACTAGC (2x)
    CTCCATGGACTCCCAGATGTTAGCAAC     (5x)
    CTCCATGGACTCCCAGATGTTAGCAACCAGC (3x)
    # |
    # C-or-T

     

    0
    Comment actions Permalink
  • Avatar
    Rolf Schröder

    Hi James,

    I tested 4.5.0.0 once again as you suggested. When running with `--alleles` using the VCF from the child, the variant is revealed:

    21  34923904    .   C   T   1457.64 .   AC=1;AF=0.500;AN=2;BaseQRankSum=-1.341;DP=137;ExcessHet=0.0000;FS=3.117;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=10.88;ReadPosRankSum=0.458;SOR=0.443    GT:AD:DP:GQ:PL  0/1:67,67:134:99:1465,0,1562

    Using `--pileup-detection` (without `--alleles`) also reveals it (with exactly the same results). Finally, using `--dragen-378-concordance-mode`, the following results are yield:

    21  34923904    .   C   T   39.23   .   AC=1;AF=0.500;AN=2;BaseQRankSum=-1.085;DP=144;ExcessHet=0.0000;FS=2.967;MLEAC=1;MLEAF=0.500;MQ=59.62;MQRankSum=1.380;QD=0.27;ReadPosRankSum=0.535;SOR=0.484 GT:AD:DP:GP:GQ:PG:PL    0/1:74,69:143:39.23,0,70:39:0,34.77,64.77:74,0,40

    Thank you very much again for your detailed answers. I understand that progress is ongoing!                                                                                                  
    Best,
    Rolf

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk