Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 output False Negative SNV result

0

6 comments

  • Official comment
    Avatar
    Gökalp Çelik

    Hi Danyue Wang

    The parameter that you are referring to does not directly mean the base quality score at the variant site in bam but it refers to the base quality value within the kmer where assembly occurs. It may be possible that your variant site is close to a site where base quality is lower than your threshold although your variant base is of higher quality, assembly engine may remove your variant base due to kmer containing other lower quality bases. Therefore our team recommends using default parameters first before changing them for your own needs. 

    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Danyue Wang

    Mutect2 and HaplotypeCaller are known to be affected (under some conditions) by the interval sizes and padding when it comes to calling variants. It may be useful if you can increase interval padding even further like 200 bp from the edges of the bed file. You may also remove the interval file to unlimit Mutect2 for haplotype generation. 

    Other than this option you may try the

    --linked-de-bruijin-graph true

    option to test if Mutect2 can recover better haplotypes in that region. 

    If you need any further assistance you may refer to the document in the following link

    https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant 

    I hope these help you solve your problem. 

    0
    Comment actions Permalink
  • Avatar
    Danyue Wang

    Hi, Gökalp Çelik

    I tried the methods metioned by the documentation and remove the target region bed, but I couldn't find the right solution, and some of the information by '-debug' mode is as following:

    12:06:56.273 INFO  EventMap - > Cigar = 500M
    12:06:56.273 INFO  EventMap - >> Events = EventMap{}
    12:06:56.352 INFO  Mutect2Engine - Assembling 3:178916864-178916892 with 2821 reads:    (with overlap region = 3:178916764-178916992)
    12:06:56.568 INFO  ReadThreadingAssembler - Using kmer size of 10 in read threading assembler
    12:06:56.714 INFO  ReadThreadingAssembler - Adding haplotype 229M from graph with kmer 10
    12:06:56.736 INFO  ReadThreadingAssembler - Adding haplotype 229M from graph with kmer 10
    12:06:56.738 INFO  ReadThreadingAssembler - Found 2 candidate haplotypes of 2 possible combinations to evaluate every read against.
    12:06:56.738 INFO  ReadThreadingAssembler - AAAGAAGCAAGAAAATACCCCCTCCATCAACTTCTTCAAGATGAATCTTCTTACATTTTCGTAAGTGTTACTCAAGAAGCAGAAAGGGAAGAATTTTTTGATGAAACAAGACGACTTTGTGACCTTCGGCTTTTTCAACCCTTTTTAAAAGTAATTGAACCAGTAGTCAACCGTGAAGAAAAGATCCTCAAT
    CGAGAAATTGGTATGATACAATATCCTATTCTAAAAT
    12:06:56.739 INFO  ReadThreadingAssembler - > Cigar = 229M : 229 score -1.4035242344113994 ref false
    12:06:56.739 INFO  ReadThreadingAssembler - AAAGAAGCAAGAAAATACCCCCTCCATCAACTTCTTCAAGATGAATCTTCTTACATTTTCGTAAGTGTTACTCAAGAAGCAGAAAGGGAAGAATTTTTTGATGAAACAAGACGACTTTGTGACCTTCGGCTTTTTCAACCCTTTTTAAAAGTAATTGAACCAGTAGGCAACCGTGAAGAAAAGATCCTCAATCGAGAAATTGGTATGATACAATATCCTATTCTAAAAT
    12:06:56.739 INFO  ReadThreadingAssembler - > Cigar = 229M : 229 score NaN ref true
    12:06:56.739 INFO  EventMap - === Best Haplotypes ===
    12:06:56.739 INFO  EventMap - AAAGAAGCAAGAAAATACCCCCTCCATCAACTTCTTCAAGATGAATCTTCTTACATTTTCGTAAGTGTTACTCAAGAAGCAGAAAGGGAAGAATTTTTTGATGAAACAAGACGACTTTGTGACCTTCGGCTTTTTCAACCCTTTTTAAAAGTAATTGAACCAGTAGGCAACCGTGAAGAAAAGATCCTCAATCGAGAAATTGGTAT
    GATACAATATCCTATTCTAAAAT
    12:06:56.739 INFO  EventMap - > Cigar = 229M
    12:06:56.740 INFO  EventMap - >> Events = EventMap{}
    12:06:56.740 INFO  EventMap - AAAGAAGCAAGAAAATACCCCCTCCATCAACTTCTTCAAGATGAATCTTCTTACATTTTCGTAAGTGTTACTCAAGAAGCAGAAAGGGAAGAATTTTTTGATGAAACAAGACGACTTTGTGACCTTCGGCTTTTTCAACCCTTTTTAAAAGTAATTGAACCAGTAGTCAACCGTGAAGAAAAGATCCTCAATCGAGAAATTGGTAT
    GATACAATATCCTATTCTAAAAT
    12:06:56.740 INFO  EventMap - > Cigar = 229M
    12:06:56.740 INFO  EventMap - >> Events = EventMap{3:178916930-178916930 [G*, T],}
    12:06:56.778 INFO  Mutect2Engine - Assembling 3:178916893-178917088 with 2824 reads:    (with overlap region = 3:178916793-178917188)

    can you give some advices,  whether I missed some key point? And the link below is the sub-extracted bam file:

    https://drive.google.com/file/d/1528JaKuDK9eAvAuqJHReAywol89QwSly/view?usp=drive_link 

    Thanks!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Danyue Wang

    Thank you for sending us a slice of your bam file. Interestingly when I removed all the parameters that you added to Mutect2 and tried calling the region by getting wider range for interval and also without any intervals I was able to call the variant that you are looking for with AF around 4%. It is possible that some of the parameters were causing this issue of FN call. Can you try your sample with the following command without any additional parameters?

    gatk Mutect2 -I bamfile.bam -R genome.fa -O vcfout.vcf 

    Here is my output from both trials for the position that you are interested. 

    3    178916930    .    G    T    .    .    AS_SB_TABLE=700,886|29,36;DP=1714;ECNT=2;MBQ=20,20;MFRL=143,139;MMQ=60,60;MPOS=29;POPAF=7.30;TLOD=93.68    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:1586,65:0.039:1651:383,15:538,24:960,39:700,886,29,36

     

    3    178916930    .    G    T    .    .    AS_SB_TABLE=700,886|29,36;DP=1714;ECNT=2;MBQ=20,20;MFRL=143,139;MMQ=60,60;MPOS=29;POPAF=7.30;TLOD=93.68    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:1586,65:0.039:1651:383,15:538,24:960,39:700,886,29,36

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Danyue Wang

    Dear Gökalp Çelik:

    Thanks a lot for following up! Sometimes, more is less. I had tested the command and validated the results. And the reason is the parameter

    --min-base-quality-score 20 

    I found that the default cutoff is 10 and if I set it to 14 or larger, the variant is not called. My confusion is that the base quality of the location is rather higher than this, why it's missed? or does the argument something else than I would expect?

    Best,

    Danyue

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Danyue Wang

    I will check with the team and let you know soon. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk