Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK Mutect 2 can't call a true positive variant

0

10 comments

  • Avatar
    Gökalp Çelik

    Hi Ruben Ladeira

    To understand this issue we need further details about this problem. What kind of data are you using to call variants from? Is it amplicon data or is it a hybrid capture approach? Can you provide IGV views to make it more clear to us? 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Ruben Ladeira

    Hi Gökalp Çelik,

    Thank you for reaching out to me.

    This data is from a hybrid capture approach (Illumina MiSeq).

    The variant is detected in IGV :

    Best regards.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    It is hard to tell anything from the IGV snapshot since mismatching nucleotides are not in the focus. 

    We do have a few suggestions for those that are needing to capture a particular variant. I am not sure if any of them would help directly but they are worth a try. Mutect2 is particularly sensitive to capture any variant information from reads however certain reads could have been diluted out in the realignment step due to similarities or low base quality values. 

    Here are a few articles that we suggest you to check. If none of them works for you, you can still submit a snipped bam file if you can and we can try to replicate the issue. 

    https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant 

    If you would like to submit a bam snippet you can use the below article and let us know about the file you submitted.

    https://gatk.broadinstitute.org/hc/en-us/articles/360035889671-How-do-I-submit-a-detailed-bug-report 

    I hope this helps.

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Ruben Ladeira

    Hi,

    Thank you for your link.

    I can send you a snippet of the bam file (containing only chromosome 12 for instance).

    I have explored the bam with the IGV tool, and I observed in the bamout file (generated by Mutect2), that there is a whole region cut in the bamout file around the locus of interest (~ 25 227 300 bp)

     

    IGV view of the original bam :

    IGV view of the bamout file (generated by Mutect2) :

    It explains the variant located at 25 227 342 that is not detected.

     

    I don't know why this whole region has been cut however ? Can this come from quality filtering ?

    Thank you for your help,

    Best regards,

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Ruben Ladeira

    Bamout only shows those active regions with a call in its default configuration and it is possible that Mutect2 could not find a proper active region in that area. It may be better if you can also remove downsampling and use all the reads available to Mutect2 using the parameter 

    --max-reads-per-alignment-start 0

    Also pay attention to those parameters suggested in the document that I posted in the above response. Let us know which of them brings the variant back. If none works we would like to see a bam snippet to test ourselves. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Ruben Ladeira

    Hi Gökalp Çelik

    After several tries, this option allowed to retain the variant of interestest:

    --force-active \

    After using

    --max-reads-per-alignment-start 0

    The region was still somewhat being cut.

    What is an active region according to Mutect2 ?

    Best regards,

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    Active regions are those that include more mismatches and indel signals compared to other regions. It is possible that the variant of yours have a very low allele fraction and combined base qualities therefore it gets filtered out during the reassembly and realignment steps. Can you provide us a snippet of this bam so that we can find out why it is not included in the assembly?

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Ruben Ladeira

    Hi,

    I just transfered the zip file through your ftp server.

    Let me know if you received it or not,

    Best regards

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Ruben Ladeira

    Thank you for the snippet you provided. We will take a look at it and let you know if there is a solution for this issue. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Ruben Ladeira

    Looks like we found the exact issue you are having here. The site you are trying to genotype seems to fall below the active region detection threshold or in Mutect2 terms it is much higher than the initial tumor lod. We certainly do not recommend using --force-active parameter as it is there for the use of desperate situations for debugging and testing. One particular parameter which can help you to capture variants with very low allele fraction is 

    --initial-tumor-lod 0

    We also recommend you to turn off downsampling by setting

    --max-reads-per-alignment-start 0

    These 2 parameters will make ultra low fraction variant calling possible through Mutect2 without resorting to --force-active parameter. 

    When we tested your sample with these 2 parameters we noticed that your site was called. 

    I hope this helps.

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk