Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 bamout reports fewer reference reads than original bam, causing false positive call.

0

7 comments

  • Avatar
    Pamela Bretscher

    Hi CYWang,

    Seeing fewer reads in your bamout file is somewhat expected because some reads will inevitably be thrown out through the filtering that Mutect2 performs. Can you take a look at the following algorithm and troubleshooting documents and let me know if you still have questions about the output?

    https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant

    https://gatk.broadinstitute.org/hc/en-us/articles/360050722212-FAQ-for-Mutect2

    One other consideration is that you are using a very outdated version of GATK and there have been many updates to Mutect2 since this version. I would suggest that you update GATK to the newest version and see if you get the results you are expecting.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    CYWang

    Dear Pamela,

    Thank you for your reply!

    I can understand that some reads will be thrown out. However, I can't puzzle out why 37% of reference reads (reference allele=G) were thrown out, but only 4.7% of alternative reads (alternative allele=A) were thrown out.

    On the other hand, we used the GATK-4.2.2.0 and the Bamout reported that 19% of reference reads (reference allele=G) were thrown out, but only 2.7% of alternative reads (alternative allele=A) were thrown out.

    Therefore, my issue is if there are some reasons will cause reads to be unequally thrown out.

    Look forward to your reply. Thank you!

    Respectfully yours,
    CYWang

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi CYWang,

    I spoke with one of the developers for Mutect2 on the GATK team and he looked at your results and gave the following response:

    My best guess is that some reference reads, after having their ends clipped to fit the active region, no longer have a unique mapping within the padded assembly region.  For example, let’s say that starting from one end of the assembly window a reference read is ACGGGAA and that this same sequence appears 200 bases downstream.  Then when we realign the read could be assigned to a location that does not exist in the bamout since it’s in the assembly padding.  Now suppose this read actually has some leading sequence before the active region and so it’s really [TTCCATAG]ACGGGAA, and that this sequence is unique.  Then if we didn’t clip the reads so dramatically the problem would vanish.

    So it appears that the issue with the filtering of the reference reads may be due to clipping of the reads during Metect's processing. Could you test this by enlarging the active region with the arguments: -padding-around-snps 50 -min-assembly-region-size 100 and let me know if the number of thrown-out reads changes?

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    CYWang

    Dear Pamela,

    Thank you for your reply!

    We had tested by the command (-padding-around-snps 50  -min-assembly-region-size 100 or --padding-around-snps 50  --min-assembly-region-size 100), and both Bamout reported that 37.7% of reference reads (reference allele, G=16012) were thrown out, but only 5.7% of alternative reads (alternative allele, A=498) were thrown out.

    Look forward to your reply. Thank you!

    Respectfully yours,
    CYWang

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi CYWang,

    Thank you for providing this update and for trying my suggestion. I've shared your results with the Mutect2 developers and they are currently looking into other possible explanations for the results you're seeing. I will keep you updated with any findings or suggestions.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    CYWang

    Dear Pamela,

    Thank you for your help and we are looking forward your finding and suggestions!

    Respectfully yours,
    CYWang

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi CYWang,

    Your issue was brought up with some other GATK team members today and I have a few suggestions/findings:

    It seems that your reads are likely being dropped as part of the final step in likelihood calculation. These reads could be actual artifacts which is why they are being thrown out. This article may help explain this. It is often difficult to troubleshoot why certain reads are dropped, so we are going to be working on a dropped read tracker for Mutect2 in the future. 

    You could try to use the --enable-dynamic-read-disqualification option in the read filtering after the likelihoods calculation. This can induce some false positives but may help keep some of your reads that are being thrown out. 

    Could you read through this Mutect2 information and possibly try this option and let me know if it helps? Additionally, could you provide any information on why you think your results are particularly unexpected? There could be something else going on at this site accounting for the thrown-out reads. 

    Kind regards,

    Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk