Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

High number of transversions after running Mutect2 and FilterMutectCalls on WES data

0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Paul-Arthur Meslin, thanks for this thorough description. Our GATK support team is focused on resolving bugs, and we are adding all other questions to our backlog to work through when we have the capacity. For context, check out our support policy

    For this question, it would be great to hear from other users who may have also seen this, and if they were able to make any changes in their usage in order to better filter out possible oxog artifacts.

    I cannot guarantee a solution for this, but please continue to post your questions so we can continue to improve our documentation, resources, and tools.

    It looks the main solution we have in Mutect2 is the --orientation-bias-artifact-priors option, however, like you said, these sites can be hard to distinguish if they have low coverage and multiple reads supporting a "true" variant. I also found this tool in our index, OrientationBiasReadCounts, if you want to check it out. As well as our documentation of this issue linking to a resource in Nucleic Acids Research summarizing the issue.

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Paul-Arthur Meslin Could you post the contents of the filtering stats file output by FilterMutectCalls?

    0
    Comment actions Permalink
  • Avatar
    Paul-Arthur Meslin

    Hi Genevieve Brandt (she/her), hi David Benjamin,

     

    First of all, thank you Genevieve for your response and these resources. It helped me understand this problem and gave me some ideas to test to try and limit the damage!

    Thank you David for taking an interest in this issue. Please find the contents of the FilterMutectCalls filter statistics file for a sample with a high number of transversions (C>A/G>T) after filtration (Version="4.1.7.0"):

     

    #<METADATA>Ln prior of deletion of length 10=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 9=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 8=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 7=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 6=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 5=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 4=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 3=-18.701186049006115
    #<METADATA>Ln prior of deletion of length 2=-20.72326583694641
    #<METADATA>Ln prior of deletion of length 1=-18.370680497641864
    #<METADATA>Ln prior of SNV=-13.0853694301389
    #<METADATA>Ln prior of insertion of length 1=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 2=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 3=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 4=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 5=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 6=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 7=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 8=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 9=-20.72326583694641
    #<METADATA>Ln prior of insertion of length 10=-20.72326583694641
    #<METADATA>Background beta-binomial cluster=weight = 0.1484, alpha = 1.55, beta = 3.59
    #<METADATA>High-AF beta-binomial cluster=weight = 0.0120, alpha = 9.99, beta = 1.03
    #<METADATA>Binomial cluster=weight = 0.8397, mean = 0.056
    #<METADATA>threshold=0.712
    #<METADATA>fdr=0.397
    #<METADATA>sensitivity=0.542
    filter FP FDR FN FNR
    weak_evidence 7.84 0.1 13.52 0.16
    strand_bias 18.42 0.24 15.81 0.18
    contamination 0.04 0.0 0.0 0.0
    normal_artifact 1.99 0.03 0.72 0.01
    orientation 15.04 0.19 13.49 0.16
    slippage 0.0 0.0 0.0 0.0
    haplotype 2.73 0.03 0.29 0.0
    germline 0.05 0.0 0.0 0.0

     

     

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk