Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 MISSING Low Frequency key (TP53 and KRAS) mutations

1

15 comments

  • Avatar
    David Benjamin

    Saba As a first step, could you run Mutect2 and FilterMutectCalls around the variants in question with -force-active true and paste the corresponding VCF lines from FilterMutectCalls?

    Also, could you report whether the variants are still missed with the -dont-use-soft-clipped-bases argument?

    0
    Comment actions Permalink
  • Avatar
    Saba

    David Benjamin

    Many thanks for your suggestions. 

    I have tried 2 things: 

    1.  Ran the samples with the option
    •     -dont-use-soft-clipped-bases true \

    Switching this option on didn't help to pick up the missing mutations.

            2. Ran mutect2 around these particular genes with -force-active true 

    This has picked the missing mutations and here are the VCF lines from FilterMutectCalls. 

    KRAS with 3% alternate bases: 40/1182 Alternate allele,

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 181171_415_g1 181172_415_c1
    12 25398284 . C T . PASS CONTQ=93;DP=1704;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=175,166;MMQ=60,60;MPOS=53;NALOD=2.70;NLOD=147.41;POPAF=6.00;ROQ=80;SEQQ=93;STRANDQ=79;TLOD=39.00 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:732,0:2.014e-03:732:323,0:404,0:367,365,0,0 0/1:921,28:0.029:949:414,16:505,12:443,478,14,14

     

    TP53 with 4% alternate bases: 21/556

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 190868_434_g1 190869_434_c1
    17 7578402 . G C . PASS CONTQ=93;DP=1334;ECNT=2;GERMQ=93;MBQ=20,20;MFRL=180,128;MMQ=60,60;MPOS=22;NALOD=2.72;NLOD=155.59;POPAF=6.00;ROQ=59;SEQQ=93;STRANDQ=80;TLOD=29.01 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:727,0:1.891e-03:727:342,0:377,0:305,422,0,0 0/1:544,21:0.035:565:254,8:289,13:260,284,10,11

     

    TP53 with 3% alternate bases: 45/1310

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 191950_440_g1 191951_440_c1
    17 7578413 . C T . PASS CONTQ=93;DP=2660;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=163,156;MMQ=60,60;MPOS=33;NALOD=2.90;NLOD=236.73;POPAF=6.00;ROQ=93;SEQQ=93;STRANDQ=93;TLOD=71.10 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:1283,0:1.259e-03:1283:556,0:710,0:589,694,0,0 0/1:1281,45:0.034:1326:594,19:675,26:627,654,21,24

     

    EGFR with 2%  alternate bases: 12/537  

    7 55259515 . T G . PASS CONTQ=93;DP=950;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=167,159;MMQ=60,60;MPOS=22;NALOD=2.41;NLOD=74.82;POPAF=6.00;ROQ=1;SEQQ=65;STRANDQ=52;TLOD=15.13 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:382,0:3.889e-03:382:142,0:235,0:178,204,0,0 0/1:500,12:0.025:512:212,0:279,12:246,254,6,6

     

    These are being picked up with force-active option. It was fairly quick for these few genes but when I do force-active on my entire bed file which is 600+ genes then it takes up to a day for some of the samples. 

    Looking forward to your opinion on these cases. 

     

     

    1
    Comment actions Permalink
  • Avatar
    Guruprasad Ananda

    Hi David Benjamin, I have encountered the same issue that Saba describes in this thread with a couple of low VAF (< 10%) variants, and was wondering if we had any updates on this topic. Thanks.

    0
    Comment actions Permalink
  • Avatar
    Saba

    David Benjamin, David Benjamin

    Yeah, it would be really helpful getting response on this matter, please. I am still waiting for that. 

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Alastair Kerr

    Is there any update on this issue?  Many thanks, Alastair 

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Saba I don't understand why these sites are not at least being found as active, and I will need bams containing a few of these variants to investigate.  Guruprasad Ananda or Alastair Kerr if you have small bams to reproduce the error feel free to share as well.

    0
    Comment actions Permalink
  • Avatar
    Richard

    Dear David Benjamin

    I came up with similar hotspot mutation come-up failures in M2 even after force call is switched on.

    The following is an example screenshot (EGFR exon 19 del15):

     

    The followings are the paired bams with hs37d5.fa as reference. The sample names are cancer and normal, respectively.

    1. cancer bam https://www.dropbox.com/s/5ic6jxj3uxgr87m/egfr_del15_case.bam?dl=0 
    2. normal bam https://www.dropbox.com/s/lz6iwfcwi2lb6kt/egfr_del15_ctrl.bam?dl=0 
    3. interval 
      7 55242149 55242815
    4. M2 command
    gatk Mutect2 \
    -I egfr_del15_case.bam \
    -I egfr_del15_ctrl.bam \
    -R hs37d5.fa \
    -normal normal \
    -O egfr_forcecall.vcf.gz \
    -L egfr.bed \
    --force-active true

     

    Cheers,

    Richard

     

     

    0
    Comment actions Permalink
  • Avatar
    Saba

    Hi @David, 

    Please find the link with the bam files for a sample having missing KRAS (KRAS with 3% alternate bases: 40/1182 Alternate allele). 

    https://www.dropbox.com/sh/utp3wtpaf6amfys/AABv5TPRsN0rFewGe8um8p9Ka?dl=0

     

    The rest of the information is in the above initial thread.

    Best regards, 

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Xin Zheng

    Hi David Benjamin, I have encountered the same issue that Saba describes in this thread with a couple of low VAF (2%) variants, and the raw data Quality value was good. Waiting for the result. Thanks.

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Saba (Xin Zheng too) I fixed the issue that was losing your KRAS mutation in a branch and will post a link to the PR on this github issue page: https://github.com/broadinstitute/gatk/issues/6724 once I submit it.  I expect that this will fix most other missed calls that could be restored with the -force-active option.  

    If anyone cares to try out the branch, it's in a public google bucket here: gs://broad-dsde-methods-davidben/gatk-builds/active-9-10-2020.jar

    Thank you for your assistance resolving this issue.

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Saba Xin Zheng Here's the pull request: https://github.com/broadinstitute/gatk/pull/6821 

    0
    Comment actions Permalink
  • Avatar
    Saba

    David Benjamin

     

    Thank you so much for fixing this issue. Much appreciated. I have used the google bucket jar to test my 4 samples where we knew this issue existed. Here are my results and then a question arising from the data: 

     

    1:

    KRAS - Chr12: 25398284

    40/1182

    Picked up

    2:

    TP53 -  chr17: 757840

    21/556

    Picked up

    3:

    TP53 - chr17: 7578413,

    45/1310

    Picked up

    4: 

    EGFR – chr7: 55259515

    12/537  

    Not Picked

    5:

    TP53 – chr17: 7578242 

     4/363

    Not Picked

    6: 

    PTPN11 - chr12: 112924336 

    5/647

    When we do have alt alleles more than 20 then we can pick these previously missed mutations. But when the count for alt allelles is less than 20 at least in these examples, then we still don't pick these very low frequency ones. In a very old version of mutect2, we used to have an option to set this alt count by using the parameter (--unique-alt-read-count), which does not exist anymore. 

    So, what I did was to try to reduce (--initial-tumor-lod) from default 2.0 to 1.0. By doing so, I can pick up a missing (EGFR) in the active region which has 12/537 count but it gets filtered out due to orientation issues. However the other examples with 4 and 5 is not picked up at all with tumor-lod of 1.0. Can you please shed some light that how mutect2 is making decisions on choosing alt bases to incorporate in the active region and also if there is a way to set this alt count to a certain number of reads? 

     

    I will really appreciate your response on this. 

    Best regards, 

     

     

    0
    Comment actions Permalink
  • Avatar
    Saba

    David Benjamin

    In contrast to the above examples, I have a few examples which are picked up by mutect2 at a very very low alt count, i.e 2/8, 3/311, and 4/97. 

    Please see these:

    HOXA13 - 4/97
    0/0:240,0:5.648e-03:240:110,0:130,0:126,114,0,0
    0/1:97,4:0.041:101:36,1:61,2:46,51,2,2

     

    BRAF - 2/8
    0|0:12,0:0.083:12:5,0:5,0:0|1:140624393_CG_C:140624393:8,4,0,0
    0|1:8,2:0.200:10:3,0:5,2:0|1:140624393_CG_C:140624393:7,1,1,1

     

    KMT2D - 3/311
    0/0:280,0:5.151e-03:280:54,0:64,0:187,93,0,0
    0/1:311,3:0.020:314:58,2:93,1:196,115,3,0

     

    I am just struggling to understand that why do we see such cases sometimes and miss other times. 

    Note: these examples are run with default value of --initial-tumor-lod (2.0)

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Saba Mutect2 doesn't use alt depth and allele fraction directly to decide whether a site is active.  Rather, it uses a fast approximation of the somatic likelihoods model (the model and its approximation are described in sections 1B and 1E of our docs: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf).  Base qualities in particular make a very big difference.

    It is also possible that some sites are found in the panel of normals.  By default Mutect2 considers these sites inactive and skips them, but you may use the -genotype-pon-sites option to treat them as active and defer all filtering to FilterMutectCalls.

    If the panel of normals is not the issue you may lower -initial-tumor-lod even further, to 0 or even -1.  However, I don't really see anything wrong with the default behavior now.  For example, one site has 5 alt reads and 637 ref reads.  Even if the base qualities are all quite high, say Q30, the p value (this is not to say that Mutect2 uses a p value, but it's a nice back-of-the-envelope illustration) comes out to 0.0005.  That is, if we accepted things based on this much evidence we would expect more than one million false positives in a genome.  On the other hand, if you want to use a lenient threshold just for known oncogenes a higher false positive rate may be tolerable.

    0
    Comment actions Permalink
  • Avatar
    Xin Zheng

    Hi David-Benjamin I have read  https://github.com/broadinstitute/gatk/pull/6821  and  in use gatk-4.1.9.0, but there are still some sites that cannot be detected (KRAS-vaf 0.01).  So I tested some parameters as below:

    • 1 --active-probability-threshold (defaut 0.002)
    • 2 -init-lod 2.0 -emit-lod 2.0
    • 3 --force-active  

     

    if --active-probability-threshold = -1 and  -init-lod = 2.0:
    can be detected
    if --active-probability-threshold = 0.002 or 0.00000001(Extremely positive number) and -init-lod = -1:
    can not be detected

    I guess the active-probability-threshold's priority is higher.
    So what I want to ask is the value(negative value ?) of active-probability-threshold of this value being close to the function of another parameter(--force-active ) ?

    And do I need to adjust this parameter to be negative, because I know it's like you said may Increase false positives(https://gatk.broadinstitute.org/hc/en-us/community/posts/360062144952/comments/360012886431)

    KRAS-vaf: 12 25398285 . C A . . AS_SB_TABLE=819,875|9,7;DP=1731;ECNT=1;MBQ=20,20;MFRL=165,151;MMQ=60,60;MPOS=35;POPAF=7.30;TLOD=18.04 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:1694,16:9.827e-03:1710:833,7:855,9:819,875,9,7).

     

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk