Mutect2 MISSING Low Frequency key (TP53 and KRAS) mutations
Dear GATK Team,
We have noticed that Mutect2 ( 4.1.4.0) is missing some of the key (TP53 and KRAS) mutations from tumor-normal paired cfDNA somatic mutation calling. We have an analysis from another source where we know that these mutations are being picked up. On investigation, our raw data (aligned bams) shows that the reads are present but the bamout shows that these regions were not selected as active regions. Hence, mutations are not coming up.
We have a handful of examples as follows:
KRAS with 3% alternate bases: 40/1182 Alternate allele,
TP53 with 4% alternate bases: 21/556
TP53 with 3% alternate bases: 45/1310
We do have cases when similar mutations have been picked up with similar or even lower alternate counts. See the positive cases where these regions appear in the active regions at a very low frequency.
KRAS at 4%- 52/1166
ATM at 3% : 23/849
Both these types;
a) active regions not containing variants and,
b) active regions containing variants
have been run from the same pipeline. The mutect2 command is as follows:
Mutect2 \
-R \
-I \
-I \
-normal \
--germline-resource \
-L \
--f1r2-tar-gz \
-bamout \
--pairHMM AVX_LOGLESS_CACHING_OMP \
--native-pair-hmm-threads 16
I have tried tunning a few parameters including
--max-reads-per-alignment-start 0 \
--f1r2-median-mq 5 \
--minimum-mapping-quality 10 \
--min-base-quality-score 10 \
--active-probability-threshold 0.001 (or 0.003)\
--initial-tumor-lod 1.0 \
--dont-trim-active-regions true \
But none of these helped to pick up these mutations. However when I used the force active mode then I picked up these mutations on their respective frequency.
--force-active true
KRAS - coming in active region now
TP53 - coming in active region now
Though this helps to solve the problem, this does not seem an appropriate solution because of
a). Some cases are being picked up at similar frequency while others are not
b). Computational time with this mode on is 10-12 times more compared to when you rely on mutect2 intrinsic behavior to choose active regions based on variations.
Based on these examples, can you please suggest what could fix this problem? I am happy to share bam files for investigations and testing if that's needed.
Looking forward to any solution to this problem.
Best regards,
-
Many thanks for your suggestions.
I have tried 2 things:
- Ran the samples with the option
- -dont-use-soft-clipped-bases true \
Switching this option on didn't help to pick up the missing mutations.
2. Ran mutect2 around these particular genes with -force-active true
This has picked the missing mutations and here are the VCF lines from FilterMutectCalls.
KRAS with 3% alternate bases: 40/1182 Alternate allele,
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 181171_415_g1 181172_415_c1
12 25398284 . C T . PASS CONTQ=93;DP=1704;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=175,166;MMQ=60,60;MPOS=53;NALOD=2.70;NLOD=147.41;POPAF=6.00;ROQ=80;SEQQ=93;STRANDQ=79;TLOD=39.00 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:732,0:2.014e-03:732:323,0:404,0:367,365,0,0 0/1:921,28:0.029:949:414,16:505,12:443,478,14,14TP53 with 4% alternate bases: 21/556
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 190868_434_g1 190869_434_c1
17 7578402 . G C . PASS CONTQ=93;DP=1334;ECNT=2;GERMQ=93;MBQ=20,20;MFRL=180,128;MMQ=60,60;MPOS=22;NALOD=2.72;NLOD=155.59;POPAF=6.00;ROQ=59;SEQQ=93;STRANDQ=80;TLOD=29.01 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:727,0:1.891e-03:727:342,0:377,0:305,422,0,0 0/1:544,21:0.035:565:254,8:289,13:260,284,10,11TP53 with 3% alternate bases: 45/1310
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 191950_440_g1 191951_440_c1
17 7578413 . C T . PASS CONTQ=93;DP=2660;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=163,156;MMQ=60,60;MPOS=33;NALOD=2.90;NLOD=236.73;POPAF=6.00;ROQ=93;SEQQ=93;STRANDQ=93;TLOD=71.10 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:1283,0:1.259e-03:1283:556,0:710,0:589,694,0,0 0/1:1281,45:0.034:1326:594,19:675,26:627,654,21,24EGFR with 2% alternate bases: 12/537
7 55259515 . T G . PASS CONTQ=93;DP=950;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=167,159;MMQ=60,60;MPOS=22;NALOD=2.41;NLOD=74.82;POPAF=6.00;ROQ=1;SEQQ=65;STRANDQ=52;TLOD=15.13 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:382,0:3.889e-03:382:142,0:235,0:178,204,0,0 0/1:500,12:0.025:512:212,0:279,12:246,254,6,6
These are being picked up with force-active option. It was fairly quick for these few genes but when I do force-active on my entire bed file which is 600+ genes then it takes up to a day for some of the samples.
Looking forward to your opinion on these cases.
-
Saba As a first step, could you run Mutect2 and FilterMutectCalls around the variants in question with -force-active true and paste the corresponding VCF lines from FilterMutectCalls?
Also, could you report whether the variants are still missed with the -dont-use-soft-clipped-bases argument?
-
Hi David Benjamin, I have encountered the same issue that Saba describes in this thread with a couple of low VAF (< 10%) variants, and was wondering if we had any updates on this topic. Thanks.
-
David Benjamin, David Benjamin
Yeah, it would be really helpful getting response on this matter, please. I am still waiting for that.
-
Is there any update on this issue? Many thanks, Alastair
-
Saba I don't understand why these sites are not at least being found as active, and I will need bams containing a few of these variants to investigate. Guruprasad Ananda or Alastair Kerr if you have small bams to reproduce the error feel free to share as well.
-
Dear David Benjamin
I came up with similar hotspot mutation come-up failures in M2 even after force call is switched on.
The following is an example screenshot (EGFR exon 19 del15):
The followings are the paired bams with hs37d5.fa as reference. The sample names are cancer and normal, respectively.
- cancer bam https://www.dropbox.com/s/5ic6jxj3uxgr87m/egfr_del15_case.bam?dl=0
- normal bam https://www.dropbox.com/s/lz6iwfcwi2lb6kt/egfr_del15_ctrl.bam?dl=0
- interval
7 55242149 55242815
- M2 command
gatk Mutect2 \
-I egfr_del15_case.bam \
-I egfr_del15_ctrl.bam \
-R hs37d5.fa \
-normal normal \
-O egfr_forcecall.vcf.gz \
-L egfr.bed \
--force-active trueCheers,
Richard
-
Hi @David,
Please find the link with the bam files for a sample having missing KRAS (KRAS with 3% alternate bases: 40/1182 Alternate allele).
https://www.dropbox.com/sh/utp3wtpaf6amfys/AABv5TPRsN0rFewGe8um8p9Ka?dl=0
The rest of the information is in the above initial thread.
Best regards,
-
Hi David Benjamin, I have encountered the same issue that Saba describes in this thread with a couple of low VAF (2%) variants, and the raw data Quality value was good. Waiting for the result. Thanks.
-
Saba (Xin Zheng too) I fixed the issue that was losing your KRAS mutation in a branch and will post a link to the PR on this github issue page: https://github.com/broadinstitute/gatk/issues/6724 once I submit it. I expect that this will fix most other missed calls that could be restored with the -force-active option.
If anyone cares to try out the branch, it's in a public google bucket here: gs://broad-dsde-methods-davidben/gatk-builds/active-9-10-2020.jar
Thank you for your assistance resolving this issue.
-
Saba Xin Zheng Here's the pull request: https://github.com/broadinstitute/gatk/pull/6821
-
Thank you so much for fixing this issue. Much appreciated. I have used the google bucket jar to test my 4 samples where we knew this issue existed. Here are my results and then a question arising from the data:
1:
KRAS - Chr12: 25398284
40/1182
Picked up
2:
TP53 - chr17: 757840
21/556
Picked up
3:
TP53 - chr17: 7578413,
45/1310
Picked up
4:
EGFR – chr7: 55259515
12/537
Not Picked
5:
TP53 – chr17: 7578242
4/363
Not Picked
6:
PTPN11 - chr12: 112924336
5/647
When we do have alt alleles more than 20 then we can pick these previously missed mutations. But when the count for alt allelles is less than 20 at least in these examples, then we still don't pick these very low frequency ones. In a very old version of mutect2, we used to have an option to set this alt count by using the parameter (--unique-alt-read-count), which does not exist anymore.
So, what I did was to try to reduce (--initial-tumor-lod) from default 2.0 to 1.0. By doing so, I can pick up a missing (EGFR) in the active region which has 12/537 count but it gets filtered out due to orientation issues. However the other examples with 4 and 5 is not picked up at all with tumor-lod of 1.0. Can you please shed some light that how mutect2 is making decisions on choosing alt bases to incorporate in the active region and also if there is a way to set this alt count to a certain number of reads?
I will really appreciate your response on this.
Best regards,
-
In contrast to the above examples, I have a few examples which are picked up by mutect2 at a very very low alt count, i.e 2/8, 3/311, and 4/97.
Please see these:
HOXA13 - 4/97 0/0:240,0:5.648e-03:240:110,0:130,0:126,114,0,0 0/1:97,4:0.041:101:36,1:61,2:46,51,2,2 BRAF - 2/8 0|0:12,0:0.083:12:5,0:5,0:0|1:140624393_CG_C:140624393:8,4,0,0 0|1:8,2:0.200:10:3,0:5,2:0|1:140624393_CG_C:140624393:7,1,1,1 KMT2D - 3/311 0/0:280,0:5.151e-03:280:54,0:64,0:187,93,0,0 0/1:311,3:0.020:314:58,2:93,1:196,115,3,0 I am just struggling to understand that why do we see such cases sometimes and miss other times.
Note: these examples are run with default value of --initial-tumor-lod (2.0)
-
Saba Mutect2 doesn't use alt depth and allele fraction directly to decide whether a site is active. Rather, it uses a fast approximation of the somatic likelihoods model (the model and its approximation are described in sections 1B and 1E of our docs: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf). Base qualities in particular make a very big difference.
It is also possible that some sites are found in the panel of normals. By default Mutect2 considers these sites inactive and skips them, but you may use the -genotype-pon-sites option to treat them as active and defer all filtering to FilterMutectCalls.
If the panel of normals is not the issue you may lower -initial-tumor-lod even further, to 0 or even -1. However, I don't really see anything wrong with the default behavior now. For example, one site has 5 alt reads and 637 ref reads. Even if the base qualities are all quite high, say Q30, the p value (this is not to say that Mutect2 uses a p value, but it's a nice back-of-the-envelope illustration) comes out to 0.0005. That is, if we accepted things based on this much evidence we would expect more than one million false positives in a genome. On the other hand, if you want to use a lenient threshold just for known oncogenes a higher false positive rate may be tolerable.
-
Hi David-Benjamin I have read https://github.com/broadinstitute/gatk/pull/6821 and in use gatk-4.1.9.0, but there are still some sites that cannot be detected (KRAS-vaf 0.01). So I tested some parameters as below:
- 1 --active-probability-threshold (defaut 0.002)
- 2 -init-lod 2.0 -emit-lod 2.0
- 3 --force-active
if --active-probability-threshold = -1 and -init-lod = 2.0:
can be detected
if --active-probability-threshold = 0.002 or 0.00000001(Extremely positive number) and -init-lod = -1:
can not be detectedI guess the active-probability-threshold's priority is higher.
So what I want to ask is the value(negative value ?) of active-probability-threshold of this value being close to the function of another parameter(--force-active ) ?And do I need to adjust this parameter to be negative, because I know it's like you said may Increase false positives(https://gatk.broadinstitute.org/hc/en-us/community/posts/360062144952/comments/360012886431)
KRAS-vaf: 12 25398285 . C A . . AS_SB_TABLE=819,875|9,7;DP=1731;ECNT=1;MBQ=20,20;MFRL=165,151;MMQ=60,60;MPOS=35;POPAF=7.30;TLOD=18.04 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:1694,16:9.827e-03:1710:833,7:855,9:819,875,9,7).
Please sign in to leave a comment.
15 comments