Mutations in contiguous sites within a sample
Hi, I have followed the gatk somatic short variants discovery workflow but I found that there are some variants that are adjacent to each other as below. I am wondering if there is a reason why mutect does not combine them?
Should I treat them as two mutations e.g. M35 (9683756 CC>TT isn't it the same mutation as 9683757 C>T?), M48 (49041108 C>A 49041109 C>A so I should change it into CC>AA?) .
and how would you deal with e.g. M46 (39727366 C>CAA and 39727367 TCC>T there is an insertion and deletion right next to each other? Or is it an indication that I might have done something wrong? These mutaitons represents less than 1% of the total number of mutations but I just want to make sure I am doing the right thing.
Thank you for your help!
sampleID |
chr |
pos |
ref |
alt |
GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB |
M35 |
1 |
9683756 |
CC |
TT |
0|1:1585,67:0.042:1652:761,31:798,35:0|1:9683756_CC_TT:9683756:557,1028,19,48 |
M35 |
1 |
9683757 |
C |
T |
1|0:1553,90:0.031:1643:742,44:791,46:1|0:9683756_CC_TT:9683756:541,1012,30,60 |
M35 |
10 |
43114056 |
G |
GTT |
0|1:2377,87:0.038:2464:1178,42:1135,43:0|1:43114056_G_GTT:43114056:1227,1150,53,34 |
M35 |
10 |
43114057 |
ACAC |
A |
0|1:2441,88:0.038:2529:1196,44:1134,43:0|1:43114056_G_GTT:43114056:1250,1191,54,34 |
M45 |
14 |
22978099 |
G |
A |
0/1:324,5:0.022:329:147,3:157,2:217,107,4,1 |
M45 |
14 |
22978100 |
A |
AGGTGAGGAAGGAAGGAG |
0/1:204,86:0.297:290:104,26:92,43:135,69,55,31 |
M45 |
20 |
32434632 |
C |
T |
0|1:1726,46:0.027:1772:813,25:879,19:0|1:32434632_C_T:32434632:806,920,26,20 |
M45 |
20 |
32434633 |
AT |
A |
0|1:1726,46:0.027:1772:779,25:846,19:0|1:32434632_C_T:32434632:806,920,26,20 |
M45 |
4 |
186589050 |
G |
T |
0|1:2600,45:0.018:2645:1355,22:1233,20:0|1:186589050_G_T:186589050:1237,1363,22,23 |
M45 |
4 |
186589051 |
AG |
A |
0|1:2669,41:0.016:2710:1351,21:1234,20:0|1:186589050_G_T:186589050:1271,1398,21,20 |
M46 |
14 |
78835175 |
G |
A |
0/1:746,23:0.03:769:368,11:367,12:402,344,14,9 |
M46 |
14 |
78835176 |
G |
A |
0/1:737,21:0.029:758:364,13:366,8:396,341,10,11 |
M46 |
17 |
7674220 |
C |
G |
0|1:1670,102:0.059:1772:775,46:881,55:0|1:7674220_C_G:7674220:812,858,44,58 |
M46 |
17 |
7674221 |
G |
C |
1|0:1729,60:0.031:1789:812,28:906,31:1|0:7674220_C_G:7674220:847,882,28,32 |
M46 |
17 |
39727366 |
C |
CAA |
0|1:1098,15:0.015:1113:527,8:531,5:0|1:39727366_C_CAA:39727366:678,420,8,7 |
M46 |
17 |
39727367 |
TCC |
T |
0|1:1135,15:0.015:1150:526,10:533,5:0|1:39727366_C_CAA:39727366:681,454,8,7 |
M46 |
2 |
38986124 |
T |
TG |
0|1:2605,6:0.001323:2611:1205,3:1274,3:0|1:38986124_T_TG:38986124:983,1622,3,3 |
M46 |
2 |
38986125 |
T |
TAGGCTCTCTAGCTGGTAGAATAATCTGAGCTACAA |
0|1:2605,7:0.001325:2612:1162,3:1284,3:0|1:38986124_T_TG:38986124:983,1622,4,3 |
M47 |
4 |
186618715 |
T |
C |
0/1:2472,156:0.061:2628:1202,67:1215,87:1088,1384,83,73 |
M47 |
4 |
186618716 |
C |
T |
0/1:2571,53:0.02:2624:1222,25:1258,23:1150,1421,24,29 |
M47 |
9 |
132944745 |
GG |
AA |
0|1:976,9:0.011:985:443,4:511,4:0|1:132944745_GG_AA:132944745:442,534,3,6 |
M47 |
9 |
132944746 |
G |
A |
1|0:937,41:0.033:978:428,22:497,18:1|0:132944745_GG_AA:132944745:426,511,19,22 |
M48 |
12 |
49041108 |
C |
A |
0|1:1656,59:0.035:1715:806,30:840,29:0|1:49041108_C_A:49041108:727,929,24,35 |
M48 |
12 |
49041109 |
C |
A |
1|0:1674,28:0.015:1702:813,16:855,11:1|0:49041108_C_A:49041108:727,947,12,16 |
M51 |
11 |
24201469 |
A |
AGC |
0|1:18,3:0.179:21:8,3:10,0:0|1:24201469_A_AGC:24201469:2,16,0,3 |
M51 |
11 |
24201470 |
A |
ATTCTAGC |
0|1:18,3:0.179:21:8,3:9,0:0|1:24201469_A_AGC:24201469:2,16,0,3 |
M53 |
4 |
186706729 |
C |
CCA |
0|1:1941,107:0.053:2048:948,53:940,53:0|1:186706729_C_CCA:186706729:1143,798,70,37 |
M53 |
4 |
186706730 |
ACGG |
A |
0|1:2006,107:0.052:2113:953,52:938,52:0|1:186706729_C_CCA:186706729:1157,849,70,37 |
M54 |
17 |
12127139 |
C |
T |
0|0:630,0:0.001845:630:280,0:302,0:0|1:12127139_C_T:12127139:381,249,0,0 |
M54 |
17 |
12127140 |
GT |
G |
0|0:630,0:0.001814:630:298,0:308,0:0|1:12127139_C_T:12127139:381,249,0,0 |
REQUIRED for all errors and issues:
a) GATK version used: 4.1
b) Exact command used:
parallel --xapply -j 1 gatk Mutect2 \
-R ${ref} \
-
Thank you for your post, ashgorden! I want to let you know we have received your question and will be moving it to the Community Discussions -> General Discussion topic, as the Somatic topic is for reporting bugs and issues with GATK.
We'll get back to you if we have any updates or follow up questions. Please see our Support Policy for more details about how we prioritize responding to questions.
-
Hi ashgorden,
The reason that Mutect2 does not combine variants adjacent to each other is that most users do not prefer to have MNPs in VCF files. MNPs are a lot less commonly handled for downstream analysis and make the analysis more difficult. So, we keep these records separate unless you change the --max-mnp-distance.
The output you have shown here looks standard and fine to us, we don't see problems with the format of these variant sites.
Let us know if you have any other questions.
Best,
Genevieve
-
Dear Genevieve and ashgorden,
I'm following this question since I was also curious about what's going on. Mutect2 DOES combine adjacent single nucleotide substitutions as --max-mnp-distance 1 by default. I think the output above requires a bit more detailed explanation:
Most of the cases above, e.g. M54, are a consequence of the irreducible representation of indels that is chosen by Mutect2 as either [N>N...N] or [N...N>N] plus some adjacent MNV. They can in principle be combined if the phasings (0|1 vs 1|0) of the indel and the MNV are the same.
The variants M35 chr1 have different phasing information, so they represent two distinct variants and can not be combined.
Cases where there is an insertion plus an adjacent deletions represent two distinct variants as there are just a number of reads supporting each of those.
My guess for why cases like M46 chr14 are not combined is that the phasing is uncertain (0/1), so that the first variant could be on one chromosome and the second on the other, leading to two distinct variants. They could also be the same and be an actual MNV if the phasing coincides, but we just don't know.
This is how I understand the annotations.
Best,
Philipp
-
Thank you Philipp for taking the time and writing out this explanation! Complex sites like these are hard to represent.
Please sign in to leave a comment.
4 comments