Mutect2 Variant has very long ALT allele
REQUIRED for all errors and issues:
a) GATK version used:
4.4.0.0
b) Exact command used:
I am running Mutect2 using the workflow nf-core/sarek. The main script and config file are found on github (linked below). The input data is tumor-only samples.
The reads undergo QC via FastQC (read checking) and FastP (read trimming). The reads are mapped to GRCh38 via BWA. Mutect2 is used for variant calling. The workflow runs Mutect2, LearnReadOrientationModel, GetPileupSummaries, CalculateContamination, and FilterMutectCalls. Each variant is annotated by VEP.
Links:
c) Entire program log:
N/A
I've noticed that some Mutect2 variant calls have a very long ALT allele that I am unsure how to interpret. Below is an example (Sorry the Alf/Ref columns are switched):
Clean VCF:
Chr Pos Alt Ref AF DP Filter Gene_Symbol
chr13 32336895 GTCTGTTTCATGAAGTTCCTTAGTATTTCCTAAAGCAAGATTATTCCTTTCATTAGCTACTTGGAAGACAAAATTATTCTCATTGTCTGAGA G 0.028 276 PASS BRCA2
chr13 32337144 AATTTTCATTTAAAGCACATACATCTTGATTCTTTTCCATGGGAATATTTTTGGTTAATTCAACATCAGATTCATAATTGTTACCTT A 0.074 320 PASS BRCA2
chr13 32337995 GATAACAAATATACTGCTGCCAGTAGAAATTCTCATAACTTAGAATTTGATGGCAGTGATTCAAGTAAAAATGATACTGTT G 0.104 304 PASS BRCA2
chr13 32340733 CCCCATGATTTAGTTGCCTTCCATCGTGGAAGGCAACTCCAAAGACACGCGGAGATTCCCAAGACACGTGGAGATTCTGGGAGCTACAGTTCAAGATG C 0.046 238 PASS BRCA2
Mutect2 with VEP Annotation VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE_NAME
chr13 32336895 . G GTCTGTTTCATGAAGTTCCTTAGTATTTCCTAAAGCAAGATTATTCCTTTCATTAGCTACTTGGAAGACAAAATTATTCTCATTGTCTGAGA . PASS AS_FilterStatus=SITE;AS_SB_TABLE=132,140|0,4;DP=301;ECNT=1;GERMQ=88;MBQ=20,38;MFRL=186,146;MMQ=60,60;MPOS=65;POPAF=7.30;ROQ=93;TLOD=8.23;CSQ=TCTGTTTCATGAAGTTCCTTAGTATTTCCTAAAGCAAGATTATTCCTTTCATTAGCTACTTGGAAGACAAAATTATTCTCATTGTCTGAGA|stop_gained&frameshift_variant|HIGH|BRCA2|ENSG00000139618|Transcript|ENST00000380152|protein_coding|11/27||||2739-2740/11954|2540-2541/10257|847/3418|R/SLFHEVP*YFLKQDYSFH*LLGRQNYSHCLRX|aga/agTCTGTTTCATGAAGTTCCTTAGTATTTCCTAAAGCAAGATTATTCCTTTCATTAGCTACTTGGAAGACAAAATTATTCTCATTGTCTGAGAa|||1||insertion|HGNC|HGNC:1101|YES|NM_000059.4||5|P2|CCDS9344.1|ENSP00000369497|P51587.235||UPI00001FCBCC||1|||PIRSF:PIRSF002397&PANTHER:PTHR11289|||||||||||||||||||||||||||||||||||||| GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:272,4:0.028:276:60,4:72,0:180,4:132,140,0,4
chr13 32337144 . A AATTTTCATTTAAAGCACATACATCTTGATTCTTTTCCATGGGAATATTTTTGGTTAATTCAACATCAGATTCATAATTGTTACCTT . PASS AS_FilterStatus=SITE;AS_SB_TABLE=161,145|0,14;DP=320;ECNT=1;GERMQ=38;MBQ=20,30;MFRL=173,146;MMQ=60,60;MPOS=12;POPAF=7.30;ROQ=93;TLOD=39.64;CSQ=ATTTTCATTTAAAGCACATACATCTTGATTCTTTTCCATGGGAATATTTTTGGTTAATTCAACATCAGATTCATAATTGTTACCTT|stop_gained&frameshift_variant|HIGH|BRCA2|ENSG00000139618|Transcript|ENST00000380152|protein_coding|11/27||||2988-2989/11954|2789-2790/10257|930/3418|Y/*FSFKAHTS*FFSMGIFLVNSTSDS*LLPX|tat/taATTTTCATTTAAAGCACATACATCTTGATTCTTTTCCATGGGAATATTTTTGGTTAATTCAACATCAGATTCATAATTGTTACCTTt|||1||insertion|HGNC|HGNC:1101|YES|NM_000059.4||5|P2|CCDS9344.1|ENSP00000369497|P51587.235||UPI00001FCBCC||1|||PIRSF:PIRSF002397&PANTHER:PTHR11289|||||||||||||||||||||||||||||||||||||| GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:306,14:0.074:320:79,14:87,0:188,14:161,145,0,14
chr13 32337995 . G GATAACAAATATACTGCTGCCAGTAGAAATTCTCATAACTTAGAATTTGATGGCAGTGATTCAAGTAAAAATGATACTGTT . PASS AS_FilterStatus=SITE;AS_SB_TABLE=143,141|0,20;DP=304;ECNT=1;GERMQ=4;MBQ=20,39;MFRL=173,193;MMQ=60,60;MPOS=78;POPAF=7.30;ROQ=93;TLOD=60.70;CSQ=ATAACAAATATACTGCTGCCAGTAGAAATTCTCATAACTTAGAATTTGATGGCAGTGATTCAAGTAAAAATGATACTGTT|frameshift_variant|HIGH|BRCA2|ENSG00000139618|Transcript|ENST00000380152|protein_coding|11/27||||3839-3840/11954|3640-3641/10257|1214/3418|V/DNKYTAASRNSHNLEFDGSDSSKNDTVX|gtg/gATAACAAATATACTGCTGCCAGTAGAAATTCTCATAACTTAGAATTTGATGGCAGTGATTCAAGTAAAAATGATACTGTTtg|CI991982||1||insertion|HGNC|HGNC:1101|YES|NM_000059.4||5|P2|CCDS9344.1|ENSP00000369497|P51587.235||UPI00001FCBCC||1|||PIRSF:PIRSF002397&PROSITE_profiles:PS50138&PANTHER:PTHR11289||||||||||||||||||||||||||||||||1|||||| GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:284,20:0.104:304:61,9:83,8:184,20:143,141,0,20
chr13 32340733 . C CCCCATGATTTAGTTGCCTTCCATCGTGGAAGGCAACTCCAAAGACACGCGGAGATTCCCAAGACACGTGGAGATTCTGGGAGCTACAGTTCAAGATG . PASS AS_FilterStatus=SITE;AS_SB_TABLE=122,110|0,6;DP=238;ECNT=1;GERMQ=59;MBQ=20,40;MFRL=179,130;MMQ=60,60;MPOS=61;POPAF=7.30;ROQ=93;TLOD=14.47;CSQ=CCCATGATTTAGTTGCCTTCCATCGTGGAAGGCAACTCCAAAGACACGCGGAGATTCCCAAGACACGTGGAGATTCTGGGAGCTACAGTTCAAGATG|stop_gained&frameshift_variant|HIGH|BRCA2|ENSG00000139618|Transcript|ENST00000380152|protein_coding|11/27||||6577-6578/11954|6378-6379/10257|2126-2127/3418|-/PMI*LPSIVEGNSKDTRRFPRHVEILGATVQDX|-/CCCATGATTTAGTTGCCTTCCATCGTGGAAGGCAACTCCAAAGACACGCGGAGATTCCCAAGACACGTGGAGATTCTGGGAGCTACAGTTCAAGATG|||1||insertion|HGNC|HGNC:1101|YES|NM_000059.4||5|P2|CCDS9344.1|ENSP00000369497|P51587.235||UPI00001FCBCC||1|||PIRSF:PIRSF002397|||||||||||||||||||||||||||||||||||||| GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:232,6:0.046:238:76,6:49,0:146,6:122,110,0,6
Note: This phenomenon is not limited to BRCA2
Questions:
- Why is Mutect2 calling such a long variant for the ALT allele?
- Are these variant calls legitimate?
- Is there a way to tell what kind of mutations these are (i.e., indel)? I assume this is an insertion or frameshift?
-
Hi David Moline
What kind of data are you working on? hybrid capture, whole genome or amplicon?
-
Hi Gökalp Çelik,
This is WES of a tumor sample. Let me know if you need more info!
Thanks,
David
-
Here is the Mutect2 output (no annotation)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr13 32336895 . G GTCTGTTTCATGAAGTTCCTTAGTATTTCCTAAAGCAAGATTATTCCTTTCATTAGCTACTTGGAAGACAAAATTATTCTCATTGTCTGAGA . PASS AS_FilterStatus=SITE;AS_SB_TABLE=132,140|0,4;DP=301;ECNT=1;GERMQ=88;MBQ=20,38;MFRL=186,146;MMQ=60,60;MPOS=65;POPAF=7.30;ROQ=93;TLOD=8.23 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:272,4:0.028:276:60,4:72,0:180,4:132,140,0,4
chr13 32337144 . A AATTTTCATTTAAAGCACATACATCTTGATTCTTTTCCATGGGAATATTTTTGGTTAATTCAACATCAGATTCATAATTGTTACCTT . PASS AS_FilterStatus=SITE;AS_SB_TABLE=161,145|0,14;DP=320;ECNT=1;GERMQ=38;MBQ=20,30;MFRL=173,146;MMQ=60,60;MPOS=12;POPAF=7.30;ROQ=93;TLOD=39.64 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:306,14:0.074:320:79,14:87,0:188,14:161,145,0,14
chr13 32337995 . G GATAACAAATATACTGCTGCCAGTAGAAATTCTCATAACTTAGAATTTGATGGCAGTGATTCAAGTAAAAATGATACTGTT . PASS AS_FilterStatus=SITE;AS_SB_TABLE=143,141|0,20;DP=304;ECNT=1;GERMQ=4;MBQ=20,39;MFRL=173,193;MMQ=60,60;MPOS=78;POPAF=7.30;ROQ=93;TLOD=60.70 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:284,20:0.104:304:61,9:83,8:184,20:143,141,0,20
chr13 32340733 . C CCCCATGATTTAGTTGCCTTCCATCGTGGAAGGCAACTCCAAAGACACGCGGAGATTCCCAAGACACGTGGAGATTCTGGGAGCTACAGTTCAAGATG . PASS AS_FilterStatus=SITE;AS_SB_TABLE=122,110|0,6;DP=238;ECNT=1;GERMQ=59;MBQ=20,40;MFRL=179,130;MMQ=60,60;MPOS=61;POPAF=7.30;ROQ=93;TLOD=14.47 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:232,6:0.046:238:76,6:49,0:146,6:122,110,0,6 -
Looking at those insertions via BLAST and BLAT first 3 definitely matches to BRCA2 RNA sequence and the last one matches somewhere from chromosome 11. There could be multiple answers present for these items but firstly I need to emphasize that tumor-only variant calling by Mutect2 is almost always prone to detecting artifacts as variants since there is no way to check the concordance with a matched normal. Secondly whole exome sequencing is still a method that uses PCR to amplify captured fragments and during this event sometimes unwanted chimeric PCR products occur to pollute reads. Especially if your samples are collected from FFPE then excessive DNA fragmentation triggers this more than what you can observe under conditions with more intact DNA.
To understand the issue better we wish to see those variants and if you can share IGV views of those regions with softclipped bases open that would be great. Alternatively if possible you may share a piece of your bam file with us so that we may be able to check and try to recreate those calls in our hands. If not possible IGV views would be nice to have.
Besides, assuming those variants are present within your region it is possible that a larger structural variation actually exists that may require further assays and methods to confirm.
Our suggestion would be to check purity of your tumor samples and decide if a particular variant with an acceptable MAF for the tumor purity is present within your findings. You may need to check Sensitivity parameters for detecting certain allele fractions using CollectHsMetrics tool and decide whether those low AF variants you have are reliable or not. Of course confirmation via non-orthogonal ways is always recommended however with allele fractions as low as <5% it may get extra hard to ensure the variant is really there. The real solution to this problem is to have a matched normal that is sequenced similarly to the tumor sample but if it is not available there is really not much to advise.
-
Hi Gökalp Çelik,
Thank you for the in-depth response! I took some screenshots in IGV for each of the 4 samples. Under View > Preferences > Alignments > Show soft-clipped bases I enables soft-clipped bases to be seen; not sure if that's what you meant. Each Variant Pos is highlighted in red. Let me know if these are sufficient or if you'd like a zoomed in/out screenshot.
chr13_32336895
chr13_32337144
chr13_32337995
chr13_32340733
Let me know what you think!
Hope all is well,
David
-
Hi David Moline
Looks like all bases are highlighted not just softclipped ones so it is hard to choose easily which ones that we should focus on. Can you turn off displaying all bases in the reads?
I have also contacted our team to provide additional opinions on this matter.
Regards.
-
Hi Gökalp Çelik,
How do these look?
chr13_32336895
chr13_32337144
chr13_32337995
chr13_32340733
-
Hi again. These images are much better. Looking at these sites, one thing that caught our attention is that none of the reads going through the region is spanning these insertions found by Mutect2 therefore reduces the likelihood of these variants to be correct. Additionally reads that support these variants all seem to be originated from the same molecule so it is likely that these reads are a result of PCR artifacts.
If you wish to dig deeper for those sites we recommend you to perform a bamout for those regions to see those assembled haplotypes by Mutect2 and decide if you see any good supporting actual reads (Not the synthetic ones) for any of those variants.
I hope this helps.
Please sign in to leave a comment.
8 comments