Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 Variant has very long ALT allele

0

8 comments

  • Avatar
    Gökalp Çelik

    Hi David Moline

    What kind of data are you working on? hybrid capture, whole genome or amplicon?

    0
    Comment actions Permalink
  • Avatar
    David Moline

    Hi Gökalp Çelik,

    This is WES of a tumor sample. Let me know if you need more info!

    Thanks,

    David

    0
    Comment actions Permalink
  • Avatar
    David Moline

    Here is the Mutect2 output (no annotation)

    #CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
    chr13    32336895    .    G    GTCTGTTTCATGAAGTTCCTTAGTATTTCCTAAAGCAAGATTATTCCTTTCATTAGCTACTTGGAAGACAAAATTATTCTCATTGTCTGAGA    .    PASS    AS_FilterStatus=SITE;AS_SB_TABLE=132,140|0,4;DP=301;ECNT=1;GERMQ=88;MBQ=20,38;MFRL=186,146;MMQ=60,60;MPOS=65;POPAF=7.30;ROQ=93;TLOD=8.23    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:272,4:0.028:276:60,4:72,0:180,4:132,140,0,4
    chr13    32337144    .    A    AATTTTCATTTAAAGCACATACATCTTGATTCTTTTCCATGGGAATATTTTTGGTTAATTCAACATCAGATTCATAATTGTTACCTT    .    PASS    AS_FilterStatus=SITE;AS_SB_TABLE=161,145|0,14;DP=320;ECNT=1;GERMQ=38;MBQ=20,30;MFRL=173,146;MMQ=60,60;MPOS=12;POPAF=7.30;ROQ=93;TLOD=39.64    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:306,14:0.074:320:79,14:87,0:188,14:161,145,0,14
    chr13    32337995    .    G    GATAACAAATATACTGCTGCCAGTAGAAATTCTCATAACTTAGAATTTGATGGCAGTGATTCAAGTAAAAATGATACTGTT    .    PASS    AS_FilterStatus=SITE;AS_SB_TABLE=143,141|0,20;DP=304;ECNT=1;GERMQ=4;MBQ=20,39;MFRL=173,193;MMQ=60,60;MPOS=78;POPAF=7.30;ROQ=93;TLOD=60.70    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:284,20:0.104:304:61,9:83,8:184,20:143,141,0,20
    chr13    32340733    .    C    CCCCATGATTTAGTTGCCTTCCATCGTGGAAGGCAACTCCAAAGACACGCGGAGATTCCCAAGACACGTGGAGATTCTGGGAGCTACAGTTCAAGATG    .    PASS    AS_FilterStatus=SITE;AS_SB_TABLE=122,110|0,6;DP=238;ECNT=1;GERMQ=59;MBQ=20,40;MFRL=179,130;MMQ=60,60;MPOS=61;POPAF=7.30;ROQ=93;TLOD=14.47    GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:232,6:0.046:238:76,6:49,0:146,6:122,110,0,6
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Looking at those insertions via BLAST and BLAT first 3 definitely matches to BRCA2 RNA sequence and the last one matches somewhere from chromosome 11. There could be multiple answers present for these items but firstly I need to emphasize that tumor-only variant calling by Mutect2 is almost always prone to detecting artifacts as variants since there is no way to check the concordance with a matched normal. Secondly whole exome sequencing is still a method that uses PCR to amplify captured fragments and during this event sometimes unwanted chimeric PCR products occur to pollute reads. Especially if your samples are collected from FFPE then excessive DNA fragmentation triggers this more than  what you can observe under conditions with more intact DNA. 

    To understand the issue better we wish to see those variants and if you can share IGV views of those regions with softclipped bases open that would be great. Alternatively if possible you may share a piece of your bam file with us so that we may be able to check and try to recreate those calls in our hands. If not possible IGV views would be nice to have. 

    Besides, assuming those variants are present within your region it is possible that a larger structural variation actually exists that may require further assays and methods to confirm. 

    Our suggestion would be to check purity of your tumor samples and decide if a particular variant with an acceptable MAF for the tumor purity is present within your findings. You may need to check Sensitivity parameters for detecting certain allele fractions using CollectHsMetrics tool and decide whether those low AF variants you have are reliable or not. Of course confirmation via non-orthogonal ways is always recommended however with allele fractions as low as <5% it may get extra hard to ensure the variant is really there. The real solution to this problem is to have a matched normal that is sequenced similarly to the tumor sample but if it is not available there is really not much to advise. 

     

    0
    Comment actions Permalink
  • Avatar
    David Moline

    Hi Gökalp Çelik,

    Thank you for the in-depth response! I took some screenshots in IGV for each of the 4 samples. Under View > Preferences > Alignments > Show soft-clipped bases I enables soft-clipped bases to be seen; not sure if that's what you meant. Each Variant Pos is highlighted in red. Let me know if these are sufficient or if you'd like a zoomed in/out screenshot.

     

    chr13_32336895

     

    chr13_32337144

     

    chr13_32337995

     

    chr13_32340733

     

    Let me know what you think!

    Hope all is well,

    David

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi David Moline

    Looks like all bases are highlighted not just softclipped ones so it is hard to choose easily which ones that we should focus on. Can you turn off displaying all bases in the reads? 

    I have also contacted our team to provide additional opinions on this matter.

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    David Moline

    Hi Gökalp Çelik,

    How do these look?

    chr13_32336895

    chr13_32337144

    chr13_32337995

    chr13_32340733

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. These images are much better. Looking at these sites, one thing that caught our attention is that none of the reads going through the region is spanning these insertions found by Mutect2 therefore reduces the likelihood of these variants to be correct. Additionally reads that support these variants all seem to be originated from the same molecule so it is likely that these reads are a result of PCR artifacts. 

    If you wish to dig deeper for those sites we recommend you to perform a bamout for those regions to see those assembled haplotypes by Mutect2 and decide if you see any good supporting actual reads (Not the synthetic ones) for any of those variants. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk