Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

artifacts called by Mutect2 on RNASeq data

0

3 comments

  • Avatar
    David Benjamin

    Unfortunately we don't have much experience with RNA calling but it looks to me like the reads are not getting split properly.  I mean, if Mutect2 is calling a read that spans your two exons against the full reference containing introns, it doesn't know anything about RNA and simply sees a read that goes a few bases past the end of the first exon into the intron and has no choice but to call a SNP and an indel.

    I am not an expert but I think that somewhere the pipeline ought to be splitting that read into a chunk along the first exon and a chunk along the second.

    0
    Comment actions Permalink
  • Avatar
    Alba Mas Malavila

    Hi David,

    thanks for your reply. I agree that it looks like the reads are not getting split properly, but I actually perform a step to split the exons after alignment with GATK4 SplitNCigarReads, as recommended in GATK Best Practices. So what I give as input to Mutect2 is the recalibrated CRAM obtained after alignment with STAR, removal of Ns with SplitNCigarReads and base quality score recalibration, and, as you can see in the second track of the IGV image, the exons are split properly.

    Looking at GATK documentation on how variant calling works, I guess that the errors occur during Mutect's reassembly of the active regions. I wonder if this could be optimized by modifying parameters such as -kmerSize or -minPruning, but I'm not sure how this is going to impact the variant calling accuracy. Do you have any suggestions on how to modify these parameters?

    Thank you very much,

    Alba

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Hi Alba,

    We don't have nearly enough experience with RNA but I think the best way to get a few guesses would be to generate a bamout via the -bamout bamout.bam argument, and then post an IGV screenshot of the realigned reads in bamout.bam.  This way we will see how parts of reads are getting clipped and moved around.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk