Genome Analysis Toolkit

Alba Mas Malavila · December 11, 2023 15:24

Hi,

I'm using Mutect2 v4.4.0.0 to call variants on RNASeq data without a matched reference. I'm following GATK best practices on RNAseq short variant discovery (https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-) but using Mutect2 instead of HaplotypeCaller to call somatic variants from tumor samples. Looking at IGV, most of the variants seem to have been called properly (you can see them in the BAM and in the VCF), but there are several variants that are called outside of the aligned regions (705 PASS variants out of 6508 total PASS variants). If you keep only SNPs with depth equal or higher than 10, then only 7 variants are called outside of the BAM. All of them are located very close to an aligned region and I've realized that in most cases the region where they are located is very similar in 3 or 4 bp with the beginning of the next aligned region, so I guess the error comes from here. I attach a picture to show what I mean.On the image, a substitution is detected on the first base of the intron, which is a G in the reference but a C in the sample. The 4 first bases of the next exon are the same as the 4 first bases of the intron, except for the G, which is a C here. Three bases upstream, it detects a large insertion, which maps perfectly on the next exon.

Have you detected these type of errors when performing variant calling with Mutect2 on RNAseq data? Do you have any suggestion on how it could be avoided?

Thank you very much,

Alba

Genome Analysis Toolkit

Need Help?

Community Forum

artifacts called by Mutect2 on RNASeq data

3 comments

Welcome

Didn't find what you were looking for?

Quick Links

Recent GATK News

About the GATK community