Multi-mapping reads causing errors in Mutect2 for RNAseq?
Hello all,
I am running Mutect2 (GATK version 4.1.4.0) using tumor BAM files from RNAseq aligned with STAR and matching germline from DNAseq (from the same sample).
I have pre-processed all the RNAseq BAM files as specified by GATK, yet when I run this Mutect2 command on the pre-processed data:
gatk Mutect2\
--reference input/ucsc.hg19.fasta\
--intervals tmp/${chr}_regions.list\
-I '${GERMLINE[@]}'\
-I '${SAMPLES[0]}'\
-I '${SAMPLES[1]}'\
-normal '${INPUT_PATIENT}'_BS_GL\
--independent-mates\
--tumor-lod-to-emit 2.0\
--f1r2-tar-gz tmp/'${INPUT_PATIENT}'_${chr}_f1r2.tar.gz\
--output tmp/'${INPUT_PATIENT}'_${chr}.vcf
I get the following error:
java.lang.IllegalArgumentException: Cannot construct fragment from more than two reads
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:725)
at org.broadinstitute.hellbender.utils.read.Fragment.create(Fragment.java:36)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1376)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.utils.genotyper.AlleleLikelihoods.groupEvidence(AlleleLikelihoods.java:595)
at org.broadinstitute.hellbender.tools.walkers.mutect.SomaticGenotypingEngine.callMutations(SomaticGenotypingEngine.java:93)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:251)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:320)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:308)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:281)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
I am aware that there is a solution for this here, and indeed, using the option '--independent-mates' seems to solve the issue. My question is, where is this error coming from? Is it inherent to RNAseq data? Is simply using '--independent-mates' the best course of action here?
If I have a look at the BAM files with samtools flagstat I get:
139395566 + 0 in total (QC-passed reads + QC-failed reads)
14875584 + 0 secondary
7662366 + 0 supplementary
33225475 + 0 duplicates
127939060 + 0 mapped (91.78% : N/A)
116857616 + 0 paired in sequencing
58428808 + 0 read1
58428808 + 0 read2
105401110 + 0 properly paired (90.20% : N/A)
105401110 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
So the duplicate reads have clearly been marked by MarkDuplicates. I suspect that the error might be caused by multi-mapping reads, resulting in the same read ID being more than once in the same BAM file. Is this correct? If so, do additional steps need to be taken in the pre-processing?
Thank you very much!
-
Hi ,
The GATK support team is focused on resolving errors and abnormal results from GATK tools. For all other questions, such as this one which is about understanding how the tool works, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
This error can only happen when three or more reads have the same name. It can occur when a supplementary alignment falls very close to the primary alignment read pair, but if your samtools stats are to be believed it's happening for some other reason. We have often seen pipelines where it happens a handful of times in a bam.
Mutect2 now handles this by randomly choosing two reads rather than failing, but we haven't had a release since this change was merged. There should be a GATK minor release next week. Once that release comes out I would strongly recommend turning off the `independent-mates` argument, which tells Mutect2 to be ignorant of read-pairing during genotyping.
-
Good to hear, thank you!
I will try removing all multiple-mapping reads too, see if that solves the issue.
Please sign in to leave a comment.
3 comments