error: read goes past end of reference
Description of the error
Running Mutect2 (somatic mutation calling) on WES data, I get the error "read goes past end of reference" on a few samples (a minority). WES data was mapped with BWA MEM v0.7.17-r1188 to HG38 following GATK4 best practices.
Cutting up the BAM file a few times with Sambamba, I was able to infer that this read is the culprit (the error gets thrown on a BAM file that contains only this read):
---
HWI-ST1444:95:C99A3ACXX:5:1203:5262:99610 99 chr13 100516763 60 7S47M46S = 100516763 47 GTGTGTGTGTGTGTAAAATTCAGGGGAGGACATCTTCCTTCTCCAGAGTTAGGGCTGTCTCTTATACACATCTCCGAGCCCACGAGACTCCTGAGCATCT >?E=I>G>H@I?I>>ABB@AHBHHHH@FGAFB@GEAGDEBHGIHCIAG@A@IIIIGI@IGIGBAA@GCHCAIGIH>@IHGGBHAAJAHFJGFKAJIB@JG MC:Z:53S47M MD:Z:47 RG:Z:1 NM:i:0 AS:i:47 XS:i:25
HWI-ST1444:95:C99A3ACXX:5:1203:5262:99610 147 chr13 100516763 60 53S47M = 100516763 -47 TACACCTCTCTATTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGTGTGTGTGTGTAAAATTCAGGGGAGGACATCTTCCTTCTCCAGAGTTAGGG @@F?IJAJAJ@@CAAHAAGIKEH@GAIGIACHCHAAABGIGI@IGHGCGCGCGCGCGCG@AAA@BAHFGGGHFGH?H@AHBAHHA@G@HIGGEEABEGG? MC:Z:7S47M46S MD:Z:47 RG:Z:1 NM:i:0 AS:i:47 XS:i:25
---
It seems that "https://github.com/broadinstitute/gatk/blob/master/src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java" line 699 throws the error. I do not understand why is the case though, the read seems fine to me.
GATK version used
The Genome Analysis Toolkit (GATK) v4.1.4.1-74-g5887df8-SNAPSHOT
HTSJDK Version: 2.21.1
Picard Version: 2.21.7
b) Exact GATK commands used
gatk Mutect2 -R hg38.fa -I input.bam --germline-resource GATK_resource_bundle/Mutect2/af-only-gnomad.hg38.vcf.gz --tmp-dir . -O out.bam
c) The entire error log if applicable.
java.lang.IllegalArgumentException: read goes past end of reference
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:727)
at org.broadinstitute.hellbender.utils.read.AlignmentUtils.leftAlignIndels(AlignmentUtils.java:804)
at org.broadinstitute.hellbender.utils.read.AlignmentUtils.createReadAlignedToRef(AlignmentUtils.java:105)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.realignReadsToTheirBestHaplotype(AssemblyBasedCallerUtils.java:84)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2Engine.callRegion(Mutect2Engine.java:249)
at org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2.apply(Mutect2.java:299)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Thanks in advance,
Wout Megchelenbrink
-
Hi Wout Megchelenbrink, this may be a problem with your alignment. Are you making sure to use the same reference for all your samples? Did you get any warnings or errors while running BWA?
Just to make sure you are using the correct best practices, here is the link: https://gatk.broadinstitute.org/hc/en-us/articles/360035535912-Data-pre-processing-for-variant-discovery
-
Wout Megchelenbrink This reminds me of an embarrassing bug that I was responsible for around the time of the 4.1.4 release that I have since fixed. Please try the most recent GATK release, and let us know if a problem remains.
By the way, this read does not go anywhere near the end of the chr13 reference, as you know. The reference referred to in the error is the locally-copied chunk of the reference.
-
Dear David,
Thanks for the advise. I downloaded the latest GATK version today and Mutect2 seems to run fine now. Although it did not finish the full analysis yet, it passed chr13 without problems this time.
I will come back to confirm the problem is solved when the pipeline is completely finished.
Best regards,
Wout -
All jobs finished successfully using GATK 4.1.8.1, so the problem is solved.
Best,
Wout -
I'm glad to hear that!
Please sign in to leave a comment.
5 comments