Issue with RNA-seq Variant Calling Pipeline using GATK
Hello GATK community,
I'm encountering an issue while processing BAM files through an RNA-seq variant calling pipeline using GATK tools. I would greatly appreciate your insights and guidance in resolving this issue.
Issue:
During testing of the pipeline, I'm experiencing unexpected behavior at the apply_bqsr
step. Although the split_reads
process and base_recalibrator
step seem to execute successfully, I encounter issues specifically when applying base quality score recalibration.
Pipeline Overview:
The pipeline follows these steps:
- MarkDuplicates
- SplitNCigarReads
- BaseRecalibrator
- ApplyBQSR
- HaplotypeCaller
- VariantFiltration
Observations:
- The
mark_duplicates
,split_ncigar_reads
, andbase_recalibrator
steps appear to run without any errors. - The
split_reads.bam
file was indexed it seems by the split_reads step. - However, during the
apply_bqsr
step, I encounter errors related to premature EOF:htsjdk.samtools.util.RuntimeEOFException: Premature EOF. Expected XXX but only received YYY; BinaryCodec in readmode; file: /path/in/container/8751-AM-0002_S1_L005/split_reads.bam
Additional Information:
- I'm using GATK version: 4.4.0.0
- Docker container:
broadinstitute/gatk:latest
- BAM files are properly indexed before the
apply_bqsr
step. - I have ensured that I'm using consistent GTF and genome versions.
- I've confirmed that my BAM files are intact and not corrupted.
- Each BAM file contains read group information.
- The reference genome matches the genome used for the analysis.
- This is on cancer RNA-seq data.
- I do not have tumor/normal pairs, so I am using the germline approach in GATK because the somatic approach needs tumor/normal pairs; am I correct in this?
Questions:
- What could be causing this premature EOF issue during the
apply_bqsr
step? - The
split_reads
process andbase_recalibrator
seem to complete correctly. Why might applying BQSR encounter these errors? - Are there specific considerations when running BQSR in an RNA-seq context that I should be aware of?
Any insights or suggestions you can provide would be incredibly helpful in diagnosing and resolving this issue. If you need any further information or logs, please let me know, and I'll provide them promptly.
Thank you very much for your assistance.
apply_bqsr(split_reads_bam, recalibration_report) 23:16:06.353 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 23:16:06.394 INFO ApplyBQSR - ------------------------------------------------------------ 23:16:06.398 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.4.0.0 23:16:06.398 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/ 23:16:06.398 INFO ApplyBQSR - Executing as root@7819211d41d4 on Linux v5.10.16.3-microsoft-standard-WSL2 amd64 23:16:06.398 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1 23:16:06.398 INFO ApplyBQSR - Start Date/Time: August 29, 2023 at 11:16:06 PM GMT 23:16:06.398 INFO ApplyBQSR - ------------------------------------------------------------ 23:16:06.399 INFO ApplyBQSR - ------------------------------------------------------------ 23:16:06.400 INFO ApplyBQSR - HTSJDK Version: 3.0.5 23:16:06.400 INFO ApplyBQSR - Picard Version: 3.0.0 23:16:06.400 INFO ApplyBQSR - Built for Spark Version: 3.3.1 23:16:06.401 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2 23:16:06.401 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 23:16:06.402 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 23:16:06.402 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 23:16:06.402 INFO ApplyBQSR - Deflater: IntelDeflater 23:16:06.402 INFO ApplyBQSR - Inflater: IntelInflater 23:16:06.403 INFO ApplyBQSR - GCS max retries/reopens: 20 23:16:06.403 INFO ApplyBQSR - Requester pays: disabled 23:16:06.403 INFO ApplyBQSR - Initializing engine 23:16:06.665 INFO ApplyBQSR - Done initializing engine 23:16:08.193 INFO ProgressMeter - Starting traversal 23:16:08.194 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute 23:16:08.545 INFO ApplyBQSR - Shutting down engine [August 29, 2023 at 11:16:08 PM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.04 minutes. Runtime.totalMemory()=285212672 htsjdk.samtools.util.RuntimeEOFException: Premature EOF. Expected 315 but only received 161; BinaryCodec in readmode; file: /path/in/container/8751-AM-0002_S1_L005/split_reads.bam at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:397) at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380) at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:282) at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:882) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:856) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:850) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:818) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:591) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:570) at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:27) at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:13) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:98) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) Using GATK jar /gatk/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_sam
tools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.4.0.0-local.jar ApplyBQSR -R /path/to/reference/GRCh38.p14.genome.fa -I /path/in/container/8751-AM-0002_S1_L005/split_reads.bam --bqsr-recal-file /path/in/container/8751-AM-0002_S1_L005/recalibration_report.txt -O /path/in/container/8751-AM-0002_S1_L005/split_reads.bam
Few hours later update: After running ValidateSamFile from picard, it seems that some reads are missing NM tags and that counts as an error. I am adding those using samtools calmd, and will go from there.
-
I have a question regarding your setup for the execution. Are you using SLURM or any other job scheduler to run your jobs ? Are you using docker container runtime or singularity?
If the answer is yes to any of those then we may proceed with a few suggestions to begin with.
Sincerely.
-
Hi Gökalp, thanks for your response.
I am in fact using Docker, but not slurm or any other job scheduler. Here's an example of how I'm running the commands (from R):
# Function to execute the ApplyBQSR command
apply_bqsr <- function(split_reads_bam, recalibration_report) {
apply_recalibration_command <- paste0(
"docker run",
" -v ", getwd(), ":/path/in/container",
" -v /mnt/e/GenomesAndIndexes:/path/to/reference",
" broadinstitute/gatk:latest",
" gatk ApplyBQSR",
" -R /path/to/reference/GRCh38.p14.genome.fa",
" -I /path/in/container/", split_reads_bam,
" --bqsr-recal-file /path/in/container/", recalibration_report,
" -O /path/in/container/", split_reads_bam
)
system(apply_recalibration_command)
}As an update, I read this thread: https://gatk.broadinstitute.org/hc/en-us/community/posts/1260803986229-ValiidateSamFile-Error-Mate-not-found and I think I'm having a similar issue. After alignment (with STAR), I added the NM tags with samtools calmd and the RG tags with Picard AddOrReplaceReadGroups; but then ValidateSamFile gives me many Mate Not Found errors instead of missing NM and RG tags. I think I need to include the RG tags in the alignment step itself, which I didn't realize I could do.
I am curious how running GATK within docker might affect this? Thanks so much again.Update:
On a test bam, after including the RG tags in the alignment step and then using samtools calmd for the NM tags: I am still getting MATE NOT FOUND errors with picard ValidateSamFile.
Aug 30, 2023 9:04:24 PM com.intel.gkl.NativeLibraryLoader load
INFO: Loading libgkl_compression.so from jar:file:/usr/picard/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Aug 30 21:04:24 UTC 2023] ValidateSamFile INPUT=/data/8751-AM-0004_S1_L005/8751-AM-0004_S1_L005_Aligned.sortedByCoord.out.bam MODE=VERBOSE REFERENCE_SEQUENCE=/reference-genome.fasta MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Aug 30 21:04:24 UTC 2023] Executing as root@4e03cd952f31 on Linux 5.10.16.3-microsoft-standard-WSL2 amd64; OpenJDK 64-Bit Server VM 17.0.8+7; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: 3.1.0-2-g9206efcdc-SNAPSHOT
WARNING: BAM index file /data/8751-AM-0004_S1_L005/8751-AM-0004_S1_L005_Aligned.sortedByCoord.out.bam.bai is older than BAM /data/8751-AM-0004_S1_L005/8751-AM-0004_S1_L005_Aligned.sortedByCoord.out.bam
ERROR::INVALID_TAG_NM:Record 6065264, Read name A00252:321:HMW2TDSX5:1:2624:9236:29011, NM tag (nucleotide differences) in file [5] does not match reality [4]
INFO 2023-08-30 21:05:45 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:01:20s. Time for last 10,000,000: 77s. Last read position: chr3:113,810,169
INFO 2023-08-30 21:07:02 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:02:37s. Time for last 10,000,000: 77s. Last read position: chr7:82,332,874
INFO 2023-08-30 21:08:18 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:03:53s. Time for last 10,000,000: 75s. Last read position: chr11:65,504,524
INFO 2023-08-30 21:09:34 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:05:09s. Time for last 10,000,000: 76s. Last read position: chr15:65,769,146
INFO 2023-08-30 21:11:01 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:06:36s. Time for last 10,000,000: 87s. Last read position: chr22:39,358,074
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1142:5828:8797, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2648:3305:14418, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1134:31141:5008, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2137:22661:1658, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2674:19542:27477, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2114:29921:10034, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1607:31901:31688, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1112:14326:28087, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1331:10890:3756, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1138:6677:31814, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1276:5466:33379, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2619:4463:29794, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2567:15411:36980, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2576:17770:22717, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2377:7184:22795, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1576:6777:6026, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1561:12111:21527, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1152:32732:32189, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2446:6325:35837, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1626:30553:19335, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1613:11641:22279, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2120:5032:1251, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2240:12102:26083, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1551:3323:35837, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1616:24994:35008, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2450:1561:17628, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1265:7319:12352, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2527:4056:8954, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2550:32913:16188, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1661:12961:22717, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2421:29197:11224, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2625:27941:15562, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1538:21197:20040, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2464:15474:7936, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2428:14000:5541, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2520:20229:23343, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1558:10239:13870, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1448:30165:21605, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2261:13404:16658, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1554:9263:33974, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2124:16125:8437, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1672:24361:12023, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1131:22209:25426, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2625:25518:14403, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2377:11315:4460, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1375:5177:22514, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1142:13819:12931, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1240:1316:31297, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2647:27218:13714, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2453:28067:15562, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1452:22516:14184, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2157:25816:14575, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1330:19587:27273, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1166:32371:3255, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1301:9570:24643, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1459:4707:18787, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1423:3757:4366, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1326:1271:9330, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2549:17020:2816, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1268:27724:28964, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1472:21956:13056, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1664:16595:24627, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2358:23466:22748, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2570:5909:5212, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2569:31539:24518, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2529:11270:1846, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2242:8169:15201, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2335:8323:18568, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2260:15157:21230, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2264:17698:1892, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1151:19361:24158, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1156:27859:35305, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2175:25599:21433, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2376:27968:32800, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2259:28348:19085, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2646:3522:33301, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1144:11948:11882, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:2143:31168:2018, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2320:29053:28447, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1246:24017:31908, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1455:12391:25708, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1234:17481:36495, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2120:3314:16376, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1344:3965:25833, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1434:1615:35383, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2167:2754:32690, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1571:25563:20525, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2227:13711:30624, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:1:1216:16667:12602, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:2262:22616:13604, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2112:29532:4131, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1506:28194:2722, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2118:15103:10175, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1134:1886:21793, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:1566:21504:16501, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:1650:31756:9674, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:4:2166:22589:27211, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:3:2507:14751:13573, Mate not found for paired read
ERROR::MATE_NOT_FOUND:Read name A00252:321:HMW2TDSX5:2:1362:6542:32080, Mate not found for paired read
Maximum output of [100] errors reached.
[Wed Aug 30 21:11:57 UTC 2023] picard.sam.ValidateSamFile done. Elapsed time: 7.55 minutes.
Runtime.totalMemory()=2600468480
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelpUpdate 2: on that same test bam in which I am now getting the 'Mate Not Found' errors, I am still getting this similar error as I initially did when running ApplyBQSR.
apply_bqsr(split_reads_bam, recalibration_report) 23:13:59.070 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 23:13:59.110 INFO ApplyBQSR - ------------------------------------------------------------ 23:13:59.114 INFO ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.4.0.0 23:13:59.114 INFO ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/ 23:13:59.114 INFO ApplyBQSR - Executing as root@d8da8438c99e on Linux v5.10.16.3-microsoft-standard-WSL2 amd64 23:13:59.114 INFO ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1 23:13:59.114 INFO ApplyBQSR - Start Date/Time: August 30, 2023 at 11:13:59 PM GMT 23:13:59.115 INFO ApplyBQSR - ------------------------------------------------------------ 23:13:59.115 INFO ApplyBQSR - ------------------------------------------------------------ 23:13:59.116 INFO ApplyBQSR - HTSJDK Version: 3.0.5 23:13:59.116 INFO ApplyBQSR - Picard Version: 3.0.0 23:13:59.116 INFO ApplyBQSR - Built for Spark Version: 3.3.1 23:13:59.117 INFO ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2 23:13:59.117 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 23:13:59.117 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 23:13:59.117 INFO ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 23:13:59.118 INFO ApplyBQSR - Deflater: IntelDeflater 23:13:59.118 INFO ApplyBQSR - Inflater: IntelInflater 23:13:59.118 INFO ApplyBQSR - GCS max retries/reopens: 20 23:13:59.118 INFO ApplyBQSR - Requester pays: disabled 23:13:59.119 INFO ApplyBQSR - Initializing engine 23:13:59.374 INFO ApplyBQSR - Done initializing engine 23:14:00.014 INFO ProgressMeter - Starting traversal 23:14:00.015 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute 23:14:00.355 INFO ApplyBQSR - Shutting down engine [August 30, 2023 at 11:14:00 PM GMT] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.02 minutes. Runtime.totalMemory()=285212672 htsjdk.samtools.util.RuntimeEOFException: Premature EOF. Expected 4 but only received 0; BinaryCodec in readmode; file: /path/in/container/8751-AM-0004_S1_L005/split_reads.bam at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:397) at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:507) at htsjdk.samtools.util.BinaryCodec.readInt(BinaryCodec.java:518) at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:280) at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:882) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:856) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:850) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:818) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:591) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:570) at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:27) at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:13) at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:98) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) Using GATK jar /gatk/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.4.0.0-local.jar ApplyBQSR -R /path/to/reference/GRCh38.p14.genome.fa -I /path/in/container/8751-AM-0004_S1_L005/split_reads.bam --bqsr-recal-file /path/in/container/8751-AM-0004_S1_L005/recalibration_report.txt -O /path/in/container/8751-AM-0004_S1_L005/split_reads.bam
-
I realized that your input and output files are the same file therefore the tool cannot run properly. You need to point to a new output file as a result of ApplyBQSR operation.
I hope this helps.
Please sign in to leave a comment.
3 comments