Value was put into PairInfoMap more than once
Hello everyone,
a) GATK version used:2.18.12
b) Exact command used:I am actually running sarek, and Markduplicates is part of the process. This is the command that I got from the error log:
Command executed: gatk --java-options "-Xmx49152M -XX:-UsePerfData" \ MarkDuplicates \ --INPUT 55_BoM-lane_2.0008.bam --INPUT 55_BoM-lane_2.0007.bam --INPUT 55_BoM-lane_2.0001.bam --INPUT 55_BoM-lane_2.0002.bam --INPUT 55_BoM-lane_2.0009.bam --INPUT 55_BoM-lane_2.0010.bam --INPUT 55_BoM-lane_2.0004.bam --INPUT 55_BoM-lane_2.0005.bam --INPUT 55_BoM-lane_2.0012.bam --INPUT 55_BoM-lane_2.0006.bam --INPUT 55_BoM-lane_2.0011.bam --INPUT 55_BoM-lane_1.0005.bam --INPUT 55_BoM-lane_1.0003.bam --INPUT 55_BoM-lane_2.0003.bam --INPUT 55_BoM-lane_1.0001.bam --INPUT 55_BoM-lane_1.0004.bam --INPUT 55_BoM-lane_1.0012.bam --INPUT 55_BoM-lane_1.0002.bam --INPUT 55_BoM-lane_1.0008.bam --INPUT 55_BoM-lane_1.0010.bam --INPUT 55_BoM-lane_1.0006.bam --INPUT 55_BoM-lane_1.0011.bam --INPUT 55_BoM-lane_1.0007.bam --INPUT 55_BoM-lane_1.0009.bam \ --OUTPUT 55_BoM.md.bam \ --METRICS_FILE 55_BoM.md.cram.metrics \ --TMP_DIR . \ --REFERENCE_SEQUENCE Homo_sapiens_assembly38.fasta \ -REMOVE_DUPLICATES false -VALIDATION_STRINGENCY LENIENT # If cram files are wished as output, the run samtools for conversion if [[ 55_BoM.md.cram == *.cram ]]; then samtools view -Ch -T Homo_sapiens_assembly38.fasta -o 55_BoM.md.cram 55_BoM.md.bam rm 55_BoM.md.bam samtools index 55_BoM.md.cram fi cat <<-END_VERSIONS > versions.yml "NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES": gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//') samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//') END_VERSIONS
c) Entire program log:
executor > slurm (498) [- ] NFC…EPARE_GENOME:BWAMEM1_INDEX - [- ] NFC…EPARE_GENOME:BWAMEM2_INDEX - [- ] NFC…E_GENOME:DRAGMAP_HASHTABLE - [- ] NFC…4_CREATESEQUENCEDICTIONARY - [- ] NFC…E_GENOME:MSISENSORPRO_SCAN - [- ] NFC…PARE_GENOME:SAMTOOLS_FAIDX - [- ] NFC…PREPARE_GENOME:TABIX_DBSNP - [- ] NFC…ME:TABIX_GERMLINE_RESOURCE - [- ] NFC…RE_GENOME:TABIX_KNOWN_SNPS - [- ] NFC…_GENOME:TABIX_KNOWN_INDELS - [- ] NFC…K:PREPARE_GENOME:TABIX_PON - [- ] NFC…EPARE_GENOME:UNZIP_ALLELES - [- ] NFC…:PREPARE_GENOME:UNZIP_LOCI - [- ] NFC…EK:PREPARE_GENOME:UNZIP_GC - [- ] NFC…EK:PREPARE_GENOME:UNZIP_RT - [4d/3a3143] NFC…s_Standard_modified_2.bed) | 1 of 1 ✔ [a4/d5d5b3] NFC…PLIT (X_44343546-44343600) | 343 of 343 ✔ [ef/7e073c] NFC…rgets_Standard_modified_2) | 1 of 1 ✔ [- ] NFC…NPUT:SAMTOOLS_VIEW_MAP_MAP - [- ] NFC…:SAMTOOLS_VIEW_UNMAP_UNMAP - [85/7739af] NFC…REK:FASTQC (53_BoM-lane_2) | 10 of 10 ✔ [6b/7a5686] NFC…AREK:FASTP (53_Org-lane_2) | 10 of 10 ✔ [e9/f85258] NFC…WAMEM1_MEM (53_Org-lane_2) | 120 of 120 ✔ [5b/a75a9f] NFC…K4_MARKDUPLICATES (53_Org) | 13 of 16, failed: 13, retries: 11 Plus 31 more processes waiting for tasks… -[nf-core/sarek] Pipeline completed with errors- ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (55_BoM)' Caused by: Process `NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (55_BoM)` terminated with an error exit status (3) Command executed: gatk --java-options "-Xmx49152M -XX:-UsePerfData" \ MarkDuplicates \ --INPUT 55_BoM-lane_2.0008.bam --INPUT 55_BoM-lane_2.0007.bam --INPUT 55_BoM-lane_2.0001.bam --INPUT 55_BoM-lane_2.0002.bam --INPUT 55_BoM-lane_2.0009.bam --INPUT 55_BoM-lane_2.0010.bam --INPUT 55_BoM-lane_2.0004.bam --INPUT 55_BoM-lane_2.0005.bam --INPUT 55_BoM-lane_2.0012.bam --INPUT 55_BoM-lane_2.0006.bam --INPUT 55_BoM-lane_2.0011.bam --INPUT 55_BoM-lane_1.0005.bam --INPUT 55_BoM-lane_1.0003.bam --INPUT 55_BoM-lane_2.0003.bam --INPUT 55_BoM-lane_1.0001.bam --INPUT 55_BoM-lane_1.0004.bam --INPUT 55_BoM-lane_1.0012.bam --INPUT 55_BoM-lane_1.0002.bam --INPUT 55_BoM-lane_1.0008.bam --INPUT 55_BoM-lane_1.0010.bam --INPUT 55_BoM-lane_1.0006.bam --INPUT 55_BoM-lane_1.0011.bam --INPUT 55_BoM-lane_1.0007.bam --INPUT 55_BoM-lane_1.0009.bam \ --OUTPUT 55_BoM.md.bam \ --METRICS_FILE 55_BoM.md.cram.metrics \ --TMP_DIR . \ --REFERENCE_SEQUENCE Homo_sapiens_assembly38.fasta \ -REMOVE_DUPLICATES false -VALIDATION_STRINGENCY LENIENT # If cram files are wished as output, the run samtools for conversion if [[ 55_BoM.md.cram == *.cram ]]; then samtools view -Ch -T Homo_sapiens_assembly38.fasta -o 55_BoM.md.cram 55_BoM.md.bam rm 55_BoM.md.bam samtools index 55_BoM.md.cram fi cat <<-END_VERSIONS > versions.yml "NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES": gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//') samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//') END_VERSIONS Command exit status: 3 Command output: (empty) Command error: 00:37:02.004 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so [Wed Nov 06 00:37:02 GMT 2024] MarkDuplicates --INPUT 55_BoM-lane_2.0008.bam --INPUT 55_BoM-lane_2.0007.bam --INPUT 55_BoM-lane_2.0001.bam --INPUT 55_BoM-lane_2.0002.bam --INPUT 55_BoM-lane_2.0009.bam --INPUT 55_BoM-lane_2.0010.bam --INPUT 55_BoM-lane_2.0004.bam --INPUT 55_BoM-lane_2.0005.bam --INPUT 55_BoM-lane_2.0012.bam --INPUT 55_BoM-lane_2.0006.bam --INPUT 55_BoM-lane_2.0011.bam --INPUT 55_BoM-lane_1.0005.bam --INPUT 55_BoM-lane_1.0003.bam --INPUT 55_BoM-lane_2.0003.bam --INPUT 55_BoM-lane_1.0001.bam --INPUT 55_BoM-lane_1.0004.bam --INPUT 55_BoM-lane_1.0012.bam --INPUT 55_BoM-lane_1.0002.bam --INPUT 55_BoM-lane_1.0008.bam --INPUT 55_BoM-lane_1.0010.bam --INPUT 55_BoM-lane_1.0006.bam --INPUT 55_BoM-lane_1.0011.bam --INPUT 55_BoM-lane_1.0007.bam --INPUT 55_BoM-lane_1.0009.bam --OUTPUT 55_BoM.md.bam --METRICS_FILE 55_BoM.md.cram.metrics --REMOVE_DUPLICATES false --TMP_DIR . --VALIDATION_STRINGENCY LENIENT --REFERENCE_SEQUENCE Homo_sapiens_assembly38.fasta --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --FLOW_MODE false --FLOW_QUALITY_SUM_STRATEGY false --USE_END_IN_UNPAIRED_READS false --USE_UNPAIRED_CLIPPED_END false --UNPAIRED_END_UNCERTAINTY 0 --FLOW_SKIP_FIRST_N_FLOWS 0 --FLOW_Q_IS_KNOWN_END false --FLOW_EFFECTIVE_QUALITY_THRESHOLD 15 --ADD_PG_TAG_TO_READS true --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false [Wed Nov 06 00:37:02 GMT 2024] Executing as huy45@htc-n43.crc.pitt.edu on Linux 3.10.0-1160.71.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.3-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.4.0.0 INFO 2024-11-06 00:37:02 MarkDuplicates Start of doWork freeMemory: 105152752; totalMemory: 134217728; maxMemory: 51539607552 INFO 2024-11-06 00:37:02 MarkDuplicates Reading input file and constructing read end information. INFO 2024-11-06 00:37:02 MarkDuplicates Will retain up to 186737708 data points before spilling to disk. INFO 2024-11-06 00:37:13 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: chr1:22,514,371 INFO 2024-11-06 00:37:13 MarkDuplicates Tracking 31280 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:18 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 5s. Last read position: chr1:43,172,857 INFO 2024-11-06 00:37:18 MarkDuplicates Tracking 61438 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:23 MarkDuplicates Read 3,000,000 records. Elapsed time: 00:00:17s. Time for last 1,000,000: 5s. Last read position: chr1:77,566,058 INFO 2024-11-06 00:37:23 MarkDuplicates Tracking 100124 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:28 MarkDuplicates Read 4,000,000 records. Elapsed time: 00:00:22s. Time for last 1,000,000: 5s. Last read position: chr1:113,585,391 INFO 2024-11-06 00:37:28 MarkDuplicates Tracking 141362 as yet u_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --FLOW_MODE false --FLOW_QUALITY_SUM_STRATEGY false --USE_END_IN_UNPAIRED_READS false --USE_UNPAIRED_CLIPPED_END false --UNPAIRED_END_UNCERTAINTY 0 --FLOW_SKIP_FIRST_N_FLOWS 0 --FLOW_Q_IS_KNOWN_END false --FLOW_EFFECTIVE_QUALITY_THRESHOLD 15 --ADD_PG_TAG_TO_READS true --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false [Wed Nov 06 00:37:02 GMT 2024] Executing as huy45@htc-n43.crc.pitt.edu on Linux 3.10.0-1160.71.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.3-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.4.0.0 INFO 2024-11-06 00:37:02 MarkDuplicates Start of doWork freeMemory: 105152752; totalMemory: 134217728; maxMemory: 51539607552 INFO 2024-11-06 00:37:02 MarkDuplicates Reading input file and constructing read end information. INFO 2024-11-06 00:37:02 MarkDuplicates Will retain up to 186737708 data points before spilling to disk. INFO 2024-11-06 00:37:13 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: chr1:22,514,371 INFO 2024-11-06 00:37:13 MarkDuplicates Tracking 31280 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:18 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:00:12s. Time for last 1,000,000: 5s. Last read position: chr1:43,172,857 INFO 2024-11-06 00:37:18 MarkDuplicates Tracking 61438 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:23 MarkDuplicates Read 3,000,000 records. Elapsed time: 00:00:17s. Time for last 1,000,000: 5s. Last read position: chr1:77,566,058 INFO 2024-11-06 00:37:23 MarkDuplicates Tracking 100124 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:28 MarkDuplicates Read 4,000,000 records. Elapsed time: 00:00:22s. Time for last 1,000,000: 5s. Last read position: chr1:113,585,391 INFO 2024-11-06 00:37:28 MarkDuplicates Tracking 141362 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:34 MarkDuplicates Read 5,000,000 records. Elapsed time: 00:00:28s. Time for last 1,000,000: 5s. Last read position: chr1:151,287,316 INFO 2024-11-06 00:37:34 MarkDuplicates Tracking 191840 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:40 MarkDuplicates Read 6,000,000 records. Elapsed time: 00:00:33s. Time for last 1,000,000: 5s. Last read position: chr1:162,398,986 INFO 2024-11-06 00:37:40 MarkDuplicates Tracking 220052 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:44 MarkDuplicates Read 7,000,000 records. Elapsed time: 00:00:38s. Time for last 1,000,000: 4s. Last read position: chr1:192,659,177 INFO 2024-11-06 00:37:44 MarkDuplicates Tracking 260584 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:50 MarkDuplicates Read 8,000,000 records. Elapsed time: 00:00:44s. Time for last 1,000,000: 5s. Last read position: chr1:215,674,602 INFO 2024-11-06 00:37:50 MarkDuplicates Tracking 296360 as yet unmatched pairs. 0 records in RAM. INFO 2024-11-06 00:37:56 MarkDuplicates Read 9,000,000 records. Elapsed time: 00:00:49s. Time for last 1,000,000: 5s. Last read position: chr1:244,995,544 INFO 2024-11-06 00:37:56 MarkDuplicates Tracking 334792 as yet unmatched pairs. 0 records in RAM. [Wed Nov 06 00:37:56 GMT 2024] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.91 minutes. Runtime.totalMemory()=11844714496 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 1: RGA00364:92:HKLK7DRXX:2:1166:17644:18865 at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133) at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86) at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61) at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:560) at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:270) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:289) at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) Work dir: /sarek/work/f5/52b3e1603b07f6131c512785d563ac Container: /sarek/work/singularity/quay.io-biocontainers-mulled-v2-d9e7bad0f7fbc8f4458d5c3ab7ffaaf0235b59fb-f857e2d6cc88d35580d01cf39e0959a68b83c1d9-0.img Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh` -- Check '.nextflow.log' file for details
I understand that this issue has been posted again and again. But after trying several methods, it still shows the same error. I have used MergeBamAlignment after using FastqToSam, I have also tried AddOrReplaceReadGroups, and I checked that my fastq files do not have duplicate reads, so I checked my bam file, and removed those duplicate reads using samtools. Running those deduped bams in this workflow is still showing the same error. I am running out of solutions, and would appreciate any input to solve this issue, thanks in advance.
-
Hi Will Ye
It looks like your inputs contain the very same read twice (same readname same mapped coordinate). You may need to check your input bams for this case. It may be present within more than one bam or could be present in a single bam more than once.
Unfortunately we do not have any support for third party tools and workflows such as sarek but you may wish to check in a command line alone without these tools to see if you can find the violating alignment record.
I hope this helps.
Please sign in to leave a comment.
1 comment