SAMException: Value was put into PairInfoMap more than once
AnsweredHi,
I have been using MarkDuplicates on a number of tumor sample data and I see an error message for some of my BAM inputs. I would be grateful if you could help me figure this out as I have exhausted the resources in other forums as well as the other two similar posts here on GATK with no luck. I tried VALIDATION_STRINGENCY=LINIENT, though it did not resolve the issue.
Below is the information required. I also included the commands for the preceding steps for your information.
a) Picard version used:
2.26.3
b) Exact command used:
#realignment
bwa mem -M -t 4 $MY_ANALYSIS/$SAMPLE.fasta $MY_ANALYSIS/$SAMPLE.end1.fq $MY_ANALYSIS/$SAMPLE.end2.fq > $MY_ANALYSIS/$SAMPLE.aln.sam
#filter low mapping quality reads
samtools view -b -h -q 40 -o $MY_ANALYSIS/$SAMPLE.aln.bam $MY_ANALYSIS/$SAMPLE.aln.sam
#Sort and index the new bam file
samtools sort $MY_ANALYSIS/$SAMPLE.aln.bam > $MY_ANALYSIS/$SAMPLE.sorted.bam
samtools index $MY_ANALYSIS/$SAMPLE.sorted.bam
#Mark Duplicates
java -Xmx32g -XX:+UseSerialGC -jar /seq/software/picard-public/current/picard.jar MarkDuplicates I=$MY_ANALYSIS/$SAMPLE.sorted.bam O=$MY_ANALYSIS/$SAMPLE.deduped.bam M=$MY_ANALYSIS/$SAMPLE.dup_metrics.txt CREATE_INDEX=true REMOVE_DUPLICATES=true
c) Entire error log:
[main] CMD: bwa mem -M -t 4 /xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.fasta /xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.end1.fq /xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.end2.fq
[main] Real time: 12034.447 sec; CPU: 12032.787 sec
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
INFO 2021-10-16 07:10:23 MarkDuplicates
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** MarkDuplicates -I /xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.sorted.bam -O /xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.deduped.bam -M /xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.dup_metrics.txt -CREATE_INDEX true -REMOVE_DUPLICATES true
**********
07:10:24.734 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/seq/software/picard-public/2.26.3/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Oct 16 07:10:24 UTC 2021] MarkDuplicates INPUT=[/xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.sorted.bam] OUTPUT=/xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.deduped.bam METRICS_FILE=/xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.dup_metrics.txt REMOVE_DUPLICATES=true CREATE_INDEX=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat Oct 16 07:10:24 UTC 2021] Executing as mtanhaem@uger-c026.broadinstitute.org on Linux 3.10.0-1160.15.2.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.3
INFO 2021-10-16 07:10:24 MarkDuplicates Start of doWork freeMemory: 2052923368; totalMemory: 2076049408; maxMemory: 33214431232
INFO 2021-10-16 07:10:24 MarkDuplicates Reading input file and constructing read end information.
INFO 2021-10-16 07:10:24 MarkDuplicates Will retain up to 120342142 data points before spilling to disk.
INFO 2021-10-16 07:10:34 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:08s. Time for last 1,000,000: 8s. Last read position: chr3:175,143,875
INFO 2021-10-16 07:10:34 MarkDuplicates Tracking 11781 as yet unmatched pairs. 1102 records in RAM.
INFO 2021-10-16 07:10:40 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:00:14s. Time for last 1,000,000: 5s. Last read position: chr10:63,826,756
INFO 2021-10-16 07:10:40 MarkDuplicates Tracking 23187 as yet unmatched pairs. 1799 records in RAM.
INFO 2021-10-16 07:10:45 MarkDuplicates Read 3,000,000 records. Elapsed time: 00:00:18s. Time for last 1,000,000: 4s. Last read position: chr13:41,166,179
INFO 2021-10-16 07:10:45 MarkDuplicates Tracking 31222 as yet unmatched pairs. 1713 records in RAM.
INFO 2021-10-16 07:10:50 MarkDuplicates Read 4,000,000 records. Elapsed time: 00:00:23s. Time for last 1,000,000: 4s. Last read position: chr17:35,540,735
INFO 2021-10-16 07:10:50 MarkDuplicates Tracking 40027 as yet unmatched pairs. 1424 records in RAM.
INFO 2021-10-16 07:10:55 MarkDuplicates Read 5,000,000 records. Elapsed time: 00:00:29s. Time for last 1,000,000: 5s. Last read position: chrX:39,909,983
INFO 2021-10-16 07:10:55 MarkDuplicates Tracking 48495 as yet unmatched pairs. 1557 records in RAM.
[Sat Oct 16 07:10:56 UTC 2021] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.53 minutes.
Runtime.totalMemory()=3317641216
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 24: RGHGLNHCCX2210714:8:1110:13707:58040
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:558)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:258)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
[E::hts_open_format] Failed to open file /xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.deduped.bam
samtools view: failed to open "/xchip/cromptonlab/MoTanha/PSTquant_runs/runs_directory/18138-016M3-r_FLI-EWS_234-234/18138-016M3-r.deduped.bam" for reading: No such file or directory
----------
I also tried ValidateSam to look at the BAM file. Setting MODE=summary resulted in getting the same error message:
ValidateSamFile Value was put into PairInfoMap more than once. 24: HGLNHCCX2210714:8:1110:13707:58040
And running in MODE=VERBOSE (warnings off) resulted in these error messages types for at least 100 records: MISSING_READ_GROUP, MISMATCH_MATE_ALIGNMENT_START, MISMATCH_FLAG_MATE_NEG_STRAND, MISMATCH_MATE_CIGAR_STRING
Moreover, I tried FixMateInformation in hopes for a solution, though I get the following error:
Exception in thread "main" htsjdk.samtools.SAMException: Found two records that are paired, not supplementary, and first of the pair: HGL2LCCX2210714:1:1101:2727:49285
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.advance(SamPairUtil.java:454)
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.next(SamPairUtil.java:501)
at htsjdk.samtools.SamPairUtil$SetMateInfoIterator.next(SamPairUtil.java:388)
at picard.sam.FixMateInformation.doWork(FixMateInformation.java:224)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
I would be grateful if you could help me understand what I can do to resolve these errors. If these errors do not point to an important issue, how can I dismiss them via Picard other than changing VALIDATION_STRINGENCY?
Thank you in advance.
Best,
Mo
-
Try adding read groups to your bam file. We have an article that gives an overview of how to check for read groups in your file: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups.
You can use the GATK tool AddOrReplaceReadGroups to add the read groups: https://gatk.broadinstitute.org/hc/en-us/articles/360035532352-Errors-about-read-group-RG-information
Best,
Genevieve
-
Hi Genevieve,
Thank you for your response. I tried your suggestion. Unfortunately, adding read groups information to my BAM file did not solve the problem and I still get the exact same exception. I have other samples that I run my commands on, and even without read group information, I do not get the SAM exception as I do here. Are there any other possible strategies to tackle this?
Best,
Mo
-
Thanks for the follow up information, I'll help you figure out what is actually causing the issue. One note - read groups are required by GATK, so if your files do not have any read group information, you will get an error message at some point with GATK.
I was hoping that your issue had to do with read groups because the other reasons you would get this error message are more complicated issues with your reads. The last time this was posted on the forum, the user ended up finding that there was a problem in the demultiplexing step: https://gatk.broadinstitute.org/hc/en-us/community/posts/360071534472-SAMException-Value-was-put-into-PairInfoMap-error-even-after-using-M-flag.
I also found many helpful posts on biostars and our legacy forum, please take a look to determine which of the read issues could be happening in your file:
- https://www.biostars.org/p/60263/
- https://www.biostars.org/p/242269/
- https://www.biostars.org/p/176814/
- https://github.com/broadinstitute/picard/issues/1148
- https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2016-08-11-2016-04-07/7431-MarkDuplicates-error-Value-was-put-into-PairInfoMap-more-than-once
- https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2018-04-11-2017-12-02/11736-Markduplicate-error-on-gatk4
- https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2017-12-02-2017-06-19/10115-picard-markdup-errorValue-was-put-into-PairInfoMap-more-than-once
Please let me know if any of these solutions work for you and if you have further questions.
Best,
Genevieve
-
Thank you so much for the links, Genevieve. These were really helpful and I was able to find a solution for my case.
Looks like our fastq files had some reads twice. I checked the *.end1.fq and *.end2.fq files and realized that the same reads are shown twice in each file. I guess this issue has lead to the reads being improperly paired. As a result, their SAM flags contains 0x2 in their value. I removed those reads using
samtools view -f 0x2
which solved the problem. Of course, this was with the assumption that removing those reads would not cause much information loss.
Thank you again for all the help and support.
Best,
Mo
-
Hi Mo,
Glad to hear that you were able to find and solve the problem! Thanks for posting the solution, I'm sure it will be helpful to users in the future.
Best,
Genevieve
Please sign in to leave a comment.
5 comments