SAMException: Value was put into PairInfoMap error even after using -M flag
Hi Everyone.
I am facing a nontrivial issue at the markduplicates step in all the 14 pairs (samples of cancer).
I keep on getting PairInfoMap error in all of my samples because of some unknown reason. I referred to this blog of yours and used -M flag in bwa mem however got no success. I use the following command
```
bwa mem -t 20 -M -R @RG\tID:HNJ53DSXX:1\tSM:13KIINT_S7\tPL:ILLUMINA /home/rohit.igib/raid_drive/GRCh38_GATK/Homo_sapiens_assembly38.fasta /home/rohit.igib/raid_drive/correct/13KIINT_S7_L001_R1_001.fastq.gz /home/rohit.igib/raid_drive/correct/13KIINT_S7_L001_R2_001.fastq.gzbwa mem -t 20 -M -R @RG\tID:HNJ53DSXX:1\tSM:13KIINT_S7\tPL:ILLUMINA /home/rohit.igib/raid_drive/GRCh38_GATK/Homo_sapiens_assembly38.fasta /home/rohit.igib/raid_drive/correct/13KIINT_S7_L001_R1_001.fastq.gz /home/rohit.igib/raid_drive/correct/13KIINT_S7_L001_R2_001.fastq.gz
```
Running validate sam produced the similar ERROR as mentioned below. Just to be transparent, the fastq files I got were out of order as well. So I used BBMap repair.sh to reorder the reads. I don't understand what are other feasible approaches to remove such reads without aligning.
Post aligning, running validate sam for all 14 pairs, and then pulling out read pair that's causing the issue and removing them from BAMs sounds cumbersome. Is there a way to remove such reads directly from fastq files? Because we don't want people to face similar issues when we make the data public.
Tools and Versions
a) Picard Version: 2.23.0
b) Bwa Version: 0.7.17-r1188
c) To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 1: RGA00804:41:HNJ53DSXX:1:1459:5620:27915
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:559)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
-
Hi rohit satyam, it looks like your BWA command is not correct and is not adding read groups properly. You could also look at this tool to make sure your read groups are added properly: https://gatk.broadinstitute.org/hc/en-us/articles/360045800972-FastqToSam-Picard-. A uBAM can be used instead of FASTQ and can be seen in the pipeline here: https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-
-
My BWA command ran successfully on 9 pairs of the data without any error. The new preprocessing pipeline of GATK as mentioned here and the one you mentioned should practically give the same output. I
-
Hi rohit satyam, is your MarkDuplicates command working now? If not, I would see if even though BWA does not get an error, you may need to edit it to correctly get the read groups.
-
Here is some more documentation about read groups: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
We make a tool that can add or edit your read groups: https://gatk.broadinstitute.org/hc/en-us/articles/360046223331-AddOrReplaceReadGroups-Picard- It may be useful to look into that, because it seems like your PairInfoMap error is a result of the read groups not being correct.
-
Hi Genevieve-Brandt-she-her. Thanks for your input. We suspected something to be fishy with the demultiplexing step and it turns out to be true. We got it demultiplexed it again and now it works fine... Thanks for the support.
Please sign in to leave a comment.
5 comments