Issue with MergeBamAlignment
Hello. I'm running into a persistent issue with Picard's MergeBamAlignment when running the GATK best practices pre-processing pipeline on targeted sequencing data.
java.lang.IllegalStateException: Aligned record iterator (A00685:197:HH2FVDSX3:3:1101:10004:11835) is behind the unmapped reads (A00685:197:HH2FVDSX3:3:1101:10004:11835 BC:Z:AGTCCTTC+TAGGACTC ZA:Z:GCGT ZB:Z:TGGT RX:Z:GCG-TGG QX:Z:FFF)
at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:557)
at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:366)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
I've seen that several other people have reported this same issue previously (https://gatk.broadinstitute.org/hc/en-us/community/posts/360067295232-mergeBam-picard-issue, https://github.com/broadinstitute/picard/issues/1689, https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2019-02-11-2018-08-12/13322-picard-merge-bam-and-ubam-error) and have attempted to replicate the recommended solutions. However, nothing has worked so far.
The read in question is present in both the unmapped and the merged bam files:
samtools view A006850197_171828_S28_L003.unmapped.bam | grep "A00685:197:HH2FVDSX3:3:1101:10004:11835"
A00685:197:HH2FVDSX3:3:1101:10004:11835 BC:Z:AGTCCTTC+TAGGACTC ZA:Z:GCGT ZB:Z:TGGT RX:Z:GCG-TGG QX:Z:FFF 77 * 0 0 * * 0 0 GATGGAGCAAGAGCAGACTATTTACCGCAGGGTCTTGCCAGTCGACTACCTTTGCTTCTTAACACGGGACTTGGGCACTCCTGAATGCCAGAGCTCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF RG:Z:HH2FVDSX3.3
A00685:197:HH2FVDSX3:3:1101:10004:11835 BC:Z:AGTCCTTC+TAGGACTC ZA:Z:GCGT ZB:Z:TGGT RX:Z:GCG-TGG QX:Z:FFF 141 * 0 0 * * 0 0 ATCGACGCTGAGATGGATGCTTTGAGGCAGGGCAAGGAGCTCTGGCATTCAGGAGTGCCCAAGTCCCGTGTTAAGAAGCAAAGGTAGTCGACTGGCA FFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF RG:Z:HH2FVDSX3.3
samtools view A006850197_171828_S28_L003.unmerged.bam | grep "A00685:197:HH2FVDSX3:3:1101:10004:11835"
A00685:197:HH2FVDSX3:3:1101:10004:11835 99 chr10 95931061 60 97M = 95931096 132 GATGGAGCAAGAGCAGACTATTTACCGCAGGGTCTTGCCAGTCGACTACCTTTGCTTCTTAACACGGGACTTGGGCACTCCTGAATGCCAGAGCTCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:97 MC:Z:97M AS:i:97 XS:i:0
A00685:197:HH2FVDSX3:3:1101:10004:11835 147 chr10 95931096 60 97M = 95931061 -132 TGCCAGTCGACTACCTTTGCTTCTTAACACGGGACTTGGGCACTCCTGAATGCCAGAGCTCCTTGCCCTGCCTCAAAGCATCCATCTCAGCGTCGAT FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF NM:i:0 MD:Z:97 MC:Z:97M AS:i:97 XS:i:0
I've tried re-running MergeBamAlignment after running one or both files through SortSam (-SO queryname), which all resulted in the same error above.
After that, I ran ValidateSamFile and got the following:
gatk ValidateSamFile -I A006850197_171828_S28_L003.unmapped.bam
Using GATK jar /cm/shared/apps/GenomeAnalysisTk/4.2.4.0/gatk-package-4.2.4.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /cm/shared/apps/GenomeAnalysisTk/4.2.4.0/gatk-package-4.2.4.0-local.jar ValidateSamFile -I A006850197_171828_S28_L003.unmapped.bam
14:57:45.828 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cm/shared/apps/GenomeAnalysisTk/4.2.4.0/gatk-package-4.2.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jul 21 14:57:45 EDT 2022] ValidateSamFile --INPUT A006850197_171828_S28_L003.unmapped.bam --MODE VERBOSE --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BISULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Jul 21, 2022 2:57:46 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Thu Jul 21 14:57:46 EDT 2022] Executing as lopezj3@x001 on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.4.0
WARNING 2022-07-21 14:57:46 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
INFO 2022-07-21 14:58:09 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:00:23s. Time for last 10,000,000: 23s. Last read position: */*
INFO 2022-07-21 14:58:32 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:00:46s. Time for last 10,000,000: 22s. Last read position: */*
INFO 2022-07-21 14:58:53 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:01:07s. Time for last 10,000,000: 21s. Last read position: */*
INFO 2022-07-21 14:59:15 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:01:29s. Time for last 10,000,000: 21s. Last read position: */*
INFO 2022-07-21 14:59:36 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:01:50s. Time for last 10,000,000: 21s. Last read position: */*
No errors found
[Thu Jul 21 14:59:39 EDT 2022] picard.sam.ValidateSamFile done. Elapsed time: 1.89 minutes.
Runtime.totalMemory()=2872049664
Tool returned:
0
gatk ValidateSamFile -I A006850197_171828_S28_L003.unmerged.bam -M SUMMARY
Using GATK jar /cm/shared/apps/GenomeAnalysisTk/4.2.4.0/gatk-package-4.2.4.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /cm/shared/apps/GenomeAnalysisTk/4.2.4.0/gatk-package-4.2.4.0-local.jar ValidateSamFile -I /mnt/beegfs/lopezj3/scripts/wdl/gatk_pp_fromFastq/May_June_panels/cromwell-executions/GATK_PPforVariantDiscovery_FastQ/66eac55f-7d5a-4dc4-aed4-7c49d8bbcf00/call-SamToFastqAndBwaMem/shard-0/execution/A006850197_171828_S28_L003.unmerged.bam -M SUMMARY
13:22:25.130 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/cm/shared/apps/GenomeAnalysisTk/4.2.4.0/gatk-package-4.2.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Jul 16 13:22:25 EDT 2022] ValidateSamFile --INPUT /mnt/beegfs/lopezj3/scripts/wdl/gatk_pp_fromFastq/May_June_panels/cromwell-executions/GATK_PPforVariantDiscovery_FastQ/66eac55f-7d5a-4dc4-aed4-7c49d8bbcf00/call-SamToFastqAndBwaMem/shard-0/execution/A006850197_171828_S28_L003.unmerged.bam --MODE SUMMARY --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BISULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Jul 16, 2022 1:22:25 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Sat Jul 16 13:22:25 EDT 2022] Executing as lopezj3@x001 on Linux 3.10.0-1062.12.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.4.0
WARNING 2022-07-16 13:22:25 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
INFO 2022-07-16 13:22:25 SamFileValidator Seen many non-increasing record positions. Printing Read-names as well.
INFO 2022-07-16 13:23:03 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:00:38s. Time for last 10,000,000: 38s. Last read position: NC_000016.9:53,337,842. Last read name: A00685:197:HH2FVDSX3:3:1321:4987:36057
INFO 2022-07-16 13:23:41 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:01:16s. Time for last 10,000,000: 37s. Last read position: NC_000013.10:95,179,260. Last read name: A00685:197:HH2FVDSX3:3:1541:9100:2002
INFO 2022-07-16 13:24:16 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:01:51s. Time for last 10,000,000: 34s. Last read position: NC_000001.10:11,227,409. Last read name: A00685:197:HH2FVDSX3:3:2166:30671:25332
INFO 2022-07-16 13:24:50 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:02:25s. Time for last 10,000,000: 34s. Last read position: NC_000002.11:196,765,025. Last read name: A00685:197:HH2FVDSX3:3:2417:32208:9267
INFO 2022-07-16 13:25:26 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:03:01s. Time for last 10,000,000: 35s. Last read position: NW_003871055.3:2,269,552. Last read name: A00685:197:HH2FVDSX3:3:2645:29288:15295
## HISTOGRAM java.lang.String
Error Type Count
ERROR:MISSING_READ_GROUP 1
WARNING:RECORD_MISSING_READ_GROUP 51807882
I then tried running samtools addreplacerg to add in the readgroup from the unmapped bam to the mapped bam:
samtools addreplacerg -r "@RG\tID:HH2FVDSX3.3\tLB:171828_lib\tPL:illumina\tSM:171828" -m overwrite_all -O b -o 171828_RG.bam 171828_S28_L003.unmerged.sort.bam
Rerunning MergeBamAlignment with the resulting file once again gave the same error.
Running the same Best Practices wdl pipeline in an different cohort of targeted sequence data from the same sequencing center resulted in no issues and the pipeline runs to completion. Upon getting in touch with the sequencing center, the only difference between the two sequencing batches is that the batch which results in the error, the UMI sequences were trimmed from the reads prior to delivery, which added additional tags to the read names. The recommended I rerun the pipeline while adding the -C option to the bwa mem command. However, attempting this still resulted in the same error: Aligned record iterator (A00685:197:HH2FVDSX3:3:1101:10004:11835) is behind the unmapped reads.
Thank you in advance for any insights or assistance you could provide regarding this issue and please let me know if there is any additional information you would need to diagnose what is causing the issue.
Javier
Please sign in to leave a comment.
0 comments