filterSamReads stdout format error
Hi there, I am seeing some differences between the /dev/stdout and -O output.bam output formats for filterSamReads.
GATK version: gatk-4.0.10.0
Command 1:
gatk-4.0.10.0/gatk FilterSamReads -I input.bam -O /dev/stdout --READ_LIST_FILE unmap.wrong.excludes --FILTER excludeReadList --VALIDATION_STRINGENCY SILENT > stdout.bam
Command 2:
gatk-4.0.10.0/gatk FilterSamReads -I input.bam -O output.bam --READ_LIST_FILE unmap.wrong.excludes --FILTER excludeReadList --VALIDATION_STRINGENCY SILENT
When I use the bams that are created, Command 2 gives me a usable bam file, but command 1 gives me errors suggesting a malformed/truncated bam file:
samtools view stdout.bam
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.
samtools view output.bam #works fine
Ideally, I would be able to stream from gatk into a separate command. I tried using the --QUIET parameter to see if the stdout stream was interfered with by logging updates, but that didn't help. Do you have any suggestions?
-
Hi rcorbett,
It doesn't look like there are any issues in your command line arguments that should cause this issue. We also were not able to recreate this issue on our end.
Could you open the stdout.bam in a text editor and see if there are any log messages that show up in it?
Thank you,
Genevieve
-
Thanks Genevieve,
If I looks through the file I don't see anything human readable. Similarly, if I try
strings stdout.bam
I don't see anything of note.
Just a little more information on the above...
sambamba flagstat output.bam
476198926 + 76232480 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
248291543 + 32776940 duplicates
450811229 + 40501993 mapped (94.67%:53.13%)
476198926 + 76232480 paired in sequencing
238099463 + 38116240 read1
238099463 + 38116240 read2
427485816 + 29684552 properly paired (89.77%:38.94%)
431438088 + 29974900 with itself and mate mapped
19373141 + 10527093 singletons (4.07%:13.81%)
1774010 + 153138 with mate mapped to a different chr
1103088 + 97901 with mate mapped to a different chr (mapQ>=5)sambamba flagstat -p stdout.bam
Error reading BGZF block starting from offset 0: wrong BGZF magicdiff output.bam stdout.bam
Binary files passed.bam and passed_stdout.bam differdu passed.bam stdout.bam
54007684 output.bam
54007684 stdout.bamThe original bam is an old one with BWA 0.5.7 aln alignments. I'm trying this on some more recently data aligned with BWA 0.7.6a mem to see if the error persists.
-
Thanks for the quick reply and that info - let me know if the error persists and we will continue to look into it.
-
Hi again, we tried the same command on a BWA 0.7.6a mem aligned bam and ran in to the same issue.
-
Hi rcorbett,
There are a couple more things that would be helpful if you could try and let us know the output so that we can diagnose the issue:
- Run the GATK Tool PrintReads with the input as your stdout output bam. Please post the entire stack trace of this command.
- Manually g unzip the stdout output bam file and look at the first 10 or so lines of the file. Is there the correct sam header?
Thank you,
Genevieve
-
Hi Genevieve Brandt (she/her),
For 1. above:
gatk PrintReads -I stdout.bam -O tmp.tmp --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
Using GATK jar /gsc/software/linux-x86_64-centos6/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /gsc/software/linux-x86_64-centos6/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar PrintReads -I stdout.bam -O tmp.tmp
14:39:25.485 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gsc/software/linux-x86_64-centos6/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:39:25.788 INFO PrintReads - ------------------------------------------------------------
14:39:25.788 INFO PrintReads - The Genome Analysis Toolkit (GATK) v4.0.10.0
14:39:25.788 INFO PrintReads - For support and documentation go to https://software.broadinstitute.org/gatk/
14:39:25.789 INFO PrintReads - Executing as rcorbett@gphost11.bcgsc.ca on Linux v3.10.0-957.5.1.el7.x86_64 amd64
14:39:25.789 INFO PrintReads - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-b08
14:39:25.789 INFO PrintReads - Start Date/Time: February 8, 2021 2:39:25 PST PM
14:39:25.789 INFO PrintReads - ------------------------------------------------------------
14:39:25.789 INFO PrintReads - ------------------------------------------------------------
14:39:25.790 INFO PrintReads - HTSJDK Version: 2.16.1
14:39:25.790 INFO PrintReads - Picard Version: 2.18.13
14:39:25.791 INFO PrintReads - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:39:25.791 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:39:25.791 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:39:25.791 INFO PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:39:25.791 INFO PrintReads - Deflater: IntelDeflater
14:39:25.791 INFO PrintReads - Inflater: IntelInflater
14:39:25.791 INFO PrintReads - GCS max retries/reopens: 20
14:39:25.792 INFO PrintReads - Requester pays: disabled
14:39:25.792 INFO PrintReads - Initializing engine
14:39:26.370 INFO PrintReads - Done initializing engine
14:39:26.374 WARN ReadUtils - Skipping index file creation for: tmp.tmp. Index file creation requires reads in coordinate sorted order.
14:39:26.386 INFO ProgressMeter - Starting traversal
14:39:26.387 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
14:39:26.440 INFO PrintReads - Shutting down engine
[February 8, 2021 2:39:26 PST PM] org.broadinstitute.hellbender.tools.PrintReads done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2107113472
htsjdk.samtools.SAMFormatException: Error parsing text SAM file. Not enough fields; File stdout.bam; Line 1
Line: Tool returned:
at htsjdk.samtools.SAMLineParser.reportFatalErrorParsingLine(SAMLineParser.java:446)
at htsjdk.samtools.SAMLineParser.parseLine(SAMLineParser.java:231)
at htsjdk.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:268)
at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:255)
at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:228)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:27)
at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:13)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:89)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
For 2. above:gunzip -c -d -f passed.bam | head -n 5
BAM?2@HD VN:1.5 GO:none SO:coordinate
@SQ SN:1 LN:249250621 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens
@SQ SN:2 LN:243199373 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens
@SQ SN:3 LN:198022430 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens
@SQ SN:4 LN:191154276 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens
@SQ SN:5 LN:180915260 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiensgunzip -c -d -f stdout.bam | head
Tool returned:
0
Jͽ{?$y~?3????;?ݕ?ݙ???;????W=??UuI?7?cw?????Vw??)????
#衃?3?Y?1X?? d??0a??!c?"D?C? -
Hi rcorbett,
Thanks for looking into that.
Could you run ValidateSamFile with your passed.bam to verify if there are any format issues? https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile
Thank you,
Genevieve
-
Good afternoon.
gatk ValidateSamFile -I passed.bam -M SUMMARY
Summary output:
## HISTOGRAM java.lang.String
Error Type Count
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 14964150
WARNING:ADJACENT_INDEL_IN_CIGAR 4 -
Hi rcorbett,
Would you be able to submit a bug report following these instructions with all of your bam files? The input.bam, output.bam, stdout.bam, and exclude files.
Our only other suggestion is that you could try using a newer version of GATK to verify that this is not a bug with the version you are using. We are currently on 4.1.9.0.
Genevieve
-
Thank you. I tried the same command with 4.1.9.0 and got the same errors. I will create a bug report.
-
Okay, great. Let me know once you have uploaded and the name of the folder.
-
OK.
I uploaded a file called
filterSamReads_output_format.tar.gz
I was surprised to not be asked for a password at the following address, so it might be worth checking if the file is in fact on your server.
gsapubftp@ftp.broadinstitute.org
-
Hi rcorbett, it's there! Thank you, I'll get back to you when I have more information.
-
Hi rcorbett,
I was able to look at your files and saw that the issue is a bug when running picard tools through gatk and the exit code text is getting printed to the stdout bam:
Tool returned: 0
I created a ticket on github for our team to solve this bug, you can follow along for more information there: https://github.com/broadinstitute/gatk/issues/7080
Thank you for sending in those files, it was helpful in determining the issue.
Best,
Genevieve
-
Hi rcorbett,
I did more digging and found that this issue only comes up when running this Picard tool through GATK. I tried with the Picard jar and this issue was not present. So, if you want to use the stdout, you can use the workaround of using the Picard jar file for Picard tools.
The team will continue to work on a solution to this at the github link I provided above. Thank you for bringing this to our attention.
Best,
Genevieve
Please sign in to leave a comment.
15 comments