Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

filterSamReads stdout format error

0

15 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi rcorbett,

    It doesn't look like there are any issues in your command line arguments that should cause this issue. We also were not able to recreate this issue on our end.

    Could you open the stdout.bam in a text editor and see if there are any log messages that show up in it? 

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    rcorbett

    Thanks Genevieve,

    If I looks through the file I don't see anything human readable.  Similarly, if I try 

    strings stdout.bam

    I don't see anything of note.

    Just a little more information on the above...

    sambamba flagstat output.bam
    476198926 + 76232480 in total (QC-passed reads + QC-failed reads)
    0 + 0 secondary
    0 + 0 supplementary
    248291543 + 32776940 duplicates
    450811229 + 40501993 mapped (94.67%:53.13%)
    476198926 + 76232480 paired in sequencing
    238099463 + 38116240 read1
    238099463 + 38116240 read2
    427485816 + 29684552 properly paired (89.77%:38.94%)
    431438088 + 29974900 with itself and mate mapped
    19373141 + 10527093 singletons (4.07%:13.81%)
    1774010 + 153138 with mate mapped to a different chr
    1103088 + 97901 with mate mapped to a different chr (mapQ>=5)

     

    sambamba flagstat -p  stdout.bam
    Error reading BGZF block starting from offset 0: wrong BGZF magic

     

    diff output.bam stdout.bam
    Binary files passed.bam and passed_stdout.bam differ

     

    du passed.bam stdout.bam
    54007684 output.bam
    54007684 stdout.bam

     

    The original bam is an old one with BWA 0.5.7 aln alignments.   I'm trying this on some more recently data aligned with BWA 0.7.6a mem to see if the error persists.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the quick reply and that info - let me know if the error persists and we will continue to look into it.

    0
    Comment actions Permalink
  • Avatar
    rcorbett

    Hi again, we tried the same command on a BWA 0.7.6a mem aligned bam and ran in to the same issue.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi rcorbett,

    There are a couple more things that would be helpful if you could try and let us know the output so that we can diagnose the issue:

    1. Run the GATK Tool PrintReads with the input as your stdout output bam. Please post the entire stack trace of this command.
    2. Manually g unzip the stdout output bam file and look at the first 10 or so lines of the file. Is there the correct sam header?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    rcorbett

    Hi Genevieve-Brandt-she-her,

    For 1. above:

    gatk PrintReads -I stdout.bam -O tmp.tmp --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
    Using GATK jar /gsc/software/linux-x86_64-centos6/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /gsc/software/linux-x86_64-centos6/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar PrintReads -I stdout.bam -O tmp.tmp
    14:39:25.485 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gsc/software/linux-x86_64-centos6/gatk-4.0.10.0/gatk-package-4.0.10.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    14:39:25.788 INFO  PrintReads - ------------------------------------------------------------
    14:39:25.788 INFO  PrintReads - The Genome Analysis Toolkit (GATK) v4.0.10.0
    14:39:25.788 INFO  PrintReads - For support and documentation go to https://software.broadinstitute.org/gatk/
    14:39:25.789 INFO  PrintReads - Executing as rcorbett@gphost11.bcgsc.ca on Linux v3.10.0-957.5.1.el7.x86_64 amd64
    14:39:25.789 INFO  PrintReads - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-b08
    14:39:25.789 INFO  PrintReads - Start Date/Time: February 8, 2021 2:39:25 PST PM
    14:39:25.789 INFO  PrintReads - ------------------------------------------------------------
    14:39:25.789 INFO  PrintReads - ------------------------------------------------------------
    14:39:25.790 INFO  PrintReads - HTSJDK Version: 2.16.1
    14:39:25.790 INFO  PrintReads - Picard Version: 2.18.13
    14:39:25.791 INFO  PrintReads - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    14:39:25.791 INFO  PrintReads - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    14:39:25.791 INFO  PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    14:39:25.791 INFO  PrintReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    14:39:25.791 INFO  PrintReads - Deflater: IntelDeflater
    14:39:25.791 INFO  PrintReads - Inflater: IntelInflater
    14:39:25.791 INFO  PrintReads - GCS max retries/reopens: 20
    14:39:25.792 INFO  PrintReads - Requester pays: disabled
    14:39:25.792 INFO  PrintReads - Initializing engine
    14:39:26.370 INFO  PrintReads - Done initializing engine
    14:39:26.374 WARN  ReadUtils - Skipping index file creation for: tmp.tmp. Index file creation requires reads in coordinate sorted order.
    14:39:26.386 INFO  ProgressMeter - Starting traversal
    14:39:26.387 INFO  ProgressMeter -        Current Locus  Elapsed Minutes       Reads Processed     Reads/Minute
    14:39:26.440 INFO  PrintReads - Shutting down engine
    [February 8, 2021 2:39:26 PST PM] org.broadinstitute.hellbender.tools.PrintReads done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=2107113472
    htsjdk.samtools.SAMFormatException: Error parsing text SAM file. Not enough fields; File stdout.bam; Line 1
    Line: Tool returned:
    at htsjdk.samtools.SAMLineParser.reportFatalErrorParsingLine(SAMLineParser.java:446)
    at htsjdk.samtools.SAMLineParser.parseLine(SAMLineParser.java:231)
    at htsjdk.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:268)
    at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:255)
    at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:228)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
    at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:27)
    at org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator.next(SAMRecordToReadIterator.java:13)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
    at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:89)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     


    For 2. above:

    gunzip -c -d -f passed.bam | head -n 5
    BAM?2@HD VN:1.5 GO:none SO:coordinate
    @SQ SN:1 LN:249250621 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens
    @SQ SN:2 LN:243199373 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens
    @SQ SN:3 LN:198022430 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens
    @SQ SN:4 LN:191154276 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens
    @SQ SN:5 LN:180915260 AS:NCBI-Build-37 UR:http://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa SP:Homo sapiens

     

    gunzip -c -d -f stdout.bam | head
    Tool returned:
    0
    Jͽ{?$y~?3????;?ݕ?ݙ???;????W=??UuI?7?cw?????Vw??)????
                                                        #衃?3?Y?1X?? d??0a??!c?"D?C?

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi rcorbett,

    Thanks for looking into that.

    Could you run ValidateSamFile with your passed.bam to verify if there are any format issues? https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    rcorbett

    Good afternoon.

    gatk ValidateSamFile -I passed.bam   -M SUMMARY

    Summary output:

    ## HISTOGRAM java.lang.String
    Error Type Count
    ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 14964150
    WARNING:ADJACENT_INDEL_IN_CIGAR 4

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi rcorbett,

    Would you be able to submit a bug report following these instructions with all of your bam files? The input.bam, output.bam, stdout.bam, and exclude files.

    Our only other suggestion is that you could try using a newer version of GATK to verify that this is not a bug with the version you are using. We are currently on 4.1.9.0.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    rcorbett

    Thank you. I tried the same command with 4.1.9.0 and got the same errors.  I will create a bug report.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Okay, great. Let me know once you have uploaded and the name of the folder.

    0
    Comment actions Permalink
  • Avatar
    rcorbett

    OK.

    I uploaded a file called

    filterSamReads_output_format.tar.gz

    I was surprised to not be asked for a password at the following address, so it might be worth checking if the file is in fact on your server.

    gsapubftp@ftp.broadinstitute.org
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi rcorbett, it's there! Thank you, I'll get back to you when I have more information.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi rcorbett,

    I was able to look at your files and saw that the issue is a bug when running picard tools through gatk and the exit code text is getting printed to the stdout bam:

    Tool returned:
    0

    I created a ticket on github for our team to solve this bug, you can follow along for more information there: https://github.com/broadinstitute/gatk/issues/7080

    Thank you for sending in those files, it was helpful in determining the issue.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi rcorbett,

    I did more digging and found that this issue only comes up when running this Picard tool through GATK. I tried with the Picard jar and this issue was not present. So, if you want to use the stdout, you can use the workaround of using the Picard jar file for Picard tools.

    The team will continue to work on a solution to this at the github link I provided above. Thank you for bringing this to our attention.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk