DepthOfCoverage with -gene-list and readgroup as --partition-type options
AnsweredHi all,
I’m trying to use the DepthOfCoverage tool in GATK 4.2.0.0 with the "-gene-list" option and read group as the partition-type but I get an error. If I run the same command without the "-gene-list" option the program runs correctly. I don't know if these two parameters are incompatibles unless I use sample option in "--partition-type", because same happens when using other partition type option: readgroup, library, platform, center, sample_by_platform, sample_by_center and sample_by_platform_by_center.
My input BAM file comes from a sample sequenced in two lanes of a HiSeq4000 sequencer, so, BAM file has two different read groups (one for each lane). I tested with several BAM files and got the same error. I have also validated my BAM file with the ValidateSamFile tool and got no error,
This is the command:
gatk DepthOfCoverage \
-R ${ref} \
-I ${infile} \
-L ${interval_list} \
-gene-list ${refseq} \
--partition-type readgroup \
--omit-depth-output-at-each-base \
-O ${outfile}
And the entire error log:
org.broadinstitute.hellbender.exceptions.GATKException: Unable to find appropriate stream for partition = sample, aggregation = gene, file type = summary
at org.broadinstitute.hellbender.tools.walkers.coverage.CoverageOutputWriter.getCorrectOutputWriter(CoverageOutputWriter.java:241)
at org.broadinstitute.hellbender.tools.walkers.coverage.CoverageOutputWriter.writePerGeneDepthInformation(CoverageOutputWriter.java:321)
at org.broadinstitute.hellbender.tools.walkers.coverage.DepthOfCoverage.onIntervalEnd(DepthOfCoverage.java:364)
at org.broadinstitute.hellbender.engine.LocusWalkerByInterval.apply(LocusWalkerByInterval.java:86)
at org.broadinstitute.hellbender.engine.LocusWalkerByInterval.lambda$traverse$0(LocusWalkerByInterval.java:54)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.engine.LocusWalkerByInterval.traverse(LocusWalkerByInterval.java:52)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1058)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
I hope you can help me. Thanks in advance!
-
Hi David Jaspez,
Thanks for posting here! We'll try to get it sorted out.
Could you post your entire stack trace with this java option: --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' ? I want to see where this is happening while the tool is running.
Best,
Genevieve
-
Hi Genevieve,
Thanks for your help. I run the command with this option and got the same output as before:
Using GATK jar /opt/gatk/4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -D samjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /opt/gatk/4.2.0.0/gatk-package-4.2.0.0-local.jar DepthOfCoverage -R path/to/ucsc.hg19.fasta -I path/to/sample.bam -L path/to/file.interval_list -gene-list path/to/file.refseq --partition-type readgroup --omit-depth-output-at-each-base -O path/to/sample.DepthOfCoverage
17:59:32.765 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/gatk/4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 06, 2021 5:59:32 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:59:32.928 INFO DepthOfCoverage - ------------------------------------------------------------
17:59:32.929 INFO DepthOfCoverage - The Genome Analysis Toolkit (GATK) v4.2.0.0
17:59:32.929 INFO DepthOfCoverage - For support and documentation go to https://software.broadinstitute.org/gatk/
17:59:32.929 INFO DepthOfCoverage - Executing as name@domain.es on Linux v2.6.32-431.el6.x86_64 amd64
17:59:32.929 INFO DepthOfCoverage - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_77-b03
17:59:32.929 INFO DepthOfCoverage - Start Date/Time: May 6, 2021 5:59:32 PM WEST
17:59:32.929 INFO DepthOfCoverage - ------------------------------------------------------------
17:59:32.929 INFO DepthOfCoverage - ------------------------------------------------------------
17:59:32.930 INFO DepthOfCoverage - HTSJDK Version: 2.24.0
17:59:32.930 INFO DepthOfCoverage - Picard Version: 2.25.0
17:59:32.930 INFO DepthOfCoverage - Built for Spark Version: 2.4.5
17:59:32.930 INFO DepthOfCoverage - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:59:32.930 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:59:32.930 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:59:32.930 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:59:32.930 INFO DepthOfCoverage - Deflater: IntelDeflater
17:59:32.930 INFO DepthOfCoverage - Inflater: IntelInflater
17:59:32.930 INFO DepthOfCoverage - GCS max retries/reopens: 20
17:59:32.930 INFO DepthOfCoverage - Requester pays: disabled
17:59:32.931 WARN DepthOfCoverage -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: DepthOfCoverage is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
17:59:32.931 INFO DepthOfCoverage - Initializing engine
17:59:33.488 INFO FeatureManager - Using codec IntervalListCodec to read file file://path/to/file.interval_list
17:59:34.782 INFO IntervalArgumentCollection - Processing 85392035 bp from intervals
17:59:34.828 INFO DepthOfCoverage - Done initializing engine
17:59:34.868 INFO ProgressMeter - Starting traversal
17:59:34.869 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
17:59:35.795 INFO FeatureManager - Using codec IntervalListCodec to read file file://path/to/file.interval_list
17:59:36.695 INFO FeatureManager - Using codec RefSeqCodec to read file file://path/to/file.refseq
17:59:37.902 INFO DepthOfCoverage - Shutting down engine
[May 6, 2021 5:59:37 PM WEST] org.broadinstitute.hellbender.tools.walkers.coverage.DepthOfCoverage done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=1965555712
org.broadinstitute.hellbender.exceptions.GATKException: Unable to find appropriate stream for partition = sample, aggregation = gene, file type = summary
at org.broadinstitute.hellbender.tools.walkers.coverage.CoverageOutputWriter.getCorrectOutputWriter(CoverageOutputWriter.java:241)
at org.broadinstitute.hellbender.tools.walkers.coverage.CoverageOutputWriter.writePerGeneDepthInformation(CoverageOutputWriter.java:321)
at org.broadinstitute.hellbender.tools.walkers.coverage.DepthOfCoverage.onIntervalEnd(DepthOfCoverage.java:364)
at org.broadinstitute.hellbender.engine.LocusWalkerByInterval.apply(LocusWalkerByInterval.java:86)
at org.broadinstitute.hellbender.engine.LocusWalkerByInterval.lambda$traverse$0(LocusWalkerByInterval.java:54)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.engine.LocusWalkerByInterval.traverse(LocusWalkerByInterval.java:52)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1058)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289) -
Hi David Jaspez,
We were able to confirm on our end that this looks like a bug. We're continuing to look into why this is happening and if there is a workaround, I'll hopefully get back to you next week.
Best,
Genevieve
-
Hello,
I spoke with the developers of this tool and found that you have run into a limitation of this tool. We think we have found a workaround though. Could you test running both -partition-type sample and -partition-type readgroup? It will produce a separate list of files for each partition type. It looks like the gene list code is piggybacking off the sample partition type code so it may work if the sample partition type code is on as well.
This is a tool that was ported from GATK3 so we haven't made any major bug fix changes, but we did make a ticket to improve the error message, which you can track here: https://github.com/broadinstitute/gatk/issues/7246
Let me know if it works!
Best,
Genevieve
-
Hello Genevieve,
Thanks for your help. I run both --partition-type sample and --partition-type readgroup parameters in the same command and it generates all the files for sample partition but it lacks the 'gene_statistics' and 'gene_summary' files for readgroup partition.
Best,
David.
-
Hi David Jaspez,
It is not possible at this time to get the gene statistics and gene summary files for the read group partition type. I will add a comment to the github ticket about the feature request and our team will take a look at it when they have the capacity. We can't guarantee it will be added, but thank you for bringing it up!
Best,
Genevieve
-
Thank you so much! I will check the GitHub ticket for updates from time to time.
Best regards.
Please sign in to leave a comment.
7 comments