Problem with Germline CNV CollectReadCounts
Hi there,
I have generated my bam file from paired end (germline whole exome) fastq by using :
bwa mem -> picard Mark duplicates -> Gatk BaseRecalibrator -> Gatk ApplyBQSR
Now Im trying to call CNVs from it by following https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants
PreprocessInterval step is done properly but in CollectReadCounts step I got the following error and tsv file is not generated:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar CollectReadCounts -L test1.preprocessed.interval_list -R /mnt/d/hg19/hg19_v0_Homo_sapiens_assembly19.fasta -imr OVERLAPPING_ONLY -I test.bam --format TSV -O test.tsv
09:46:20.948 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 14, 2020 9:46:21 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:46:21.523 INFO CollectReadCounts - ------------------------------------------------------------
09:46:21.523 INFO CollectReadCounts - The Genome Analysis Toolkit (GATK) v4.1.4.1
09:46:21.523 INFO CollectReadCounts - For support and documentation go to https://software.broadinstitute.org/gatk/
09:46:21.524 INFO CollectReadCounts - Executing as -----@BIOBAM on Linux v4.4.0-18362-Microsoft amd64
09:46:21.524 INFO CollectReadCounts - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09
09:46:21.524 INFO CollectReadCounts - Start Date/Time: February 14, 2020 9:46:20 AM EET
09:46:21.524 INFO CollectReadCounts - ------------------------------------------------------------
09:46:21.525 INFO CollectReadCounts - ------------------------------------------------------------
09:46:21.525 INFO CollectReadCounts - HTSJDK Version: 2.21.0
09:46:21.525 INFO CollectReadCounts - Picard Version: 2.21.2
09:46:21.525 INFO CollectReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:46:21.525 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:46:21.525 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:46:21.525 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:46:21.526 INFO CollectReadCounts - Deflater: IntelDeflater
09:46:21.526 INFO CollectReadCounts - Inflater: IntelInflater
09:46:21.526 INFO CollectReadCounts - GCS max retries/reopens: 20
09:46:21.526 INFO CollectReadCounts - Requester pays: disabled
09:46:21.526 INFO CollectReadCounts - Initializing engine
09:46:21.932 INFO FeatureManager - Using codec IntervalListCodec to read file file:///mnt/d/gatk-4.1.4.1/test1.preprocessed.interval_list
09:46:21.940 INFO IntervalArgumentCollection - Processing 0 bp from intervals
09:46:21.943 INFO CollectReadCounts - Done initializing engine
09:46:21.946 INFO CollectReadCounts - Collecting read counts...
09:46:21.946 INFO ProgressMeter - Starting traversal
09:46:21.947 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
09:46:21.961 INFO CollectReadCounts - Shutting down engine
[February 14, 2020 9:46:21 AM EET] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=1212153856
java.lang.IllegalArgumentException: The collection is empty: collection must not be null or empty.
at org.broadinstitute.hellbender.utils.Utils.nonEmpty(Utils.java:619)
at org.broadinstitute.hellbender.utils.Utils.nonEmpty(Utils.java:671)
at org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts$CachedOverlapDetector.<init>(CollectReadCounts.java:221)
at org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts.apply(CollectReadCounts.java:180)
at org.broadinstitute.hellbender.engine.ReadWalker.lambda$traverse$0(ReadWalker.java:96)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Thanks in advance
-
Your
test1.preprocessed.interval_list
file is empty. Could you check that?
If it is not empty it is possible that the file format is abnormal and cannot be recognized by gatk. -
thank you for your answer. no it is not empty.
note: I have used the same bed file while generating bam file in "Gatk BaseRecalibrator" and in the first step of CNV calling "PreprocessIntervals" step. -
Can you check again whether test1.preprocessed.interval_list contains intervals (i.e., it may be non-empty because it only contains a sequence dictionary, but does not contain any intervals)?
Given the following lines in your log, I suspect this might be the case (and I'm guessing @SkyWarrior did, too):
09:46:21.932 INFO FeatureManager - Using codec IntervalListCodec to read file file:///mnt/d/gatk-4.1.4.1/test1.preprocessed.interval_list
09:46:21.940 INFO IntervalArgumentCollection - Processing 0 bp from intervalsIt is possible that the preprocessing steps performed by PreprocessIntervals on your original bed file resulted in an empty interval list?
-
Hello, I'm having a similar problem, have you found a solution? I checked my preprocessed.interval_list file, it is not empty and it also contains the intervals. I also have these lines in my log:
17:13:15.144 INFO CollectReadCounts - Initializing engine
17:13:18.773 INFO IntervalArgumentCollection - Processing 3117292070 bp from intervals
17:13:18.847 INFO CollectReadCounts - Done initializing engine
17:13:18.850 INFO CollectReadCounts - Shutting down engine
[October 11, 2023 5:13:18 PM CEST] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=2076049408
java.lang.IllegalArgumentException: The string is null: string must not be null or empty -
Can you post your whole commandline and log here as well?
This exception message is quite generic and does not seem like a regular GATK exception to us.
-
I actually think I found the problem, my cram files didn't have the RG, so I used samtools addreplacerg to add it. Now I'm still testing it but it seems to work
-
Great that you solved your problem.
Thank you for your feedback.
Please sign in to leave a comment.
7 comments