Problem with Germline CNV CollectReadCounts
Hi there,
I have generated my bam file from paired end (germline whole exome) fastq by using :
bwa mem -> picard Mark duplicates -> Gatk BaseRecalibrator -> Gatk ApplyBQSR
Now Im trying to call CNVs from it by following https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants
PreprocessInterval step is done properly but in CollectReadCounts step I got the following error and tsv file is not generated:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar CollectReadCounts -L test1.preprocessed.interval_list -R /mnt/d/hg19/hg19_v0_Homo_sapiens_assembly19.fasta -imr OVERLAPPING_ONLY -I test.bam --format TSV -O test.tsv
09:46:20.948 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 14, 2020 9:46:21 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:46:21.523 INFO CollectReadCounts - ------------------------------------------------------------
09:46:21.523 INFO CollectReadCounts - The Genome Analysis Toolkit (GATK) v4.1.4.1
09:46:21.523 INFO CollectReadCounts - For support and documentation go to https://software.broadinstitute.org/gatk/
09:46:21.524 INFO CollectReadCounts - Executing as -----@BIOBAM on Linux v4.4.0-18362-Microsoft amd64
09:46:21.524 INFO CollectReadCounts - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09
09:46:21.524 INFO CollectReadCounts - Start Date/Time: February 14, 2020 9:46:20 AM EET
09:46:21.524 INFO CollectReadCounts - ------------------------------------------------------------
09:46:21.525 INFO CollectReadCounts - ------------------------------------------------------------
09:46:21.525 INFO CollectReadCounts - HTSJDK Version: 2.21.0
09:46:21.525 INFO CollectReadCounts - Picard Version: 2.21.2
09:46:21.525 INFO CollectReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:46:21.525 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:46:21.525 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:46:21.525 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:46:21.526 INFO CollectReadCounts - Deflater: IntelDeflater
09:46:21.526 INFO CollectReadCounts - Inflater: IntelInflater
09:46:21.526 INFO CollectReadCounts - GCS max retries/reopens: 20
09:46:21.526 INFO CollectReadCounts - Requester pays: disabled
09:46:21.526 INFO CollectReadCounts - Initializing engine
09:46:21.932 INFO FeatureManager - Using codec IntervalListCodec to read file file:///mnt/d/gatk-4.1.4.1/test1.preprocessed.interval_list
09:46:21.940 INFO IntervalArgumentCollection - Processing 0 bp from intervals
09:46:21.943 INFO CollectReadCounts - Done initializing engine
09:46:21.946 INFO CollectReadCounts - Collecting read counts...
09:46:21.946 INFO ProgressMeter - Starting traversal
09:46:21.947 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
09:46:21.961 INFO CollectReadCounts - Shutting down engine
[February 14, 2020 9:46:21 AM EET] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=1212153856
java.lang.IllegalArgumentException: The collection is empty: collection must not be null or empty.
at org.broadinstitute.hellbender.utils.Utils.nonEmpty(Utils.java:619)
at org.broadinstitute.hellbender.utils.Utils.nonEmpty(Utils.java:671)
at org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts$CachedOverlapDetector.<init>(CollectReadCounts.java:221)
at org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts.apply(CollectReadCounts.java:180)
at org.broadinstitute.hellbender.engine.ReadWalker.lambda$traverse$0(ReadWalker.java:96)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.ReadWalker.traverse(ReadWalker.java:94)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Thanks in advance
-
Your
test1.preprocessed.interval_list
file is empty. Could you check that?
If it is not empty it is possible that the file format is abnormal and cannot be recognized by gatk. -
thank you for your answer. no it is not empty.
note: I have used the same bed file while generating bam file in "Gatk BaseRecalibrator" and in the first step of CNV calling "PreprocessIntervals" step. -
Can you check again whether test1.preprocessed.interval_list contains intervals (i.e., it may be non-empty because it only contains a sequence dictionary, but does not contain any intervals)?
Given the following lines in your log, I suspect this might be the case (and I'm guessing @SkyWarrior did, too):
09:46:21.932 INFO FeatureManager - Using codec IntervalListCodec to read file file:///mnt/d/gatk-4.1.4.1/test1.preprocessed.interval_list
09:46:21.940 INFO IntervalArgumentCollection - Processing 0 bp from intervalsIt is possible that the preprocessing steps performed by PreprocessIntervals on your original bed file resulted in an empty interval list?
-
Hello, I'm having a similar problem, have you found a solution? I checked my preprocessed.interval_list file, it is not empty and it also contains the intervals. I also have these lines in my log:
17:13:15.144 INFO CollectReadCounts - Initializing engine
17:13:18.773 INFO IntervalArgumentCollection - Processing 3117292070 bp from intervals
17:13:18.847 INFO CollectReadCounts - Done initializing engine
17:13:18.850 INFO CollectReadCounts - Shutting down engine
[October 11, 2023 5:13:18 PM CEST] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=2076049408
java.lang.IllegalArgumentException: The string is null: string must not be null or empty -
Can you post your whole commandline and log here as well?
This exception message is quite generic and does not seem like a regular GATK exception to us.
-
I actually think I found the problem, my cram files didn't have the RG, so I used samtools addreplacerg to add it. Now I'm still testing it but it seems to work
-
Great that you solved your problem.
Thank you for your feedback.
-
Hello, I get the same error, and can not understand the reason. This is the command and output:
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Using GATK jar /opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx9830M -XX:-UsePerfData -jar /opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar CollectReadCounts --input P20911_115_S15_L003_R1_001.bam --intervals genome.interval_list --output SLPD003.hdf5 --reference Homo_sapiens_assembly38.fasta --tmp-dir . --format HDF5 --imr OVERLAPPING_ONLY
08:50:34.540 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/share/gatk4-4.6.2.0-0/gatk-package-4.6.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:50:34.867 INFO CollectReadCounts - ------------------------------------------------------------
08:50:34.871 INFO CollectReadCounts - The Genome Analysis Toolkit (GATK) v4.6.2.0
08:50:34.871 INFO CollectReadCounts - For support and documentation go to https://software.broadinstitute.org/gatk/
08:50:34.871 INFO CollectReadCounts - Executing as peter@monod33.mbb.ki.se on Linux v4.18.0-553.53.1.el8_10.x86_64 amd64
08:50:34.871 INFO CollectReadCounts - Java runtime: OpenJDK 64-Bit Server VM v17.0.11-internal+0-adhoc..src
08:50:34.871 INFO CollectReadCounts - Start Date/Time: November 27, 2025 at 8:50:34 AM GMT
08:50:34.871 INFO CollectReadCounts - ------------------------------------------------------------
08:50:34.871 INFO CollectReadCounts - ------------------------------------------------------------
08:50:34.872 INFO CollectReadCounts - HTSJDK Version: 4.2.0
08:50:34.872 INFO CollectReadCounts - Picard Version: 3.4.0
08:50:34.872 INFO CollectReadCounts - Built for Spark Version: 3.5.0
08:50:34.874 INFO CollectReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:50:34.874 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:50:34.874 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:50:34.874 INFO CollectReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:50:34.874 INFO CollectReadCounts - Deflater: IntelDeflater
08:50:34.874 INFO CollectReadCounts - Inflater: IntelInflater
08:50:34.874 INFO CollectReadCounts - GCS max retries/reopens: 20
08:50:34.875 INFO CollectReadCounts - Requester pays: disabled
08:50:34.875 INFO CollectReadCounts - Initializing engine
08:50:35.395 INFO FeatureManager - Using codec IntervalListCodec to read file file:///datf/sl/users/peter/oncoanalyser/b5/3c129c1b7ef7657db931becbb05d75/genome.interval_list
08:50:42.804 INFO IntervalArgumentCollection - Processing 3043969085 bp from intervals
08:50:42.901 INFO CollectReadCounts - Done initializing engine
08:50:42.903 INFO CollectReadCounts - Shutting down engine
[November 27, 2025 at 8:50:42 AM GMT] org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts done. Elapsed time: 0.14 minutes.
Runtime.totalMemory()=2080374784
java.lang.IllegalArgumentException: The string is null: string must not be null or empty
at org.broadinstitute.hellbender.utils.Utils.nonNull(Utils.java:643)
at org.broadinstitute.hellbender.utils.Utils.nonEmpty(Utils.java:699)
at org.broadinstitute.hellbender.utils.Utils.nonEmpty(Utils.java:715)
at org.broadinstitute.hellbender.tools.copynumber.formats.metadata.SimpleSampleLocatableMetadata.<init>(SimpleSampleLocatableMetadata.java:18)
at org.broadinstitute.hellbender.tools.copynumber.formats.metadata.MetadataUtils.fromHeader(MetadataUtils.java:46)
at org.broadinstitute.hellbender.tools.copynumber.CollectReadCounts.onTraversalStart(CollectReadCounts.java:157)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1117)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:150)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:203)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:222)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)The interval list:
$ head /datf/sl/users/peter/oncoanalyser/b5/3c129c1b7ef7657db931becbb05d75/genome.interval_list
@HD VN:1.6
@SQ SN:chr1 LN:248956422 M5:6aef897c3d6ff0c78aff06ac189178dd AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
@SQ SN:chr2 LN:242193529 M5:f98db672eb0993dcfdabafe2a882905c AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
@SQ SN:chr3 LN:198295559 M5:76635a41ea913a405ded820447d067b0 AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
@SQ SN:chr4 LN:190214555 M5:3210fecf1eb92d5489da4346b3fddc6e AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
@SQ SN:chr5 LN:181538259 M5:a811b3dc9fe66af729dc0dddf7fa4f13 AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
@SQ SN:chr6 LN:170805979 M5:5691468a67c7e7a7b5f2a3a683792c29 AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
......I do have Read Groups in the bam file, (added with samtools addreplacerg):
$ samtools view P20911_115_S15_L003_R1_001.bam | head
A00689:301:HCYTNDSX2:3:2408:30418:36479 81 chr1 9996 0 65S86M chr7 149581093 0 CACCTTTCGTTATAGGATATTTTAATGATTACAGTGAGAGTGTCTGGTGTTCATACGTTTGCTCTTCCGATATCCCTTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA ,,,,,,,,,:,,,F,FF,F,:FFFF,FF,,,FFFF,,F,,F,,,,F,:F,::,,,FFF:F::,FFF:FF:F,:,FFF,F:FFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:2 MD:Z:7A4A73 MC:Z:27S75M49S AS:i:76 XS:i:75 RG:Z:HCYTNDSX2.3 SM:SLPD003 PL:illumina LB:SLPD003L003
A00689:301:HCYTNDSX2:3:2516:6162:3615 81 chr1 9996 15 108S43M chr2 32916518 0 CCCCCCCCCCCCACCCCCCCCCCCAACCCTAACCCTATCCATTAACATTACGCAAACCGATACCCTTTGGCTAACCGTAAGAGTCACCGGATCTCTTACGTTTTCTCTTCCGATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA ,:FFF,,:FFF,,FFFF,,FFFF,:,F:F,::F::F,,,,,F,:,,:F,F::,,,,,F:,,,,F:FF:,,,F::,,:F:F,F:F,F,,F,:,,,FF,,:,,FF,F,F:::F,,F,,F,:FF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:43 MC:Z:112S39M AS:i:43 XS:i:39 RG:Z:HCYTNDSX2.3 SM:SLPD003 PL:illumina LB:SLPD003L003
A00689:301:HCYTNDSX2:3:2563:8748:7435 81 chr1 9996 14 114S37M chr2 32916486 0 CCCCCTCCCCCCCCCCCCACCCCCACCCCCCCCCCCACCCCTACTCCTGCCGCTCACGGTATACCTGATCGTACTGCTAACAGTCAGAGTTACTGTAGCCCTGACGGTTTCTCTTCCGATAACCCTAACCCTAACCCTAACCCTAACCCTA :,FFF,,FFFF,:,FFF,,,FFF,,,FFF,,,FF:,,,F:FF,,,F,,,,,,,F,FFF,,F:,F,,,:,F:,:,,FF,FFF,:FF,,,,F,FF,:,F:,,F:,FF,,,F,F,F:,,F,,F,,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:37 MC:Z:106S45M AS:i:37 XS:i:33 RG:Z:HCYTNDSX2.3 SM:SLPD003 PL:illumina LB:SLPD003L003Thanks for any help on this!
-
Peter Lönnerberg can you check whether your BAM file has a corresponding @RG line in the header with the appropriate SM tag?
Please sign in to leave a comment.
9 comments