Sample intervals must be identical to the original intervals used to build the panel of normals
Hi.
When I follow the CNV pipeline (https://gatkforums.broadinstitute.org/gatk/discussion/11682#2) I got the following error, whilst running the DenoiseReadCounts function:
"Sample intervals must be identical to the original intervals used to build the panel of normals"
I used 3 random .bam files from females' whole genome sequencing samples, from the 1000 Genome Project. These come from their Phase 1 (that uses hg19 reference).
I followed the pipeline exactly as told, using the same files consistently for the genome reference, genomic intervals, etc., even on the construction of the Panel of Normals.
With this, how can I have this error?
Thank you,
Pedro Raposo
Commands
- Panel of Normals construction:
gatk CollectReadCounts -I HG00097.mapped.SOLID.bfast.GBR.low_coverage.20101123.bam -L intervals_list_hg19.bed --interval-merging-rule OVERLAPPING_ONLY -O HG00097.counts.hdf5
gatk CollectReadCounts -I HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20101123.bam -L intervals_list_hg19.bed --interval-merging-rule OVERLAPPING_ONLY -O HG00096.counts.hdf5
gatk CollectReadCounts -I HG00102.mapped.SOLID.bfast.GBR.low_coverage.20101123.bam -L intervals_list_hg19.bed --interval-merging-rule OVERLAPPING_ONLY -O HG00102.counts.hdf5
gatk AnnotateIntervals -R hg19.fasta -L intervals_list_hg19.bed --interval-merging-rule OVERLAPPING_ONLY -O annotated_intervals.tsv
gatk CreateReadCountPanelOfNormals -I HG00096.counts.hdf5 -I HG00097.counts.hdf5 -I HG00102.counts.hdf5 --annotated-intervals annotated_intervals.tsv -O cnv.pon.hdf5
- Main pipeline:
gatk PreprocessIntervals -L intervals_list_hg19.bed -R hg19.fasta --interval-merging-rule OVERLAPPING_ONLY -O targets_C.preprocessed.interval_list
gatk CollectReadCounts -I My_sample.bam -L targets_C.preprocessed.interval_list --interval-merging-rule OVERLAPPING_ONLY -O My_sample.hdf5
gatk DenoiseReadCounts -I My_sample.hdf5 --count-panel-of-normals cnv.pon.hdf5 --standardized-copy-ratios My_sample.standardizedCR.tsv --denoised-copy-ratios My_sample.denoisedCR.tsv
GATK version
4.1.3.0
Entire error log
Using GATK jar /opt/software/conda2/envs/GATK/share/gatk4-4.1.3.0-0/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /opt/software/conda2/envs/GATK/share/gatk4-4.1.3.0-0/gatk-package-4.1.3.0-local.jar DenoiseReadCounts -I My_sample.counts.hdf5 --count-panel-of-normals cnv.pon.hdf5 --standardized-copy-ratios My_sample.standardizedCR.tsv --denoised-copy-ratios My_sample.denoisedCR.tsv
11:47:00.914 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/software/conda2/envs/GATK/share/gatk4-4.1.3.0-0/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 28, 2020 11:47:01 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:47:01.230 INFO DenoiseReadCounts - ------------------------------------------------------------
11:47:01.230 INFO DenoiseReadCounts - The Genome Analysis Toolkit (GATK) v4.1.3.0
11:47:01.230 INFO DenoiseReadCounts - For support and documentation go to https://software.broadinstitute.org/gatk/
11:47:01.231 INFO DenoiseReadCounts - Executing as praposo@b1s3 on Linux v3.10.0-957.21.3.el7.x86_64 amd64
11:47:01.231 INFO DenoiseReadCounts - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
11:47:01.232 INFO DenoiseReadCounts - Start Date/Time: 28 February 2020 11:47:00 GMT
11:47:01.232 INFO DenoiseReadCounts - ------------------------------------------------------------
11:47:01.232 INFO DenoiseReadCounts - ------------------------------------------------------------
11:47:01.233 INFO DenoiseReadCounts - HTSJDK Version: 2.20.1
11:47:01.233 INFO DenoiseReadCounts - Picard Version: 2.20.5
11:47:01.233 INFO DenoiseReadCounts - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:47:01.233 INFO DenoiseReadCounts - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:47:01.233 INFO DenoiseReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:47:01.233 INFO DenoiseReadCounts - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:47:01.233 INFO DenoiseReadCounts - Deflater: IntelDeflater
11:47:01.234 INFO DenoiseReadCounts - Inflater: IntelInflater
11:47:01.234 INFO DenoiseReadCounts - GCS max retries/reopens: 20
11:47:01.234 INFO DenoiseReadCounts - Requester pays: disabled
11:47:01.234 INFO DenoiseReadCounts - Initializing engine
11:47:01.234 INFO DenoiseReadCounts - Done initializing engine
log4j:WARN No appenders could be found for logger (org.broadinstitute.hdf5.HDF5Library).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
11:47:01.307 INFO DenoiseReadCounts - Reading read-counts file (My_sample.counts.hdf5)...
11:47:01.440 WARN SVDDenoisingUtils - Sequence dictionaries in panel and case sample do not match.
11:47:01.441 INFO SVDDenoisingUtils - Validating sample intervals against original intervals used to build panel of normals...
11:47:01.443 INFO DenoiseReadCounts - Shutting down engine
[28 February 2020 11:47:01 GMT] org.broadinstitute.hellbender.tools.copynumber.DenoiseReadCounts done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2019033088
java.lang.IllegalArgumentException: Sample intervals must be identical to the original intervals used to build the panel of normals.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.tools.copynumber.denoising.SVDDenoisingUtils.denoise(SVDDenoisingUtils.java:120)
at org.broadinstitute.hellbender.tools.copynumber.denoising.SVDReadCountPanelOfNormals.denoise(SVDReadCountPanelOfNormals.java:88)
at org.broadinstitute.hellbender.tools.copynumber.DenoiseReadCounts.doWork(DenoiseReadCounts.java:200)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
-
Official comment
When constructing the panel, you pass `-L intervals_list_hg19.bed` to CollectReadCounts. However, for the case sample, you instead pass `-L targets_C.preprocessed.interval_list` to CollectReadCounts, which results in the mismatch error.
It looks like you might want to pass `-L intervals_list_hg19.bed` to PreprocessIntervals, then pass the result via `-L` to CollectReadCounts for both the panel and the case samples.
Comment actions -
Thank you Samuel. That was indeed the problem!
Please sign in to leave a comment.
2 comments