Sequence dictionary and index contain different numbers of contigs
Dear GATK,
I am trying to run GATK to detect somatic copy number variants. Workflow: https://gatk.broadinstitute.org/hc/en-us/articles/360035531092--How-to-part-I-Sensitively-detect-copy-ratio-alterations-and-allelic-segments
I get an error when I try to run:
gatk PreprocessIntervals -L targets_C.interval_list -R Homo_sapiens_assembly19.fasta --bin-length 0 --interval-merging-rule OVERLAPPING_ONLY -O sandbox/targets_C.preprocessed.interval_list
I am using GATK docker image 4.1.3.0.
Error:
07:02:56.434 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 09, 2020 7:02:57 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
07:02:57.600 INFO PreprocessIntervals - ------------------------------------------------------------
07:02:57.601 INFO PreprocessIntervals - The Genome Analysis Toolkit (GATK) v4.1.3.0
07:02:57.601 INFO PreprocessIntervals - For support and documentation go to https://software.broadinstitute.org/gatk/
07:02:57.601 INFO PreprocessIntervals - Executing as root@7645dd8077e3 on Linux v5.4.39-linuxkit amd64
07:02:57.601 INFO PreprocessIntervals - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
07:02:57.601 INFO PreprocessIntervals - Start Date/Time: December 9, 2020 7:02:56 AM UTC
07:02:57.601 INFO PreprocessIntervals - ------------------------------------------------------------
07:02:57.602 INFO PreprocessIntervals - ------------------------------------------------------------
07:02:57.602 INFO PreprocessIntervals - HTSJDK Version: 2.20.1
07:02:57.602 INFO PreprocessIntervals - Picard Version: 2.20.5
07:02:57.602 INFO PreprocessIntervals - HTSJDK Defaults.COMPRESSION_LEVEL : 2
07:02:57.602 INFO PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
07:02:57.602 INFO PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
07:02:57.602 INFO PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
07:02:57.602 INFO PreprocessIntervals - Deflater: IntelDeflater
07:02:57.603 INFO PreprocessIntervals - Inflater: IntelInflater
07:02:57.603 INFO PreprocessIntervals - GCS max retries/reopens: 20
07:02:57.603 INFO PreprocessIntervals - Requester pays: disabled
07:02:57.603 INFO PreprocessIntervals - Initializing engine
07:02:57.640 INFO PreprocessIntervals - Shutting down engine
[December 9, 2020 7:02:57 AM UTC] org.broadinstitute.hellbender.tools.copynumber.PreprocessIntervals done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=148373504
***********************************************************************
A USER ERROR has occurred: Couldn't read file file:///gatk/USZ_melanoma/Homo_sapiens_assembly19.fasta. Error was: Sequence dictionary and index contain different numbers of contigs
I am using this reference file: https://storage.cloud.google.com/genomics-public-data/references/b37/Homo_sapiens_assembly19.fasta.gz
However, the dict and fai files are not available for this genome version, therefore I created them with the following commands:
-
samtools faidx Homo_sapiens_assembly19.fasta
-
gatk CreateSequenceDictionary -R Homo_sapiens_assembly19.fasta
I am open to any suggestions on how to solve these issues.
Thank you.
-
Hi rahelp, there could have been an issue when you downloaded the files and created the index and dictionary files. Check for errors in those processes and/or re-do them to confirm the files do not have issues.
-
rahelp another user is seeing this same error message. Did you end up solving this problem? If so, what was your solution?
Thank you,
Genevieve
-
Dear Genevieve,
The issue was that CreateSequenceDictionary was not working properly.
For it to work properly:
- google authentication completed
- Docker default memory is only 2 GB. This needs to be set to higher (I set it to 100).
- change java options:
gatk --java-options -Xmx12g CreateSequenceDictionary -R Homo_sapiens_assembly19.fasta
I hope this helps.
Good luck!
-
Thank you so much, rahelp!
Please sign in to leave a comment.
4 comments