A USER ERROR has occurred: Contig chr1_KI270762v1_alt not present in reads sequence dictionary
AnsweredHello,
I was hoping that someone might be able to help me resolve this error I have been encountering while trying to use DOC. It says that "chr1_KI270762v1_alt" is not found in the sequence dictionary.
I run the command as follows:
gatk DepthOfCoverage -R hg38.fasta -I sample.final.bam -O DepthOfCoverage -L hg38.interval_list
The chromosomes are named the same way in all my files (i.e. chr1 chr2...) and 'chr1_KI270762v1_alt' is found in both my .fasta and .dict files. It is also in the .bam file and I guess this is the source of the error. The command I used to create the interval list was:
gatk BedToIntervalList -I hg38.interval.bed -O hg38.interval_list -SD hg38.dict
I reviewed similar posts and have not been able to find a solution that works for me.
Version I am using:
The Genome Analysis Toolkit (GATK) v4.2.6.1
HTSJDK Version: 2.24.1
Picard Version: 2.27.1
Entire program log:
Thanks!
-
Hi Maha Tageldein,
Have you tried to create a new sequence dictionary for your fasta file? The tool may be using an old version without this contig. You can use the tool CreateSequenceDictionary.
Let me know if that works.
Best,
Genevieve
-
Thank you for the suggestion. I confirmed that "chr1_KI270762v1_alt" was present in the fasta file then used CreateSequenceDictionary to create a new sequence dictionary using:java -jar picard.jar CreateSequenceDictionary R=hg38.fasta O=hg38.dict
When I run the command now (old .dict file deleted from the directory) like this:
gatk DepthOfCoverage -R hg38.fasta -I sample.final.bam -O DepthOfCoverage -L hg38.interval_list
I encounter the same error. Please find the logs below:
Using GATK jar /Applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /Applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar DepthOfCoverage -R hg38.fasta -I IR_MN_3.final.bam -O DepthOfCoverage -L hg38.interval_list
11:51:35.169 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Applications/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.dylib
11:51:35.497 INFO DepthOfCoverage - ------------------------------------------------------------
11:51:35.498 INFO DepthOfCoverage - The Genome Analysis Toolkit (GATK) v4.2.6.1
11:51:35.498 INFO DepthOfCoverage - For support and documentation go to https://software.broadinstitute.org/gatk/
11:51:35.500 INFO DepthOfCoverage - Executing as maha@Mahas-MacBook-Pro.local on Mac OS X v11.4 x86_64
11:51:35.500 INFO DepthOfCoverage - Java runtime: Java HotSpot(TM) 64-Bit Server VM v16.0.2+7-67
11:51:35.501 INFO DepthOfCoverage - Start Date/Time: June 6, 2022 at 11:51:35 a.m. EDT
11:51:35.501 INFO DepthOfCoverage - ------------------------------------------------------------
11:51:35.501 INFO DepthOfCoverage - ------------------------------------------------------------
11:51:35.503 INFO DepthOfCoverage - HTSJDK Version: 2.24.1
11:51:35.503 INFO DepthOfCoverage - Picard Version: 2.27.1
11:51:35.503 INFO DepthOfCoverage - Built for Spark Version: 2.4.5
11:51:35.503 INFO DepthOfCoverage - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:51:35.503 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:51:35.503 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:51:35.503 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:51:35.503 INFO DepthOfCoverage - Deflater: IntelDeflater
11:51:35.503 INFO DepthOfCoverage - Inflater: IntelInflater
11:51:35.504 INFO DepthOfCoverage - GCS max retries/reopens: 20
11:51:35.504 INFO DepthOfCoverage - Requester pays: disabled
11:51:35.506 WARN DepthOfCoverage -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: DepthOfCoverage is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
11:51:35.506 INFO DepthOfCoverage - Initializing engine
11:51:36.355 INFO FeatureManager - Using codec IntervalListCodec to read file file:///Volumes/My%20Passport%20for%20Mac/Experiments-%20raw%20data/1041%20WGS%20Kate_MN%20sequencing%20experiments_20210917/20200409_Sequencing%20IR%20MN%20HeLa_Novogene_rep1/WGS_202004/hg38.interval_list
11:51:36.513 INFO IntervalArgumentCollection - Processing 2982123250 bp from intervals
11:51:36.526 INFO DepthOfCoverage - Done initializing engine
11:51:36.662 INFO ProgressMeter - Starting traversal
11:51:36.663 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
11:51:36.671 INFO DepthOfCoverage - Shutting down engine
[June 6, 2022 at 11:51:36 a.m. EDT] org.broadinstitute.hellbender.tools.walkers.coverage.DepthOfCoverage done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=119537664
***********************************************************************
A USER ERROR has occurred: Contig chr1_KI270762v1_alt not present in reads sequence dictionary
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
-
Thanks Maha Tageldein! Could you confirm that chr1_KI270762v1_alt is the header sequence dictionary of your BAM file? https://gatk.broadinstitute.org/hc/en-us/articles/360035890791-SAM-or-BAM-or-CRAM-Mapped-sequence-data-formats
-
It turns out 'chr1_KI270762v1_alt' is not in the header of my BAM file. Does this mean I should use a different fasta file as a reference?
Thank you
-
Yeah, you'll need to make sure to use the same reference for DepthofCoverage that was used for read mapping.
Please sign in to leave a comment.
5 comments