PROBLEM: "A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary"
Hello there,
Im trying to run "PreprocessIntervals" but I get this error:
PROBLEM: "A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary"
Can you please provide
I have generated reference.dict and reference.fasta.fai properly and my bed file looks like this (tab-delimited):
chr1 69090 70008 OR4F5_1 0 +
chr1 367658 368597 OR4F29_1 0 -
chr1 621095 622034 OR4F16_1 0 -
chr1 861321 861393 SAMD11_1 0 +
chr1 865534 865716 SAMD11_2 0 +
chr1 866418 866469 SAMD11_3 0 +
chr1 871151 871276 SAMD11_4 0 +
chr1 874419 874509 SAMD11_5 0 +
chr1 874654 874840 SAMD11_6 0 +
chr1 876523 876686 SAMD11_7 0 +
chr1 877515 877631 SAMD11_8 0 +
chr1 877789 877868 SAMD11_9 0 +
chr1 877938 878438 SAMD11_10 0 +
chr1 878632 878757 SAMD11_11 0 +
chr1 879077 879188 SAMD11_12 0 +
chr1 879287 879533 SAMD11_13 0 +
chr1 880073 880180 NOC2L_19 0 -
chr1 880436 880526 NOC2L_18 0 -
chr1 880897 881033 NOC2L_17 0 -
chr1 881552 881666 NOC2L_16 0 -
chr1 881781 881925 NOC2L_15 0 -
chr1 883510 883612 NOC2L_14 0 -
chr1 883869 883983 NOC2L_13 0 -
chr1 886506 886618 NOC2L_12 0 -
chr1 887379 887519 NOC2L_11 0 -
chr1 887791 887980 NOC2L_10 0 -
Thanks in advance
-
Can you paste the commands you used to get this error? What reference are you using? You may have a mismatch with the way you are naming intervals in your bed file and the reference you are using.
-
My genome reference is "hg19_v0_Homo_sapiens_assembly19" that I downloaded it from bundle
and my command is:gatk PreprocessIntervals \ -R hg19_v0_Homo_sapiens_assembly19.fasta \ --bin-length 0 \ -L mybed.bed \ -imr OVERLAPPING_ONLY \ -O targets.interval_list
-
Thanks. Can you send the contents of your reference dict? Basically, you need to make sure that every input (bed file and reference files) is following the same way of naming the contigs...either chr1 or 1.
For example, looking at the reference dictionary for hg19 and hg38 from this public reference bucket,
you can see how the contigs are named differently:
-
Thank you for your answer Tiffany
my reference.dict file looks like :and mybed.bed file looks likes :
-
Ok, are you still getting the issue? If so, try using the fasta reference files available here and let me know if the same error occurs.
-
Tiffany Miller: I am also facing the exact same issue. First I was using hg19 from UCSC golden path, the I also tried with the fasta reference files (and .dict file) but still gatting the same error (command and errror is shown below):
12:09:27.234 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/akansha/vivekruhela/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 12, 2021 12:09:27 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
12:09:27.532 INFO GenomicsDBImport - ------------------------------------------------------------
12:09:27.532 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
12:09:27.532 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
12:09:27.532 INFO GenomicsDBImport - Executing as akansha@sbilab on Linux v4.4.0-169-generic amd64
12:09:27.533 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-8u265-b01-0ubuntu2~16.04-b01
12:09:27.533 INFO GenomicsDBImport - Start Date/Time: January 12, 2021 12:09:27 PM IST
12:09:27.533 INFO GenomicsDBImport - ------------------------------------------------------------
12:09:27.533 INFO GenomicsDBImport - ------------------------------------------------------------
12:09:27.533 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
12:09:27.533 INFO GenomicsDBImport - Picard Version: 2.23.3
12:09:27.534 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:09:27.534 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:09:27.534 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:09:27.534 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:09:27.534 INFO GenomicsDBImport - Deflater: IntelDeflater
12:09:27.534 INFO GenomicsDBImport - Inflater: IntelInflater
12:09:27.534 INFO GenomicsDBImport - GCS max retries/reopens: 20
12:09:27.534 INFO GenomicsDBImport - Requester pays: disabled
12:09:27.534 INFO GenomicsDBImport - Initializing engine
12:09:32.225 INFO FeatureManager - Using codec IntervalListCodec to read file file:///home/akansha/vivekruhela/hg19_v0_HybSelOligos_whole_exome_illumina_coding_v1_whole_exome_illumina_coding_v1.Homo_sapiens_assembly19.targets.interval_list
12:09:32.259 INFO GenomicsDBImport - Shutting down engine
[January 12, 2021 12:09:32 PM IST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.08 minutes.
Runtime.totalMemory()=2767716352
***********************************************************************A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig 1 given as location, but this contig isn't present in the Fasta sequence dictionary
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /home/akansha/vivekruhela/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/akansha/vivekruhela/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport -R /home/akansha/vivekruhela/gatk_bundle/hg19_v0_Homo_sapiens_assembly19.fasta -V /home/akansha/vivekruhela/PON/SRR1566827.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566831.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566833.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566835.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566859.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566889.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566901.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566928.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566933.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566957.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566967.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566985.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606005.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606009.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606035.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606043.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606045.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606068.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128443.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128462.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128495.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128507.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128517.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128529.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128531.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128544.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128570.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128646.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128650.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128654.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128704.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128752.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128950.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128952.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128954.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128962.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128969.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128995.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129015.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129022.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129031.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129042.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129072.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129091.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129142.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129150.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129220.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129222.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129229.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129259.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2182804.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2182806.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3162947.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3162978.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163146.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163205.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163605.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163648.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163652.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163653.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163678.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163680.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163705.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163711.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4188892.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4188955.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189328.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189350.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189588.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189593.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189594.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189597.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189602.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189605.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189606.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189699.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189725.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566955.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129174.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163666.vcf.gz --genomicsdb-workspace-path pon_db --tmp-dir /home/akansha/vivekruhela/tmp1 -L /home/akansha/vivekruhela/hg19_v0_HybSelOligos_whole_exome_illumina_coding_v1_whole_exome_illumina_coding_v1.Homo_sapiens_assembly19.targets.interval_listEDIT1: I want to add my next attempt here. This time the prevous error is resolved but GenomicsDBImport is still not working. I have downloaded the hg38 intervals from GATK resource bundle and used UCSC liftover to convert it into hg19 based interval list. The error message is shown below:
15:36:14.002 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/akansha/vivekruhela/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 12, 2021 3:36:14 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:36:14.265 INFO GenomicsDBImport - ------------------------------------------------------------
15:36:14.266 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
15:36:14.266 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
15:36:14.266 INFO GenomicsDBImport - Executing as akansha@sbilab on Linux v4.4.0-169-generic amd64
15:36:14.266 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-8u265-b01-0ubuntu2~16.04-b01
15:36:14.266 INFO GenomicsDBImport - Start Date/Time: January 12, 2021 3:36:13 PM IST
15:36:14.266 INFO GenomicsDBImport - ------------------------------------------------------------
15:36:14.267 INFO GenomicsDBImport - ------------------------------------------------------------
15:36:14.267 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
15:36:14.267 INFO GenomicsDBImport - Picard Version: 2.23.3
15:36:14.267 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:36:14.267 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:36:14.267 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:36:14.268 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:36:14.268 INFO GenomicsDBImport - Deflater: IntelDeflater
15:36:14.268 INFO GenomicsDBImport - Inflater: IntelInflater
15:36:14.268 INFO GenomicsDBImport - GCS max retries/reopens: 20
15:36:14.268 INFO GenomicsDBImport - Requester pays: disabled
15:36:14.268 INFO GenomicsDBImport - Initializing engine
15:36:18.805 INFO FeatureManager - Using codec BEDCodec to read file file:///home/akansha/vivekruhela/gatk_bundle/hglft_genome_3bc14_d6f440.bed
15:36:18.810 INFO IntervalArgumentCollection - Processing 0 bp from intervals
15:36:18.847 INFO GenomicsDBImport - Done initializing engine
15:36:19.417 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
15:36:19.418 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/akansha/vivekruhela/pon_db/vidmap.json
15:36:19.418 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/akansha/vivekruhela/pon_db/callset.json
15:36:19.418 INFO GenomicsDBImport - Complete VCF Header will be written to /home/akansha/vivekruhela/pon_db/vcfheader.vcf
15:36:19.418 INFO GenomicsDBImport - Importing to workspace - /home/akansha/vivekruhela/pon_db
15:36:19.418 INFO ProgressMeter - Starting traversal
15:36:19.418 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
15:36:19.467 INFO GenomicsDBImport - Shutting down engine
[January 12, 2021 3:36:19 PM IST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=2850029568
java.lang.IndexOutOfBoundsException: Index: 0
at java.util.Collections$EmptyList.get(Collections.java:4456)
at org.genomicsdb.model.GenomicsDBImportConfiguration$ImportConfiguration.getColumnPartitions(GenomicsDBImportConfiguration.java:2083)
at org.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:203)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:745)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)Kindly suggest. Thanks.
-
Hi vivekruhela, could you make a new post about this issue? The user above is running PreProcessIntervals but you are running GenomicsDBImport, so it may be a different issue. Please see this document about what to include in your post: https://gatk.broadinstitute.org/hc/en-us/articles/360053424571-How-to-Write-a-Post
-
Hi,
@preprocessintervals - A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "home/Reference_Sequences/xxxx.bed" is not valid for this input.
I am using the fasta sequences, index files from GATK server itself.. can u help to rectify the error.
-
Hi Dr N Ch,
It looks like this error is occurring because the tool is unable to recognize the file you provided as a proper interval list. Can you provide the full command that you ran and take a look at this article on interval lists to ensure yours is in the proper format?
Kind regards,
Pamela
Please sign in to leave a comment.
9 comments