Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

PROBLEM: "A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary"

0

7 comments

  • Avatar
    Tiffany Miller

    Can you paste the commands you used to get this error? What reference are you using? You may have a mismatch with the way you are naming intervals in your bed file and the reference you are using. 

    0
    Comment actions Permalink
  • Avatar
    firat zahid

    My genome reference is "hg19_v0_Homo_sapiens_assembly19" that I downloaded it from bundle
    and my command is: 

    gatk PreprocessIntervals \
            -R hg19_v0_Homo_sapiens_assembly19.fasta \
            --bin-length 0 \
            -L mybed.bed \
            -imr OVERLAPPING_ONLY \
            -O targets.interval_list
    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Thanks. Can you send the contents of your reference dict? Basically, you need to make sure that every input (bed file and reference files) is following the same way of naming the contigs...either chr1 or 1. 

    For example, looking at the reference dictionary for hg19 and hg38 from this public reference bucket

    you can see how the contigs are named differently:

    0
    Comment actions Permalink
  • Avatar
    firat zahid

    Thank you for your answer Tiffany
    my reference.dict file looks like :

     

    and mybed.bed file looks likes : 

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Ok, are you still getting the issue? If so, try using the fasta reference files available here and let me know if the same error occurs. 

    0
    Comment actions Permalink
  • Avatar
    vivekruhela

    Tiffany Miller: I am also facing the exact same issue. First I was using hg19 from UCSC golden path, the I also tried with the fasta reference files (and .dict file) but still gatting the same error (command and errror is shown below):

    12:09:27.234 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/akansha/vivekruhela/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 12, 2021 12:09:27 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    12:09:27.532 INFO GenomicsDBImport - ------------------------------------------------------------
    12:09:27.532 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
    12:09:27.532 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    12:09:27.532 INFO GenomicsDBImport - Executing as akansha@sbilab on Linux v4.4.0-169-generic amd64
    12:09:27.533 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-8u265-b01-0ubuntu2~16.04-b01
    12:09:27.533 INFO GenomicsDBImport - Start Date/Time: January 12, 2021 12:09:27 PM IST
    12:09:27.533 INFO GenomicsDBImport - ------------------------------------------------------------
    12:09:27.533 INFO GenomicsDBImport - ------------------------------------------------------------
    12:09:27.533 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
    12:09:27.533 INFO GenomicsDBImport - Picard Version: 2.23.3
    12:09:27.534 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    12:09:27.534 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    12:09:27.534 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    12:09:27.534 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    12:09:27.534 INFO GenomicsDBImport - Deflater: IntelDeflater
    12:09:27.534 INFO GenomicsDBImport - Inflater: IntelInflater
    12:09:27.534 INFO GenomicsDBImport - GCS max retries/reopens: 20
    12:09:27.534 INFO GenomicsDBImport - Requester pays: disabled
    12:09:27.534 INFO GenomicsDBImport - Initializing engine
    12:09:32.225 INFO FeatureManager - Using codec IntervalListCodec to read file file:///home/akansha/vivekruhela/hg19_v0_HybSelOligos_whole_exome_illumina_coding_v1_whole_exome_illumina_coding_v1.Homo_sapiens_assembly19.targets.interval_list
    12:09:32.259 INFO GenomicsDBImport - Shutting down engine
    [January 12, 2021 12:09:32 PM IST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.08 minutes.
    Runtime.totalMemory()=2767716352
    ***********************************************************************

    A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig 1 given as location, but this contig isn't present in the Fasta sequence dictionary

    ***********************************************************************

    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    Using GATK jar /home/akansha/vivekruhela/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/akansha/vivekruhela/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar GenomicsDBImport -R /home/akansha/vivekruhela/gatk_bundle/hg19_v0_Homo_sapiens_assembly19.fasta -V /home/akansha/vivekruhela/PON/SRR1566827.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566831.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566833.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566835.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566859.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566889.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566901.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566928.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566933.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566957.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566967.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566985.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606005.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606009.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606035.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606043.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606045.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1606068.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128443.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128462.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128495.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128507.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128517.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128529.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128531.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128544.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128570.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128646.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128650.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128654.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128704.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128752.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128950.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128952.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128954.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128962.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128969.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2128995.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129015.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129022.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129031.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129042.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129072.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129091.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129142.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129150.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129220.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129222.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129229.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129259.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2182804.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2182806.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3162947.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3162978.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163146.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163205.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163605.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163648.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163652.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163653.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163678.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163680.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163705.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163711.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4188892.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4188955.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189328.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189350.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189588.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189593.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189594.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189597.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189602.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189605.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189606.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189699.vcf.gz -V /home/akansha/vivekruhela/PON/SRR4189725.vcf.gz -V /home/akansha/vivekruhela/PON/SRR1566955.vcf.gz -V /home/akansha/vivekruhela/PON/SRR2129174.vcf.gz -V /home/akansha/vivekruhela/PON/SRR3163666.vcf.gz --genomicsdb-workspace-path pon_db --tmp-dir /home/akansha/vivekruhela/tmp1 -L /home/akansha/vivekruhela/hg19_v0_HybSelOligos_whole_exome_illumina_coding_v1_whole_exome_illumina_coding_v1.Homo_sapiens_assembly19.targets.interval_list

     

    EDIT1: I want to add my next attempt here. This time the prevous error is resolved but GenomicsDBImport is still not working. I have downloaded the hg38 intervals from GATK resource bundle and used UCSC liftover to convert it into hg19 based interval list. The error message is shown below:

    15:36:14.002 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/akansha/vivekruhela/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 12, 2021 3:36:14 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    15:36:14.265 INFO GenomicsDBImport - ------------------------------------------------------------
    15:36:14.266 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.9.0
    15:36:14.266 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    15:36:14.266 INFO GenomicsDBImport - Executing as akansha@sbilab on Linux v4.4.0-169-generic amd64
    15:36:14.266 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-8u265-b01-0ubuntu2~16.04-b01
    15:36:14.266 INFO GenomicsDBImport - Start Date/Time: January 12, 2021 3:36:13 PM IST
    15:36:14.266 INFO GenomicsDBImport - ------------------------------------------------------------
    15:36:14.267 INFO GenomicsDBImport - ------------------------------------------------------------
    15:36:14.267 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
    15:36:14.267 INFO GenomicsDBImport - Picard Version: 2.23.3
    15:36:14.267 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    15:36:14.267 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    15:36:14.267 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    15:36:14.268 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    15:36:14.268 INFO GenomicsDBImport - Deflater: IntelDeflater
    15:36:14.268 INFO GenomicsDBImport - Inflater: IntelInflater
    15:36:14.268 INFO GenomicsDBImport - GCS max retries/reopens: 20
    15:36:14.268 INFO GenomicsDBImport - Requester pays: disabled
    15:36:14.268 INFO GenomicsDBImport - Initializing engine
    15:36:18.805 INFO FeatureManager - Using codec BEDCodec to read file file:///home/akansha/vivekruhela/gatk_bundle/hglft_genome_3bc14_d6f440.bed
    15:36:18.810 INFO IntervalArgumentCollection - Processing 0 bp from intervals
    15:36:18.847 INFO GenomicsDBImport - Done initializing engine
    15:36:19.417 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
    15:36:19.418 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/akansha/vivekruhela/pon_db/vidmap.json
    15:36:19.418 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/akansha/vivekruhela/pon_db/callset.json
    15:36:19.418 INFO GenomicsDBImport - Complete VCF Header will be written to /home/akansha/vivekruhela/pon_db/vcfheader.vcf
    15:36:19.418 INFO GenomicsDBImport - Importing to workspace - /home/akansha/vivekruhela/pon_db
    15:36:19.418 INFO ProgressMeter - Starting traversal
    15:36:19.418 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
    15:36:19.467 INFO GenomicsDBImport - Shutting down engine
    [January 12, 2021 3:36:19 PM IST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.09 minutes.
    Runtime.totalMemory()=2850029568
    java.lang.IndexOutOfBoundsException: Index: 0
    at java.util.Collections$EmptyList.get(Collections.java:4456)
    at org.genomicsdb.model.GenomicsDBImportConfiguration$ImportConfiguration.getColumnPartitions(GenomicsDBImportConfiguration.java:2083)
    at org.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:203)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:745)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    Kindly suggest. Thanks.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi vivekruhela, could you make a new post about this issue? The user above is running PreProcessIntervals but you are running GenomicsDBImport, so it may be a different issue. Please see this document about what to include in your post: https://gatk.broadinstitute.org/hc/en-us/articles/360053424571-How-to-Write-a-Post

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk