Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Error was: Sequence dictionary and index contain different numbers of contigs

0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Adam Session,

    I think you are getting this error message because a GFF file cannot be used as an interval file. Please see this resource we have on interval files: https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Adam Session

    Genevieve-Brandt-she-her,

     

    This error actually shows up whether I try to filter the file using a gff, bed, or not at all. I copied the preferred command but did troubleshoot that prior.

     

    Using GATK jar /global/u2/a/asession/SCRIPTS/GATK/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /global/u2/a/asession/SCRIPTS/GATK/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar PreprocessIntervals -O intervals.out -R ../hic_output.fasta --bin-length 0 --interval-merging-rule OVERLAPPING_ONLY
    09:21:23.665 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/global/u2/a/asession/SCRIPTS/GATK/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Feb 10, 2021 9:21:23 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    09:21:23.859 INFO PreprocessIntervals - ------------------------------------------------------------
    09:21:23.859 INFO PreprocessIntervals - The Genome Analysis Toolkit (GATK) v4.1.9.0
    09:21:23.859 INFO PreprocessIntervals - For support and documentation go to https://software.broadinstitute.org/gatk/
    09:21:23.859 INFO PreprocessIntervals - Executing as asession@cori08 on Linux v4.12.14-150.63-default amd64
    09:21:23.859 INFO PreprocessIntervals - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_202-b08
    09:21:23.860 INFO PreprocessIntervals - Start Date/Time: February 10, 2021 9:21:23 AM PST
    09:21:23.860 INFO PreprocessIntervals - ------------------------------------------------------------
    09:21:23.860 INFO PreprocessIntervals - ------------------------------------------------------------
    09:21:23.860 INFO PreprocessIntervals - HTSJDK Version: 2.23.0
    09:21:23.860 INFO PreprocessIntervals - Picard Version: 2.23.3
    09:21:23.860 INFO PreprocessIntervals - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:21:23.861 INFO PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:21:23.861 INFO PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:21:23.861 INFO PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:21:23.861 INFO PreprocessIntervals - Deflater: IntelDeflater
    09:21:23.861 INFO PreprocessIntervals - Inflater: IntelInflater
    09:21:23.861 INFO PreprocessIntervals - GCS max retries/reopens: 20
    09:21:23.861 INFO PreprocessIntervals - Requester pays: disabled
    09:21:23.861 INFO PreprocessIntervals - Initializing engine
    09:21:24.017 INFO PreprocessIntervals - Shutting down engine
    [February 10, 2021 9:21:24 AM PST] org.broadinstitute.hellbender.tools.copynumber.PreprocessIntervals done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=2077229056
    ***********************************************************************

    A USER ERROR has occurred: Couldn't read file file:///global/cscratch1/sd/asession/Rmuscosa/Assem2/Try2/GATK/../hic_output.fasta. Error was: Sequence dictionary and index contain different numbers of contigs

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Adam Session, is that command above when you ran it with no interval file?

    Could you also provide the link to the forum you were looking at so that I do not have you re try many steps?

    Thank you!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Adam Session

    Genevieve-Brandt-she-her,

     

    Yes, the command was " ~/SCRIPTS/GATK/gatk-4.1.9.0/gatk PreprocessIntervals -O intervals.out -R ../hic_output.fasta --bin-length 0 --interval-merging-rule OVERLAPPING_ONLY >&error.log"

     

    Could the scaffold naming scheme cause issues if they contain specific characters? I've had some issues with other programs not liking the semicolons in the scaffold names, example copied here "ScmvaoM_5596;HRSCAF=7745".

     

     

    The other thread is here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360075354612-Sequence-dictionary-and-index-contain-different-numbers-of-contigs

     

    Thanks,

    Adam

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks so much. Could you try deleting the sequence dictionary for your reference file and re-creating it with CreateSequenceDictionary

    The other possible problem is an issue when you downloaded your reference, and it is truncated. You could try to re-download the reference file as well.

    0
    Comment actions Permalink
  • Avatar
    Adam Session

    Genevieve-Brandt-she-her,

     

    Thank you for the help. Remaking the .dict file worked.

     

    Adam

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great, thank you for the update!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk