Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How to use UpdateVCFSequenceDictionary if I don't have any input VCF files?

0

9 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi rahelp,

    For posts regarding GATK issues, we require three items to be included in the post.

    1. GATK version number
    2. Exact command used
    3. Complete Stack Trace/Error log [Use -DSTACK_TRACE_ON_USEREXCEPTION to print the stack trace.] How to submit java arguments.

    Your post is missing #2 and #3, which must be included so that we can thoroughly look into the problem.

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    rahelp

    Hi Genevieve,

    I apologise for the incomplete post. Here are the missing details:

    Exact command used:

    gatk PreprocessIntervals     -L targets_C.interval_list     -R Homo_sapiens_assembly19.fasta  --bin-length 0     --interval-merging-rule OVERLAPPING_ONLY     -O sandbox/targets_C.preprocessed.interval_list

     

    Complete Stack Trace/ Error log: 

    Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.3.0-local.jar PreprocessIntervals -L targets_C.interval_list -R Homo_sapiens_assembly19.fasta --bin-length 0 --interval-merging-rule OVERLAPPING_ONLY -O sandbox/targets_C.preprocessed.interval_list

    21:11:50.785 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    21:11:50.911 INFO  PreprocessIntervals - ------------------------------------------------------------

    21:11:50.912 INFO  PreprocessIntervals - The Genome Analysis Toolkit (GATK) v4.1.3.0

    21:11:50.912 INFO  PreprocessIntervals - For support and documentation go to https://software.broadinstitute.org/gatk/

    21:11:50.912 INFO  PreprocessIntervals - Executing as root@7645dd8077e3 on Linux v4.19.121-linuxkit amd64

    21:11:50.912 INFO  PreprocessIntervals - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12

    21:11:50.912 INFO  PreprocessIntervals - Start Date/Time: December 14, 2020 9:11:50 PM UTC

    21:11:50.912 INFO  PreprocessIntervals - ------------------------------------------------------------

    21:11:50.912 INFO  PreprocessIntervals - ------------------------------------------------------------

    21:11:50.913 INFO  PreprocessIntervals - HTSJDK Version: 2.20.1

    21:11:50.913 INFO  PreprocessIntervals - Picard Version: 2.20.5

    21:11:50.913 INFO  PreprocessIntervals - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    21:11:50.913 INFO  PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    21:11:50.913 INFO  PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    21:11:50.913 INFO  PreprocessIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    21:11:50.913 INFO  PreprocessIntervals - Deflater: IntelDeflater

    21:11:50.913 INFO  PreprocessIntervals - Inflater: IntelInflater

    21:11:50.913 INFO  PreprocessIntervals - GCS max retries/reopens: 20

    21:11:50.913 INFO  PreprocessIntervals - Requester pays: disabled

    21:11:50.913 INFO  PreprocessIntervals - Initializing engine

    21:11:51.117 INFO  PreprocessIntervals - Shutting down engine

    [December 14, 2020 9:11:51 PM UTC] org.broadinstitute.hellbender.tools.copynumber.PreprocessIntervals done. Elapsed time: 0.01 minutes.

    Runtime.totalMemory()=158334976

    ***********************************************************************

     

    A USER ERROR has occurred: We require a sequence dictionary from a reference, a source of reads, or a source of variants to process intervals.  Since reference and reads files generally contain sequence dictionaries, this error most commonly occurs for VariantWalkers that do not require a reference or reads.  You can fix the problem by passing a reference file with a sequence dictionary via the -R argument or you can run the tool UpdateVCFSequenceDictionary on your vcf.

     

    ***********************************************************************

    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

     

    I hope this helps.

     

    Thank you.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi rahelp,

    Thanks for updating with more information! It looks like in your case, you are missing the sequence dictionary from the reference file you provided. You can create one using the tool CreateSequenceDictionary.

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    rahelp

    Dear Genevieve,

    Thank you for you reply.

    I have already created a dictionary file by using CreateSequence dictionary with the following command:

    gatk --java-options -Xmx12g CreateSequenceDictionary -R Homo_sapiens_assembly19.fasta

    and as a response I get :

    Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx12g -jar /gatk/gatk-package-4.1.3.0-local.jar CreateSequenceDictionary -R Homo_sapiens_assembly19.fasta

    INFO 2020-12-18 18:11:08 CreateSequenceDictionary Output dictionary will be written in Homo_sapiens_assembly19.dict

    18:11:08.306 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    [Fri Dec 18 18:11:08 UTC 2020] CreateSequenceDictionary  --REFERENCE Homo_sapiens_assembly19.fasta  --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

    [Fri Dec 18 18:11:08 UTC 2020] Executing as root@ on Linux 4.19.121-linuxkit amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.3.0

    I encounter the problem when I try to run preprocess intervals:

    A USER ERROR has occurred: We require a sequence dictionary from a reference, a source of reads, or a source of variants to process intervals.  Since reference and reads files generally contain sequence dictionaries, this error most commonly occurs for VariantWalkers that do not require a reference or reads.  You can fix the problem by passing a reference file with a sequence dictionary via the -R argument or you can run the tool UpdateVCFSequenceDictionary on your vcf.

    (see full stack trace in my last comment).

    I don't have any vcf files that I could use for UpdateVCFSequenceDictionary. 

    Please let me know if I can find a suitable vcf file somewhere or if there is another solution to this problem.

    Thank you!

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi, could you confirm that you only have only the correct dictionary file in your directory with the reference?

    -Genevieve

    0
    Comment actions Permalink
  • Avatar
    rahelp

    Hi!

    I checked now and I only have one dictionary file in the directory that was created with CreateSequenceDictionary (as in last message).

    Regards,

    rahelp

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I see, could you try the newest version of GATK and see if this issue persists? It does seem strange to me.

    0
    Comment actions Permalink
  • Avatar
    rahelp

    Dear Genevieve,

    I have now updated to the latest version.

    I noticed, that even though the dict file is created. The file is empty. 
    (gatk) root@8af0014f731a:/gatk/# gatk --java-options -Xmx12g CreateSequenceDictionary -R Homo_sapiens_assembly19.fasta

    Using GATK jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx12g -jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar CreateSequenceDictionary -R Homo_sapiens_assembly19.fasta

    INFO 2021-01-07 09:37:06 CreateSequenceDictionary Output dictionary will be written in Homo_sapiens_assembly19.dict

    09:37:06.501 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so

    [Thu Jan 07 09:37:06 GMT 2021] CreateSequenceDictionary --REFERENCE Homo_sapiens_assembly19.fasta --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

    [Thu Jan 07 09:37:06 GMT 2021] Executing as root@8af0014f731a on Linux 4.19.121-linuxkit amd64; OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.9.0-SNAPSHOT

    I had to use --java-options -Xmx12g otherwise I get an out of memory error. 

    I cannot detect any other possible errors in the error log. If you can help me solve this issue, I would be grateful.

    Regards,

    Rahel

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Rahel,

    I am wondering if the CreateSequenceDictionary command is not finishing which is resulting in the file being empty. Is the process fully being completed?

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk