Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK configration

0

5 comments

  • Avatar
    Chris Kachulis

    Hi Wondessen Ayalew,

     

    You need a fasta dictionary to go along with you reference fasta.  Unfortunately, the error message points to a page which doesn't appear to exist anymore, but you can use CreateSequenceDictionary in Picard to create the fasta dictionary.

    (Note, if you are new to GATK/Picard:  you can run Picard tools from GATK.  So if you don't have picard separately installed, you can run `gatk  CreateSequenceDictionary -R ../reference/Bos_taurus.ARS-UCD1.2.dna.toplevel.fa -O ../reference/Bos_taurus.ARS-UCD1.2.dna.toplevel.dict`)

    0
    Comment actions Permalink
  • 0
    Comment actions Permalink
  • Avatar
    Wondessen Ayalew

    Dear GATK team,

    Thank you for your prompt response and valuable support. The problem still persists even though I generated an index file using   " samtools faidx  Bos_taurus.ARS-UCD1.2.dna.toplevel.fa"     the files in my directory are listed below

    The error message is as follows

    (/opt/sw/gatk/4.3/gatk4_env) gatk BaseRecalibrator -I ../data/Afar_rD/Afar_1_dedup.bam -R ../reference/Bos_taurus.ARS-UCD1.2.dna.toplevel.fa --known-sites ../GATK_reso/ARS1.2PlusY_BQSR.vcf.gz -O Afar1_data.table
    Using GATK jar /export/opt/sw/gatk/4.3/gatk4_env/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /export/opt/sw/gatk/4.3/gatk4_env/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar BaseRecalibrator -I ../data/Afar_rD/Afar_1_dedup.bam -R ../reference/Bos_taurus.ARS-UCD1.2.dna.toplevel.fa --known-sites ../GATK_reso/ARS1.2PlusY_BQSR.vcf.gz -O Afar1_data.table
    06:37:47.263 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/export/opt/sw/gatk/4.3/gatk4_env/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    06:37:47.540 INFO  BaseRecalibrator - ------------------------------------------------------------
    06:37:47.541 INFO  BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.3.0.0
    06:37:47.541 INFO  BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
    06:37:47.541 INFO  BaseRecalibrator - Executing as wondossen@planetsmasher.hgen.slu.se on Linux v3.10.0-693.21.1.el7.x86_64 amd64
    06:37:47.541 INFO  BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    06:37:47.541 INFO  BaseRecalibrator - Start Date/Time: February 14, 2023 6:37:47 AM CET
    06:37:47.541 INFO  BaseRecalibrator - ------------------------------------------------------------
    06:37:47.542 INFO  BaseRecalibrator - ------------------------------------------------------------
    06:37:47.542 INFO  BaseRecalibrator - HTSJDK Version: 3.0.1
    06:37:47.542 INFO  BaseRecalibrator - Picard Version: 2.27.5
    06:37:47.542 INFO  BaseRecalibrator - Built for Spark Version: 2.4.5
    06:37:47.543 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    06:37:47.543 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    06:37:47.543 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    06:37:47.543 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    06:37:47.543 INFO  BaseRecalibrator - Deflater: IntelDeflater
    06:37:47.543 INFO  BaseRecalibrator - Inflater: IntelInflater
    06:37:47.543 INFO  BaseRecalibrator - GCS max retries/reopens: 20
    06:37:47.543 INFO  BaseRecalibrator - Requester pays: disabled
    06:37:47.544 INFO  BaseRecalibrator - Initializing engine
    06:37:48.593 INFO  FeatureManager - Using codec VCFCodec to read file file:///export/proj/ethiopian_cattle/NOBACKUP/../GATK_reso/ARS1.2PlusY_BQSR.vcf.gz
    06:37:48.606 INFO  BaseRecalibrator - Shutting down engine
    [February 14, 2023 6:37:48 AM CET] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=2171076608
    ***********************************************************************

    A USER ERROR has occurred: An index is required but was not found for file /export/proj/ethiopian_cattle/NOBACKUP/../GATK_reso/ARS1.2PlusY_BQSR.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    (/opt/sw/gatk/4.3/gatk4_env)

    Thank you!

     

    0
    Comment actions Permalink
  • Avatar
    Wondessen Ayalew

    Dear All,

    Thank you for your valuable comments. Your suggestions are working well regarding my GATK config. questions. In Addition, sorting of vcf file (dbSNPs) avoid the warning message posted in my second question. 

    Now, I did not find the second pass base recalibration to run AnalyseCovariance in GATK4.3. Rather, I escaped the AnalyseCovariance step and PrintRead step and tried for HaplotypeCaller. Any suggestions?

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    Hello again,

    This time it's complaining that your VCF doesn't have an index.  You can use the tool IndexFeatureFile to fix that

    ex:

    IndexFeatureFile -I /export/proj/ethiopian_cattle/NOBACKUP/../GATK_reso/ARS1.2PlusY_BQSR.vcf.gz

    I think it's likely that sorting the file sort of accidentally fixed the problem because it indexed the file as part of the sort operation.

    It's probably fine to skip AnalyzeCovariates if there isn't anything unusual about your sequencing.  Usually BQSR works fine and it's just a sanity check.  

    Did you run ApplyBQSR?  That's the step that actually does the recalibration.  So if you just run BaseRecalibrator without that you haven't actually done anything to your data.  That's probably fine too since modern high quality sequencing typically only benefits marginally from recalibration.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk