Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

BaseRecalibrator - no suitable codecs

Answered
1

14 comments

  • Avatar
    Bhanu Gandham

    Hi,

     

    The input for knownsites argument should be a vcf file and you are providing it a vcf-index file. Take a look at the tools docs: https://gatk.broadinstitute.org/hc/en-us/articles/360041850511-BaseRecalibrator

    0
    Comment actions Permalink
  • Avatar
    Rhodri Smith

    Hi Bhanu 

    Thanks for your reply

    I think I initially got errors with my vcf file as it was in a different folder to my index file and that was the problem. I then started using the index files as you identified. I have now got it working. Many thanks for your time and help

    Best wishes 

    0
    Comment actions Permalink
  • Avatar
    Miaoran ZHANG

    hi ,when i use ~/software/gatk-4.2.0.0/gatk FastaAlternateReferenceMaker -R chr1.fa -O chr1gatk.fasta -V variations.vcf.idx to build myown reference fasta file, I meet an error like this:

    A USER ERROR has occurred: Cannot read file:///data/zhangmr/peng/modifref/variations.vcf.idx because no suitable codecs found

    the full log is here:

    Using GATK jar /home/zhangmr/software/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/zhangmr/software/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar FastaAlternateReferenceMaker -R chr1.fa -O chr1gatk.fasta -V /data/zhangmr/peng/modifref/variations.vcf.idx
    10:34:28.181 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/zhangmr/software/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Apr 15, 2021 10:34:35 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    10:34:35.402 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
    10:34:35.403 INFO FastaAlternateReferenceMaker - The Genome Analysis Toolkit (GATK) v4.2.0.0
    10:34:35.403 INFO FastaAlternateReferenceMaker - For support and documentation go to https://software.broadinstitute.org/gatk/
    10:34:35.403 INFO FastaAlternateReferenceMaker - Executing as zhangmr@centaur on Linux v4.15.0-128-generic amd64
    10:34:35.403 INFO FastaAlternateReferenceMaker - Java runtime: OpenJDK 64-Bit Server VM v11.0.10+9-Ubuntu-0ubuntu1.18.04
    10:34:35.404 INFO FastaAlternateReferenceMaker - Start Date/Time: April 15, 2021 at 10:34:28 AM CST
    10:34:35.404 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
    10:34:35.404 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Version: 2.24.0
    10:34:35.405 INFO FastaAlternateReferenceMaker - Picard Version: 2.25.0
    10:34:35.405 INFO FastaAlternateReferenceMaker - Built for Spark Version: 2.4.5
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    10:34:35.405 INFO FastaAlternateReferenceMaker - Deflater: IntelDeflater
    10:34:35.405 INFO FastaAlternateReferenceMaker - Inflater: IntelInflater
    10:34:35.406 INFO FastaAlternateReferenceMaker - GCS max retries/reopens: 20
    10:34:35.406 INFO FastaAlternateReferenceMaker - Requester pays: disabled
    10:34:35.406 INFO FastaAlternateReferenceMaker - Initializing engine
    10:34:35.567 INFO FastaAlternateReferenceMaker - Shutting down engine
    [April 15, 2021 at 10:34:35 AM CST] org.broadinstitute.hellbender.tools.walkers.fasta.FastaAlternateReferenceMaker done. Elapsed time: 0.12 minutes.
    Runtime.totalMemory()=2155872256
    ***********************************************************************

    A USER ERROR has occurred: Cannot read file:///data/zhangmr/peng/modifref/variations.vcf.idx because no suitable codecs found

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    could you help me to sovle this problem?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Miaoran ZHANG,

    It looks like this error is coming from a file format issue. Here are the tool docs for that tool: FastaAlternateReferenceMaker. The tool needs a VCF input for the -V argument, and you are submitting an index file (/data/zhangmr/peng/modifref/variations.vcf.idx) instead of the VCF file.

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Ekin Köni

    ey, ı am facing the same error and can't figure it out away. 

    A USER ERROR has occurred: Cannot read file:///mnt/e/thesis/data/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz because no suitable codecs found

    ı have all there reference related files that ı have installed from resource bundle

    hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi                hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz

    hg38_v0_Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf         hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi

    hg38_v0_Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz.tbi  hg38_v0_hapmap_3.3.hg38.vcf

    hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf                             hg38_v0_hapmap_3.3.hg38.vcf.gz.tbi

    hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz                          hg38_v0_wgs_calling_regions.hg38.interval_list                                               hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi                     

    hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.idx                                                               hg38_v0_Homo_sapiens_assembly38.dict                                    

                                                          hg38_v0_Homo_sapiens_assembly38.fasta                                   

    hg38_v0_Homo_sapiens_assembly38.fasta.64.alt                            

    hg38_v0_Homo_sapiens_assembly38.fasta.64.amb                            

    hg38_v0_Homo_sapiens_assembly38.fasta.64.ann                            

    hg38_v0_Homo_sapiens_assembly38.fasta.64.bwt                             

    hg38_v0_Homo_sapiens_assembly38.fasta.64.pac                             hg38_v0_1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf      hg38_v0_Homo_sapiens_assembly38.fasta.64.sa                              hg38_v0_1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx  hg38_v0_Homo_sapiens_assembly38.fasta.fai                                hg38_v0_1000G_omni2.5.hg38.vcf                                          hg38_v0_Homo_sapiens_assembly38.known_indels.vcf                         hg38_v0_1000G_omni2.5.hg38.vcf.gz.tbi                                   hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz                      hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf                      hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz.tbi                  hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz                   hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf            

     

    ı used the bgzip command to convert my vcf files to vcf.gz as it is the proper way.

    but still, ı'm facing the same error.

    the command that ı run is;

     

    gatk BaseRecalibrator \

       -I mySample68snc.bam \

       -R hg38_v0_Homo_sapiens_assembly38.fasta \

       --known-sites hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz \

       --known-sites hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz \

       --known-sites hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz \

       --known-sites hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \

       -O recal_data1.table

     

     

    so if you have a suggestion ıt would make me so happy. Thank you...

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Ekin Köni,

    This error typically indicates that the input file is not being recognized properly as a vcf file. Could you try running "gzcat vcf | head -1" for the file that is causing the issue and paste the header line here? (example "##fileformat=VCFv4.2").

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Ekin Köni

    ##fileformat=VCFv4.1

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Ekin Köni, okay thank you for providing this. What version of GATK are you using?

    0
    Comment actions Permalink
  • Avatar
    Ekin Köni

    The Genome Analysis Toolkit (GATK) v4.2.3.0

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Ekin Köni,

    Thank you. It's possible that there is an issue with how this file was compressed, causing it to be malformed. Could you try running PrintBGZFBlockInformation on the file? If there is an error, you may need to retry running bgzip on the vcf.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Ekin Köni

    ı will try this suggestion thank you. 

    0
    Comment actions Permalink
  • Avatar
    Rea Kalampaliki

    Hello! I am receiving the exact error with the file: hg19_v0_Homo_sapiens_assembly19.dbsnp138.vcf

    Please help:)

    ref_snps='./reference_genome/hg19_v0_Homo_sapiens_assembly19.dbsnp138.vcf'
    ref_indels='./reference_genome/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf'
    ref='./reference_genome/hg19_v0_Homo_sapiens_assembly19.fasta'

    #..more lines with code..
    ./gatk BaseRecalibrator \
                    -I $bam_marked_dup \
                    -R $ref \
                    --known-sites $ref_snps \
                    --known-sites $ref_indels \
                    -O $recal_table \
    # ..result..
    A USER ERROR has occurred: Cannot read file:///home/user/gatk_project/gatk-4.2.5.0/./reference_genome/hg19_v0_Homo_sapiens_assembly19.dbsnp138.vcf because no suitable codecs found
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Ekin Köni, do you have any suggestions that might help out Rea Kalampaliki?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Rea Kalampaliki if you are still having issues with this file, go ahead and make a new post so we can walk you through the troubleshooting steps.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk