Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

BaseRecalibrator - no suitable codecs

Answered
1

30 comments

  • Avatar
    Bhanu Gandham

    Hi,

     

    The input for knownsites argument should be a vcf file and you are providing it a vcf-index file. Take a look at the tools docs: https://gatk.broadinstitute.org/hc/en-us/articles/360041850511-BaseRecalibrator

    0
    Comment actions Permalink
  • Avatar
    Rhodri Smith

    Hi Bhanu 

    Thanks for your reply

    I think I initially got errors with my vcf file as it was in a different folder to my index file and that was the problem. I then started using the index files as you identified. I have now got it working. Many thanks for your time and help

    Best wishes 

    0
    Comment actions Permalink
  • Avatar
    Miaoran ZHANG

    hi ,when i use ~/software/gatk-4.2.0.0/gatk FastaAlternateReferenceMaker -R chr1.fa -O chr1gatk.fasta -V variations.vcf.idx to build myown reference fasta file, I meet an error like this:

    A USER ERROR has occurred: Cannot read file:///data/zhangmr/peng/modifref/variations.vcf.idx because no suitable codecs found

    the full log is here:

    Using GATK jar /home/zhangmr/software/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/zhangmr/software/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar FastaAlternateReferenceMaker -R chr1.fa -O chr1gatk.fasta -V /data/zhangmr/peng/modifref/variations.vcf.idx
    10:34:28.181 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/zhangmr/software/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Apr 15, 2021 10:34:35 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    10:34:35.402 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
    10:34:35.403 INFO FastaAlternateReferenceMaker - The Genome Analysis Toolkit (GATK) v4.2.0.0
    10:34:35.403 INFO FastaAlternateReferenceMaker - For support and documentation go to https://software.broadinstitute.org/gatk/
    10:34:35.403 INFO FastaAlternateReferenceMaker - Executing as zhangmr@centaur on Linux v4.15.0-128-generic amd64
    10:34:35.403 INFO FastaAlternateReferenceMaker - Java runtime: OpenJDK 64-Bit Server VM v11.0.10+9-Ubuntu-0ubuntu1.18.04
    10:34:35.404 INFO FastaAlternateReferenceMaker - Start Date/Time: April 15, 2021 at 10:34:28 AM CST
    10:34:35.404 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
    10:34:35.404 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Version: 2.24.0
    10:34:35.405 INFO FastaAlternateReferenceMaker - Picard Version: 2.25.0
    10:34:35.405 INFO FastaAlternateReferenceMaker - Built for Spark Version: 2.4.5
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    10:34:35.405 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    10:34:35.405 INFO FastaAlternateReferenceMaker - Deflater: IntelDeflater
    10:34:35.405 INFO FastaAlternateReferenceMaker - Inflater: IntelInflater
    10:34:35.406 INFO FastaAlternateReferenceMaker - GCS max retries/reopens: 20
    10:34:35.406 INFO FastaAlternateReferenceMaker - Requester pays: disabled
    10:34:35.406 INFO FastaAlternateReferenceMaker - Initializing engine
    10:34:35.567 INFO FastaAlternateReferenceMaker - Shutting down engine
    [April 15, 2021 at 10:34:35 AM CST] org.broadinstitute.hellbender.tools.walkers.fasta.FastaAlternateReferenceMaker done. Elapsed time: 0.12 minutes.
    Runtime.totalMemory()=2155872256
    ***********************************************************************

    A USER ERROR has occurred: Cannot read file:///data/zhangmr/peng/modifref/variations.vcf.idx because no suitable codecs found

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    could you help me to sovle this problem?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Miaoran ZHANG,

    It looks like this error is coming from a file format issue. Here are the tool docs for that tool: FastaAlternateReferenceMaker. The tool needs a VCF input for the -V argument, and you are submitting an index file (/data/zhangmr/peng/modifref/variations.vcf.idx) instead of the VCF file.

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Ekin Köni

    ey, ı am facing the same error and can't figure it out away. 

    A USER ERROR has occurred: Cannot read file:///mnt/e/thesis/data/hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz because no suitable codecs found

    ı have all there reference related files that ı have installed from resource bundle

    hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi                hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz

    hg38_v0_Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf         hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi

    hg38_v0_Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz.tbi  hg38_v0_hapmap_3.3.hg38.vcf

    hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf                             hg38_v0_hapmap_3.3.hg38.vcf.gz.tbi

    hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz                          hg38_v0_wgs_calling_regions.hg38.interval_list                                               hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz.tbi                     

    hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.idx                                                               hg38_v0_Homo_sapiens_assembly38.dict                                    

                                                          hg38_v0_Homo_sapiens_assembly38.fasta                                   

    hg38_v0_Homo_sapiens_assembly38.fasta.64.alt                            

    hg38_v0_Homo_sapiens_assembly38.fasta.64.amb                            

    hg38_v0_Homo_sapiens_assembly38.fasta.64.ann                            

    hg38_v0_Homo_sapiens_assembly38.fasta.64.bwt                             

    hg38_v0_Homo_sapiens_assembly38.fasta.64.pac                             hg38_v0_1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf      hg38_v0_Homo_sapiens_assembly38.fasta.64.sa                              hg38_v0_1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx  hg38_v0_Homo_sapiens_assembly38.fasta.fai                                hg38_v0_1000G_omni2.5.hg38.vcf                                          hg38_v0_Homo_sapiens_assembly38.known_indels.vcf                         hg38_v0_1000G_omni2.5.hg38.vcf.gz.tbi                                   hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz                      hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf                      hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz.tbi                  hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz                   hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf            

     

    ı used the bgzip command to convert my vcf files to vcf.gz as it is the proper way.

    but still, ı'm facing the same error.

    the command that ı run is;

     

    gatk BaseRecalibrator \

       -I mySample68snc.bam \

       -R hg38_v0_Homo_sapiens_assembly38.fasta \

       --known-sites hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf.gz \

       --known-sites hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz \

       --known-sites hg38_v0_Homo_sapiens_assembly38.known_indels.vcf.gz \

       --known-sites hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \

       -O recal_data1.table

     

     

    so if you have a suggestion ıt would make me so happy. Thank you...

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Ekin Köni,

    This error typically indicates that the input file is not being recognized properly as a vcf file. Could you try running "gzcat vcf | head -1" for the file that is causing the issue and paste the header line here? (example "##fileformat=VCFv4.2").

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Ekin Köni

    ##fileformat=VCFv4.1

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Ekin Köni, okay thank you for providing this. What version of GATK are you using?

    0
    Comment actions Permalink
  • Avatar
    Ekin Köni

    The Genome Analysis Toolkit (GATK) v4.2.3.0

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Ekin Köni,

    Thank you. It's possible that there is an issue with how this file was compressed, causing it to be malformed. Could you try running PrintBGZFBlockInformation on the file? If there is an error, you may need to retry running bgzip on the vcf.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Ekin Köni

    ı will try this suggestion thank you. 

    0
    Comment actions Permalink
  • Avatar
    Rea Kalampaliki

    Hello! I am receiving the exact error with the file: hg19_v0_Homo_sapiens_assembly19.dbsnp138.vcf

    Please help:)

    ref_snps='./reference_genome/hg19_v0_Homo_sapiens_assembly19.dbsnp138.vcf'
    ref_indels='./reference_genome/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf'
    ref='./reference_genome/hg19_v0_Homo_sapiens_assembly19.fasta'

    #..more lines with code..
    ./gatk BaseRecalibrator \
                    -I $bam_marked_dup \
                    -R $ref \
                    --known-sites $ref_snps \
                    --known-sites $ref_indels \
                    -O $recal_table \
    # ..result..
    A USER ERROR has occurred: Cannot read file:///home/user/gatk_project/gatk-4.2.5.0/./reference_genome/hg19_v0_Homo_sapiens_assembly19.dbsnp138.vcf because no suitable codecs found
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Ekin Köni, do you have any suggestions that might help out Rea Kalampaliki?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Rea Kalampaliki if you are still having issues with this file, go ahead and make a new post so we can walk you through the troubleshooting steps.

    0
    Comment actions Permalink
  • Avatar
    oshara chamodi

    I get the same error and do not seem to know whats wrong.

    Using GATK jar /home/hi/Documents/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hi/Documents/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar BaseRecalibrator -I /home/hi/Documents/bam_reads/file_copy_sorted.bam -R /home/hi/Documents/data/ref/hg38.fa --known-sites /home/hi/Documents/data/ref/Mills_and_1000G_gold_standard.indels.hg38.vcf -O /home/hi/Documents/data/sorted_ModyRead1.table
    04:16:30.217 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/hi/Documents/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    04:16:30.446 INFO  BaseRecalibrator - ------------------------------------------------------------
    04:16:30.451 INFO  BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.4.0.0
    04:16:30.459 INFO  BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
    04:16:30.459 INFO  BaseRecalibrator - Executing as oshi@oshi-VirtualBox on Linux v5.15.0-86-generic amd64
    04:16:30.459 INFO  BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v17.0.8.1+1-Ubuntu-0ubuntu120.04
    04:16:30.460 INFO  BaseRecalibrator - Start Date/Time: October 19, 2023 at 4:16:30 AM IST
    04:16:30.460 INFO  BaseRecalibrator - ------------------------------------------------------------
    04:16:30.460 INFO  BaseRecalibrator - ------------------------------------------------------------
    04:16:30.463 INFO  BaseRecalibrator - HTSJDK Version: 3.0.5
    04:16:30.464 INFO  BaseRecalibrator - Picard Version: 3.0.0
    04:16:30.471 INFO  BaseRecalibrator - Built for Spark Version: 3.3.1
    04:16:30.472 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    04:16:30.480 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    04:16:30.495 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    04:16:30.495 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    04:16:30.496 INFO  BaseRecalibrator - Deflater: IntelDeflater
    04:16:30.497 INFO  BaseRecalibrator - Inflater: IntelInflater
    04:16:30.498 INFO  BaseRecalibrator - GCS max retries/reopens: 20
    04:16:30.498 INFO  BaseRecalibrator - Requester pays: disabled
    04:16:30.499 INFO  BaseRecalibrator - Initializing engine
    04:16:31.464 INFO  BaseRecalibrator - Shutting down engine
    [October 19, 2023 at 4:16:31 AM IST] org.broadinstitute.hellbender.tools.walkers.bqsr.BaseRecalibrator done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=63025152
    ***********************************************************************

    A USER ERROR has occurred: Cannot read file:///home/hi/Documents/data/ref/Mills_and_1000G_gold_standard.indels.hg38.vcf because no suitable codecs found

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. 

     

    please help if possible. thank you

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi oshara chamodi

    Can you check the contents of the file 

    /home/hi/Documents/data/ref/Mills_and_1000G_gold_standard.indels.hg38.vcf

    with bcftools view? It is possible that the file is corrupt or it may be empty. 

    0
    Comment actions Permalink
  • Avatar
    Heather Peng

    Hello, I'm currently trying to gatk IndexFeatureFile my vcf file, but it reported A USER ERROR has occurred: Cannot read file filtered_knownsites.vcf because no suitable codecs found.
    When I cat filtered_knownsites.vcf | head -100, the following output shows:
    chr1    8615    rs202245468     CCT     C       .       PASS    CAF=[0.9766,0.02342];COMMON=1;KGPROD;KGPhase1;KGPilot123;RS=202245468;RSPOS=233587;SAO=0;SSR=0;VC=DIV;VP=0x05000000000110001c000200;WGT=1;dbSNPBuildID=137
    chr1    8714    rs199948150     GC      G       .       PASS    OTHERKG;RS=199948150;RSPOS=233686;SAO=0;SSR=0;VC=DIV;VP=0x050000000001000002000200;WGT=1;dbSNPBuildID=137
    chr1    9261    rs201432159     G       A       .       PASS    OTHERKG;RS=201432159;RSPOS=234232;SAO=0;SSR=0;VC=SNV;VP=0x050000000001000002000100;WGT=1;dbSNPBuildID=137
    chr1    9342    rs56055731      C       T       .       PASS    HD;OTHERKG;RS=56055731;RSPOS=234313;SAO=0;SSR=0;VC=SNV;VP=0x050000000001000402000100;WGT=1;dbSNPBuildID=129
    chr1    9435    rs111659307     A       G       .       PASS    OTHERKG;RS=111659307;RSPOS=234408;SAO=0;SSR=0;VC=SNV;VP=0x050000000001000002000100;WGT=1;dbSNPBuildID=132
    chr1    9508    rs8179403       T       A       .       PASS    GNO;OTHERKG;RS=8179403;RSPOS=234481;SAO=0;SLO;SSR=0;VC=SNV;VLD;VP=0x050100000001040102000100;WGT=1;dbSNPBuildID=117
    chr1    9558    rs111281142     A       G       .       PASS    OTHERKG;RS=111281142;RSPOS=234534;SAO=0;SSR=0;VC=SNV;VP=0x050000000001000002000100;WGT=1;dbSNPBuildID=132
    chr1    9580    rs113278145     C       T       .       PASS    OTHERKG;RS=113278145;RSPOS=234556;SAO=0;SSR=0;VC=SNV;VP=0x050000000001000002000100;WGT=1;dbSNPBuildID=132

    One note worthy thing is that I am actually testing CHM13 for somatic variant calling and am trying to at least do BQSR, so I lifted the Homo_Sapiens_Assembly38.vcf with CrossMap using a chain file then splitted the lines that includes "duplicated alleles" such as (REF:G, ALT:G,A to REF:G, ALT:G and REF:G ALT:A) then removed the lines where there are only G (REF:G, ALT:G).

    I would really, really appreciate any help. Thanks!!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Heather Peng

    Looks like your file is lacking a proper header section for a VCF therefore our tool cannot index. 

    Regards. 

    1
    Comment actions Permalink
  • Avatar
    Heather Peng

    Gökalp Çelik Thank you so much I'll look into that right now.

    0
    Comment actions Permalink
  • Avatar
    Heather Peng

    After a series of revising, I got A USER ERROR has occurred: Error while trying to create index for sorted_filtered_knownsites.vcf. Error was: htsjdk.tribble.TribbleException: Line 94: there aren't enough columns for line  (we expected 9 tokens, and saw 1 )

    when I sed -n '94p' sorted_filtered_knownsites.vcf, I got
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
    what could be the problem? Thank you!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Those columns must be tab separated. If not it will only detect one column. 

    0
    Comment actions Permalink
  • Avatar
    Heather Peng

    Thank you for your fast response!

    However when `cat -A sorted_filtered_knownsites.vcf | sed -n '94p'`, it did gave me #CHROM^IPOS^IID^IREF^IALT^IQUAL^IFILTER^IINFO$ so I guess it's tab separated already?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    You may need to run ValidateVariants tool to check the integrity of the VCF file. 

    0
    Comment actions Permalink
  • Avatar
    Heather Peng

    I ran it but it is still returning 
    tsjdk.tribble.TribbleException: Line 94: there aren't enough columns for line  (we expected 9 tokens, and saw 1 ), for input source: sorted_filtered_knownsites.vcf

    Sorry to keep asking but is there any other angle I could look into this issue? 

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Your VCF is malformed at line 94. There is certainly an incompatible separator for the line that is in effect. Tab separator is '\t' therefore you need to fix that line with the proper separator. Without this fix there is really no way to continue further from this point. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Heather Peng

    I tried all the ways I could think of, still getting the same error.. so I tried to compare the two:
    cat -A  Homo_sapiens_assembly38.vcf | sed -n '3436p'
    #CHROM^IPOS^IID^IREF^IALT^IQUAL^IFILTER^IINFO$

    cat -A  v2_tabbed_knownsites.vcf | sed -n '94p'
    #CHROM^IPOS^IID^IREF^IALT^IQUAL^IFILTER^IINFO$

    They are giving same results, is this the end of this road? 
    Thank you for your patience.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    If there is a way that you can open this file with a text editor you may be able to see if those columns are separated with tab or not. Commandline does not always show those whitespace characters correctly. 

    You can grab the header section of the file with 

    bcftools view -h 

    and later fix the corresponding line and later reheader the file with bcftools reheader tool to see if it works. If you are not satisfied with GATK's error message you may be able to try bgzipping and indexing the file with bgzip and tabix to see if they also raise an error with that file. 

    I hope these help. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Can you also check the 94th variant entry in the file below the header section?

    0
    Comment actions Permalink
  • Avatar
    Can Kockan

    Just a tiny addition to Gökalp Çelik's suggestions regarding potential whitespace issues; the last two replies on this thread might be worth checking out:

    https://gatk.broadinstitute.org/hc/en-us/community/posts/21369679658267-Tribble-can-t-find-CHROM-header-but-line-is-present

     

    0
    Comment actions Permalink
  • Avatar
    Heather Peng

    Gökalp Çelik Can Kockan Thank you for kindly advising the next step.

    I think something went wrong at the very start (i.e. CrossMap) so decided to take alternative route to approach CHM13 somatic calling.

    Still the above discuss helped me gain a lot of insights into VCF procssing, much appreciated!! 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk