Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

af-only-gnomad.raw.sites.vcf

0

8 comments

  • Avatar
    Melcar Collodetti

    Changed GATK version to 4.2.6.1 and tried the following: 

    ./gatk-4.2.6.1/gatk GetPileupSummaries \
        --disable-sequence-dictionary-validation true \
        -I amostra1_sorted_rmdup_F4.bam \
        -V af-only-gnomad.raw.sites.vcf \
        -L amostra1_coverageBed20x.interval_list \
        -O amostra1.table
    But the same error occured
    0
    Comment actions Permalink
  • 0
    Comment actions Permalink
  • Avatar
    Melcar Collodetti

    I also tried:

    wget https://storage.googleapis.com/gatk-best-practices/somatic-b37/af-only-gnomad.raw.sites.vcf

    bgzip af-only-gnomad.raw.sites.vcf

    bcftools index -t af-only-gnomad.raw.sites.vcf.gz 

    Then I tried 

    ./gatk-4.2.6.1/gatk GetPileupSummaries \
        --disable-sequence-dictionary-validation true \
        -I amostra1_sorted_rmdup_F4.bam \
        -V af-only-gnomad.raw.sites.vcf.1.gz.tbi \
        -L amostra1_coverageBed20x.interval_list \
        -O amostra1.table
     
    and got the same error: A USER ERROR has occurred: Cannot read file:///workspace/Somatico_hg19/af-only-gnomad.raw.sites.vcf.1.gz.tbi because no suitable codecs found

     

    0
    Comment actions Permalink
  • Avatar
    Melcar Collodetti

    New try with GATK 4.2.2.0 and af-only-gnomad.raw.sites.vcf: 

    gatk-4.2.2.0/gatk IndexFeatureFile -I af-only-gnomad.raw.sites.vcf

    after index was done:

    ./gatk-4.2.2.0/gatk GetPileupSummaries \
        -I amostra1_sorted_rmdup_F4.bam \
        -V af-only-gnomad.raw.sites.vcf.idx \
        -L amostra1_coverageBed20x.interval_list \
        -O amostra1.table
     
    Got the same error: A USER ERROR has occurred: Cannot read file:///workspace/Somatico_hg19/af-only-gnomad.raw.sites.vcf.idx because no suitable codecs found
    0
    Comment actions Permalink
  • Avatar
    Melcar Collodetti

    IT FINALLY WORKED: 

    ./gatk-4.2.6.1/gatk GetPileupSummaries \
        --disable-sequence-dictionary-validation true \
        -I amostra1_sorted_rmdup_F4.bam \
        -V af-only-gnomad.raw.sites.vcf.1 \
        -L amostra1_coverageBed20x.interval_list \
        -O amostra1.table

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Melcar Collodetti

    Thanks for sharing your resolve in this matter. Looks like your former download was cut and file was downloaded with a different name. Which is probably what is happening to many of the similar cases posted here. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Melcar Collodetti

    You're welcome. 

    Basically, what needs to be done is:

    -Fix the link to https://storage.googleapis.com/gatk-best-practices/somatic-b37/af-only-gnomad.raw.sites.vcf

    -Index the raw file with:

    gatk-4.2.2.0/gatk IndexFeatureFile -I af-only-gnomad.raw.sites.vcf

    -Use GATK 4.2.6.1 to ignore the 'chr' prefix with:

    ./gatk-4.2.6.1/gatk GetPileupSummaries \
        --disable-sequence-dictionary-validation true \
        -I amostra1_sorted_rmdup_F4.bam \
        -V af-only-gnomad.raw.sites.vcf.1 \
        -L amostra1_coverageBed20x.interval_list \
        -O amostra1.table
    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Melcar Collodetti A few comments on the workflow, which I hope will be helpful:

    • The -L argument in GetPileupSummaries should usually be the same vcf file as the -V argument.
    • We recommend using the small_exac_common VCF in the same best practices google bucket as the -V argument, not the af-only-gnomad.
    • GATK tools require VCF inputs to be indexed but it is the VCF itself, not the .idx or .tbi index file that is used as the input.
    • If you install gsutil (google cloud utilities) you can run GATK tools using the gs:// bucket path for the -V argument (likewise for -I bam/cram files and -R reference files) without downloading them to your local machine.  The GATK uses the java NIO library to read only part of the files at once.  This also works with some files in a google bucket and some files present locally.
    • The workflow https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2.wdl in the GATK github repo does all of this for you.
    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk