Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Resource bundle Follow


  • Avatar

    Hello! Could you improve a readme file to introduct the resource files?  

    Comment actions Permalink
  • Avatar


    I am trying to run BaseRecalibrator with my WGS data.

    My ref is hg19 reference. Where can I get SNP and Indel vcf files in hg19 version?


    I found b37 version files (below) in google cloud gs://gatk-legacy-bundles, but not for hg19.


    1000G_phase1.indels.b37.vcf (currently from the 1000 Genomes Phase I indel calls)



    Comment actions Permalink
  • Avatar


    I have another question about hg38 genome reference fasta file.

    I downloaded "Homo_sapiens_assembly38.fasta" from your Google Cloud bucket

    But, this fasta file does not have chrEBV seq.

    To compare, next, I downloaded "hg38" file from here, 

    hg38.fasta has chrEBV seq (just checked by "grep hg38.fasta").


    Is there any reason for excluding chrEBV from your bundle-reference, hg38?  




    Comment actions Permalink
  • Avatar
    Joy Bordini

    Hi Ashi,


    I'm facing the same problem in retrieving hg19 resources. In particular:

    • 1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg19.vcf
    • Axiom_Exome_Plus.genotypes.all_populations.poly.hg19.vcf.gz
    • Homo_sapiens_assembly19.known_indels.vcf.gz

    Did you find a way to get them?





    Comment actions Permalink
  • Avatar
    Patrick Blaney


    First, thank you to the members of the Broad for putting this bundle together for the genomics community.

    I had a question regarding the specific assembly build of the hg38 reference genome. In the documentation there is a reference to the GRCh38.p7 release in "Technical Documentation->Glossary->Reference Genome Components" and then again it is mentioned in "Technical Documentation->Glossary-Human genome reference builds - GRCh38 or hg38 - b37 - hg19". However in the same paragraph it states "Note that the GATK team rarely if ever adopts patches due to constraints from our production operations. We are not currently able to provide support for the use of patches."

    Does this mean that the current FASTA file (Homo_sapiens_assembly38.fasta) in the resource bundle is in fact NOT GRCh38.p7? Instead it is the primary release GRCh38 from 2013 with no patches included? This was unclear to me as I searched all the documentation.

    Thank you,


    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk