Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Resource bundle Follow

16 comments

  • Avatar
    Nickier

    Hello! Could you improve a readme file to introduct the resource files?  

    5
    Comment actions Permalink
  • Avatar
    Ashi

    Hi,

    I am trying to run BaseRecalibrator with my WGS data.

    My ref is hg19 reference. Where can I get SNP and Indel vcf files in hg19 version?

     

    I found b37 version files (below) in google cloud gs://gatk-legacy-bundles, but not for hg19.

    dbsnp_138.b37.vcf

    1000G_phase1.indels.b37.vcf (currently from the 1000 Genomes Phase I indel calls)

    Mills_and_1000G_gold_standard.indels.b37.sites.vcf

     

    3
    Comment actions Permalink
  • Avatar
    Ashi

    Hi

    I have another question about hg38 genome reference fasta file.

    I downloaded "Homo_sapiens_assembly38.fasta" from your Google Cloud bucket

    https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/

    But, this fasta file does not have chrEBV seq.

    To compare, next, I downloaded "hg38" file from here, https://support.illumina.com/sequencing/sequencing_software/igenome.html 

    hg38.fasta has chrEBV seq (just checked by "grep hg38.fasta").

     

    Is there any reason for excluding chrEBV from your bundle-reference, hg38?  

     

     

     

    1
    Comment actions Permalink
  • Avatar
    Joy Bordini

    Hi Ashi,

     

    I'm facing the same problem in retrieving hg19 resources. In particular:

    • 1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg19.vcf
    • Axiom_Exome_Plus.genotypes.all_populations.poly.hg19.vcf.gz
    • Homo_sapiens_assembly19.known_indels.vcf.gz

    Did you find a way to get them?

     

    Thanks

     

    Joy

    0
    Comment actions Permalink
  • Avatar
    Patrick Blaney

    Hello,

    First, thank you to the members of the Broad for putting this bundle together for the genomics community.

    I had a question regarding the specific assembly build of the hg38 reference genome. In the documentation there is a reference to the GRCh38.p7 release in "Technical Documentation->Glossary->Reference Genome Components" and then again it is mentioned in "Technical Documentation->Glossary-Human genome reference builds - GRCh38 or hg38 - b37 - hg19". However in the same paragraph it states "Note that the GATK team rarely if ever adopts patches due to constraints from our production operations. We are not currently able to provide support for the use of patches."

    Does this mean that the current FASTA file (Homo_sapiens_assembly38.fasta) in the resource bundle is in fact NOT GRCh38.p7? Instead it is the primary release GRCh38 from 2013 with no patches included? This was unclear to me as I searched all the documentation.

    Thank you,

    Patrick

    0
    Comment actions Permalink
  • Avatar
    Lingyu Zhan

    Hi GATK Team,

    First, thank you for this post. I want to download hg19 version resources for VariantRecalibrator. From this page, it seems like these resources were available through FTP Server, which is now disabled. Is there any official platform that still provides these resources? Thank you so much.

    Best

    Lingyu

     

    0
    Comment actions Permalink
  • Avatar
    Lingyu Zhan

    To add to my previous questions, it seems like that the 'genomics-public-data' bucket also does not contain the complete list of b37 resources (for VariantRecalibrator) as indicated either. I would like to know if there are any other buckets that contain a complete list for b37 resources too. Thank you so much.

    Best

    Lingyu

    0
    Comment actions Permalink
  • Avatar
    Limin Chen

    Hello, 

    I know that `Homo_sapiens_assembly38.fasta.64.amb` is one of the bwa index file, but what does `.64` mean in the file name while the original fasta file DOES NOT have `.64`. Why add `.64`? 

    Is it possible to create Readme.txt to explain what each file does?

    Thanks,

    Best,

    LC

    1
    Comment actions Permalink
  • Avatar
    HQ Zhao

    Can't access google cloud it says permission required

    https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/
    0
    Comment actions Permalink
  • Avatar
    Take Murata

    Hello, 

    I previously access ftp.broad.mit.edu/pub/human_STS_releases/july97/ to get “07-97.YAC2STS.txt”. Now, how can I get the file?

    With best regards.

    Take

    0
    Comment actions Permalink
  • Avatar
    Take Murata

    progress report
    I was able to access the ftp server and get the file.

    Thank you for the support.

    Take

    0
    Comment actions Permalink
  • Avatar
    Emily

    Dear GATK team and community, 

    I have WES data and have aligned in my previous steps with bwa-mem with the ref genome hg38. I am now looking to do the BaseRecalibrator and BQSR steps with the same reference genome hg38.

    However, the  text above " In addition, we are currently transitioning to support the Grch38/hg38 reference build, but have not yet generated all of the files necessary for all use cases (in particular we are still missing the Hg38 version of the Broad's exome intervals)" has made me reconsider. Should I be using a different ref genome?

    Any advice or clarification would be great! 

    0
    Comment actions Permalink
  • Avatar
    Rahul Yadav

    I can't find a gtf file for Homo_sapiens_assembly38 in the resource bundle v0 - genom…blic-data – Bucket details – Cloud Storage – Google Cloud console

    0
    Comment actions Permalink
  • Avatar
    Gil Stelzer

    Hi GATK team

    Thanks for making this resource bundle.

    I was looking for an annotation file with gene symbols and their strand, exon\intron coordinates on the Grch38/hg38 build.  I looked through the resource bundle and found the following file - Homo_sapiens_assembly38.fasta.64.ann

    When I browsed the file I didn't see gene symbols (maybe I missed something).  If you have an annotation file that I am looking for do you also have it in gtf \ gff \ bed format?

    Many thanks,

    Gil

    0
    Comment actions Permalink
  • Avatar
    Yap Sing Yee

    Hi, GATK, currently I plan to use GATK 4 to find snp and compare the variants between samples, however I couldn't find the resource reference file for Vibrio spp., where do i get this file?? And how to setup and run GATK4 for my project??

    Thanks for your patience on my questions. Thank you!!

    0
    Comment actions Permalink
  • Avatar
    Julia Wiggeshoff

    Is there an estimate for when the exome files for the hg38 build will be released? The gtf files for that build are also still missing. Many thanks!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk