Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Broad hg38 ICE interval list

Answered
3

17 comments

  • 2
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    Is there a standard place to look for this sort of thing?  I'm always confused trying to find interval lists.

    1
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    I've asked pipeline ops to move this interval list to the gcp-public--broad-references bucket - that is one of the standard places to find interval lists. The team is actually working on documentation about the buckets we point to in best practice pipelines in the gatk-worklows repos to provide more metadata to users. 

    1
    Comment actions Permalink
  • Avatar
    Nickier

    Hi Tiffany Miller  , I have downloaded the interval file from https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0/HybSelOligos,  the fifth field  of the file "whole_exome_illumina_coding_v1.Homo_sapiens_assembly38.targets.interval_list" is  the coordinates of hg19? What is the difference between the targets.interval_list and baits.interval_list?

    1
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    Nickier  Targets are the regions the exome capture is aiming to cover .  Baits are the positions of the actual bait sequences that are used in the exome capture process.   So targets are a region of interest that the assay is designed to target, and the baits are where the actual molecules that are used during the DNA capture process are aligned.  They roughly correspond, but targets are generally larger than baits and difficult regions / long targets may require multiple baits.

    In general you probably want to be using at the targets file, that's the region of interest.  If you're trying to analyze capture efficiency or something about the sequencing process itself you'd probably want to look at the baits as well.

    1
    Comment actions Permalink
  • Avatar
    Nickier

    Thank you very much, I still have a question. Can I use this bed file to replace the interval file? The bed file is downloaded from the CCDS database and converted to bed format.

    ## bed
    wget ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/current_human/CCDS.current.txt
    cat CCDS.current.txt | grep  "Public" | perl -alne '{/\[(.*?)\]/;next unless $1;$gene=$F[2];$exons=$1;$exons=~s/\s//g;$exons=~s/-/\t/g;print "$F[0]\t$_\t$gene" foreach split/,/,$exons;}'|sort -u |bedtools sort -i |awk '{if($3>$2) print "chr"$0}'  > hg38.exon.bed
    

    1
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Hi Chet! I am confirming with the production team, but I believe this is the one: gs://gcp-public-data--broad-references/hg38/v0/exome_calling_regions.v1.interval_list

     

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    This is not the correct file (apparently this is the joint calling interval list). We are finding the public one to point you to.

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Update: The file is scheduled to be released in the gcp-public--broad-references on Friday.

    0
    Comment actions Permalink
  • Avatar
    dannykwells

    Hi Tiffany Miller I am looking for the hg19 version of this file, both the targets and the baits. I poked around the buckets above but didn't see anything that immediately stood out. Any direction would be great. Thank you!!

     

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    Maybe we should have a readme file for all the public data folders that describe what the files are?

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Agreed Louis Bergelson .  There is a ReadMe, but it looks massively out of date. 

    dannykwells I've asked our team if we can get this file moved over. Then we have to coordinate with GCP to get it over since they are sponsoring the bucket. I'll let you know when that is done. May take a week or so. 

     

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Nickier the targets interval list for hg38 you pointed to was accurate for defining the hg38 target intervals for the ICE exome capture kit used at the Broad. Does using this bed file you are pointing to make sense for what you are doing? 

    0
    Comment actions Permalink
  • Avatar
    Nickier

    Tiffany Miller  Thanks~~ Actually I am not sure if this is correct I descripted above, I just saw it in some tutorials. Maybe I should use the * _Regions.bed provided by Agilent, because my exon capture kit is SureSelect Human All Exon V7, and I also downloaded this file on the Agilent website. By the way, do I need to convert bed file to interval file? On the left is the regions bed file I downloaded from the Agilent website, on the right is the target interval provided by GATK team.

    0
    Comment actions Permalink
  • Avatar
    Nickier

    I already got the answer at this tutorial, thank you again~~

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    You want to use the target files for the capture kit your data was generated with. What we provided here to answer the original post was for the ICE exome capture kit by Ilumina. 

    dannykwells I am still waiting on these files to get transferred to GCP. Sorry for the wait. 

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    dannykwells FYI, the hg19 version of this file, both the targets and the baits are now available in the gcp bucket: https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg19/v0/HybSelOligos/whole_exome_illumina_coding_v1/?pli=1

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk