GATKgCNV References For Hg19 and Hg38
I cannot seem to find the recommended references by GATKgCNV for hg19 and hg38. I have used it before but I cannot find it currently. If this question is not allowed I can take it down. Does anyone perchance have this link? Thank you for your time.
-
Hi Y R
Do you mean the reference genomes or reference documents?
Regards.
-
I mean the reference fasta, interval lists, etc. anything that could be used for GATKgCNV given that it can have issues with mitochondrial regions as well as other regions.
-
Hi again.
We have resource bundles stored in google cloud. You can find the links below.
https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle
https://console.cloud.google.com/storage/browser/gatk-legacy-bundles/b37
https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0
I hope this helps.
-
This is where I start getting confused (plus an explanation of what confuses me):
1) The first link you mentioned (https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle ) leads to a Google Cloud bucket with 1000 genomes data for humans. Here ( https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references;tab=objects?prefix=&forceOnObjectsSortingFiltering=false ) and that does not seem to be useful for hg19 or hg38 if I am understanding this correctly?
2) The second link you mentioned (https://console.cloud.google.com/storage/browser/gatk-legacy-bundles/b37;tab=objects?prefix=&forceOnObjectsSortingFiltering=false ) seems to be for 1000 genomes and dbsnp data. I am not sure how I can apply this to GATKgCNV for hg19 or hg38?
3) The third link you mentioned ( https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0;tab=objects?prefix=&forceOnObjectsSortingFiltering=false ) seems to have the fasta, fasta.fai, and `wgs_calling_regions.hg38.interval_list` for hg38. This seems to be for whole genome sequenced data (assuming I am interpreting wgs correctly?). I am dealing with exome data though.
Main Question Regarding Exome Data:
Where is the hg19 reference data for exomes (fasta, fai, intervals_list - used as inputs for GATKgCNV) and where is the hg38 exome reference data for exomes (fasta, fai, intervals_list - used as inputs for GATKgCNV) ? If this is okay to ask? Gökalp Çelik
-
Hi
The first link is the article that explains about our resource bundles that we post to every user asking for our reference genomes. There may be bunch of other links in there however I provided those particular ones after that.
First link is the GRCh37 GATK resource bundle that we used in our pipelines for aligning. Second link is the GRCh38 reference that GATK is currently supporting and using in the pipelines. Both resource bundles contain fasta files as well as fai and dict files and all other accessory input files for sequence analysis using GATK best practices.
For exome sequencing we did not have a particular reference for capture intervals as it depends on user's preference and there are many out there. We already provided means to convert those manufacturer target bed files to GCNV usable interval lists so the only thing we can provide in this matter is the article that explains those steps.
I hope this helps.
-
So from what I understand of it for exome data we need to use our own references? The references on the cloud are only for genome data? Gökalp Çelik
-
Yes you are right.
Please sign in to leave a comment.
7 comments