Resource bundle error download
Hello!
I am currently trying to download Homo_sapiens_assembly38.dbsnp138.vcf from the resource bundle from https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/
I tried to download it on Google Chrome and Microsoft Edge but they always give me the same error "Needs Authorization". I had no problem downloading other files from the resource files. However, I can't download this file Homo_sapiens_assembly38.dbsnp138.vcf
Why can't I download this file? Is there another way I can obtain this file?
-
Hi Linda Do, I am not able to reproduce this issue on my end so I cannot determine where the problem is coming from. The bucket you linked to is public and all files should be available.
-
Hi Linda Do,
I installed Google storage util following this https://cloud.google.com/storage/docs/gsutil_install and then you can get download to current directory with:
gsutil cp gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.dbsnp138.vcf .
and for hg38
gsutil cp gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf .
Genevieve Brandt (she/her), any way we can have this Homo_sapiens_assembly19.dbsnp138.vcf file compressed with bgzip so its not 10 GB for download? It takes long time to compress it.
-
Hi Brian Wiley,
I can put in a request for this change and bring it up with my team. Our GATK Support Team is not the group that maintains these data resources, so I cannot guarantee any timeline for this. Here is our support policy for more details: https://gatk.broadinstitute.org/hc/en-us/articles/360038469272-What-types-of-questions-will-the-GATK-frontline-team-answer-
Genevieve
-
Hi Brian Wiley, wanted to give you a quick update. After bringing this up with the team, I found that they agreed and would prefer a zipped version of the file. Thank you for bringing this issue to our attention! We are looking into changing it but still cannot guarantee a timeline.
Genevieve
-
Thank you Genevieve Brandt and Brian Wiley.
You both were very helpful. I was able to successfully download the files.
-
Great! Thanks for the update Linda Do.
-
Brian Wiley we were able to update the Homo_sapiens_assembly38.
dbsnp138.vcf with Homo_sapiens_assembly38.dbsnp138.vcf.gz, it's in the bucket now. Thanks for bringing this to our attention, hopefully it helps the download speed!
-
Hi, I was wondering if instead of the "Homo_sapiens_assembly38.dbsnp138.vcf.gz" file we can use the "00-All.vcf.gz" file in this location(https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/) from the dbSNP database?
For context, I am working with whole exome sequencing data and am curious if I can use the updated version of the common SNP's from the database if at all I can use the above one since I am working with Whole Exome Sequencing data and not Whole Genome Sequencing data for the Baserecalibration step of the variant calling pipeline?
Any and all help will be appreciated. Thank you
-
Hi Aravind Sundar,
It might work, but I can't say with full certainty. The baserecalibration could benefit from the more up-to-date SNPs, but there is the possibility that something has changed between those that could maybe cause issues. The most likely issue could be contig names, so I would recommend checking the contig names between those files before running to see if there are differences.
Hope this helps!
Please sign in to leave a comment.
9 comments