Which file is af-only-gnomad.hg38.vcf.gz?
AnsweredHi Team!!
I have been searching for a file named :
I went to gnomad website and downloaded the file All chromosomes VCF (.tbi) from the gnomADv3 tab. The file is overwhelmingly large and I wonder how much time will the PONs take if I used it. Am I referring to the correct file.
After reading various GATK blogs, I came across this bucket that contains the af-only-gnomad.hg38.vcf.gz. but the size is very small. I don't think the two files are the same? Are they?
Can you please point me to a GATK blog where the difference between two files is clearly underlined?
You mentioned that you don't provide the aforementioned file for hg19 here. Does it mean the file I downloaded from the bucket is reliable or was it reducted for workshop purposes?
rohit satyam The gnomAD VCF is enormous because it contains a lot of INFO field annotations, none of which Mutect2 needs except for AF (allele frequency in the population). The AF only gnomad that we provide in the best practices google bucket is the gnomAD VCF with all extraneous annotations removed. In principle you could use gnomAD with all the annotations, but it would waste a lot of CPU time parsing the VCF.
Wow. Thanks a lot. David Benjamin
David Benjamin, I just did a quick check for first 5 variants from af-only-gnomad.raw.sites.vcf for b37 for chromosome 1 and 17. All 5 I could see from chr17 on gnomAD site but none of first 5 from chr1 (positions 10067, 10108, 10109, 10114, 10119) are listed (see https://gnomad.broadinstitute.org/region/1-10067-10128?dataset=gnomad_r2_1). The first on gnomAD is 10128 for chr1. How was af-only-gnomad.raw.sites.vcf created for b37?
Hi Brian Wiley, from one of the other threads you commented on I found a WDL script that builds the resources. If you didn't already see it, you can check out this script for more detailed information about how the devs build the Mutect2 resources.
