GenomAD files for MuTect2
AnsweredHello all,
I am new to GATK and I am trying to perform somatic variant calling following this tutorial: https://gatk.broadinstitute.org/hc/en-us/articles/360035531132
I am confused which af-only-genomAD files should be used for the tutorial. Apparently, the genomAD file must only contain PASS variants: https://gatkforums.broadinstitute.org/gatk/discussion/comment/58618/
It sounds like this applies to the genomAD file for MuTect2 (-germline-resource) and to the biallelic version for gatk GetPileupSummaries. However, unlike mentioned in the thread, the af-only-gnomad.hg38.vcf.gz and small_exac_common_3.hg38.vcf.gz from https://console.cloud.google.com/storage/browser/_details/gatk-best-practices/somatic-hg38/ contain also variants which did not pass all filters.
--> somatic-hg38_small_exac_common_3.hg38.vcf.gz (md5: 4ac7593efd401234654fdf87ab1b5ef1)
# variants Filter
301 AC_Adj0_Filter
2588 InbreedingCoeff_Filter
13 VQSRTrancheINDEL99.50to99.90
4 VQSRTrancheINDEL99.90to99.95
14 VQSRTrancheINDEL99.95to100.00
673 VQSRTrancheSNP99.60to99.80
610 VQSRTrancheSNP99.80to99.90
324 VQSRTrancheSNP99.90to99.95
222 VQSRTrancheSNP99.95to100.00
--> somatic-hg38_af-only-gnomad.hg38.vcf.gz (md5: 30d500cc0f4adc640ddbb25eb341c89d)
# variants Filter
227532 InbreedingCoeff
Can these files be used for the MuTect2 workflow or do I have to restrict them to PASS variants?
Best,
Felix
-
Hi Felix! I believe you can use the files as is if you are following the best-practices tutorial as steps further in the pipeline will handle reducing to PASS variants, but I will double-check this. Here are some notes on this file (posted previously by David Benjamin ):
The gnomAD VCF [if you take it from the gnomad site] is enormous because it contains a lot of INFO field annotations, none of which Mutect2 needs except for AF (allele frequency in the population). The AF only gnomad that we provide in the best practices google bucket is the gnomAD VCF with all extraneous annotations removed. In principle you could use gnomAD with all the annotations, but it would waste a lot of CPU time parsing the VCF.
Also, you may appreciate this workspace that shows how to run Mutect2 in a cloud platform called Terra: https://app.terra.bio/#workspaces/help-gatk/Somatic-SNVs-Indels-GATK4 It has instructions on how to run the workflows and has the workflows configured to all the resources needed for it (though if you are using hg38 you'd need to add that reference table). If you are interested in using the platform and have any questions, let me know. We support Terra as well.
-
Hi Tiffany,
Thank you for your reply! I have experimented with the original gnomAD VCF and also noticed the file size problem. It is a great idea to improve the speed by using the AF-only files.
My only problem is whether the AF-only files must only contain PASS reads. I have skimmed the source code of the mutect2.wdl (https://github.com/broadinstitute/gatk/blob/master/scripts/mutect2_wdl/mutect2.wdl), but I haven't found any step that would filter the AF-only VCFs for PASS reads.
I may use Terra in the future, but for now I am trying to build my own Mutect2 pipeline on our local servers. I am just taking the mutect2.wdl as an example.
Have a nice day,
Felix
-
Hi Felix! Sorry, I don't have clearer news yet. I've messaged a few folks and will hopefully have a good answer soon.
-
Thank you for the update, Tiffany.
-
Alright Felix, I found out the gnomad file should only have passing variants in it. Please restrict them for your use. I am going to get that prioritized on our side. Thank you for asking this question!
-
Thank you very much for your help, Tiffany! I will prepare my files accordingly.
-
Hi All, I have 14 whole exome samples (each from a different patient). At the moment I don't have any normal sample files. So based on the discussions from gatk best practices and other forums, I settled on an approach. I would be grateful if you could advice if my approach below will be suitable
1. I call the individual samples separately with Mutect2 using 1000g_pon.hg38.vcf.gz and af-only-gnomad.hg38.vcf.gz files (available at https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38;tab=objects?prefix)
2. Perform downstream analysis ( eg. annotation, etc)
-
Hi Vincent Appiah, the GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
Vincent Appiah That is correct.
Please sign in to leave a comment.
9 comments