GetPileupSummaries common germline variant sites VCF hg38
AnsweredHello,
I am using GATK 4.2.4.1 to perform somatic variant calling. I am trying to perform the filtration of the variants which were called using GetPileupSummaries and CalculateContamination. GetPileupSummaries requires a:
"common germline variant sites VCF, e.g. derived from the gnomAD resource, with population allele frequencies (AF) in the INFO field. This resource must contain only biallelic SNPs and can be an eight-column sites-only VCF. The tool ignores the filter status of the variant calls in this germline resource."
I am looking for this reference VCF file for hg38, for the entire genome, rather than for individual chromosomes. Does this resource exist or is it currently being made?
-
Hi Anish K,
I am going to move your post into our Community Discussions -> Documentation Questions topic, as the Somatic topic is for reporting bugs and issues with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Genevieve Brandt
-
Hi Anish K,
Yes, we have this resource! I think the answer to your question is here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360067310872-How-to-find-or-generate-common-germline-variant-sites-VCF-required-by-GetPileupSummaries
Please let me know if that doesn't answer your question.
Best,
Genevieve
-
I have a doubt in somatic variant filtration step. I have completed till somatic variant calling
" java -jar "$gatk" Mutect2 -R "$Ref_dir" -I "$tumor_bam_path" -I "$normal_bam_path" --tumor-sample "$tumor_sample" --normal-sample "$normal_sample" --germline-resource germline.vcf -pon "$pon" -O "$output_dir/${tumor_sample}_Somatic.vcf.gz".
For filtration, 3 steps to perform:
1. getPileupSummaries
2. CalculateContamination
3. FilterMutectCalls
Here in the getpileupsummaries step, for -L and -V which file should be used. For -V do I need to use the germline.vcf file which I have used in the somatic variant step. and for -L, I have no idea which file to use or cant understrand how to create the .bed or the .interval_list file. There is no proper mention about the file -L and -V
--input,-I <GATKPath> BAM/SAM/CRAM file containing reads This argument must be specified at least once.
Required.--intervals,-L <String> One or more genomic intervals over which to operate This argument must be specified at
least once. Required.--output,-O <File> The output table Required.
--variant,-V <FeatureInput> A VCF file containing variants and allele frequencies Required.
Thanks in advance for your guidance.
-
If you have the common bilallelic sites vcf file you can use the same file for both -V and -L parameters for this tool.
https://gatk.broadinstitute.org/hc/en-us/articles/360042913771-GetPileupSummaries
Regards.
-
Thank you for the clarification. I used the hg38_af-only-gnomad.hg38.vcf file as a germline resource when calling somatic variants with Mutect2. My question is, can I use this file for both the -V and -L parameters in GetPileupSummaries?
-
Yes you can but also keep in mind that this file also contains rare sites as well. It is usually the better practice to keep only common sites where allele frequencies are greater than 0.01 or even higher.
Please sign in to leave a comment.
6 comments