Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GetPileupSummaries common germline variant sites VCF hg38

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Anish K,

    I am going to move your post into our Community Discussions -> Documentation Questions topic, as the Somatic topic is for reporting bugs and issues with GATK.

    You can read more about our forum guidelines and the topics here: Forum Guidelines.

    Best,

    Genevieve Brandt

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Anish K,

    Yes, we have this resource! I think the answer to your question is here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360067310872-How-to-find-or-generate-common-germline-variant-sites-VCF-required-by-GetPileupSummaries

    Please let me know if that doesn't answer your question.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Anitha R

    I have a doubt in somatic variant filtration step. I have completed till somatic variant calling

    " java -jar "$gatk" Mutect2 -R "$Ref_dir" -I "$tumor_bam_path" -I "$normal_bam_path" --tumor-sample "$tumor_sample" --normal-sample "$normal_sample" --germline-resource germline.vcf -pon "$pon" -O "$output_dir/${tumor_sample}_Somatic.vcf.gz".

    For filtration, 3 steps to perform:

    1. getPileupSummaries

    2. CalculateContamination

    3. FilterMutectCalls

     

    Here in the getpileupsummaries step, for -L and -V which file should be used. For -V do I need to use the germline.vcf file which I have used in the somatic variant step. and for -L, I have no idea which file to use or cant understrand how to create the .bed or the .interval_list file. There is no proper mention about the file -L and -V

    --input,-I <GATKPath>         BAM/SAM/CRAM file containing reads  This argument must be specified at least once.
                                  Required. 

    --intervals,-L <String>       One or more genomic intervals over which to operate  This argument must be specified at
                                  least once. Required. 

    --output,-O <File>            The output table  Required. 

    --variant,-V <FeatureInput>   A VCF file containing variants and allele frequencies  Required. 

    Thanks in advance for your guidance.

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    If you have the common bilallelic sites vcf file you can use the same file for both -V and -L parameters for this tool. 

    https://gatk.broadinstitute.org/hc/en-us/articles/360042913771-GetPileupSummaries

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Anitha R

    Thank you for the clarification. I used the hg38_af-only-gnomad.hg38.vcf file as a germline resource when calling somatic variants with Mutect2. My question is, can I use this file for both the -V and -L parameters in GetPileupSummaries?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Yes you can but also keep in mind that this file also contains rare sites as well. It is usually the better practice to keep only common sites where allele frequencies are greater than 0.01 or even higher. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk