Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Huge number of variants in tumor only mode as compared to T/N paired mode using Mutect2



  • Avatar
    David Benjamin

    Shivangi Agarwal We need to change a few things.

    • You should always run Mutect2 with a germline resource.  For human sample, we provide one in our best practices google bucket: gs://gatk-best-practices/somatic-b37/af-only-gnomad.raw.sites.vcf, which is just gnomAD with all the annotations except AF (population allele frequency) stripped.
    • Except in very rare circumstances it is best to use one of our public panels of normals, such as gs://gatk-best-practices/somatic-b37/Mutect2-WGS-panel-b37.vcf and gs://gatk-best-practices/somatic-b37/Mutect2-exome-panel.vcf.

    It is always far preferable to run in tumor-normal mode when a matched normal is available.  Your command should therefore be something like:

    gatk Mutect2 -R hg19.fasta -I tumor.bam -I normal.bam -normal normal_sample -germline-resource af-only-gnomad.raw.sites.vcf -pon Mutect2-WGS-panel-b37.vcf -O unfiltered.vcf

    gatk FilterMutectCalls -R hg19.fasta -V unfiltered.vcf -O filtered.vcf


    Finally, even if you run everything perfectly, tumor-only mode will still yield many more variants.  Almost all of this excess is attributable to rare germline variants not present in gnomAD, several tens of thousands of which occur in the average genome.

    Comment actions Permalink
  • Avatar
    Shivangi Agarwal

    Hi David, 

    Thanks so much for your response. I did call variants both ways (1st: no germline and no public PON and IInd: Germline: somatic-b37_af-only-gnomad.raw.sites.vcf & PON somatic-b37_Mutect2-exome-panel.vcf). But, I did not find a difference in number of variants called.

    Please have a look over the commands here:

    Ist Case:

    gatk Mutect2 -R hg19.fasta -I sample1.bam -O sample1-SS-unfiltered.vcf

    IInd Case:

    gatk Mutect2 -R hg19.fasta -L hg19bed.bed -I sample1.bam --germline-resource somatic-b37_af-only-gnomad.raw.sites.vcf -pon somatic-b37_Mutect2-exome-panel.vcf --f1r2-tar-gz f1r2.tar.gz -O sample1-unfiltered.vcf

    The number of variants called in the first case are :414,377 and the number of variants called for the second case are: 410,968.

    After I filter them using filtermutect2, the number of variants which passed the filters are: 81097 for Ist case and 98300 for IInd case.

    gatk FilterMutectCalls -R hg19.fasta -V sample1-SS-unfiltered.vcf -O  sample1-SS-filtered-testing.vcf

    gatk FilterMutectCalls -R hg19.fasta -V sample1-unfiltered.vcf -O  sample1-filtered-testing.vcf

    I wonder to see huge variants and no difference in variant call numbers using germline resource and public PON. Hope to hear back from you soon.



    Comment actions Permalink
  • Avatar
    David Benjamin

    Hi Shivangi,

    That's about the difference I would expect.  Using a matched normal will have more of an impact.



    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk