Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 Tumor-only calling of variants with high AF?



  • Avatar
    David Benjamin

    Usually not.  The problem is that somatic variants are sufficiently rare that even alleles absent from gnomAD (so that you can safely say that the prevalence is less than 1 in 100,000) are more likely to be rare germline variants than somatic if they have a germline het-ish AF.  You can turn this off by setting -default-af 0, but there will be even more false positives.

    Comment actions Permalink
  • Avatar

    I also got somatic variants with high AF from MuTect2-GATK

    gatk Mutect2 -R $GENOME/genome.fa -L $SCRATCH/S07604624_Regions_sorted.bed -I $WORK/${sample}_dedup.bam -tumor ${sample} --germline-resource $SCRATCH/af-only-gnomad.raw.sites_hg19.vcf -pon $SCRATCH/Mutect2-exome-panel_hg19.vcf -O $WORK/${sample}_tumor_only_unfiltered.vcf

    gatk GetPileupSummaries -I $WORK/${sample}_dedup.bam -V $SCRATCH/small_exac_common_3_hg19.vcf -L $SCRATCH/small_exac_common_3_hg19.vcf -O $WORK/${sample}_pileups.table

    gatk CalculateContamination -I$WORK/${sample}_pileups.table -O $WORK/${sample}.contamination.table

    gatk FilterMutectCalls -V $WORK/${sample}_tumor_only_unfiltered.vcf -R $GENOME/genome.fa --contamination-table $WORK/${sample}.contamination.table -O $WORK/${sample}_tumor_only.filtered.vcf

    Some of these variants overlap with calls from HaplotypeCaller:

    gatk --java-options -Xmx40g HaplotypeCaller -R $GENOME/genome.fa -I $WORK/${sample}_dedup.bam -O $WORK/${sample}_germline_unfiltered.vcf -L $SCRATCH/S07604624_Regions_sorted.bed --dbsnp $SCRATCH/Homo_sapiens_assembly19.dbsnp138_hg19.vcf

    gatk SelectVariants -R $GENOME/genome.fa -V $WORK/${sample}_germline_unfiltered.vcf --select-type-to-include SNP --restrict-alleles-to BIALLELIC --exclude-non-variants TRUE -O $WORK/${sample}_germline_unfiltered.snps.vcf

    gatk SelectVariants -R $GENOME/genome.fa -V $WORK/${sample}_germline_unfiltered.vcf --select-type-to-include INDEL --exclude-non-variants TRUE -O $WORK/${sample}_germline_unfiltered.indels.vcf

    gatk VariantFiltration -V $WORK/${sample}_germline_unfiltered.snps.vcf -filter 'QD < 2.0' --filter-name 'QD2' -filter 'QUAL < 30.0' --filter-name 'QUAL30' -filter 'SOR > 3.0' --filter-name 'SOR3' -filter 'FS > 60.0' --filter-name 'FS60' -filter 'MQ < 40.0' --filter-name 'MQ40' -filter 'MQRankSum < -12.5' --filter-name 'MQRankSum-12.5' -filter 'ReadPosRankSum < -8.0' --filter-name 'ReadPosRankSum-8' -O $WORK/${sample}T_germline_filtered.snps.vcf


    How to deal these variants?

    Comment actions Permalink
  • Avatar
    David Benjamin

    There's nothing you can do.  A variant that doesn't show up in your germline resource, with an allele fraction near 1/2, could be somatic or germline.  If the sample is impure such that somatic hets with no CNVs have allele fraction less than 1/2 FilterMutectCalls can do better by modeling the spectrum of allele fractions.  Likewise, FilterMutectCalls can do a decent job with tumor-only calling of cfDNA samples.  In general though, the uncertainty is unavoidable.

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk