Mutect2 Tumor-only calling of variants with high AF?
When running Mutect2 in tumor-only mode, it seems that most any variants with a high allele fraction (>0.4 or so) tend to be classified as 'germline' variants by the filtering step. In matched tumor/normal, they are classified as somatic. These variants are not in the gerrmline resource and not in the panel of normals.
I was curious, are there conditions that high AF variants would be called somatic instead of germline when Mutect2 is run in tumor-only mode? Or, is it just not capable of this in the tumor-only scenario?
-
Usually not. The problem is that somatic variants are sufficiently rare that even alleles absent from gnomAD (so that you can safely say that the prevalence is less than 1 in 100,000) are more likely to be rare germline variants than somatic if they have a germline het-ish AF. You can turn this off by setting -default-af 0, but there will be even more false positives.
-
I also got somatic variants with high AF from MuTect2-GATK 4.1.7.0:
gatk Mutect2 -R $GENOME/genome.fa -L $SCRATCH/S07604624_Regions_sorted.bed -I $WORK/${sample}_dedup.bam -tumor ${sample} --germline-resource $SCRATCH/af-only-gnomad.raw.sites_hg19.vcf -pon $SCRATCH/Mutect2-exome-panel_hg19.vcf -O $WORK/${sample}_tumor_only_unfiltered.vcfgatk GetPileupSummaries -I $WORK/${sample}_dedup.bam -V $SCRATCH/small_exac_common_3_hg19.vcf -L $SCRATCH/small_exac_common_3_hg19.vcf -O $WORK/${sample}_pileups.table
gatk CalculateContamination -I$WORK/${sample}_pileups.table -O $WORK/${sample}.contamination.table
gatk FilterMutectCalls -V $WORK/${sample}_tumor_only_unfiltered.vcf -R $GENOME/genome.fa --contamination-table $WORK/${sample}.contamination.table -O $WORK/${sample}_tumor_only.filtered.vcf
Some of these variants overlap with calls from HaplotypeCaller:gatk --java-options -Xmx40g HaplotypeCaller -R $GENOME/genome.fa -I $WORK/${sample}_dedup.bam -O $WORK/${sample}_germline_unfiltered.vcf -L $SCRATCH/S07604624_Regions_sorted.bed --dbsnp $SCRATCH/Homo_sapiens_assembly19.dbsnp138_hg19.vcf
gatk SelectVariants -R $GENOME/genome.fa -V $WORK/${sample}_germline_unfiltered.vcf --select-type-to-include SNP --restrict-alleles-to BIALLELIC --exclude-non-variants TRUE -O $WORK/${sample}_germline_unfiltered.snps.vcf
gatk SelectVariants -R $GENOME/genome.fa -V $WORK/${sample}_germline_unfiltered.vcf --select-type-to-include INDEL --exclude-non-variants TRUE -O $WORK/${sample}_germline_unfiltered.indels.vcf
gatk VariantFiltration -V $WORK/${sample}_germline_unfiltered.snps.vcf -filter 'QD < 2.0' --filter-name 'QD2' -filter 'QUAL < 30.0' --filter-name 'QUAL30' -filter 'SOR > 3.0' --filter-name 'SOR3' -filter 'FS > 60.0' --filter-name 'FS60' -filter 'MQ < 40.0' --filter-name 'MQ40' -filter 'MQRankSum < -12.5' --filter-name 'MQRankSum-12.5' -filter 'ReadPosRankSum < -8.0' --filter-name 'ReadPosRankSum-8' -O $WORK/${sample}T_germline_filtered.snps.vcf
How to deal these variants?
-
There's nothing you can do. A variant that doesn't show up in your germline resource, with an allele fraction near 1/2, could be somatic or germline. If the sample is impure such that somatic hets with no CNVs have allele fraction less than 1/2 FilterMutectCalls can do better by modeling the spectrum of allele fractions. Likewise, FilterMutectCalls can do a decent job with tumor-only calling of cfDNA samples. In general though, the uncertainty is unavoidable.
Please sign in to leave a comment.
3 comments