Mutect2 -- How is the --max-population-af filter used in tumor-only mode?
Hello,
It was my understanding that the `--max-population-af` filter removes germline variants present in the gnomAD germline resource with allele frequencies greater than 0.01 (the reported default value). However, in my resulting somatic_filtered.vcf.gz (from unpaired, tumor-only mode; see entire somatic workflow below), I am finding PASS variants that are present in the gnomAD resource with large allele frequencies -- chr5:157471682 is one example and has a reported allele frequency of 0.486. Could you please help me understand how this filter is working if it is not removing all variants that are present in the germline resource with AFs > 0.01?
Thanks in advance for your help,
Layne
REQUIRED for all errors and issues:
a) GATK version used: 4.2.2.0
b) Exact commands used:
1. Mutect2
gatk --java-options '{java_opts}' Mutect2 --native-pair-hmm-threads {threads} --reference {genome} --input {tumorbam} --panel-of-normals {pon} --germline-resource {afonly} --intervals targets.interval_list --f1r2-tar-gz f1r2.tar.gz --output somatic.vcf.gz --bam-output somatic.bam --tmp-dir /tmpdir
2. LearnReadOrientationModel
gatk --java-options '{java_opts}' LearnReadOrientationModel --input f1r2.tar.gz --output read-orientation-model.tar.gz --tmp-dir /tmpdir
3. GetPileupSummaries
gatk --java-options '{java_opts}' GetPileupSummaries --input {tumorbam} --variant {commonvcf} --intervals {commonvcf} --output pileup_summary_tumor.table --tmp-dir /tmpdir
4. CalculateContamination
gatk --java-options '{java_opts}' CalculateContamination --input pileup_summary_tumor.table --output contamination_unpaired.table --tmp-dir /tmpdir
5. FilterMutectCalls
gatk --java-options '{java_opts}' FilterMutectCalls --reference {genome} --contamination-table contamination_unpaired.table --orientation-bias-artifact-priors read-orientation-model.tar.gz --variant somatic.vcf.gz --output somatic_filtered.vcf.gz --tmp-dir /tmpdir
-
This parameter is a performance optimization in the active region code. It is not an argument of FilterMutectCalls. It is possible that a different site near this one will be considered active and trigger assembly and genotyping of the surrounding area, including the site in question. Then FilterMutectCalls uses the full germline probability model to decide whether to keep a variant. In this case, perhaps despite the large population AF the allele fraction in the sample was low enough that it did not appear to be germline.
Regardless, one should never take tumor-only calls too seriously. Even if one aggressively filtered out everything in gnomAD tumor-only calls are inevitably full of rare germline variants.
Please sign in to leave a comment.
1 comment