Benchmarking Mutect2 on tumor only mode
AnsweredI am utilizing a benchmarking dataset to benchmark precision and recall for our pipelines implementation of mutect2 tumor only mode. We found that we can get good recalls, catching between 85- 95% of true positives, but we cannot get a precision above 8% without resulting in recall reducing down to 20-30%
With mutect2 defaults, and gold standard pipeline (learn orientation bias, contamination and filtermutectcalls), we get the following results (generated with som.py)
type total.truth total.query tp fp fn unk ambi recall recall_lower recall_upper recall2 precision precision_lower precision_upper na ambiguous fp.region.size fp.rate
0 indels 7990 13270 2443 10827 5547 0 0 0.305757 0.295727 0.315928 0.305757 0.184099 0.177575 0.190762 0.0 0.0 3033630367 3.568991
1 SNVs 7903 18684 1688 16996 6215 0 0 0.213590 0.204660 0.222729 0.213590 0.090345 0.086298 0.094519 0.0 0.0 3033630367 5.602528
5 records 15893 33949 4131 29818 11762 0 0 0.259926 0.253151 0.266788 0.259926 0.121683 0.118237 0.125192 0.0 0.0 3033630367 9.829147
Changing af-of-alleles not in and using fasle-discovery threshold results in a change like so: (Note this was only run on 1 chromosome but manually checked that the percentages are approximately the same for a single chromosome)
type,total.truth,total.query,tp,fp,fn,unk,ambi,recall,recall_lower,recall_upper,recall2,precision,precision_lower,precision_upper,na,ambiguous,fp.region.size,fp.rate,sompyversion,sompycmd
0,indels,667,2513,414,2099,253,0,0,0.620689655172,0.583408660615,0.656925787246,0.620689655172,0.16474333466,0.150629848617,0.179628529629,0.0,0.0,248956422,8.43119443611,som.py-v0.3.12-2-g9d128a9,/opt/hap.py-install/bin/som.py dream_files/truth_hg38_chr1.vcf SRR2020636_af3_NO_contamination_fbeta0.25.vcf -r /mnt/helomicsngs-s3/Reference/GRCh38/gatk_resource/Homo_sapiens_assembly38.fasta -o somatic_results/SRR2020636_af3_chr1
1,SNVs,676,6106,574,5532,102,0,0,0.849112426036,0.820662650549,0.874580664901,0.849112426036,0.0940058958402,0.0868782432617,0.101518314254,0.0,0.0,248956422,22.22075637,som.py-v0.3.12-2-g9d128a9,/opt/hap.py-install/bin/som.py dream_files/truth_hg38_chr1.vcf SRR2020636_af3_NO_contamination_fbeta0.25.vcf -r /mnt/helomicsngs-s3/Reference/GRCh38/gatk_resource/Homo_sapiens_assembly38.fasta -o somatic_results/SRR2020636_af3_chr1
5,records,1343,8976,988,7988,355,0,0,0.735666418466,0.711592644414,0.758725682797,0.735666418466,0.110071301248,0.103722428141,0.116671585458,0.0,0.0,248956422,32.0859367106,som.py-v0.3.12-2-g9d128a9,/opt/hap.py-install/bin/som.py dream_files/truth_hg38_chr1.vcf SRR2020636_af3_NO_contamination_fbeta0.25.vcf -r /mnt/helomicsngs-s3/Reference/GRCh38/gatk_resource/Homo_sapiens_assembly38.fasta -o somatic_results/SRR2020636_af3_chr1
How do I minimize the number of false positives that are occurring without sacrificing recall?
-
Thank you for your question! This question falls outside of the scope of GATK Support. (See our support policy for more details). However, we encourage you to keep posting questions because they help us improve our documentation and build resources. In addition, if you know the answer to other questions outside of our GATK support team scope, please help out other users! And other users feel free to chime and discuss here.
Mutect2 is a popular topic on the forum so we wrote an FAQ article answering many of the questions posted in the last year. Please see that article, as well as our other comprehensive documentation, in case your answer is there.
One specific forum thread discusses this topic, please see that post: https://gatk.broadinstitute.org/hc/en-us/community/posts/360057810051-Mutect2-somatic-variant-calling-with-without-matched-normal-sample
-
Hi Alexandra,
I don't think I can help you with your question, but could I ask you about your benchmarking experiment? We are trying something similar in my lab, and I am wondering:
1) Was this whole-genome, whole-exome, or some other kind of experiment?
2) What depth of sequencing did you have?
Thanks for any help,
Mike
Please sign in to leave a comment.
2 comments