Germline false positives in Mutect2
If not an error, choose a category for your question(REQUIRED):
c) Why do I see a large number of false positive (especially germlines) after calling somatic variants by Mutect2?
d) Where do I find the best way to decrease false positive rates without compromising sensitivity?
I used GATK/Mutect 2 (GenomeAnalysisTk/4.1.3.0) to call somatic variants using tumor-normal pairs. The default setting was used throughout, following the best practice protocol. However, I found lots of false positives (many appear germline variants, seen in the blood as well). By quick inspection of 113 variants from the original BAMs on IGV, I found only 35% of true positive rate. For example, the following variants, which are obviously germline on IGV (source is BAM), are called as somatic.
My question are :
1) What are the known true positive rate for GATK2 (4.1.3.0)
2) Any suggestion for option arguments or filtering to remove these obvious false positives without compromising sensitivity?
3) Or am I simply doing this wrong? Any comments/suggestions are appreciated.
Thank you
-
Thank you for your question! Mutect2 is a popular topic on the forum so we wrote a FAQ article answering many of the questions posted in the last year. Please see that article, as well as our other comprehensive documentation, in case your answer is there.
Mutect2 has undergone some updates since the version you are using, so I would recommend updating to version 4.1.9.0 and seeing if these issues persist.
-
@TMB What are your command lines for Mutect2 and FilterMutectCalls?
-
Hi David. My command lines for Mutec2 and FilterMutectCalls are as follows:
Mutect2
gatk --java-options "-Xmx6G" Mutect2 \
--reference $REF \
--germline-resource $RESOURCE \
--input ${SAMPLE}_Bld.bam \
--input ${SAMPLE}_T_1.bam \
--tumor-sample ${SAMPLE}_1_T \
--f1r2-tar-gz ${SAMPLE}_1_T.F1R2.tar.gz \
--output ${SAMPLE}_1_T.somatic.vcf.gzFilterMutectCalls
--tumor-segmentation ${SAMPLE}_1_T.segments.table \
--contamination-table ${SAMPLE}_1_T.contamination.table \
--ob-priors ${SAMPLE}_1_T.read-orientation-model.tar.gz \
--output ${SAMPLE}_1_T.filtered.vcf \
--reference $REF -
TMB You are inadvertently running in multi-sample mode: Mutect2 thinks both samples are tumors. You need to specify -normal-sample, and -tumor-sample is a deprecated optional argument (it does nothing because Mutect2 treats anything not specified as -normal-sample as a tumor sample). By the way, -normal-sample can be specified multiple times, so you may run on multiple tumors and multiple normals at the same time.
-
Hi David,
Thank you so much for your inputs.
I tried the Mutect2 version 4.1.9.0 which seems to avoid this problem. Do you agree?
Do I still specify anything for version 4.1.9.0 to get this right?
Takae
-
Just -normal-sample, and keep your other arguments the same. And you can continue to specify -tumor-sample if it's a hassle to change all your scripts, but it has no effect.
-
Just to clarify. Do I have to say "-tumor-sample" instead of "-normal"?
Or either way works fine?
For 4.1.9.0, I used the following script:
gatk --java-options "-Xmx6G" Mutect2 -R $REF \
-I ${SAMPLE}_Bld.bam \
-I ${SAMPLE}_T.bam \
-normal ${BN}_Bld \
--germline-resource $RESOURCE \
--f1r2-tar-gz $OUT_DIR/${BN}.4.1.0.0.f1r2.tar.gz \
-O $OUT_DIR/${BN}.somatic.GATK.4.1.9.0.vcf.gz
-
You have to specify -normal to tell Mutect2 which sample is the normal. You may exclude -tumor-sample.
-
Got it. That's what I did with 4.0.9.0 and worked well.
Thank you so much for answering questions!
Your help is much appreciated.
Please sign in to leave a comment.
9 comments