GATK 4.3.0.0 Mutect2 fail to call variant
AnsweredREQUIRED for all errors and issues:
a) GATK version used: 4.3.0.0
b) Exact command used: --linked-de-bruijn-graph true --max-reads-per-alignment-start 0
c) Entire program log: run sucess. See detials in the end.
Hi, GATK team! I am analyzing amplicon sequencing data of human mtDNA. A known variant 15607G can not be called in Mutect2 raw vcf result.
I have checked the sam file, the variant indeed exits. Most reads of this variant have flag=0, mapping quality=60, more than 50000x depths, and 100% ALT. I also have variant with higher depths and they can be called correctly. Here is the IGV view of this variant:
As the best practice advised, it's better to set --max-reads-per-alignment-start 0 (to cancel downsampling) for amplicon sequencing data. But I can't call this variant under this setting.
But if I use downsampling and set --downsampling-stride 50, I can call this variant.
And if I use samtools mpileup instead of Mutect2, I can also call this variant.
Could you please explain why this will happen?
Besides, some variants seem very strange to me. Two variants have the same PGT and PID. These variants are like this (the GT and AF are marked in red):
How to understand such variant? Should I still filter them based on AF or DP or StrandBias? Or should I filter out them?
Here are the detail log:
16:00:39.818 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
16:00:39.820 INFO IntelPairHmm - Available threads: 8
16:00:39.820 INFO IntelPairHmm - Requested threads: 4
16:00:39.820 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
16:00:39.841 INFO ProgressMeter - Starting traversal
16:00:39.841 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
16:01:00.964 INFO ProgressMeter - MT:1490 0.4 10 28.4
16:01:20.300 INFO ProgressMeter - MT:3302 0.7 20 29.7
16:01:55.966 INFO ProgressMeter - MT:5244 1.3 30 23.6
16:02:15.798 INFO ProgressMeter - MT:6992 1.6 40 25.0
16:02:33.243 INFO ProgressMeter - MT:8825 1.9 50 26.5
16:02:44.049 INFO ProgressMeter - MT:10722 2.1 60 29.0
16:03:06.120 INFO ProgressMeter - MT:12548 2.4 70 28.7
16:03:26.363 INFO ProgressMeter - MT:14420 2.8 80 28.8
16:03:40.293 INFO ProgressMeter - MT:16060 3.0 90 29.9
16:03:40.908 INFO Mutect2 - 18 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: ReadLengthReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
18 total reads filtered
16:03:40.908 INFO ProgressMeter - MT:16060 3.0 92 30.5
16:03:40.908 INFO ProgressMeter - Traversal complete. Processed 92 total regions in 3.0 minutes.
16:03:40.915 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.1464474
16:03:40.915 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 86.5387548
16:03:40.915 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 11.88 sec
16:03:40.917 INFO Mutect2 - Shutting down engine
[February 16, 2023 4:03:40 PM CST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 3.02 minutes.
Runtime.totalMemory()=1944584192
Tool returned:
SUCCESS
-
yangjw GATK tools are often shaky with amplicon data, which we don't officially support. Using --linked-de-bruijn-graph and -downsampling-stride is the right thing to do. My guess is that with 50000x coverage something goes wrong in assembly, but with Mutect3 coming out soon I would not have time to diagnose it, unfortunately.
You should also turn on mitochondria mode.
For filtering, we recommend FilterMutectCalls. Mutect2 is not designed to be run without it. The best thing to do is run the entire best practices somatic small variants pipeline.
It is odd that the PGTs are the same for the first two variants you show because they seem to be out of phase. The PGT comes from code shared with HaplotypeCaller and it can be faulty. I would just ignore it.
-
Hi, David, thank you for your reply and advice!
Well, I haven't figure out what exactly cause 16507G missing. Because the other variants are called correctly and none of them is missing. I guess maybe 16507G has something special which disturb the Mutect2 calling. I will adopt your advice and go on to find a better solution.
As for the PGT issue, at present I still filter such variants with AF, and the results are consist with the known reference sample. Thanks to you, I decide not to worry much about such PGTs.
Thank you again. Looking forward to Mutect3 coming!
Please sign in to leave a comment.
2 comments