Mutect2 output False Negative SNV result
Hi,
I found the Mutect2 could not get some expected positive SNVs, which had relative high AF and AD value. And I can get it by force calling with the parameter '--alleles ', one of them is listed below:
3 178916930 . G T . . AS_SB_TABLE=717,910|21,21;DP=1721;ECNT=2;MBQ=20,20;MFRL=146,179;MMQ=60,60;MPOS=37;POPAF=7.30;TLOD=50
.93 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:1627,42:0.022:1669:350,6:577,16:967,22:717,910,21,21
Detail information in IGV is:
And the running info was:
a) GATK version used: v4.4.0.0
b) Exact command used:
java17 -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /bioinfo/software/packages/gatk4/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar Mutect2 -R /bioinfo/data/Genomes/NCBI/build37/Sequence/WholeGenomeFasta/human_g1k_v37.fasta -I intermediate_bam/CSHO230817-3-GWST007-2_raw.consensus.bam -O wtest_recalling_4.4/CSHO230817-3-GWST007-2_raw.mutect2.unfiltered.vc
f -L target.bed --f1r2-tar-gz wtest_recalling_4.4/CSHO230817-3-GWST007-2_raw.mutect2.f1r2.tar.gz -germline-resource /bioinfo/data/GnomAD/af-only-gnomad.new.sites.hg19.vcf.gz -pon ctdnawise-data/188pon.vcf.gz --max-reads-per-alignment-st
art 0 -RF GoodCigarReadFilter -RF NotSecondaryAlignmentReadFilter -RF NotSupplementaryAlignmentReadFilter --min-base-quality-score 20 --minimum-mapping-quality 30 --dont-use-soft-clipped-bases --add-output-vcf-command-line false --nativ
e-pair-hmm-threads 8 -bamout wtest_recalling_4.4/CSHO230817-3-GWST007-2_raw.mutect2.bamout.bam -ip 80 -init-lod -4
c) Entire program log:
18:49:05.593 WARN GATKReadFilterPluginDescriptor - Redundant enabled filter (NotSecondaryAlignmentReadFilter) is enabled for this tool by default
18:49:05.598 WARN GATKReadFilterPluginDescriptor - Redundant enabled filter (GoodCigarReadFilter) is enabled for this tool by default
18:49:05.662 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/bioinfo/software/packages/gatk4/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
18:49:05.702 INFO Mutect2 - ------------------------------------------------------------
18:49:05.706 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.4.0.0
18:49:05.706 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
18:49:05.706 INFO Mutect2 - Executing as bwang@mamba on Linux v3.10.0-1127.el7.x86_64 amd64
18:49:05.706 INFO Mutect2 - Java runtime: Java HotSpot(TM) 64-Bit Server VM v17.0.7+8-LTS-224
18:49:05.707 INFO Mutect2 - Start Date/Time: August 24, 2023 at 6:49:05 PM CST
18:49:05.707 INFO Mutect2 - ------------------------------------------------------------
18:49:05.707 INFO Mutect2 - ------------------------------------------------------------
18:49:05.708 INFO Mutect2 - HTSJDK Version: 3.0.5
18:49:05.708 INFO Mutect2 - Picard Version: 3.0.0
18:49:05.708 INFO Mutect2 - Built for Spark Version: 3.3.1
18:49:05.708 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:49:05.709 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:49:05.709 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:49:05.709 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:49:05.709 INFO Mutect2 - Deflater: IntelDeflater
18:49:05.710 INFO Mutect2 - Inflater: IntelInflater
18:49:05.710 INFO Mutect2 - GCS max retries/reopens: 20
18:49:05.710 INFO Mutect2 - Requester pays: disabled
18:49:05.711 INFO Mutect2 - Initializing engine
18:49:05.939 INFO FeatureManager - Using codec VCFCodec to read file file:///data/NGS/Ctdna-Data-Analysis.eta/test/106.DataAnalysis/20230819_E200004953_SuZhouJianLu_C_1_GenePlus/analysis_HO1/ctdnawise-data/188pon.vcf.gz
18:49:05.974 WARN IntelInflater - Zero Bytes Written : 0
18:49:05.992 INFO FeatureManager - Using codec VCFCodec to read file file:///bioinfo/data/GnomAD/af-only-gnomad.new.sites.hg19.vcf.gz
18:49:06.084 INFO FeatureManager - Using codec BEDCodec to read file file:///data/NGS/Ctdna-Data-Analysis.eta/test/106.DataAnalysis/20230819_E200004953_SuZhouJianLu_C_1_GenePlus/analysis_HO1/target.bed
18:49:06.099 INFO IntervalArgumentCollection - Processing 2651 bp from intervals
18:49:06.116 INFO Mutect2 - Done initializing engine
18:49:06.160 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/bioinfo/software/packages/gatk4/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
18:49:06.161 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/bioinfo/software/packages/gatk4/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
18:49:06.192 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
18:49:06.193 INFO IntelPairHmm - Available threads: 48
18:49:06.194 INFO IntelPairHmm - Requested threads: 8
18:49:06.194 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
18:49:06.387 INFO ProgressMeter - Starting traversal
18:49:06.388 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
18:49:20.293 INFO ProgressMeter - 7:55259553 0.2 20 86.3
18:49:23.039 INFO Mutect2 - 504 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: ReadLengthReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
0 read(s) filtered by: NotSupplementaryAlignmentReadFilter
504 total reads filtered out of 67879 reads processed
18:49:23.040 INFO ProgressMeter - 7:55259553 0.3 29 104.5
18:49:23.040 INFO ProgressMeter - Traversal complete. Processed 29 total regions in 0.3 minutes.
18:49:23.289 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.023994052000000002
18:49:23.289 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.15249880200000002
18:49:23.290 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.12 sec
18:49:23.474 INFO Mutect2 - Shutting down engine
[August 24, 2023 at 6:49:23 PM CST] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.30 minutes.
Runtime.totalMemory()=1224736768
Thanks for any help!
Danyue
-
Official comment
Hi Danyue Wang
The parameter that you are referring to does not directly mean the base quality score at the variant site in bam but it refers to the base quality value within the kmer where assembly occurs. It may be possible that your variant site is close to a site where base quality is lower than your threshold although your variant base is of higher quality, assembly engine may remove your variant base due to kmer containing other lower quality bases. Therefore our team recommends using default parameters first before changing them for your own needs.
Comment actions -
Hi Danyue Wang
Mutect2 and HaplotypeCaller are known to be affected (under some conditions) by the interval sizes and padding when it comes to calling variants. It may be useful if you can increase interval padding even further like 200 bp from the edges of the bed file. You may also remove the interval file to unlimit Mutect2 for haplotype generation.
Other than this option you may try the
--linked-de-bruijin-graph true
option to test if Mutect2 can recover better haplotypes in that region.
If you need any further assistance you may refer to the document in the following link
I hope these help you solve your problem.
-
Hi, Gökalp Çelik
I tried the methods metioned by the documentation and remove the target region bed, but I couldn't find the right solution, and some of the information by '-debug' mode is as following:
12:06:56.273 INFO EventMap - > Cigar = 500M
12:06:56.273 INFO EventMap - >> Events = EventMap{}
12:06:56.352 INFO Mutect2Engine - Assembling 3:178916864-178916892 with 2821 reads: (with overlap region = 3:178916764-178916992)
12:06:56.568 INFO ReadThreadingAssembler - Using kmer size of 10 in read threading assembler
12:06:56.714 INFO ReadThreadingAssembler - Adding haplotype 229M from graph with kmer 10
12:06:56.736 INFO ReadThreadingAssembler - Adding haplotype 229M from graph with kmer 10
12:06:56.738 INFO ReadThreadingAssembler - Found 2 candidate haplotypes of 2 possible combinations to evaluate every read against.
12:06:56.738 INFO ReadThreadingAssembler - AAAGAAGCAAGAAAATACCCCCTCCATCAACTTCTTCAAGATGAATCTTCTTACATTTTCGTAAGTGTTACTCAAGAAGCAGAAAGGGAAGAATTTTTTGATGAAACAAGACGACTTTGTGACCTTCGGCTTTTTCAACCCTTTTTAAAAGTAATTGAACCAGTAGTCAACCGTGAAGAAAAGATCCTCAAT
CGAGAAATTGGTATGATACAATATCCTATTCTAAAAT
12:06:56.739 INFO ReadThreadingAssembler - > Cigar = 229M : 229 score -1.4035242344113994 ref false
12:06:56.739 INFO ReadThreadingAssembler - AAAGAAGCAAGAAAATACCCCCTCCATCAACTTCTTCAAGATGAATCTTCTTACATTTTCGTAAGTGTTACTCAAGAAGCAGAAAGGGAAGAATTTTTTGATGAAACAAGACGACTTTGTGACCTTCGGCTTTTTCAACCCTTTTTAAAAGTAATTGAACCAGTAGGCAACCGTGAAGAAAAGATCCTCAATCGAGAAATTGGTATGATACAATATCCTATTCTAAAAT
12:06:56.739 INFO ReadThreadingAssembler - > Cigar = 229M : 229 score NaN ref true
12:06:56.739 INFO EventMap - === Best Haplotypes ===
12:06:56.739 INFO EventMap - AAAGAAGCAAGAAAATACCCCCTCCATCAACTTCTTCAAGATGAATCTTCTTACATTTTCGTAAGTGTTACTCAAGAAGCAGAAAGGGAAGAATTTTTTGATGAAACAAGACGACTTTGTGACCTTCGGCTTTTTCAACCCTTTTTAAAAGTAATTGAACCAGTAGGCAACCGTGAAGAAAAGATCCTCAATCGAGAAATTGGTAT
GATACAATATCCTATTCTAAAAT
12:06:56.739 INFO EventMap - > Cigar = 229M
12:06:56.740 INFO EventMap - >> Events = EventMap{}
12:06:56.740 INFO EventMap - AAAGAAGCAAGAAAATACCCCCTCCATCAACTTCTTCAAGATGAATCTTCTTACATTTTCGTAAGTGTTACTCAAGAAGCAGAAAGGGAAGAATTTTTTGATGAAACAAGACGACTTTGTGACCTTCGGCTTTTTCAACCCTTTTTAAAAGTAATTGAACCAGTAGTCAACCGTGAAGAAAAGATCCTCAATCGAGAAATTGGTAT
GATACAATATCCTATTCTAAAAT
12:06:56.740 INFO EventMap - > Cigar = 229M
12:06:56.740 INFO EventMap - >> Events = EventMap{3:178916930-178916930 [G*, T],}
12:06:56.778 INFO Mutect2Engine - Assembling 3:178916893-178917088 with 2824 reads: (with overlap region = 3:178916793-178917188)can you give some advices, whether I missed some key point? And the link below is the sub-extracted bam file:
https://drive.google.com/file/d/1528JaKuDK9eAvAuqJHReAywol89QwSly/view?usp=drive_link
Thanks!
-
Hi Danyue Wang
Thank you for sending us a slice of your bam file. Interestingly when I removed all the parameters that you added to Mutect2 and tried calling the region by getting wider range for interval and also without any intervals I was able to call the variant that you are looking for with AF around 4%. It is possible that some of the parameters were causing this issue of FN call. Can you try your sample with the following command without any additional parameters?
gatk Mutect2 -I bamfile.bam -R genome.fa -O vcfout.vcf
Here is my output from both trials for the position that you are interested.
3 178916930 . G T . . AS_SB_TABLE=700,886|29,36;DP=1714;ECNT=2;MBQ=20,20;MFRL=143,139;MMQ=60,60;MPOS=29;POPAF=7.30;TLOD=93.68 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:1586,65:0.039:1651:383,15:538,24:960,39:700,886,29,36
3 178916930 . G T . . AS_SB_TABLE=700,886|29,36;DP=1714;ECNT=2;MBQ=20,20;MFRL=143,139;MMQ=60,60;MPOS=29;POPAF=7.30;TLOD=93.68 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:1586,65:0.039:1651:383,15:538,24:960,39:700,886,29,36
I hope this helps.
-
Dear Gökalp Çelik:
Thanks a lot for following up! Sometimes, more is less. I had tested the command and validated the results. And the reason is the parameter
--min-base-quality-score 20
I found that the default cutoff is 10 and if I set it to 14 or larger, the variant is not called. My confusion is that the base quality of the location is rather higher than this, why it's missed? or does the argument something else than I would expect?
Best,
Danyue
-
Hi Danyue Wang
I will check with the team and let you know soon.
Please sign in to leave a comment.
6 comments