How to recover a mutation from secondary alignment
REQUIRED for all errors and issues:
a) GATK version used: gatk-4.2.6.1
b) Exact command used:
gatk --java-options "-Xms80G -Xmx80G -XX:-UseGCOverheadLimit -XX:+UseParallelGC -XX:ParallelGCThreads=16" Mutect2 \
-native-pair-hmm-threads 16 \
-R $GENOME/GRCh38_p13.fa \
-I $OUTPUT_GATK/${lib}"_recalibrated.bam" \
-pon $SITES/Mutect2_1000g_pon.hg38.vcf.gz \
--germline-resource $SITES/af-only-gnomad.hg38.vcf.gz \
--read-filter NotSecondaryAlignmentReadFilter \
--read-filter NotSupplementaryAlignmentReadFilter \
-L $ONCO_BED \
-O $OUTPUT_GATK/${lib}".vcf.gz" \
-bamout $OUTPUT_GATK/${lib}"_mutect2.bam"
c) Entire program log:
16:11:25.140 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/GATK/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:11:25.461 INFO Mutect2 - ------------------------------------------------------------
16:11:25.462 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.2.6.1
16:11:25.462 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
16:11:25.463 INFO Mutect2 - Executing as sassou@node124 on Linux v3.10.0-1160.el7.x86_64 amd64
16:11:25.463 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_262-b10
16:11:25.464 INFO Mutect2 - Start Date/Time: January 26, 2023 4:11:25 PM CET
16:11:25.464 INFO Mutect2 - ------------------------------------------------------------
16:11:25.464 INFO Mutect2 - ------------------------------------------------------------
16:11:25.464 INFO Mutect2 - HTSJDK Version: 2.24.1
16:11:25.464 INFO Mutect2 - Picard Version: 2.27.1
16:11:25.465 INFO Mutect2 - Built for Spark Version: 2.4.5
16:11:25.465 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:11:25.465 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:11:25.465 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:11:25.465 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:11:25.465 INFO Mutect2 - Deflater: IntelDeflater
16:11:25.465 INFO Mutect2 - Inflater: IntelInflater
16:11:25.465 INFO Mutect2 - GCS max retries/reopens: 20
16:11:25.466 INFO Mutect2 - Requester pays: disabled
16:11:25.466 INFO Mutect2 - Initializing engine
16:11:26.353 INFO FeatureManager - Using codec VCFCodec to read file file:////GRCh38_p13/Mutect2_1000g_pon.hg38.vcf.gz
16:11:26.717 INFO FeatureManager - Using codec VCFCodec to read file file:////GRCh38_p13/af-only-gnomad.hg38.vcf.gz
16:11:27.125 INFO FeatureManager - Using codec BEDCodec to read file file:////REDA/REDA.bed
16:11:27.176 INFO IntervalArgumentCollection - Processing 102791 bp from intervals
16:11:27.290 INFO Mutect2 - Done initializing engine
16:11:27.368 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/GATK/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
16:11:27.370 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/GATK/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
16:11:27.448 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
16:11:27.461 INFO IntelPairHmm - Available threads: 1
16:11:27.461 INFO IntelPairHmm - Requested threads: 16
16:11:27.461 WARN IntelPairHmm - Using 1 available threads, but 16 were requested
16:11:27.461 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
16:11:27.589 INFO ProgressMeter - Starting traversal
16:11:27.589 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
16:11:54.454 INFO ProgressMeter - chr2:25235659 0.4 20 44.7
16:12:11.521 INFO ProgressMeter - chr2:25245332 0.7 30 41.0
16:12:42.043 INFO ProgressMeter - chr2:197402921 1.2 50 40.3
16:13:12.839 INFO ProgressMeter - chr3:128486291 1.8 60 34.2
16:13:26.717 INFO ProgressMeter - chr4:105275608 2.0 90 45.3
16:13:42.150 INFO ProgressMeter - chr5:177512483 2.2 100 44.6
16:14:10.866 INFO ProgressMeter - chr5:177516075 2.7 110 40.4
16:14:33.803 INFO ProgressMeter - chr7:102111676 3.1 120 38.7
16:14:44.959 INFO ProgressMeter - chr7:102196897 3.3 130 39.5
16:15:02.405 INFO ProgressMeter - chr7:102227466 3.6 140 39.1
16:15:12.890 INFO ProgressMeter - chr9:130862715 3.8 180 47.9
16:15:25.842 INFO ProgressMeter - chr11:32392636 4.0 200 50.4
16:15:38.416 INFO ProgressMeter - chr17:1660393 4.2 250 59.8
16:15:58.731 INFO ProgressMeter - chr17:7674145 4.5 290 64.2
16:16:22.600 INFO ProgressMeter - chr17:7676492 4.9 300 61.0
16:16:35.000 INFO ProgressMeter - chr17:31182479 5.1 310 60.5
16:16:51.587 INFO ProgressMeter - chr17:31232874 5.4 340 63.0
16:17:15.035 INFO ProgressMeter - chr17:31261654 5.8 350 60.4
16:17:32.244 INFO ProgressMeter - chr17:31350135 6.1 370 60.9
16:17:59.031 INFO ProgressMeter - chr17:60600376 6.5 380 58.2
16:18:13.944 INFO ProgressMeter - chr19:33301798 6.8 400 59.1
16:18:33.509 INFO ProgressMeter - chr20:32434714 7.1 420 59.2
16:18:59.833 INFO ProgressMeter - chr21:34892882 7.5 440 58.4
16:19:13.799 INFO ProgressMeter - chrX:40063578 7.8 470 60.5
16:19:34.457 INFO ProgressMeter - chrX:124022565 8.1 490 60.4
16:19:44.934 INFO ProgressMeter - chrX:130013501 8.3 530 63.9
16:20:10.057 INFO ProgressMeter - chrX:130015826 8.7 540 62.0
16:20:17.924 INFO Mutect2 - 2609 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: ReadLengthReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
2609 total reads filtered
16:20:17.924 INFO ProgressMeter - chrX:134413756 8.8 563 63.7
16:20:17.925 INFO ProgressMeter - Traversal complete. Processed 563 total regions in 8.8 minutes.
16:20:17.976 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.227637305
16:20:17.976 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 346.50865353200004
16:20:17.976 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 13.21 sec
16:20:19.154 INFO Mutect2 - Shutting down engine
[January 26, 2023 4:20:19 PM CET] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 8.90 minutes.
Runtime.totalMemory()=65856339968
Tool returned:
SUCCESS
Using GATK jar /GATK/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Mutect2 failed when the mutation is located on a secondary alignment, see below:
M02792:126:000000000-K9RHL:1:1114:12346:24983 163 chr21 43104253 0 147M = 43104346 240 TTAACTGTCTTTGAAAAGAACATGAAGTTTTTATAATTTACATGAAAAAAAGGCAAACAAACCTGGCTAAACGTCGGTTTATTGTGCAACCGAGAGCACCTGTCTCCATGACGACATGCTCCAATTTTGAAATAAAATGAACAGTTG BEDHBFFDGFEEDEGGHHFIDECKFJGEFGFEBEBJEFEBDEEKGJJJJHGHDIEJJEEJJDEIKIFGBJJDBEHBHDFFBEEKFGFFJDEBFHFHGEFHHKEHHHHEELFDBFFEDLFHIHEJEFFFKFJKFBJJJDKGKFEGEFD XA:Z:chr21,+6495931,147M,0; ZA:Z:CGCT ZB:Z:CCGT BC:Z:5 MD:Z:147 RG:Z:SAD190004_S5 NM:i:0 AS:i:147 XS:i:147 QX:Z:CCB ABC RX:Z:CGC-CCG xc:i:1 zd:Z: xm:i:1 M02792:126:000000000-
K9RHL:1:1114:12346:24983 83 chr21 43104346 0 147M = 43104253 -240 GAGCACCTGTCTCCATGACGACATGCTCCAATTTTGAAATAAAATGAACAGTTGACTCTGTAAGGGAAAATGAGAGCTGATTATTTTGCTGGGAAGATATCAAACACATGGAATATGTCAGCAGCATGACATACACTATCAAATTAC HHFLEHGEFFFGIKFEGDEHEKEEFGGIKGEIJJEHEFEBFFGEEHFFKHEJEHEGGGEDBGHHHHFGGEFHHGIFHFHEJCEJJJEFHFIHHFIHEBEHKGFFLEKEEHHGEBEEEHLHGLHFLEEHEDDCDCDHCDEHDECHCCD XA:Z:chr21,-6496024,147M,0; ZA:Z:CGCT ZB:Z:CCGT BC:Z:5 MD:Z:147 RG:Z:SAD190004_S5 NM:i:0 AS:i:147 XS:i:147 QX:Z:CCB ABC RX:Z:CGC-CCG xc:i:1 zd:Z: xm:i:1
-
Laurent MANCHON The command line given above filters secondary and supplementary alignments (it turns on the corresponding read filters, which is also the default behavior). The program log most likely reports no reads filtered in this way because they fail the mapping quality filter first.
However, even if you disable these filters you will end up with a lot of mapping errors and double-counted variants. Simply running Mutect2 would not produce sensible output.
-
By the way, to turn off a read filter, use the --disable-read-filter argument.
-
Same region in my bam files do not contain any MAPQ0 reads. Could you be using a reference genome without unmasked alternate and unmapped contigs ? Can you try the very same analysis using GIAB hg38v2 reference genome?
-
that's what I did in my analysis, I used the genome GRCh38_p13 (08.2022) from gencode consortium. I have removed all the alts contigs and use only the chromosome sequences.
-
Interesting. Did you use bowtie2 with end-to-end mapping settings to align your reads ?
-
no.
this is the command i use:bwa-mem2 mem -M -w 150 -d 80 -c 2000 -t 24 -R "@RG\tID:${lib}\tSM:${lib}\tLB:${lib}\tPL:ILLUMINA" $GENOME/GRCh38_p13.fa $f1 $f2 | samtools fixmate -O bam -@ 24 - - | sambamba-0.8 sort -t 24 -m 128G -o $OUTPUT_BAM/$lib".bam" /dev/stdin &>
$BWA_LOG/$lib.log$f1 and $f2 are the input fastq file from my paired-end library.
-
Even more interesting because those reads seem to have aligned also to another location on chromosome 21 with MAPQ0. I use bwa mem 0.7.17 with default settings using GIAB hg38v2 genome and the very same locus does not have any reads with XA tags. I was wondering if you have a chance to test it using default settings and GIAB hg38v2 genome (Gencode genomes are not masked like GIAB reference genomes. Gencode genomes are more suitable for RNASeq applications which require unmasked regions sometimes). I also see that you are using bwa-mem2 which is a vector accelerated version. github page says outputs are the same as regular bwa however I am concerned that there may be some discrepancies.
So here are a couple of things to test
1- Using GIAB hg38v2
2- Using bwa mem with regular settings
3- All of the above.
If any of the parameters could save those reads to to go MAPQ0 then that is the most likely culprit here. I do not recommend removing read filters or accepting MAPQ0 reads into variant calling which will open up a whole bunch of can of worms later.
-
for now this problem is over for me because i have added those parameters to Mutect2:
--read-filter NotSecondaryAlignmentReadFilter \
--read-filter NotSupplementaryAlignmentReadFilter \
--interval-padding 10 \also i use VarDict in collaboration with Mutect2 and i combine the results of these two callers.
sometimes VarDict rescue some SNV not identified by Mutect2.best
Please sign in to leave a comment.
8 comments