Declared strand bias not observed in igv (only in local reassembly)
Hi all, I have this variant which is called on an amplicon design:
chr17 7669658 . G A . clustered_events;strand_bias AS_FilterStatus=strand_bias;AS_SB_TABLE=418;DP=2454;ECNT=6;GERMQ=93;MBQ=20,20;MFRL=144,144;MMQ=60,60;MPOS=63;POPAF=7.3;TLOD=68.15;BAM=Clipped;REL=High_Confidence GT:AD:AF:DP:F1R2:F2R1:FAD:SB:FMC 0/1:2384,62:0.016:2446:0,0:1912,27:1967,31:418,1966,31,31:2,18
as you can see, it is tagged as strand_bias by FilterMutectCalls. When I checked SB format value (according to vcf header: ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">), I quickly saw that strand bias (since 418 <<< 1966). However, when I checked that in igv (top track is bwa-mem2 aligned bam and bottom track is locally realigned by mutect2 bam):
I don't see that strand bias. Many of the forward reads are gone (I asume they mapped somewhere else?). I find this a bit weird, since this is an amplicon based design.
My first question is which value should I use to perform the Fisher's exact test to evaluate the possible strand bias F1R2 and F2R1 or SB. How are these fields different?
My other question is about how is it possible for the SB first value (ref forward) to be so different from what I actually observe in the igv. This happens in more variants in my vcf.
P.D.: this variant has been called by other variant such as lofreq, but it was NOT discarded as strand bias. So, I would like to know more about this local reassembly and where have the "missing" reads gone given that, as I mentioned, this is an amplicon design.
REQUIRED for all errors and issues:
a) GATK version used:
The Genome Analysis Toolkit (GATK) v4.4.0.0
HTSJDK Version: 3.0.5
Picard Version: 3.0.0
b1) Exact Mutect2 command used:
/nfs/home/software/packages/gatk-4.4.0.0/gatk --java-options -Xmx40g Mutect2 --dont-use-soft-clipped-bases true --genotype-germline-sites --min-base-quality-score 20 --max-reads-per-alignment-start 0 --native-pair-hmm-threads 16 -R /nfs/home/references/genomes/human/GRCh38/bwa_mem2/Homo_sapiens.GRCh38.fa -I /media/scratch0/20231003_desarrollo_rutina_new_callers//analysis_TP53_100x/mapping/TP53-RED53-79484187.bam -L /media/scratch0/20231003_desarrollo_rutina_new_callers//analysis_TP53_100x/QC/mapping/TP53-RED53-79484187_covered_regions_100x.bed -O /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1.vcf
b2) Exact FilterMutectCalls command used:
/nfs/home/software/packages/gatk-4.4.0.0/gatk --java-options -Xmx40g FilterMutectCalls --max-events-in-region 4 -V /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1.vcf -R /nfs/home/references/genomes/human/GRCh38/bwa_mem2/Homo_sapiens.GRCh38.fa -O /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1_filtered.vcf
c) Entire program log:
/nfs/home/software/packages/gatk-4.4.0.0/gatk --java-options -Xmx40g Mutect2 --dont-use-soft-clipped-bases true --genotype-germline-sites --min-base-quality-score 20 --max-reads-per-alignment-start 0 --native-pair-hmm-threads 16 -R /nfs/home/references/genomes/human/GRCh38/bwa_mem2/Homo_sapiens.GRCh38.fa -I /media/scratch0/20231003_desarrollo_rutina_new_callers//analysis_TP53_100x/mapping/TP53-RED53-79484187.bam -L /media/scratch0/20231003_desarrollo_rutina_new_callers//analysis_TP53_100x/QC/mapping/TP53-RED53-79484187_covered_regions_100x.bed -O /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1.vcf
Using GATK jar /nfs/home/software/packages/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx40g -jar /nfs/home/software/packages/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar Mutect2 --dont-use-soft-clipped-bases true --genotype-germline-sites --min-base-quality-score 20 --max-reads-per-alignment-start 0 --native-pair-hmm-threads 16 -R /nfs/home/references/genomes/human/GRCh38/bwa_mem2/Homo_sapiens.GRCh38.fa -I /media/scratch0/20231003_desarrollo_rutina_new_callers//analysis_TP53_100x/mapping/TP53-RED53-79484187.bam -L /media/scratch0/20231003_desarrollo_rutina_new_callers//analysis_TP53_100x/QC/mapping/TP53-RED53-79484187_covered_regions_100x.bed -O /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1.vcf
13:13:35.589 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/nfs/home/software/packages/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:13:35.624 INFO Mutect2 - ------------------------------------------------------------
13:13:35.627 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.4.0.0
13:13:35.627 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
13:13:35.627 INFO Mutect2 - Executing as fmartinez@vlinuxcervantes4srv on Linux v6.2.0-26-generic amd64
13:13:35.627 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v18.0.2-ea+9-Ubuntu-222.04
13:13:35.627 INFO Mutect2 - Start Date/Time: October 31, 2023 at 1:13:35 PM CET
13:13:35.627 INFO Mutect2 - ------------------------------------------------------------
13:13:35.627 INFO Mutect2 - ------------------------------------------------------------
13:13:35.628 INFO Mutect2 - HTSJDK Version: 3.0.5
13:13:35.628 INFO Mutect2 - Picard Version: 3.0.0
13:13:35.628 INFO Mutect2 - Built for Spark Version: 3.3.1
13:13:35.629 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:13:35.629 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:13:35.629 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:13:35.629 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:13:35.629 INFO Mutect2 - Deflater: IntelDeflater
13:13:35.629 INFO Mutect2 - Inflater: IntelInflater
13:13:35.629 INFO Mutect2 - GCS max retries/reopens: 20
13:13:35.630 INFO Mutect2 - Requester pays: disabled
13:13:35.630 INFO Mutect2 - Initializing engine
13:13:35.787 INFO FeatureManager - Using codec BEDCodec to read file file:///media/scratch0/20231003_desarrollo_rutina_new_callers/analysis_TP53_100x/QC/mapping/TP53-RED53-79484187_covered_regions_100x.bed
13:13:35.793 INFO IntervalArgumentCollection - Processing 1887 bp from intervals
13:13:35.799 INFO Mutect2 - Done initializing engine
13:13:35.817 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nfs/home/software/packages/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
13:13:35.820 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/nfs/home/software/packages/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
13:13:35.833 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
13:13:35.833 INFO IntelPairHmm - Available threads: 20
13:13:35.833 INFO IntelPairHmm - Requested threads: 16
13:13:35.833 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
13:13:35.882 INFO ProgressMeter - Starting traversal
13:13:35.883 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
13:20:27.465 INFO ProgressMeter - chr17:7673687 6.9 10 1.5
13:26:09.576 INFO ProgressMeter - chr17:7675962 12.6 20 1.6
13:26:30.911 INFO Mutect2 - 0 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappedReadFilter
22 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: ReadLengthReadFilter
2 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
24 total reads filtered out of 813991 reads processed
13:26:30.911 INFO ProgressMeter - chr17:7675962 12.9 26 2.0
13:26:30.911 INFO ProgressMeter - Traversal complete. Processed 26 total regions in 12.9 minutes.
13:26:30.923 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.866022091
13:26:30.923 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 90.21488249100001
13:26:30.923 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 16.62 sec
13:26:30.924 INFO Mutect2 - Shutting down engine
[October 31, 2023 at 1:26:30 PM CET] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 12.92 minutes.
Runtime.totalMemory()=13220446208
Tool returned:
SUCCESS
/nfs/home/software/packages/gatk-4.4.0.0/gatk --java-options -Xmx40g FilterMutectCalls --max-events-in-region 4 -V /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1.vcf -R /nfs/home/references/genomes/human/GRCh38/bwa_mem2/Homo_sapiens.GRCh38.fa -O /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1_filtered.vcf
Using GATK jar /nfs/home/software/packages/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx40g -jar /nfs/home/software/packages/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar FilterMutectCalls --max-events-in-region 4 -V /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1.vcf -R /nfs/home/references/genomes/human/GRCh38/bwa_mem2/Homo_sapiens.GRCh38.fa -O /media/scratch0/20231003_desarrollo_rutina_new_callers//tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1_filtered.vcf
13:26:33.475 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/nfs/home/software/packages/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:26:33.508 INFO FilterMutectCalls - ------------------------------------------------------------
13:26:33.511 INFO FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.4.0.0
13:26:33.511 INFO FilterMutectCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
13:26:33.511 INFO FilterMutectCalls - Executing as fmartinez@vlinuxcervantes4srv on Linux v6.2.0-26-generic amd64
13:26:33.511 INFO FilterMutectCalls - Java runtime: OpenJDK 64-Bit Server VM v18.0.2-ea+9-Ubuntu-222.04
13:26:33.511 INFO FilterMutectCalls - Start Date/Time: October 31, 2023 at 1:26:33 PM CET
13:26:33.512 INFO FilterMutectCalls - ------------------------------------------------------------
13:26:33.512 INFO FilterMutectCalls - ------------------------------------------------------------
13:26:33.512 INFO FilterMutectCalls - HTSJDK Version: 3.0.5
13:26:33.513 INFO FilterMutectCalls - Picard Version: 3.0.0
13:26:33.513 INFO FilterMutectCalls - Built for Spark Version: 3.3.1
13:26:33.513 INFO FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:26:33.513 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:26:33.514 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:26:33.514 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:26:33.514 INFO FilterMutectCalls - Deflater: IntelDeflater
13:26:33.514 INFO FilterMutectCalls - Inflater: IntelInflater
13:26:33.514 INFO FilterMutectCalls - GCS max retries/reopens: 20
13:26:33.514 INFO FilterMutectCalls - Requester pays: disabled
13:26:33.515 INFO FilterMutectCalls - Initializing engine
13:26:33.658 INFO FeatureManager - Using codec VCFCodec to read file file:///media/scratch0/20231003_desarrollo_rutina_new_callers/tmp_TP53_100x/variant_calling/TP53-RED53-79484187_mutect2_tmp1.vcf
13:26:33.681 INFO FilterMutectCalls - Done initializing engine
13:26:33.735 INFO ProgressMeter - Starting traversal
13:26:33.735 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
13:26:33.736 INFO FilterMutectCalls - Starting pass 0 through the variants
13:26:33.944 INFO FilterMutectCalls - Finished pass 0 through the variants
13:26:34.002 INFO FilterMutectCalls - Starting pass 1 through the variants
13:26:34.073 INFO FilterMutectCalls - Finished pass 1 through the variants
13:26:34.083 INFO FilterMutectCalls - Starting pass 2 through the variants
13:26:34.150 INFO FilterMutectCalls - Finished pass 2 through the variants
13:26:34.151 INFO FilterMutectCalls - Starting pass 3 through the variants
13:26:34.247 INFO FilterMutectCalls - Finished pass 3 through the variants
13:26:34.255 INFO FilterMutectCalls - No variants filtered by: AllowAllVariantsVariantFilter
13:26:34.256 INFO FilterMutectCalls - 0 read(s) filtered by: AllowAllReadsReadFilter 13:26:34.256 INFO ProgressMeter - unmapped 0.0 392 45144.0
13:26:34.256 INFO ProgressMeter - Traversal complete. Processed 392 total variants in 0.0 minutes.
13:26:34.262 INFO FilterMutectCalls - Shutting down engine
[October 31, 2023 at 1:26:34 PM CET] org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=469762048
-
Hi Fran
Mutect2 uses local reassembly to generate genotype calls therefore there are many parameters involving the decisions given within each local reassembly.
Looking at the bam views and your VCF line your variant is not only filtered due to Strand Bias but also due to clustered events nearby as they can be seen in the IGV image. Those clustered events are mostly due to low BQ basecalls therefore most of them get pruned within the local reassembly causing a strand bias for the reference nucleotide G. This in the end causes the whole variant to be filtered due to clustered events and strand bias.
Looking at the command line parameters you have you seem to have modified your
--max-events-in-region 4
which is directly involved in the creation of the clustered events filter. Those reads look like there may be more clustered events therefore increasing this parameter may change the way this variant gets filtered in the local reassembly.
Also since those clustered base changes will be of low BQ using the parameter
--min-base-quality-score 20
will get those bases pruned almost immediately. The default value of 10 is a much less strict than 20 and may help your reference base to be included with more balanced strand values.
Before the final one additional comment from our team is that although SB annotation in the FORMAT field is shared between HaplotypeCaller and Mutect2, unlike HaplotypeCaller Mutect2 does not use Fisher Exact Test for the strand bias filter but has its own model.
If you wish to get into more details about the local reassembly we do have documentation in the link down below.
https://github.com/broadinstitute/gatk/blob/master/docs/local_assembly.pdf
Also if you wish to understand more about the filtering strategy
https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf
I hope these help.
Please sign in to leave a comment.
1 comment