Spurios indel
REQUIRED for all errors and issues:
a) GATK version used: 4.3.0
b) Exact command used:
java -Xmx6g -jar gatk-package-4.3.0.0-local.jar HaplotypeCaller -R hg19.fa -L dummy.bed -mbq 15 -G StandardAnnotation --standard-min-confidence-threshold-for-calling 25 --disable-read-filter NotDuplicateReadFilter --kmer-size 30 --max-reads-per-alignment-start 0 -I dummy.bam -O dummy.vcf --dont-use-soft-clipped-bases
This command leads to a spurios confident indel call at position chr2:21225268. There is no evidence of this indel at all when looking at the reads (only 4 reads out of thousands have it). Removing "mbq 15" parameters removes the call (but this filter is important for other calls). Removing "kmer-size 30" also removes the indel call, but also removes the real SNP call at chr2:21225281.
I uploaded the data to: ulitsky_Jan25_2025.zip
c) Entire program log:
17:23:27.138 WARN GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardAnnotation) is enabled for this tool by default
17:23:27.174 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/labs/ulitsky/igoru/gatk/gatk4/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
17:23:27.258 INFO HaplotypeCaller - ------------------------------------------------------------
17:23:27.259 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.3.0.0
17:23:27.259 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
17:23:27.259 INFO HaplotypeCaller - Executing as igoru@login3.wexac.weizmann.ac.il on Linux v5.14.0-162.6.1.el9_1.x86_64 amd64
17:23:27.259 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.20+8
17:23:27.259 INFO HaplotypeCaller - Start Date/Time: January 25, 2025 at 5:23:27 PM IST
17:23:27.259 INFO HaplotypeCaller - ------------------------------------------------------------
17:23:27.259 INFO HaplotypeCaller - ------------------------------------------------------------
17:23:27.260 INFO HaplotypeCaller - HTSJDK Version: 3.0.1
17:23:27.260 INFO HaplotypeCaller - Picard Version: 2.27.5
17:23:27.260 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
17:23:27.260 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:23:27.260 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:23:27.260 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:23:27.260 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:23:27.260 INFO HaplotypeCaller - Deflater: IntelDeflater
17:23:27.260 INFO HaplotypeCaller - Inflater: IntelInflater
17:23:27.260 INFO HaplotypeCaller - GCS max retries/reopens: 20
17:23:27.260 INFO HaplotypeCaller - Requester pays: disabled
17:23:27.260 INFO HaplotypeCaller - Initializing engine
17:23:27.368 INFO FeatureManager - Using codec BEDCodec to read file file:///home/labs/ulitsky/igoru/gatk/wildmix/dummy.bed
17:23:27.372 INFO IntervalArgumentCollection - Processing 200 bp from intervals
17:23:27.375 INFO HaplotypeCaller - Done initializing engine
17:23:27.383 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
17:23:27.392 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/labs/ulitsky/igoru/gatk/gatk4/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:23:27.394 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/labs/ulitsky/igoru/gatk/gatk4/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
17:23:27.405 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions
17:23:27.405 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
17:23:27.405 INFO IntelPairHmm - Available threads: 256
17:23:27.406 INFO IntelPairHmm - Requested threads: 4
17:23:27.406 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
17:23:27.423 INFO ProgressMeter - Starting traversal
17:23:27.423 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
17:23:30.057 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr2:21225268 and possibly subsequent; at least 10 samples must have called genotypes
17:23:30.100 INFO HaplotypeCaller - 1 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
1 total reads filtered
17:23:30.100 INFO ProgressMeter - unmapped 0.0 3 67.2
17:23:30.100 INFO ProgressMeter - Traversal complete. Processed 3 total regions in 0.0 minutes.
17:23:30.111 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.015311268000000001
17:23:30.111 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.8879550620000001
17:23:30.111 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.54 sec
17:23:30.112 INFO HaplotypeCaller - Shutting down engine
[January 25, 2025 at 5:23:30 PM IST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.05 minutes.
-
Hi ulitskyi
Looks like you are using amplicon data with very high depths. When I looked at the toy data you provided that seemingly HOMVAR SNP call is also called as HET by HaplotypeCaller which sure tells me that this kind of data is usually not very compatible with HaplotypeCaller's default assembly engine.
Here are a couple of parameters that I recommend you to test. You need to add these parameters to HaplotypeCaller. My initial test showed me that SNP is now called as HOMVAR and the spurious INDEL is not called at all.
--pileup-detection true
--use-pdhmm true
--pileup-detection-enable-indel-pileup-calling trueThese parameters enable the partial hmms generated by the pileup detection mode to be added to the assembly engine to correct failed assemblies. Last parameter does not showup in the help text but believe us it is there. All of these parameters are also enabled by default when DRAGEN 3.7.8 compatibility mode is enabled. One thing to keep in mind is that when PDHMM is enabled GVCF output cannot be used (At least currently).
Normally when HaplotypeCaller is used without kmer and mbq parameters that indel site is not called at all and SNP site is called as HOMVAR correctly albeit AD values are not quite right. You might want to check your added parameters again if my additional parameters don't quite do justice for your calls.
I hope this helps.
Regards.
Please sign in to leave a comment.
1 comment