HaplotypeCaller not calling variants in the beginning of an alignment (except when --alleles is specified)
Dear all,
There are two variants in my viral dataset that are not called even though they are present in all reads (at high coverage): A SNP at position 5 (G > A) and a 1 bp indel at position 11. The same command worked correctly identified all datasets in other, very similar datasets.
I have used the following command (GATK 4.1.9.0):
gatk HaplotypeCaller --reference ref.fasta --input mapped_reads.sorted-rg_upd.bam --output variants.vcf --sample-ploidy 1 --max-reads-per-alignment-start 50 --verbosity DEBUG --bam-output tmp.bam;
Which gives the following output:
Using GATK jar /usr/local/bin/lmod/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/bin/lmod/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller --reference ref.fasta --input mapped_reads.sorted-rg_upd.bam --output gatk/variants.vcf --sample-ploidy 1 --max-reads-per-alignment-start 50 --verbosity DEBUG --bam-output gatk/output.bam
15:56:45.681 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/bin/lmod/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:56:45.697 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/libgkl_compression609740992347016330.so
Feb 03, 2023 3:56:45 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:56:45.814 INFO HaplotypeCaller - ------------------------------------------------------------
15:56:45.815 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
15:56:45.815 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
15:56:45.815 INFO HaplotypeCaller - Executing as bebogaerts@bioit-groot on Linux v5.4.0-135-generic amd64
15:56:45.815 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_172-b11
15:56:45.815 INFO HaplotypeCaller - Start Date/Time: February 3, 2023 3:56:45 PM UTC
15:56:45.815 INFO HaplotypeCaller - ------------------------------------------------------------
15:56:45.815 INFO HaplotypeCaller - ------------------------------------------------------------
15:56:45.816 INFO HaplotypeCaller - HTSJDK Version: 2.23.0
15:56:45.816 INFO HaplotypeCaller - Picard Version: 2.23.3
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.BUFFER_SIZE : 131072
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.CREATE_INDEX : false
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.CREATE_MD5 : false
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.CUSTOM_READER_FACTORY :
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.REFERENCE_FASTA : null
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:56:45.817 INFO HaplotypeCaller - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
15:56:45.817 DEBUG ConfigFactory - Configuration file values:
15:56:45.820 DEBUG ConfigFactory - gcsMaxRetries = 20
15:56:45.820 DEBUG ConfigFactory - gcsProjectForRequesterPays =
15:56:45.820 DEBUG ConfigFactory - gatk_stacktrace_on_user_exception = false
15:56:45.820 DEBUG ConfigFactory - samjdk.use_async_io_read_samtools = false
15:56:45.820 DEBUG ConfigFactory - samjdk.use_async_io_write_samtools = true
15:56:45.820 DEBUG ConfigFactory - samjdk.use_async_io_write_tribble = false
15:56:45.820 DEBUG ConfigFactory - samjdk.compression_level = 2
15:56:45.820 DEBUG ConfigFactory - spark.kryoserializer.buffer.max = 512m
15:56:45.821 DEBUG ConfigFactory - spark.driver.maxResultSize = 0
15:56:45.821 DEBUG ConfigFactory - spark.driver.userClassPathFirst = true
15:56:45.821 DEBUG ConfigFactory - spark.io.compression.codec = lzf
15:56:45.821 DEBUG ConfigFactory - spark.executor.memoryOverhead = 600
15:56:45.821 DEBUG ConfigFactory - spark.driver.extraJavaOptions =
15:56:45.821 DEBUG ConfigFactory - spark.executor.extraJavaOptions =
15:56:45.821 DEBUG ConfigFactory - codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
15:56:45.821 DEBUG ConfigFactory - read_filter_packages = [org.broadinstitute.hellbender.engine.filters]
15:56:45.821 DEBUG ConfigFactory - annotation_packages = [org.broadinstitute.hellbender.tools.walkers.annotator]
15:56:45.821 DEBUG ConfigFactory - cloudPrefetchBuffer = 40
15:56:45.821 DEBUG ConfigFactory - cloudIndexPrefetchBuffer = -1
15:56:45.821 DEBUG ConfigFactory - createOutputBamIndex = true
15:56:45.821 INFO HaplotypeCaller - Deflater: IntelDeflater
15:56:45.822 INFO HaplotypeCaller - Inflater: IntelInflater
15:56:45.822 INFO HaplotypeCaller - GCS max retries/reopens: 20
15:56:45.822 INFO HaplotypeCaller - Requester pays: disabled
15:56:45.822 INFO HaplotypeCaller - Initializing engine
15:56:46.114 DEBUG GenomeLocParser - Prepared reference sequence contig dictionary
15:56:46.115 DEBUG GenomeLocParser - NA (1406 bp)
15:56:46.117 DEBUG GenomeLocParser - Prepared reference sequence contig dictionary
15:56:46.117 DEBUG GenomeLocParser - NA (1406 bp)
15:56:46.119 DEBUG GenomeLocParser - Prepared reference sequence contig dictionary
15:56:46.119 DEBUG GenomeLocParser - NA (1406 bp)
15:56:46.119 INFO HaplotypeCaller - Done initializing engine
15:56:46.128 INFO HaplotypeCallerEngine - Currently, physical phasing is only available for diploid samples.
DEBUG 2023-02-03 15:56:46 BlockCompressedOutputStream Using deflater: IntelDeflater
15:56:46.166 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/usr/local/bin/lmod/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
15:56:46.168 DEBUG NativeLibraryLoader - Extracting libgkl_utils.so to /tmp/libgkl_utils3808127199146473835.so
15:56:46.169 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/usr/local/bin/lmod/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
15:56:46.185 DEBUG NativeLibraryLoader - Extracting libgkl_pairhmm_omp.so to /tmp/libgkl_pairhmm_omp8422231514735189967.so
15:56:46.197 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
15:56:46.197 INFO IntelPairHmm - Available threads: 1
15:56:46.197 INFO IntelPairHmm - Requested threads: 4
15:56:46.197 WARN IntelPairHmm - Using 1 available threads, but 4 were requested
15:56:46.198 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
15:56:46.222 INFO ProgressMeter - Starting traversal
15:56:46.222 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
15:56:46.227 DEBUG ReadsPathDataSource - Preparing readers for traversal
15:56:46.228 DEBUG SamReaderQueryingIterator - Preparing intervals for traversal
15:56:46.295 DEBUG IntToDoubleFunctionCache - cache miss 2 > -1 expanding to 12
15:56:46.319 DEBUG IntToDoubleFunctionCache - cache miss 2 > -1 expanding to 12
15:56:46.463 DEBUG HaplotypeCaller - Processing assembly region at NA:1-57 isActive: true numReads: 386
15:56:46.553 DEBUG ReadThreadingGraph - Recovered 2 of 2 dangling tails
15:56:46.556 DEBUG ReadThreadingGraph - Recovered 0 of 1 dangling heads
15:56:46.597 DEBUG ReadThreadingGraph - Recovered 2 of 2 dangling tails
15:56:46.598 DEBUG ReadThreadingGraph - Recovered 0 of 1 dangling heads
15:56:46.613 DEBUG IntToDoubleFunctionCache - cache miss 123 > 12 expanding to 133
15:56:46.613 DEBUG IntToDoubleFunctionCache - cache miss 169 > 133 expanding to 268
15:56:46.614 DEBUG IntToDoubleFunctionCache - cache miss 325 > 268 expanding to 538
15:56:46.709 DEBUG HaplotypeCallerEngine - Active Region NA:1-57
15:56:46.709 DEBUG HaplotypeCallerEngine - Extended Act Region NA:1-157
15:56:46.709 DEBUG HaplotypeCallerEngine - Ref haplotype coords NA:1-157
15:56:46.709 DEBUG HaplotypeCallerEngine - Haplotype count 128
15:56:46.709 DEBUG HaplotypeCallerEngine - Kmer sizes count 0
15:56:46.709 DEBUG HaplotypeCallerEngine - Kmer sizes values []
15:56:46.812 DEBUG GenotypeLikelihoodCalculators - Expanding capacity ploidy:2->2 allele:1->2
15:56:46.816 DEBUG GenotypeLikelihoodCalculators - Expanding capacity ploidy:2->2 allele:1->2
15:56:46.878 DEBUG HaplotypeCaller - Processing assembly region at NA:58-357 isActive: false numReads: 975
15:56:46.905 DEBUG HaplotypeCaller - Processing assembly region at NA:358-657 isActive: false numReads: 1535
15:56:46.926 DEBUG HaplotypeCaller - Processing assembly region at NA:658-957 isActive: false numReads: 1503
15:56:46.929 DEBUG HaplotypeCaller - Processing assembly region at NA:958-1257 isActive: false numReads: 1195
15:56:46.930 DEBUG HaplotypeCaller - Processing assembly region at NA:1258-1347 isActive: false numReads: 589
15:56:46.930 DEBUG HaplotypeCaller - Processing assembly region at NA:1348-1406 isActive: true numReads: 464
15:56:46.965 DEBUG ReadThreadingGraph - Recovered 0 of 7 dangling tails
15:56:46.965 DEBUG ReadThreadingGraph - Recovered 0 of 1 dangling heads
15:56:46.992 DEBUG ReadThreadingGraph - Recovered 0 of 7 dangling tails
15:56:46.992 DEBUG ReadThreadingGraph - Recovered 0 of 2 dangling heads
15:56:47.008 DEBUG HaplotypeCallerEngine - Active Region NA:1348-1406
15:56:47.008 DEBUG HaplotypeCallerEngine - Extended Act Region NA:1248-1406
15:56:47.008 DEBUG HaplotypeCallerEngine - Ref haplotype coords NA:1248-1406
15:56:47.008 DEBUG HaplotypeCallerEngine - Haplotype count 48
15:56:47.008 DEBUG HaplotypeCallerEngine - Kmer sizes count 0
15:56:47.008 DEBUG HaplotypeCallerEngine - Kmer sizes values []
15:56:47.030 INFO HaplotypeCaller - 29 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
29 total reads filtered
15:56:47.032 INFO ProgressMeter - unmapped 0.0 7 519.8
15:56:47.032 INFO ProgressMeter - Traversal complete. Processed 7 total regions in 0.0 minutes.
15:56:47.038 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.002506143
15:56:47.038 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.009118448000000001
15:56:47.038 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.09 sec
15:56:47.071 INFO HaplotypeCaller - Shutting down engine
[February 3, 2023 3:56:47 PM UTC] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2204631040
I have tried to change many parameters but the variants never show up (I have followed these steps:https://gatk.broadinstitute.org/hc/en-us/articles/360035891111-Expected-variant-at-a-specific-site-was-not-called). I have tried the '--alleles' option which gives the following line for the SNP at position 6:
NA 6 . G A 2230.04 . AC=1;AF=1.00;AN=1;DP=63;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=43.91;QD=26.81;SOR=3.407 GT:AD:DP:GQ:PL 1:0,62:62:99:2240,0
Which, to my knowledge, looks like a high quality variant. Could you help me identify why the variant is not called? And why the resulting bam output is empty.
You can find the data here:
https://drive.google.com/file/d/1zfSnBOWh_FT5PTpjvT3Ye7SbG_H9MFPH/view?usp=share_link
Let me know if you need any more info.
Many thanks in advance!
Best,
Bert
-
Hello Bert Bogaerts. We have a standard list of advice to try if variants are missing from your output. Here are some steps you should try to identify what might have gone wrong in your case: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant.
Please sign in to leave a comment.
1 comment