Why does my vcf contain a genotype when there are no reads supporting it?
Can you please provide
a) GATK version used
4.1.4.1
b) Exact GATK commands used:
gatk HaplotypeCaller --native-pair-hmm-threads 4 -I {sortedBamFile} -R {refFasta} -L {bedFile} -O {vcfFile} --output-mode EMIT_ALL_CONFIDENT_SITES --ERC BP_RESOLUTION
c) The entire error log if applicable:
[14:55:50 INF] Calling variants: /mnt/lab_share/... | /mnt/lab_share/...
Using GATK jar /home/.../gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/.../gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar HaplotypeCaller --native-pair-hmm-threads 4 -I /mnt/lab_share/... -R /mnt/lab_share/.../ref/hs38DH.fa -L /mnt/lab_share/... -O /mnt/lab_share/...-gatk.vcf --output-mode EMIT_ALL_CONFIDENT_SITES --ERC BP_RESOLUTION
14:55:52.771 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/.../gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 24, 2020 2:55:52 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
14:55:53.000 INFO HaplotypeCaller - ------------------------------------------------------------
14:55:53.001 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.4.1
14:55:53.001 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
14:55:53.001 INFO HaplotypeCaller - Executing as ... on Linux v4.1.12-124.35.1.el7uek.x86_64 amd64
14:55:53.001 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_241-b07
14:55:53.001 INFO HaplotypeCaller - Start Date/Time: February 24, 2020 2:55:52 PM CST
14:55:53.001 INFO HaplotypeCaller - ------------------------------------------------------------
14:55:53.001 INFO HaplotypeCaller - ------------------------------------------------------------
14:55:53.002 INFO HaplotypeCaller - HTSJDK Version: 2.21.0
14:55:53.002 INFO HaplotypeCaller - Picard Version: 2.21.2
14:55:53.002 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:55:53.002 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:55:53.002 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:55:53.002 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:55:53.002 INFO HaplotypeCaller - Deflater: IntelDeflater
14:55:53.002 INFO HaplotypeCaller - Inflater: IntelInflater
14:55:53.002 INFO HaplotypeCaller - GCS max retries/reopens: 20
14:55:53.002 INFO HaplotypeCaller - Requester pays: disabled
14:55:53.002 INFO HaplotypeCaller - Initializing engine
14:55:53.761 INFO FeatureManager - Using codec BEDCodec to read file file:///mnt/lab_share/...
14:55:53.785 INFO IntervalArgumentCollection - Processing 22 bp from intervals
14:55:53.803 INFO HaplotypeCaller - Done initializing engine
14:55:53.805 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
14:55:53.901 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
14:55:53.902 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
14:55:53.915 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/.../gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
14:55:53.919 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/.../gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
14:55:53.973 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
14:55:53.973 INFO IntelPairHmm - Available threads: 4
14:55:53.973 INFO IntelPairHmm - Requested threads: 4
14:55:53.973 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
14:55:54.037 INFO ProgressMeter - Starting traversal
14:55:54.038 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
14:55:54.975 WARN InbreedingCoeff - InbreedingCoeff will not be calculated; at least 10 samples must have called genotypes
14:55:58.306 INFO HaplotypeCaller - 30 read(s) filtered by: ((((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter) AND WellformedReadFilter)
30 read(s) filtered by: (((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter) AND GoodCigarReadFilter)
30 read(s) filtered by: ((((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter) AND NonZeroReferenceLengthAlignmentReadFilter)
30 read(s) filtered by: (((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter) AND PassesVendorQualityCheckReadFilter)
30 read(s) filtered by: ((((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter) AND NotDuplicateReadFilter)
30 read(s) filtered by: (((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter) AND NotSecondaryAlignmentReadFilter)
29 read(s) filtered by: ((MappingQualityReadFilter AND MappingQualityAvailableReadFilter) AND MappedReadFilter)
29 read(s) filtered by: (MappingQualityReadFilter AND MappingQualityAvailableReadFilter)
29 read(s) filtered by: MappingQualityReadFilter
1 read(s) filtered by: NotSecondaryAlignmentReadFilter
14:55:58.306 INFO ProgressMeter - chr22:35380679 0.1 22 309.3
14:55:58.306 INFO ProgressMeter - Traversal complete. Processed 22 total regions in 0.1 minutes.
14:55:58.511 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.005727958
14:55:58.511 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.18370724800000002
14:55:58.511 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 1.01 sec
14:55:58.513 INFO HaplotypeCaller - Shutting down engine
[February 24, 2020 2:55:58 PM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.10 minutes.
Runtime.totalMemory()=403701760
First off, I am very new to all of this and searched for previous posts that may contain my issue and a solution, but I was unable to find anything. In short, my issue is that the .vcf files I am producing will sometimes have a line where there are no supporting reads (DP = 0) but GATK has called the genotype as REF/REF (GT = 0/0). See example line from one of my files:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 109690516 . G <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:0,0:0:0:0,0,0
Based on what I read in The Variant Call Format (VCF) Version 4.2 Specification (https://samtools.github.io/hts-specs/VCFv4.2.pdf), "If a call cannot be made for a sample at a given locus, ‘.’ should be specified for each missing allele in the GT field (for example ‘./.’ for a diploid genotype and ‘.’ for haploid genotype)." So, my question is, why is the output for my example being expressed as 0/0 when I would think it should be ./.?
Thanks in advance,
Jeff
-
As a side note, I have removed some identifying text from the log I provided and replaced with "...".
-
I have a similar issue. INFO column of VCF only has counts of ALT allele, but haplotype caller called the position as heterozygous. Examining the BAM verified all reads were for ALT allele.
Any ideas?
-
Hi Mark Godek,
Please see this article that contains detailed information about HaplotypeCaller and why you might see unexpected results.
We also have other resources that may be helpful:
-
Check out the HaplotypeCaller option -bamout to view realigned reads
- Make sure that you are following our best practices including data pre-processing and variant calling workflows.
Please sign in to leave a comment.
3 comments