Haplotypecaller 4.5.2.0 --alleles unexpected output
Answered
- Output VCF file contains entries like below when '--alleles' is used:
chr5 125894866 . C T Infinity . AC=0;AF=0.00;AN=2;DP=31;ExcessHet=0.0000;FS=0.000;MLEAC=0;MLEAF=0.00;MQ=60.00;SOR=0.188 GT:AD:DP:GP:GQ:PG:PL 0/0:30,0:30:0,124.77,167.78:99:0,34.77,37.78:0,90,130
chr5 125895079 . A . 94.78 . AN=2;DP=37;MQ=60.00 GT:AD:DP 0/0:37:37
chr5 125895106 . G . 73.78 . AN=2;DP=28;MQ=60.00 GT:AD:DP 0/0:28:28
chr5 125895113 . G . 68.78 . AN=2;DP=25;MQ=60.00 GT:AD:DP 0/0:25:25
Position 125894866 is present in the --alleles file, and is genotyped correctrly as homozygous reference in the current sample. The following three positions are not present in the --alleles file, and do not contain an ALT allele in the output file ( dot for ALT). without '--alleles', these positions are not outputted.
Is this expected behaviour, and if so, why are they emitted ?
notes :
- same output is observed without threading ,
- same output is observed without dragen mode.
- --alleles is taken from a normal HC run.
- roughly the same heterozygous calls & hom.ALT calls are made with/without --alleles (which is expected behaviour)
=======================
REQUIRED for all errors and issues:
a) GATK version used: 4.2.5.0
b) Exact command used:
gatk --java-options "-Djava.io.tmpdir=/tmp -Xmx3g" HaplotypeCaller \
-R /home/gvandeweyer/elprep_streaming/reference/hg19.fasta \
-I /home/gvandeweyer/elprep_streaming/results/wesep-229191-f.bam \
-O results/wesep-229191-f.vcf \
--alleles affected_alleles.vcf \
-L 0005-scattered.interval_list \
-bamout results/wesep-229191-f.variants.bam \
-G StandardAnnotation -G StandardHCAnnotation \
--dragen-mode \
--dragstr-params-path /home/gvandeweyer/elprep_streaming/results/wesep-229191-f.bam.params \
--native-pair-hmm-threads 2
c) Entire program log:
(ELPREP) gvandeweyer@ngsvm-pipelines:~/elprep_streaming/VariantCalling_Test/scattered$ gatk --java-options "-Djava.io.tmpdir=/tmp -Xmx3g" HaplotypeCaller -R /home/gvandeweyer/elprep_streaming/reference/hg19.fasta -I /home/gvandeweyer/elprep_streaming/results/wesep-
229191-f.bam -O results/wesep-229191-f.vcf --alleles ../wesid-226998-m.haplotypecaller.final.vcf.gz -L 0005-scattered.interval_list -bamout results/wesep-229191-f.variants.bam -G StandardAnnotation -G StandardHCAnnotation --dragen-mode --dragstr-params-
path /home/gvandeweyer/elprep_streaming/results/wesep-229191-f.bam.params 2>&1 | tee Runtime.log.txt
Using GATK jar /home/gvandeweyer/miniconda3/envs/ELPREP/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Djava.io.tmpdir=/tmp -Xmx3g -jar /home/gvandeweyer/miniconda3/envs/ELPREP/share/gatk4-4.2.5.0-0/gatk-packa
ge-4.2.5.0-local.jar HaplotypeCaller -R /home/gvandeweyer/elprep_streaming/reference/hg19.fasta -I /home/gvandeweyer/elprep_streaming/results/wesep-229191-f.bam -O results/wesep-229191-f.vcf --alleles ../wesid-226998-m.haplotypecaller.final.vcf.gz -L 0005-scattered.inter
val_list -bamout results/wesep-229191-f.variants.bam -G StandardAnnotation -G StandardHCAnnotation --dragen-mode --dragstr-params-path /home/gvandeweyer/elprep_streaming/results/wesep-229191-f.bam.params
22:06:39.332 WARN GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardAnnotation) is enabled for this tool by default
22:06:39.337 WARN GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardHCAnnotation) is enabled for this tool by default
22:06:39.383 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/gvandeweyer/miniconda3/envs/ELPREP/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 12, 2022 10:06:39 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
22:06:39.543 INFO HaplotypeCaller - ------------------------------------------------------------
22:06:39.543 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.5.0
22:06:39.543 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
22:06:39.543 INFO HaplotypeCaller - Executing as gvandeweyer@ngsvm-pipelines.uza.be on Linux v4.4.0-210-generic amd64
22:06:39.543 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_312-b07
22:06:39.544 INFO HaplotypeCaller - Start Date/Time: March 12, 2022 10:06:39 PM CET
22:06:39.544 INFO HaplotypeCaller - ------------------------------------------------------------
22:06:39.544 INFO HaplotypeCaller - ------------------------------------------------------------
22:06:39.544 INFO HaplotypeCaller - HTSJDK Version: 2.24.1
22:06:39.544 INFO HaplotypeCaller - Picard Version: 2.25.4
22:06:39.544 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
22:06:39.544 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:06:39.545 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:06:39.545 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:06:39.545 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:06:39.545 INFO HaplotypeCaller - Deflater: IntelDeflater
22:06:39.545 INFO HaplotypeCaller - Inflater: IntelInflater
22:06:39.545 INFO HaplotypeCaller - GCS max retries/reopens: 20
22:06:39.545 INFO HaplotypeCaller - Requester pays: disabled
22:06:39.545 INFO HaplotypeCaller - Initializing engine
22:06:40.018 INFO FeatureManager - Using codec VCFCodec to read file file:///home/gvandeweyer/elprep_streaming/VariantCalling_Test/scattered/../wesid-226998-m.haplotypecaller.final.vcf.gz
22:06:40.091 INFO FeatureManager - Using codec IntervalListCodec to read file file:///home/gvandeweyer/elprep_streaming/VariantCalling_Test/scattered/0005-scattered.interval_list
22:06:40.375 INFO IntervalArgumentCollection - Processing 3884963 bp from intervals
22:06:40.411 INFO HaplotypeCaller - Done initializing engine
22:06:40.411 WARN HaplotypeCaller - *************************************************************************
22:06:40.411 WARN HaplotypeCaller - * DRAGEN-GATK mode enabled *
22:06:40.412 WARN HaplotypeCaller - * The following arguments have had their inputs overwritten: *
22:06:40.412 WARN HaplotypeCaller - * --apply-frd *
22:06:40.412 WARN HaplotypeCaller - * --apply-bqd *
22:06:40.412 WARN HaplotypeCaller - * --transform-dragen-mapping-quality *
22:06:40.412 WARN HaplotypeCaller - * --soft-clip-low-quality-ends *
22:06:40.413 WARN HaplotypeCaller - * --mapping-quality-threshold-for-genotyping 1 *
22:06:40.413 WARN HaplotypeCaller - * --minimum-mapping-quality 1 *
22:06:40.413 WARN HaplotypeCaller - * --allele-informative-reads-overlap-margin 1 *
22:06:40.413 WARN HaplotypeCaller - * --disable-cap-base-qualities-to-map-quality *
22:06:40.413 WARN HaplotypeCaller - * --enable-dynamic-read-disqualification-for-genotyping *
22:06:40.413 WARN HaplotypeCaller - * --expected-mismatch-rate-for-read-disqualification 0.03 *
22:06:40.413 WARN HaplotypeCaller - * --genotype-assignment-method USE_POSTERIOR_PROBABILITIES *
22:06:40.414 WARN HaplotypeCaller - * --padding-around-indels 150 *
22:06:40.414 WARN HaplotypeCaller - * --standard-min-confidence-threshold-for-calling 3.0 *
22:06:40.414 WARN HaplotypeCaller - * --use-posteriors-to-calculate-qual *
22:06:40.414 WARN HaplotypeCaller - * --allele-informative-reads-overlap-margin 1 *
22:06:40.414 WARN HaplotypeCaller - * *
22:06:40.414 WARN HaplotypeCaller - * If you would like to run DRAGEN-GATK with different inputs for any *
22:06:40.415 WARN HaplotypeCaller - * of the above arguments please manually construct the command. *
22:06:40.415 WARN HaplotypeCaller - *************************************************************************
22:06:40.437 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
22:06:40.484 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/gvandeweyer/miniconda3/envs/ELPREP/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
22:06:40.485 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/gvandeweyer/miniconda3/envs/ELPREP/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
22:06:40.515 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
22:06:40.516 INFO IntelPairHmm - Available threads: 4
22:06:40.516 INFO IntelPairHmm - Requested threads: 4
22:06:40.517 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
22:06:40.545 INFO ProgressMeter - Starting traversal
22:06:40.545 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
22:06:41.344 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr4:57843320 and possibly subsequent; at least 10 samples must have called genotypes
22:06:50.557 INFO ProgressMeter - chr4:69816964 0.2 570 3415.9
22:07:00.633 INFO ProgressMeter - chr4:74352584 0.3 1340 4002.4
22:07:10.827 INFO ProgressMeter - chr4:79856475 0.5 2370 4695.9
22:07:20.846 INFO ProgressMeter - chr4:88243684 0.7 3370 5017.2
22:07:32.170 INFO ProgressMeter - chr4:88577717 0.9 3470 4032.9
22:07:42.172 INFO ProgressMeter - chr4:102816662 1.0 4680 4556.4
22:07:52.185 INFO ProgressMeter - chr4:111397771 1.2 5780 4840.9
22:08:02.254 INFO ProgressMeter - chr4:122599470 1.4 6650 4883.2
22:08:12.357 INFO ProgressMeter - chr4:138450656 1.5 7590 4960.1
22:08:22.425 INFO ProgressMeter - chr4:147561485 1.7 8520 5017.7
22:08:32.475 INFO ProgressMeter - chr4:155719877 1.9 9500 5092.5
22:08:42.564 INFO ProgressMeter - chr4:169393558 2.0 10500 5163.1
22:08:52.656 INFO ProgressMeter - chr4:184628140 2.2 11430 5191.1
22:09:17.262 INFO ProgressMeter - chr4:190878453 2.6 12190 4667.0
22:09:31.627 INFO ProgressMeter - chr5:679327 2.9 12500 4383.9
22:09:41.640 INFO ProgressMeter - chr5:5303312 3.0 13080 4333.6
22:09:51.755 INFO ProgressMeter - chr5:15936459 3.2 14060 4411.9
22:10:01.826 INFO ProgressMeter - chr5:32403384 3.4 14710 4384.9
22:10:11.906 INFO ProgressMeter - chr5:38528906 3.5 15860 4502.2
22:10:22.512 INFO ProgressMeter - chr5:53815458 3.7 16970 4587.2
22:10:32.552 INFO ProgressMeter - chr5:64023887 3.9 17960 4644.7
22:10:42.616 INFO ProgressMeter - chr5:70888653 4.0 18930 4692.0
22:10:52.692 INFO ProgressMeter - chr5:78319994 4.2 20130 4790.1
22:11:02.777 INFO ProgressMeter - chr5:89939520 4.4 21130 4834.7
22:11:12.950 INFO ProgressMeter - chr5:96339058 4.5 21980 4841.3
22:11:23.401 INFO ProgressMeter - chr5:115188609 4.7 23170 4914.9
22:11:33.498 INFO ProgressMeter - chr5:127089898 4.9 24050 4925.7
22:11:33.815 INFO HaplotypeCaller - 69572 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
380 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
69952 total reads filtered
22:11:33.816 INFO ProgressMeter - chr5:127488298 4.9 24105 4931.6
22:11:33.816 INFO ProgressMeter - Traversal complete. Processed 24105 total regions in 4.9 minutes.
22:11:33.891 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 2.883281574
22:11:33.891 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 66.287158269
22:11:33.891 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 81.56 sec
22:11:35.558 INFO HaplotypeCaller - Shutting down engine
[March 12, 2022 10:11:35 PM CET] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 4.94 minutes.
Runtime.totalMemory()=1998061568
===========
See forum topic details at forum guidelines page: https://gatk.broadinstitute.org/hc/en-us/articles/360053845952-Forum-Guidelines
-
Hi Geert Vandeweyer,
Thank you for writing in about this. We are not sure why these variants would be output and we would like to take a closer look. And just to double check - you are certain that these variants are not in your affected_alleles.vcf?
Best,
Genevieve
-
Yes, but I'll double check. If you can send me a private message, I can share the data. (Patient material)
-
Geert Vandeweyer okay thank you. You only need to provide a small portion of your BAM file, just around 100 bases before and after this region. We have instructions here for how to upload a bug report: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671
-
Genevieve Brandt (she/her) : I've uploaded the data to the FPT client. Name == HaplotypeCaller.unknown_alleles.tar.gz
There's one of the unexpected alleles called in the provided region.
Thanks for looking into it !
-
Hi Geert Vandeweyer,
We took a look at your files, thank you for making the process so easy for us!
The only variant we were able to replicate on our end that was called in addition to chr5:125894866 was chr5:125895106. It does look like a bug and we think we have identified the cause as an edge case in the genotyper code. I have created a ticket in our repo so that our developer team can fix this issue: https://github.com/broadinstitute/gatk/issues/7741
For your specific issue with these variants popping up now, you can always remove them with SelectVariants.
Thank you for bringing this to our attention and please let me know if you have any other questions.
Best,
Genevieve
-
We were able to get a fix for this that will be in the next GATK release. https://github.com/broadinstitute/gatk/pull/7740
Thank you so much again for your very thorough files, they really helped us solve this quickly!
-
Thank you for looking into the problem !
Please sign in to leave a comment.
7 comments