Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
AnsweredREQUIRED for all errors and issues:
a) GATK version used:
b) Exact command used:
c) Entire program log:
See forum topic details at forum guidelines page: https://gatk.broadinstitute.org/hc/en-us/articles/360053845952-Forum-Guidelines
Hi there, I am new in gatk. The below is my running code:
gatk VariantRecalibrator -R Homo_sapiens_hg38.fasta -L chrM -V SNP_samples.vcf --trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.90 -tranche 99.80 -tranche 99.70 -tranche 99.60 -tranche 99.50 -tranche 99.40 -tranche 99.30 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an SOR -an DP -mode SNP --max-gaussians 6 --resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 omni.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 1000GI.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp138.vcf -O input_snp.recal --tranches-file input.tranches
I got the war as:
WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
and the error as:
A USER ERROR has occurred: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
you can find my log:
Using GATK jar /home/tahi/anaconda3/envs/annot/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/tahi/anaconda3/envs/annot/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar VariantRecalibrator -R known_snp/Homo_sapiens_hg38.fasta -L chrM -V SNP_samples.vcf --trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.90 -tranche 99.80 -tranche 99.70 -tranche 99.60 -tranche 99.50 -tranche 99.40 -tranche 99.30 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an SOR -an DP -mode SNP --max-gaussians 6 --resource:hapmap,known=false,training=true,truth=true,prior=15.0 known_snp/hapmap.vcf.gz --resource:omni,known=false,training=true,truth=false,prior=12.0 known_snp/omni.vcf.gz --resource:1000G,known=false,training=true,truth=false,prior=10.0 known_snp/1000GI.vcf.gz --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 known_snp/dbsnp138.vcf -O input_snp.recal --tranches-file input.tranches
03:39:03.716 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/tahi/anaconda3/envs/annot/share/gatk4-4.2.5.0-0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 22, 2022 3:39:03 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
03:39:03.886 INFO VariantRecalibrator - ------------------------------------------------------------
03:39:03.886 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.2.5.0
03:39:03.886 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
03:39:03.886 INFO VariantRecalibrator - Executing as tahi@tahi-GL553VD on Linux v5.11.0-49-generic amd64
03:39:03.886 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v11.0.9.1-internal+0-adhoc..src
03:39:03.886 INFO VariantRecalibrator - Start Date/Time: February 22, 2022 at 3:39:03 AM EST
03:39:03.886 INFO VariantRecalibrator - ------------------------------------------------------------
03:39:03.886 INFO VariantRecalibrator - ------------------------------------------------------------
03:39:03.887 INFO VariantRecalibrator - HTSJDK Version: 2.24.1
03:39:03.887 INFO VariantRecalibrator - Picard Version: 2.25.4
03:39:03.887 INFO VariantRecalibrator - Built for Spark Version: 2.4.5
03:39:03.887 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
03:39:03.887 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
03:39:03.887 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
03:39:03.887 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
03:39:03.888 INFO VariantRecalibrator - Deflater: IntelDeflater
03:39:03.888 INFO VariantRecalibrator - Inflater: IntelInflater
03:39:03.888 INFO VariantRecalibrator - GCS max retries/reopens: 20
03:39:03.888 INFO VariantRecalibrator - Requester pays: disabled
03:39:03.888 INFO VariantRecalibrator - Initializing engine
03:39:04.153 INFO FeatureManager - Using codec VCFCodec to read file file:///home/tahi/Working_Space/NGS/analysis/6.calling/6.2.variant_discovery/known_snp/hapmap.vcf.gz
03:39:04.319 INFO FeatureManager - Using codec VCFCodec to read file file:///home/tahi/Working_Space/NGS/analysis/6.calling/6.2.variant_discovery/known_snp/omni.vcf.gz
03:39:04.382 INFO FeatureManager - Using codec VCFCodec to read file file:///home/tahi/Working_Space/NGS/analysis/6.calling/6.2.variant_discovery/known_snp/1000GI.vcf.gz
03:39:04.439 INFO FeatureManager - Using codec VCFCodec to read file file:///home/tahi/Working_Space/NGS/analysis/6.calling/6.2.variant_discovery/known_snp/dbsnp138.vcf
03:39:04.531 INFO FeatureManager - Using codec VCFCodec to read file file:///home/tahi/Working_Space/NGS/analysis/6.calling/6.2.variant_discovery/SNP_samples.vcf
03:39:04.612 INFO IntervalArgumentCollection - Processing 16569 bp from intervals
03:39:04.671 INFO VariantRecalibrator - Done initializing engine
03:39:04.673 INFO TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
03:39:04.673 INFO TrainingSet - Found omni track: Known = false Training = true Truth = false Prior = Q12.0
03:39:04.673 INFO TrainingSet - Found 1000G track: Known = false Training = true Truth = false Prior = Q10.0
03:39:04.673 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
03:39:04.690 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
03:39:04.784 INFO ProgressMeter - Starting traversal
03:39:04.784 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
03:39:04.803 INFO ProgressMeter - unmapped 0.0 28 88421.1
03:39:04.803 INFO ProgressMeter - Traversal complete. Processed 28 total variants in 0.0 minutes.
03:39:04.811 INFO VariantRecalibrator - Shutting down engine
[February 22, 2022 at 3:39:04 AM EST] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=470810624
***********************************************************************
A USER ERROR has occurred: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
I am appreciate you to help me find the behind problem as I could not find it in forum solution.
warm regards
-
Hi tahi,
One of the annotations you are using to build your VariantRecalibrator model is QD (-an QD) but you have not added QD to your VCF file.
You can add annotations to your VCF file with the gatk tool VariantAnnotator. The tool documentation page is here: https://gatk.broadinstitute.org/hc/en-us/articles/4418054223003-VariantAnnotator
Please let me know if you have any further questions.
Best,
Genevieve
-
Dear Dr, Genevieve Brandt,
I got the same problem form GATK4.3.0. Based on your suggestion, I run "VariantRecalibrator model" and the problem still persists. Is there any other way to fix such problem? or Does hard filtering avoid such issue? Thank you!
Here is the error message
Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
-
GenotypeGVCFs generates the QD annotation by default. What command did you use to generate the VCF you're supplying as input to VariantRecalibrator? If you've run GenotypeGVCFs, but excluded QD, you can calculate it with VariantAnnotator.
-
Hi all,
I am getting the same error after using the VariantRecalibrator. I have checked training set and my input VCF and both have QD present in the annotations. Do you have any idea what could cause it and how to solve it?
Thanks in advance,
Emilia
My log file:
Using GATK jar gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar VariantRecalibrator -R <ref> -V <vcf> --resource:pf_crosses,known=false,training=true,truth=true,prior=15.0 <training_vcf> -an QD -an FS -an SOR -an DP --max-gaussians 8 --mq-cap-for-logit-jitter-transform 70 -mode SNP -O <ouput>.snps.recal --tranches-file Pv4_2195_merge_2023_08_02.snps.tranches --rscript-file <output>.snps.plots.R
12:52:14.589 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file: gatk4-4.1.4.1-1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 03, 2023 12:52:14 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
12:52:14.749 INFO VariantRecalibrator - ------------------------------------------------------------
12:52:14.749 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.4.1
12:52:14.749 INFO VariantRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/
12:52:14.749 INFO VariantRecalibrator - Executing as gabbie@s9 on Linux v5.4.0-153-generic amd64
12:52:14.749 INFO VariantRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
12:52:14.750 INFO VariantRecalibrator - Start Date/Time: August 3, 2023 12:52:14 PM UTC
12:52:14.750 INFO VariantRecalibrator - ------------------------------------------------------------
12:52:14.750 INFO VariantRecalibrator - ------------------------------------------------------------
12:52:14.750 INFO VariantRecalibrator - HTSJDK Version: 2.21.0
12:52:14.750 INFO VariantRecalibrator - Picard Version: 2.21.2
12:52:14.750 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:52:14.750 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:52:14.750 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:52:14.750 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:52:14.751 INFO VariantRecalibrator - Deflater: IntelDeflater
12:52:14.751 INFO VariantRecalibrator - Inflater: IntelInflater
12:52:14.751 INFO VariantRecalibrator - GCS max retries/reopens: 20
12:52:14.751 INFO VariantRecalibrator - Requester pays: disabled
12:52:14.751 INFO VariantRecalibrator - Initializing engine
12:52:14.991 INFO FeatureManager - Using codec VCFCodec to read file file://<training_vcf>
12:52:15.017 INFO FeatureManager - Using codec VCFCodec to read file file://<vcf>
12:52:15.052 INFO VariantRecalibrator - Done initializing engine
12:52:15.054 INFO TrainingSet - Found pf_crosses track: Known = false Training = true Truth = true Prior = Q15.0
12:52:15.061 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
12:52:15.079 INFO ProgressMeter - Starting traversal
12:52:15.079 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:52:15.084 INFO ProgressMeter - unmapped 0.0 0 0.0
12:52:15.084 INFO ProgressMeter - Traversal complete. Processed 0 total variants in 0.0 minutes.
12:52:15.088 INFO VariantRecalibrator - Shutting down engine
[August 3, 2023 12:52:15 PM UTC] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2152202240
***********************************************************************A USER ERROR has occurred: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. -
Hi Emila Mańko,
It's suspicious to me that it says zero variants processed and I have a theory. Can you confirm that there is at least one variant in your training VCF that is also in your input VCF? You could do that with GATK's SelectVariants supplying one with -V and one with -conc.
-Laura
-
Hi Laura,
Thank you for such quick response and sorry for delay on my side. I can confirm that we have quite good overlap between training set and input VCF confirmed both with GATK SelectVariants and bcftools.
We tried to manipulate around the VariantRecalibrator parameters and decreasing max-gaussians argument seems to bypass the error which probably is not ideal solution. Although from the error itself it seemed that something might be wrong with the input VCF itself... Any ideas what else we should be looking into to forsee this type of errors in the future?
Thank you!
Emilia
-
Hi Emila Mańko
I've seen a decrease in the max gaussians argument help when the error is no negative training data, but that's not what your log shows. It's going to be hard to further debug without seeing any of the data. Would you be able to share a small portion of the VCF so I could try to reproduce the error?
-Laura
Please sign in to leave a comment.
7 comments