Error on Google Cloud Platform: haplotypecaller-gvcf-gatk.wdl with make_gvcf = false
The gatk-workflows/gatk4-germline-snps-indels GitHub README says "However, for instances when calling variants for one or a few samples it is possible to have the workflow directly call variants and output a VCF file by setting the make_gvcf
input variable to false
I made this modification in haplotypecaller-gvcf-gatk.wdl and in running this workflow came across the error below. Is there a way around this issue?
Command run:
$ gcloud alpha genomics pipelines run --pipeline-file wdl_pipeline.yaml --regions us-central1 --inputs-from-file WDL=${GATK_GOOGLE_DIR}/haplotypecaller-gvcf-gatk4.wdl,WORKFLOW_INPUTS=${GATK_GOOGLE_DIR}/haplotypecaller-gvcf-gatk4.hg38.wgs.inputs.json,WORKFLOW_OPTIONS=${GATK_GOOGLE_DIR}/ --env-vars WORKSPACE=${GATK_OUTPUT_DIR}/work,OUTPUTS=${GATK_OUTPUT_DIR}/output --logging ${GATK_OUTPUT_DIR}/logging/
Error log:
Picked up _JAVA_OPTIONS:
03:15:07.229 WARN GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardAnnotation) is enabled for this tool by default
03:15:07.231 WARN GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardHCAnnotation) is enabled for this tool by default
03:15:07.430 INFO NativeLibraryLoader - Loading from jar:file:/gatk/gatk-package-!/com/intel/gkl/native/
03:15:07.906 INFO HaplotypeCaller - ------------------------------------------------------------
03:15:07.907 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.4.0
03:15:07.907 INFO HaplotypeCaller - For support and documentation go to
03:15:07.908 INFO HaplotypeCaller - Executing as root@6385518f1e3c on Linux v4.19.112+ amd64
03:15:07.908 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03
03:15:07.909 INFO HaplotypeCaller - Start Date/Time: May 11, 2020 3:15:07 AM UTC
03:15:07.909 INFO HaplotypeCaller - ------------------------------------------------------------
03:15:07.910 INFO HaplotypeCaller - ------------------------------------------------------------
03:15:07.911 INFO HaplotypeCaller - HTSJDK Version: 2.20.3
03:15:07.911 INFO HaplotypeCaller - Picard Version: 2.21.1
03:15:07.912 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
03:15:07.912 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
03:15:07.912 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
03:15:07.913 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
03:15:07.913 INFO HaplotypeCaller - Deflater: IntelDeflater
03:15:07.913 INFO HaplotypeCaller - Inflater: IntelInflater
03:15:07.914 INFO HaplotypeCaller - GCS max retries/reopens: 20
03:15:07.914 INFO HaplotypeCaller - Requester pays: disabled
03:15:07.915 INFO HaplotypeCaller - Initializing engine
03:15:11.181 INFO FeatureManager - Using codec IntervalListCodec to read file file:///cromwell_root/gcp-public-data--broad-references/hg38/v0/scattered_calling_intervals/temp_0025_of_50/scattered.interval_list
03:15:11.391 INFO IntervalArgumentCollection - Processing 58850000 bp from intervals
03:15:11.429 INFO HaplotypeCaller - Done initializing engine
03:15:11.574 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
03:15:11.585 INFO HaplotypeCaller - Shutting down engine
[May 11, 2020 3:15:11 AM UTC] done. Elapsed time: 0.07 minutes.
A USER ERROR has occurred: Allele-specific annotations are not yet supported in the VCF mode
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /gatk/gatk-package-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx6G -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -jar /gatk/gatk-package- HaplotypeCaller -R /cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://55trios/GATKprocessing/output/NA12878.hg38.bam -L /cromwell_root/gcp-public-data--broad-references/hg38/v0/scattered_calling_intervals/temp_0025_of_50/scattered.interval_list -O NA12878.hg38.vcf.gz -contamination 0 -G StandardAnnotation -G AS_StandardAnnotation -G StandardHCAnnotation
Hi ikeoluwao_o
Looks like HaplotypeCaller is not able to handle Allele-specific annotations (-G AS_StandardAnnotation) when run in VCF mode. Two things you can try
1) The workflow by default uses gatk4.1.4.0, set the "gatk_docker" workflow parameter to use the latest version of gatk (broadinstitute/gatk: The latest version may already have a fix for this.
2) If the latest version doesn't work try running this modified version of the workflow that removes the annotation when run on VCF mode
Hi Beri,
Thank you for your response. The first option did not work but the second one did.
Can I ask if it's okay to proceed with the use of the VCF output for variant annotation (VEP) or is there another GATK tool I should be implementing? From my understanding, the use of the joint genotyping workflow improves variant calling accuracy. But in its absence, is the VCF output I just produced sufficient?
This article should answer your other question Germline-short-variant-discovery-SNPs-Indels
