Error on Google Cloud Platform: haplotypecaller-gvcf-gatk.wdl with make_gvcf = false
The gatk-workflows/gatk4-germline-snps-indels GitHub README says "However, for instances when calling variants for one or a few samples it is possible to have the workflow directly call variants and output a VCF file by setting the make_gvcf
input variable to false
."
I made this modification in haplotypecaller-gvcf-gatk.wdl and in running this workflow came across the error below. Is there a way around this issue?
Command run:
$ gcloud alpha genomics pipelines run --pipeline-file wdl_pipeline.yaml --regions us-central1 --inputs-from-file WDL=${GATK_GOOGLE_DIR}/haplotypecaller-gvcf-gatk4.wdl,WORKFLOW_INPUTS=${GATK_GOOGLE_DIR}/haplotypecaller-gvcf-gatk4.hg38.wgs.inputs.json,WORKFLOW_OPTIONS=${GATK_GOOGLE_DIR}/generic.google-papi.options.json --env-vars WORKSPACE=${GATK_OUTPUT_DIR}/work,OUTPUTS=${GATK_OUTPUT_DIR}/output --logging ${GATK_OUTPUT_DIR}/logging/
Error log:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.f816abdf
03:15:07.229 WARN GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardAnnotation) is enabled for this tool by default
03:15:07.231 WARN GATKAnnotationPluginDescriptor - Redundant enabled annotation group (StandardHCAnnotation) is enabled for this tool by default
03:15:07.430 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
03:15:07.906 INFO HaplotypeCaller - ------------------------------------------------------------
03:15:07.907 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.4.0
03:15:07.907 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
03:15:07.908 INFO HaplotypeCaller - Executing as root@6385518f1e3c on Linux v4.19.112+ amd64
03:15:07.908 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03
03:15:07.909 INFO HaplotypeCaller - Start Date/Time: May 11, 2020 3:15:07 AM UTC
03:15:07.909 INFO HaplotypeCaller - ------------------------------------------------------------
03:15:07.910 INFO HaplotypeCaller - ------------------------------------------------------------
03:15:07.911 INFO HaplotypeCaller - HTSJDK Version: 2.20.3
03:15:07.911 INFO HaplotypeCaller - Picard Version: 2.21.1
03:15:07.912 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
03:15:07.912 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
03:15:07.912 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
03:15:07.913 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
03:15:07.913 INFO HaplotypeCaller - Deflater: IntelDeflater
03:15:07.913 INFO HaplotypeCaller - Inflater: IntelInflater
03:15:07.914 INFO HaplotypeCaller - GCS max retries/reopens: 20
03:15:07.914 INFO HaplotypeCaller - Requester pays: disabled
03:15:07.915 INFO HaplotypeCaller - Initializing engine
03:15:11.181 INFO FeatureManager - Using codec IntervalListCodec to read file file:///cromwell_root/gcp-public-data--broad-references/hg38/v0/scattered_calling_intervals/temp_0025_of_50/scattered.interval_list
03:15:11.391 INFO IntervalArgumentCollection - Processing 58850000 bp from intervals
03:15:11.429 INFO HaplotypeCaller - Done initializing engine
03:15:11.574 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
03:15:11.585 INFO HaplotypeCaller - Shutting down engine
[May 11, 2020 3:15:11 AM UTC] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 0.07 minutes.
Runtime.totalMemory()=320339968
***********************************************************************
A USER ERROR has occurred: Allele-specific annotations are not yet supported in the VCF mode
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /gatk/gatk-package-4.1.4.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx6G -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -jar /gatk/gatk-package-4.1.4.0-local.jar HaplotypeCaller -R /cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://55trios/GATKprocessing/output/NA12878.hg38.bam -L /cromwell_root/gcp-public-data--broad-references/hg38/v0/scattered_calling_intervals/temp_0025_of_50/scattered.interval_list -O NA12878.hg38.vcf.gz -contamination 0 -G StandardAnnotation -G AS_StandardAnnotation -G StandardHCAnnotation
-
Hi ikeoluwao_o
Looks like HaplotypeCaller is not able to handle Allele-specific annotations (-G AS_StandardAnnotation) when run in VCF mode. Two things you can try
1) The workflow by default uses gatk4.1.4.0, set the "gatk_docker" workflow parameter to use the latest version of gatk (broadinstitute/gatk:4.1.7.0). The latest version may already have a fix for this.
2) If the latest version doesn't work try running this modified version of the workflow that removes the annotation when run on VCF mode https://github.com/gatk-workflows/gatk4-germline-snps-indels/blob/bs-hc-annot-fix/haplotypecaller-gvcf-gatk4.wdl.
-
Hi Beri,
Thank you for your response. The first option did not work but the second one did.
Can I ask if it's okay to proceed with the use of the VCF output for variant annotation (VEP) or is there another GATK tool I should be implementing? From my understanding, the use of the joint genotyping workflow improves variant calling accuracy. But in its absence, is the VCF output I just produced sufficient?
-
This article should answer your other question Germline-short-variant-discovery-SNPs-Indels
Please sign in to leave a comment.
3 comments