Mutect2 PoN best practice pipeline failed with GenomicsDBImport 'A USER ERROR has occurred: Failed to create reader from file' ERROR
Can you please provide
a) GATK version used (docker 4.1.8.0)
b) Exact GATK commands used
config=${bindir}/cromwell.mutect2.conf
cromwell_engine=${bindir}/cromwell-49.jar
wdlscript=${bindir}/gatk4-somatic-snvs-indels-master/mutect2_pon.wdl
importzip=${outputdir}/imports.zip
json=${outputdir}/zb.mutect2.pon.json
java -jar -Dconfig.file=${config} ${cromwell_engine} run ${wdlscript} --imports ${importzip} --inputs ${json} > pon.log 2>&1
c) The entire error log if applicable. Too Long, I just provided the part I thought necessary.
I am running *gatk4-somatic-snvs-indels* best practice archived at this site:
https://github.com/gatk-workflows/gatk4-somatic-snvs-indels
to call snvs+indels in WES paired samples.
It said that the pipeline was tested successfully on:
GATK version 4.1.4.1
Cromwell version v47
and I am running it on my local machine with:
GATK version 4.1.8.0
Cromwell version v49
because I succeed with version 4.1.8.0 gatk on running *gatk4-data-processing* best practice archived at:
https://github.com/gatk-workflows/gatk4-data-processing
but when I was running *gatk4-somatic-snvs-indels* to build a panel of normals, I ran into several errors:
1. The 1st error was *solved *after I deleted the line:
"Mutect2_Panel.Mutect2.variants_for_contamination":"gs://gatk-best-practices/somatic-b37/small_exac_common_3.vcf"
in the provided json file:
https://github.com/gatk-workflows/gatk4-somatic-snvs-indels/blob/master/mutect2_pon.inputs.json
Nowhere in the mutect2_pon.wdl file "Mutect2_Panel.Mutect2.variants_for_contamination" was claimed as a input, so I guess that this input unexpectedly triggered the imported *mutect2.wdl* to execute "call MergePileupSummaries as MergeTumorPileups" part, which gives an empty output and causes an error.
2. the 2nd error was an already reported one, but ....
the error message was "genomicsDBImport .A USER ERROR has occurred: Failed to create reader from file"
It was reported in:
a. https://gatk.broadinstitute.org/hc/en-us/community/posts/360056249211-genomicsDBImport-A-USER-ERROR-has-occurred-Failed-to-create-reader-from-file
caused by wrong file formats, but all the files in my execution were generated by the gatk archived pipeline. So this post doesn't solved my problem
b. https://gatkforums.broadinstitute.org/gatk/discussion/23966/failed-to-create-reader-error-in-genomicsdbimport
caused by missing vcf index file but index file does exist in the cromwell execution directory.
Then I checked if gatk 4.1.8.0 and 4.1.4.0 version of *gatk genomicsDBImport* tools used different file format as input:
4.1.4.0 https://gatk.broadinstitute.org/hc/en-us/articles/360036712071-GenomicsDBImport
4.1.8.0 https://gatk.broadinstitute.org/hc/en-us/articles/360046222251-GenomicsDBImport
there are several updates but the input option remain unchanged.
the entire wdl execution log file is too long to be pasted here.[5000 lines] I am very willing to post any infomation here if necessary.
As I see, the pipeline exit during CreatePanel. 'CreatePanel call' calls 'CreatePanel task' calls 'gatk GenomicsDBImport' and failed.
[First 300 bytes]:Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-CreatePanel/shard-2/attempt-3/tmp.acdf7c8b
15:29:51.117 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/n
Job Mutect2_Panel.CreatePanel:12:3 exited with return code 2 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /data/user/object/mutect2.pons/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-CreatePanel/shard-12/attempt-3/execution/stderr.
the stderr message mentioned in the log file is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-CreatePanel/shard-0/attempt-3/tmp.d645415d
13:17:59.638 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 17, 2020 1:17:59 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:17:59.813 INFO GenomicsDBImport - ------------------------------------------------------------
13:17:59.814 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.8.0
13:17:59.814 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
13:17:59.814 INFO GenomicsDBImport - Executing as root@466abf09f44a on Linux v4.18.0-147.8.1.el8_1.x86_64 amd64
13:17:59.814 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
13:17:59.814 INFO GenomicsDBImport - Start Date/Time: July 17, 2020 1:17:59 PM GMT
13:17:59.814 INFO GenomicsDBImport - ------------------------------------------------------------
13:17:59.814 INFO GenomicsDBImport - ------------------------------------------------------------
13:17:59.815 INFO GenomicsDBImport - HTSJDK Version: 2.22.0
13:17:59.815 INFO GenomicsDBImport - Picard Version: 2.22.8
13:17:59.815 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:17:59.815 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:17:59.815 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:17:59.815 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:17:59.815 INFO GenomicsDBImport - Deflater: IntelDeflater
13:17:59.815 INFO GenomicsDBImport - Inflater: IntelInflater
13:17:59.815 INFO GenomicsDBImport - GCS max retries/reopens: 20
13:17:59.815 INFO GenomicsDBImport - Requester pays: disabled
13:17:59.815 INFO GenomicsDBImport - Initializing engine
13:17:59.832 INFO GenomicsDBImport - Shutting down engine
[July 17, 2020 1:17:59 PM GMT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1132986368
***********************************************************************
A USER ERROR has occurred: Failed to create reader from file:///data/user/zbraw_freashn/mutect2.pons/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-Mutect2/shard-0/Mutect2/e296cf4a-1a6f-4f79-b252-473197857114/call-Filter/execution/samplename[concealed].b37-filtered.vcf
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /root/gatk.jar GenomicsDBImport --genomicsdb-workspace-path pon_db -R /cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-CreatePanel/shard-0/attempt-3/inputs/1559445236/human_g1k_v37_decoy.fasta -V /data/user/zbraw_freashn/mutect2.pons/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-Mutect2/shard-0/Mutect2/e296cf4a-1a6f-4f79-b252-473197857114/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/zbraw_freashn/mutect2.pons/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-Mutect2/shard-1/Mutect2/bff47bb1-702b-4bfc-829f-52ef94e3fa98/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/zbraw_freashn/mutect2.pons/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-Mutect2/shard-2/Mutect2/0371d5cd-dcdb-4290-b9d4-021573f9675a/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/zbraw_freashn/mutect2.pons/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-Mutect2/shard-3/Mutect2/b3fb5a46-eb33-402c-9508-522017ada5dd/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/zbraw_freashn/mutect2.pons/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-Mutect2/shard-4/Mutect2/921c9f3f-2b08-403d-8ef3-2f949bcf9c0c/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/zbraw_freashn/mutect2.pons/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-Mutect2/shard-5/Mutect2/c55aaa1a-e3ea-4e60-9477-d5d2cfea595d/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/zbraw_freashn/mutect2.pons/cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-Mutect2/shard-6/Mutect2/562a6add-ac63-4199-813b-6d7f6e49d3f7/call-Filter/execution/samplename[concealed].b37-filtered.vcf -L /cromwell-executions/Mutect2_Panel/8a515b6f-89d2-4e2c-9adf-bcb612a520c2/call-CreatePanel/shard-0/attempt-3/inputs/1617384629/0000-scattered.interval_list
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The problematic command line is:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /root/gatk.jar GenomicsDBImport --genomicsdb-workspace-path pon_db -R /cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-CreatePanel/shard-0/attempt-3/inputs/1559445236/human_g1k_v37_decoy.fasta -V /data/user/object/mutect2.pons/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-Mutect2/shard-0/Mutect2/72a2075c-b6f4-46c5-892a-f95885518121/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/object/mutect2.pons/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-Mutect2/shard-1/Mutect2/571a1129-fec6-44eb-8b98-e8c91f2a1d70/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/object/mutect2.pons/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-Mutect2/shard-2/Mutect2/daa95ba0-e4aa-423a-8970-2f66119ff914/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/object/mutect2.pons/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-Mutect2/shard-3/Mutect2/ddb7d2fb-f764-4667-9ed6-52159d527b31/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/object/mutect2.pons/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-Mutect2/shard-4/Mutect2/c587c2a2-4f6d-45c7-a296-99813dc4f0f6/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/object/mutect2.pons/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-Mutect2/shard-5/Mutect2/e3f328e4-15a3-41f5-98e6-f4e381e2c9b0/call-Filter/execution/samplename[concealed].b37-filtered.vcf -V /data/user/object/mutect2.pons/cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-Mutect2/shard-6/Mutect2/1102f3d0-0de1-4feb-9b93-9c9b0ba79b79/call-Filter/execution/samplename[concealed].b37-filtered.vcf -L /cromwell-executions/Mutect2_Panel/6b0d1f07-0af7-4df1-bd2e-eef952b5254e/call-CreatePanel/shard-0/attempt-3/inputs/1890662442/0000-scattered.interval_list
I found the vcf files of each sample, and instead of running it as a part of the wdl pipeline, I ran it independently on the command line as follows:
gatk --java-options "-Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2" GenomicsDBImport --genomicsdb-workspace-path pon_db -R /data/wangzw/zbraw_freashn/input/human_g1k_v37_decoy.fasta -V ./100336AZW1.b37-filtered.vcf -V ./104260AZW1.b37-filtered.vcf -V ./104261AZW1.b37-filtered.vcf -V ./105023AZW1.b37-filtered.vcf -V ./105603AZW1.b37-filtered.vcf -V ./11T001353W.b37-filtered.vcf -V ./11T004419W.b37-filtered.vcf -L /data/wangzw/zbraw_freashn/input/common_sites.interval_list
The command is now executable (while has not finished yet). The log is:
Using GATK jar /root/WZW/20200512_wes_p3/bin/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /root/WZW/20200512_wes_p3/bin/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar GenomicsDBImport --genomicsdb-workspace-path pon_db -R /data/user/project/input/human_g1k_v37_decoy.fasta -V ./samplename[concealed].b37-filtered.vcf -V ./samplename[concealed].b37-filtered.vcf -V ./samplename[concealed].b37-filtered.vcf -V ./samplename[concealed].b37-filtered.vcf -V ./samplename[concealed].b37-filtered.vcf -V ./samplename[concealed].b37-filtered.vcf -V ./samplename[concealed].b37-filtered.vcf -L /data/user/project/input/common_sites.interval_list
10:32:03.046 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/root/WZW/20200512_wes_p3/bin/gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 20, 2020 10:32:03 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:32:03.210 INFO GenomicsDBImport - ------------------------------------------------------------
10:32:03.210 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.8.0
10:32:03.210 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
10:32:03.210 INFO GenomicsDBImport - Executing as root@ljlab on Linux v4.18.0-147.8.1.el8_1.x86_64 amd64
10:32:03.210 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v11.0.1+13-LTS
10:32:03.211 INFO GenomicsDBImport - Start Date/Time: July 20, 2020 at 10:32:03 AM CST
10:32:03.211 INFO GenomicsDBImport - ------------------------------------------------------------
10:32:03.211 INFO GenomicsDBImport - ------------------------------------------------------------
10:32:03.211 INFO GenomicsDBImport - HTSJDK Version: 2.22.0
10:32:03.211 INFO GenomicsDBImport - Picard Version: 2.22.8
10:32:03.211 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:32:03.211 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:32:03.211 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:32:03.211 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:32:03.212 INFO GenomicsDBImport - Deflater: IntelDeflater
10:32:03.212 INFO GenomicsDBImport - Inflater: IntelInflater
10:32:03.212 INFO GenomicsDBImport - GCS max retries/reopens: 20
10:32:03.212 INFO GenomicsDBImport - Requester pays: disabled
10:32:03.212 INFO GenomicsDBImport - Initializing engine
10:32:03.488 INFO FeatureManager - Using codec IntervalListCodec to read file file:///data/user/project/input/common_sites.interval_list
10:32:33.850 INFO IntervalArgumentCollection - Processing 11034962 bp from intervals
10:32:34.371 WARN GenomicsDBImport - A large number of intervals were specified. Using more than 100 intervals in a single import is not recommended and can cause performance to suffer. If GVCF data only exists within those intervals, performance can be improved by aggregating intervals with the merge-input-intervals argument.
10:32:34.392 INFO GenomicsDBImport - Done initializing engine
10:32:34.603 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
10:32:34.604 INFO GenomicsDBImport - Vid Map JSON file will be written to /data/user/project/mutect2.pons/cromwell-executions/Mutect2_Panel/50f317ea-4193-4d6d-86cf-b647e9fedd5e/pon_db/vidmap.json
10:32:34.604 INFO GenomicsDBImport - Callset Map JSON file will be written to /data/user/project/mutect2.pons/cromwell-executions/Mutect2_Panel/50f317ea-4193-4d6d-86cf-b647e9fedd5e/pon_db/callset.json
10:32:34.604 INFO GenomicsDBImport - Complete VCF Header will be written to /data/user/project/mutect2.pons/cromwell-executions/Mutect2_Panel/50f317ea-4193-4d6d-86cf-b647e9fedd5e/pon_db/vcfheader.vcf
10:32:34.604 INFO GenomicsDBImport - Importing to workspace - /data/user/project/mutect2.pons/cromwell-executions/Mutect2_Panel/50f317ea-4193-4d6d-86cf-b647e9fedd5e/pon_db
10:32:34.604 INFO ProgressMeter - Starting traversal
10:32:34.604 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
-
Hi WangZiwei,
I believe this is the original inquiry you referred to in your other post. I will follow up on this one when I have more information.
Kind regards,
Jason
-
this post was pended before and I have post another one here:
Could you please delete this one?
-
Hi WangZiwei,
Can you confirm which version of the Mutect2 workflow you are using?
https://dockstore.org/my-workflows/github.com/broadinstitute/gatk/mutect2
Examining the versions of the workflow, we don't currently have a 4.1.8.0 version. Can you confirm whether this workflow works when you use a version with the equivalent GATK version - for example, version 4.1.6.0 of Mutect2 with the 4.1.6.0 GATK4 docker.
Kind regards,
Jason
Please sign in to leave a comment.
3 comments