Fatal error detected by the Java Runtime Environment when running MarkDuplicates
This issue is being filed on behalf of Eduardo Maury.
Description of issue
Eduardo Maury wrote to Terra Support to get assistance with running this featured workflow processing-for-variant-discovery-gatk. After a series of troubleshooting steps, Eduardo has created their own version of the workflow as the featured workflow did not include the -Xmx flags for the MarkDuplicates and SortAndFixTags tasks. You can see a copy of this WDL script by going to the bottom of this page under Submitted workflow script.
Eduardo is currently experiencing the following error message
# A fatal error has been detected by the Java Runtime Environment:
# SIGSEGV (0xb) at pc=0x00007f632e970bf1, pid=17, tid=0x00007ecf1d905700
# JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
# Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 )
# Problematic frame:
# V [libjvm.so+0x9d1bf1]
# Core dump written. Default location: /cromwell_root/core or core.17
# An error report file with more information is saved as:
# If you would like to submit a bug report, please visit:
We thought this might have been happening because the bam was unsorted, and MarkDuplicates expects coordinate- or query-sorted inputs. Eduardo updated the WDL so the MergeBamAlignment task uses --SORT_ORDER "queryname" instead of --SORT_ORDER "unsorted" which the featured workflow uses. The issue persisted despite this change.
The issue also persisted after changing to version 18.104.22.168.
REQUIRED for all errors and issues:
a) GATK version used: 22.214.171.124 (also tried 126.96.36.199)
b) Exact command used:
For MarkDuplicates task
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.compression_level=5 -Xms550G -Xmx590G -XX:+UseSerialGC -jar /gatk/gatk-package-188.8.131.52-local.jar MarkDuplicates --INPUT /cromwell_root/fc-b266996d-0c10-45f7-bd1c-39cb4eef6aa5/submissions/ef865dc6-fce6-46eb-a9ab-2e3c1c155d30/PreProcessingForVariantDiscovery_GATK4/1ec7301d-6836-405a-8d36-44d3bbac2a9d/call-MergeBamAlignment/attempt-4/RP-1044_00485262_v1_WGS_GCP.unmapped.aligned.unsorted.bam --OUTPUT RP-1044_00485262_v1_WGS_GCP.b37.aligned.unsorted.duplicates_marked.bam --METRICS_FILE RP-1044_00485262_v1_WGS_GCP.b37.duplicate_metrics --VALIDATION_STRINGENCY SILENT --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --ASSUME_SORT_ORDER queryname --CREATE_MD5_FILE true --SORTING_COLLECTION_SIZE_RATIO 0.125
c) Entire program log:
Eduardo Maury is there a chance you changed any of the permissions on the workspace since I left that comment yesterday? My colleagues and I are no longer able to view the backend log.
From what I remember yesterday, I was going to recommend to set the disk space for SortAndFixTags to 624 GB. I also noticed that you had java Xmx at 16GB, and you should decrease that to Xmx14G. Once I'm able to see the log again though I can check with my colleagues if there is anything else you should try now for this task.
Eduardo Maury not sure if you changed anything but it's working now!
Hi Eduardo Maury,
I followed up with the Terra team and found that it is possible to give the task 800 GB of disk space for the task. There is a limit of 624 GB for memory, but that limit does not apply for disk space. Disk space is also much cheaper than memory, so increasing the disk space should not be a problem. The disk space input for SortAndFixTags is agg_large_disk, so you can change that to 800GB.
In order to save considerable compute costs, I would recommend decreasing the memory to 16 GB for the task and then setting the Xmx value to 14GB (--java-options "-Xmx14G"). In your last run the memory was still set to 600 GB and the Xmx value was 16 GB.
Let me know how this goes!
Still an error:
workspace-id: b266996d-0c10-45f7-bd1c-39cb4eef6aa5submission-id: d4687ece-dac4-4b13-98de-bdc4bc2fb5d3
It looks like you didn't change the agg_large_disk parameter to 800. Could you try that again?
it seems that it worked!
Thank you for letting us know! I am so glad we were able to get this working. Please let us know if you come up with any other issues and we can always help out!
Please sign in to leave a comment.