Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Fatal error detected by the Java Runtime Environment when running MarkDuplicates

0

37 comments

  • Avatar
    Genevieve Brandt (she/her)

    Thank you Jason Cerrato for posting this issue! I'm sorry this has taken so long to get to the bottom of the problem, but I hope we will be able to figure this out soon. 

    The next step is that we want to determine if there is anything about this specific BAM file that could be causing these issues. Something like extremely large duplicate sets could be the reason why MarkDuplicates is failing.

    Eduardo, would it be possible to get more information from you about the BAM file? 
    1. What type of sequencing data is this? Is it amplicon data? 
    2. Have you tried yet to disable optical duplicate detection with READ_NAME_REGEX set to null (in the MarkDuplicates step)?
    3. Can you check the sort order tag in the header of the input BAM to MarkDuplicates and verify that it says "SO:queryname" (first line of the BAM header)?
    4. Can you provide an IGV screenshot of a representative section of the BAM, and/or some metrics such as size, total number of reads, maximum depth, etc.?

    Thank you for your help looking into this issue further.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    This is just WGS, sequenced and QC’ed at the broad with their standard pipelines and made available in a Terra workspace. Have not tried disabling the optical duplicate mark.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Ok thanks! That is good to know. Can you try disabling the optical duplicate marking and provide some of the details from question #4?

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    Here is an IGV plot from a random region.

     

    Currently running two samples with the following stats (let me know if you need any other):

    Total reads: 2104685646, 2425881222

    Mean coverage: 83.141155, 90.941138

    Chimera rate: 0.007568, 0.008076

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    Ran pipeline with  READ_NAME_REGEX set to null. Still get errors. 

     

    # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fb0b0caebf1, pid=17, tid=0x00007fb0adb2e700 # # JRE version: OpenJDK Runtime Environment (8.0_242-b08) (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08) # Java VM: OpenJDK 64-Bit Server VM (25.242-b08 mixed mode linux-amd64 ) # Problematic frame: # V [libjvm.so+0x9d1bf1] # # Core dump written. Default location: /cromwell_root/core or core.17 # # An error report file with more information is saved as: # /cromwell_root/hs_err_pid17.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp #

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    any further recommendations? is it possible to set up a meeting to discuss?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Eduardo Maury,

    I am sorry this is taking so long to diagnose, this is definitely looking like a GATK bug/issue. Unfortunately, we are not able to do meetings for GATK support tickets. 

    I did get some further feedback from our GATK developers with some options you can try to diagnose where the issue is coming from. 

    First, could you try disabling the Intel Compressor/Decompressor tool called "Snappy"? It will  help deduce the origin of the issue. If the code runs successfully with Snappy disabled, then we will know it's a pretty serious bug. You can do this with the java option -Dsamjdk.snappy.disable=true. 

    Then second, could you try running the job with the jdk deflater/inflater? I can't remember if we have tried this already. If not, I definitely think you should try it. These are GATK options, -use_jdk_deflater and -use_jdk_inflater

    Once again, I'm really sorry this issue has caused you such a big headache. I hope we can get to the bottom of what is causing it soon. 

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    currently running without Snappy. For the deflater/inflater, which one should I run, or am I supposed to run both?

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    without snappy I get the following error:

     

    Failed to evaluate 'flowcell_unmapped_bams' (reason 1 of 1): Evaluating read_lines(flowcell_unmapped_bams_list) failed: Failed to read_lines("gs://fc-b266996d-0c10-45f7-bd1c-39cb4eef6aa5/d53b3f2b-5b38-4197-bb13-c44e21069ff1/MergeUnmappedBAMFiles/c4b1a615-4ae9-41a0-a603-7d5fc27e2df8/call-PerformMergeOperation/RP-1044_00485262_v1_WGS_GCP.unmapped.bam") (reason 1 of 1): [Attempted 1 time(s)] - IOException: Could not read from gs://fc-b266996d-0c10-45f7-bd1c-39cb4eef6aa5/d53b3f2b-5b38-4197-bb13-c44e21069ff1/MergeUnmappedBAMFiles/c4b1a615-4ae9-41a0-a603-7d5fc27e2df8/call-PerformMergeOperation/RP-1044_00485262_v1_WGS_GCP.unmapped.bam: File gs://fc-b266996d-0c10-45f7-bd1c-39cb4eef6aa5/d53b3f2b-5b38-4197-bb13-c44e21069ff1/MergeUnmappedBAMFiles/c4b1a615-4ae9-41a0-a603-7d5fc27e2df8/call-PerformMergeOperation/RP-1044_00485262_v1_WGS_GCP.unmapped.bam is larger than requested maximum of 10000000 Bytes.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    For the deflater and inflater, run with both options! 

    I'm passing along your snappy update, thank you for trying that. Do you have the Terra job manager link in case we need to see more?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Eduardo Maury it looks like when you tried Snappy, the WDL wasn't configured properly. The error you posted is a WDL error, not a GATK error. 

    I found the job manager link and it looks like the job did not start correctly. 

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    The WDL error would have been caught by the compiler that stores the WDL scripts as it runs proper formatting. This error has not been reported on previous times I ran the code. Not sure what the error could be if I only made the edit you suggested re: snappy. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Eduardo Maury I think you edited an older version of the WDL. For example, there is no Xmx option in your MarkDuplicates command, which I remember we edited.

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    That is correct. I just re-ran with the correct version and disabling snappy. There is still an error. 

    Here is the run information: workspace-id: b266996d-0c10-45f7-bd1c-39cb4eef6aa5submission-id: 7405aabe-a0b2-40e2-9b59-e263a54615d4

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you for the update Eduardo Maury! Could you also try with the jdk inflater and deflater?

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    Still with errors using inflater/deflater

     

    workspace-id: b266996d-0c10-45f7-bd1c-39cb4eef6aa5submission-id: 23aed5a9-2cb0-403e-b1d9-db351e2c02f3

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks Eduardo Maury. The developers I'm working on with this issue are out of office this week. They will be back next week so I'm hoping to get something else for you to try then. 

    I am able to help with that second sample that completed the MarkDuplicates step. It looks like SortSam ran out of disk space. (Caused by: java.io.IOException: No space left on device). I'm wondering if that job doesn't have enough memory for your Xmx 590G. Jason Cerrato could you take a look at how the SortSam step is set up?

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Genevieve Brandt (she/her) I'm seeing the WDL for the script in the Workflow Dashboard here. This is the relevant part of the WDL:

    Combined with the input for the task

    command_mem_gb_sort = 550
    command_mem_gb_fix = 60
    command_mem_gb = 590

    making -Xms 550G and -Xmx 590G for the SortSam command and

    -Xms 60G and -Xmx 590G for the SetNmMdAndUqTags command.

    The task has 600 GB disk space as well.

    Does it matter if the SORT_ORDER for SortSam is "coordinate"?

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    Is the question above addressed to me?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Eduardo Maury thanks for checking in, we don't need anything from you right now. I'm looking into it on my end with the info from Jason. Just pinged the developers again regarding the SortAndFixTags step. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Eduardo Maury,

    Since the second sample is in the SortAndFixTags step, I have some recommendations for how to proceed with that step. 

    Could you try doubling the disk space for the SortAndFixTags step? And then also set the --TMP_DIR argument for the SortSam GATK command to a directory within your working directory. It looks the SortSam command is running out of disk space. 

    Let us know how that goes. And I'll also get back to you next week regarding the sample that is failing the MarkDuplicates step. 

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Eduardo Maury,

    I have an update regarding your sample #1 which is failing the MarkDuplicates step. Our Picard expert took a look and thinks that you should re-try with a normal size machine, along with setting the java xmx option and decreasing the sorting collection size. They think that something is overflowing and causing these major problems. Here is what they recommend you try:

    • Input: queryname sorted bam
    • Remove -Dsamjdk.snappy.disable=true
    • -Xms8G
    • -Xmx12G
    • --SORTING_COLLECTION_SIZE_RATIO 0.125
    • --ASSUME_SORT_ORDER queryname
    • Remove --USE_JDK_INFLATER true --USE_JDK_DEFLATER true
    • Set the machine to 16GB memory and 16GB disk space

    Please let me know if you have any questions regarding this! And keep me posted if it fails or succeeds.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    Currently trying to run with these parameters. I have mentioned this in the past, but in the short term is there a pipeline that could just get me the bwa aligned bams? I really just need to re-align to hg19. Ideally the samples would be cleaned and marked with duplicates, but optimizing this current pipeline has taken over 2months...

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yes, the pipeline you have already run has bwa aligned bams. You could take the output from the MergeBamAlignment step if you want those.

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    So with the new specifications one of the samples ran to completion. However, there is still an error with one of the samples, although now it is able to pass the markduplicates step. 

     

    workspace-id: b266996d-0c10-45f7-bd1c-39cb4eef6aa5submission-id: 6b54d9f9-a826-4632-b51b-4a1052a710bd

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you Eduardo Maury, this is great news! I am so glad that both of the samples are now past the MarkDuplicates step. For the sample that failed SortSam, could you try with the recommendations in this comment

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    Not sure which recommendation specific you are referring to since we tried many on the comment linked. Which one are you referring to?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Eduardo Maury never mind for those recommendations. I was able to have more developers take a look today and they think the boot disk error message was a red herring. 

    For the sample that is failing during SortSam, could you specify these parameters for the SortAndFixTags step:

    • Decrease the memory for the task to 16 GB and the memory for each GATK command to 14 GB (--java-options "-Xmx14G"). Right now the task has 600 GB and each GATK command has 590 GB.  
    • Increase the disk space for the task to 800 GB

    This should work because we think the disk space needs to be more than 2x your input bam size for SortSam. And decreasing the memory should greatly decrease the cost too.

    Let me know if that works!

    0
    Comment actions Permalink
  • Avatar
    Eduardo Maury

    Still with error. I don't think we are able to allocate 800GB for a tast on terra. 

     

    workspace-id: b266996d-0c10-45f7-bd1c-39cb4eef6aa5submission-id: f4a815ce-e540-470a-839e-431848993ef4

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks Eduardo, I'm looking into this with the Terra team.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk