Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicatesSpark doesn't work for large bam files

0

2 comments

  • Avatar
    danilovkiri

    Hi TYA

    Have you tried to look into the error log?

    There is a line at the very beginning of the error message

    java.io.IOException: No space left on device

    I guess it gives you all the explanation needed. In case you do have plenty of space and find the "No space left" error confusing, it is necessary to mention that java tools often use temp directories which are the system's default. Thus java tools can store temp files not on the SSD/HDD with terabytes of free space, but on the 100GB loading volume. I suggest you to run

    gatk --java-options "-XmxNG -XmsMG -Djava.io.tmpdir=/path/to/tmpdir"

    setting the Xmx argument as well (since RAM is limited as well). Set the tmpdir to the directory which has much free space and monitor the disk usage via "df -h"

    1
    Comment actions Permalink
  • Avatar
    TYA

    Hi danilovkiri,

    Thank you very much for your response. As you've suggested, I have changed the tmpdir and it worked. :) Thanks a lot. The code that I used is the following:

    gatk MarkDuplicatesSpark \
    -I input.bam \
    -O output.bam \
    -R ref.fa \
    --tmp-dir ~/tmp

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk