Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicatesSpark Follow

1 comment

  • Avatar
    Baptiste

    Hello,

    I'm trying to use MarkDuplicatesSpark on WGS files of approx. 130Go,

    I've set spark.local.dir and --tmp-dir to a RAM disk to avoid very long memory access (I don't have a large SSD, and scratch is on a NFS mount of HDD),

    On a slurm cluster with a server of 128CPUs + 1To of RAM I get an error "out of disk space" (RAM-disk),

    Is it normal? It looks like MarkDuplicatesSpark needs more than 1To of temporary files

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk