Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkduplicatesSpark failed

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi lid.zigh,

    It can sometimes be complicated to find the root of the issue when running spark tools. The part of your stack trace that seems illuminating is here:

    21/11/21 05:44:43 ERROR ShuffleBlockFetcherIterator: Error occurred while fetching local blocks
    java.nio.file.NoSuchFileException: /scratch/lzeighami/WLCH3023/DownSample.60X/tmp/blockmgr-5057f706-90a5-4ef2-bc8f-7fd9d6e1832c/3a/shuffle_3_38190_0.index

    Spark is not finding one of the index files here. This could be from a few different problems with the command but first I would recommend checking that your temporary directory has enough space to hold the temporary files that spark creates. You'll also want to make sure that there is not a limit for the number of files that can exist in the temporary directory. 

    You might find more success changing your temporary directory to the system temp folder and not within your subdirectory. Potentially changing

    --conf 'spark.local.dir=tmp' \
    to
    --conf 'spark.local.dir=/tmp' \

    Let me know if this works or you have further issues/questions!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    lid.zigh

    Hello Genevieve,

    Thank you for your kind help. I will try my script with system tmp folder instead of my tmp folder and hope it solves the issue. I will keep you updated.

    Thank you,

    Lida

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Sounds good, let me know!

    0
    Comment actions Permalink
  • Avatar
    Ambu

    Hi lid.zigh, If you are using anaconda or miniconda for GATK4, You can rectify this issue by simply downgrading the java from 11 to 8.

    Activate your gatk in conda and use this command

    conda install openjdk==8.0.332=h166bdaf_0

    Then run MarkDuplicateSpark again.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you for your insight Ambu!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk