Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicatesSpark error: Couldn't write file sort.md.bam

0

2 comments

  • Avatar
    James Emery

    Hello mengting.

    The error message in question here implies that you are trying to overwrite an existing directory. First I would recommend making sure that wherever you are writing your outputs that the name of {outputbam}.parts is absolutely unique. Either delete the CRR032108.sort.md.bam.parts directory or change the name of your output to avoid collision. 

    However we would generally recommend using a newer version of GATK than 4.1 as there have been a number of important bugfixes and improvements that could be relevant to this issue. Given how old that version is it is possible there there is an issue with the tool creating and then attempting to overwrite its own directory. If you have cleared that there is no overwritten output directory next I would recommend using a newer version of GATK (we are up to 4.5) and testing that you don't see the same exception. 

    0
    Comment actions Permalink
  • Avatar
    mengting

    Hi, James Emery,

    Thanks for your advice! I tried to  use a newer version of GATK (version 4.5)  to run my code, and the problems I was facing before were solved perfectly.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk