MarkDuplicatesSpark error: Couldn't write file sort.md.bam
I have been running GATK 4.1 MarkDuplicatesSpark on one Sam file to get bam file and after couple of hours running it failed. Could you please give me some advice to solve the problem? The commad are as follows:
ID=CRR032108
gatk MarkDuplicatesSpark -I $ID.bam \
-O $ID.sort.md.bam \
-M $ID.sort.md.metricts.txt \
--tmp-dir ./tmp \
--conf 'spark.executor.cores=5' \
--conf 'spark.local.dir=./tmp'
While I got an error: A USER ERROR has occurred: Couldn't write file CRR032108.sort.md.bam because writing failed with exception Output directory CRR032108.sort.md.bam.parts already exists
And there is a file named CRR032108.sort.md.bam.parts in my catalogue.
-
Hello mengting.
The error message in question here implies that you are trying to overwrite an existing directory. First I would recommend making sure that wherever you are writing your outputs that the name of {outputbam}.parts is absolutely unique. Either delete the CRR032108.sort.md.bam.parts directory or change the name of your output to avoid collision.
However we would generally recommend using a newer version of GATK than 4.1 as there have been a number of important bugfixes and improvements that could be relevant to this issue. Given how old that version is it is possible there there is an issue with the tool creating and then attempting to overwrite its own directory. If you have cleared that there is no overwritten output directory next I would recommend using a newer version of GATK (we are up to 4.5) and testing that you don't see the same exception. -
Hi, James Emery,
Thanks for your advice! I tried to use a newer version of GATK (version 4.5) to run my code, and the problems I was facing before were solved perfectly.
Please sign in to leave a comment.
2 comments