Picard MarkDuplicates Functionality
It's a general query. Does picard Markduplicates remark duplicates if duplicates are already marked before removing them, or does it remove them directly if already marked?
-
MarkDuplicates recalculates all duplicates and ignores previous duplicate tags so you don't have to worry about clearing duplicate tags on your reads. If a read is marked duplicate before and MarkDuplicates decides that it is a duplicate, it will be remarked as if it was not marked before.
Regards.
-
Hi all,
I wonder if the reads marked by Picard's MarkDuplicates will be automatically ignored in the analysis processes in subsequent steps. By this, I mean the reads are only marked but not removed (-REMOVE_DUPLICATES false). I've heard it mentioned that once reads are marked, they will never be considered again by any software. Is this true? Are there any arguments to indicate whether to overlook these reads or not?
Thanks!
-
Hi Dong Yiyi,
It is true that MarkDuplicates by default only marks the reads as duplicates and does not remove them (unless you set REMOVE_DUPLICATES to true). Every GATK tool has a default duplicate read filter enabled and will therefore not consider reads marked as duplicates. While this should be true for most other bioinformatics tools as this is certainly the right thing to do, we cannot guarantee that every (third-party) software behaves this way.
Hope this helps,
Michael -
Thank you, Michael!
Please sign in to leave a comment.
4 comments