Where do I find guidance on when to mark duplicates a second time?
AnsweredDear GATK Team,
The (How to) Map and clean up short read sequence data efficiently tutorial describes:
For multiplexed samples, first perform the workflow steps on a file representing one sample and one lane. Then mark duplicates. Later, after some steps in the GATK's variant discovery workflow, and after aggregating files from the same sample from across lanes into a single file, mark duplicates again. These two marking steps ensure you find both optical and PCR duplicates.
1. After what steps should duplicates be marked a second time?
I understand it may depend on the variant discovery workflow being followed but it would be helpful to have more specific guidance as a second step marking duplicates is not mentioned in any of the following:
Somatic short variant discovery (SNVs + Indels)
Germline short variant discovery (SNPs + Indels)
Somatic copy number variant discovery (CNVs)
In addition, in the Data pre-processing for variant discovery tutorial, only marking duplicates per sample is described.
2. What are the limitations of marking duplicates only once, such as only once files are aggregated from the same sample from across lanes?
Thank you for your time and help.
Kind regards.
-
Official comment
Hi ISmolicz,
It looks like the (How to) Map and clean up short read sequence data efficiently tutorial is out of date based on your forum posts. In our pipelines, we only run MarkDuplicates once to get both optical and PCR duplicates. We can't think of a reason why it would need to be run twice.
I requested that this tutorial be changed noting that it is out of date and when we have the capacity, we will take a look at the tutorial and try to bring it up to date. Thank you for writing in regarding this issue. I apologize for how long it took us to get an answer.
Best,
Genevieve
Comment actions -
Hi ISmolicz,
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
Please sign in to leave a comment.
2 comments