Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicates (Picard) Follow

1 comment

  • Avatar
    Fred Zhou



    I'm recently puzzled by the multi-lane setting in the MarkDuplicates tool.

    I have WGS data from multiple lanes, and I have added the RG info during the bwa-mem step.

    I learned that the MarkDuplicates can take multiple bam as input, also MarkDuplicates will only use library info in the RG for the processing. So I tried to input all the bams from multiple lanes from the same sample for MarkDuplicates. However, it seems that MarkDuplicates will discard the RG info after the output?

    In this case, do I need to do two rounds of the MarkDuplicates (lane level first, then the library level)?


    Please suggest if my understanding is correct. Thank you very much!






    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk