Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How should I pre-process data from multiplexed sequencing and multi-library designs? Follow

1 comment

  • Avatar
    Ury Alon

    Thanks for the great explanation.

    Regarding the following line:

    "Note that we used to do a first round of marking duplicates here for QC purposes but tool improvements have rendered this obsolete"

    If I am interested in the per-lane statistics (namely how many duplicates per lanes), how can they be extracted if MarkDuplicates is executed only once when merging the lanes into a single BAM?

    Looking at the metrics file (I'm using gatk v4.1.7.0), I see that the results are per library.  Does it mean that if I want per-lane statistics, I should modify the read group so each lane will have a distinct library (currently the all have the same library)?


    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk