Best practice with read groups
Hi,
Could you clarify something for my own understanding please?
I have read your post https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups on read groups and understand what they are.
My question is, if I have the same library run on the same flowcell over multiple lanes, do I need to preserve this lane information for downstream applications such as Mark (optical) duplicates?
Therefore would it be incorrect to concatenate the fastqs from different lanes before assigning read group information at the alignment stage?
-
Hi Sheryl,
MarkDuplicates specifically may not use the read group information, but BQSR definitely does. We recommend creating a different FASTQ file for each read group and then converting those to an unaligned BAM that will retain the readgroup information. You start from paired or interleaved FASTQs -- i.e. paired FASTQs have separate files for each read in the pair and interleaved have both reads in the pair one after another in the same file. If you already have interleaved FASTQs you'll have to split them (see https://github.com/gatk-workflows/seq-format-conversion/blob/master/interleaved-fastq-to-paired-fastq.wdl) You'll also need to split the fastqs by flowcell-lane because the workflow for this purpose expects one read group per fastq. That should be doable with a relatively easy (but probably long-running) Python script. The read names have the flowcell and lane in them.
You'll need a TSV with the following :
readgroupfastq_pair1fastq_pair2sample_namelibrary_nameplatform_unitrun_dateplatform_namesequencing_centerThis example is for paired FASTQ files, butPlatform name is the technology used to produce the reads (i.e. illumina)
Platform Unit should be unique to each read group, i.e. flowcell.lane.barcode)Once you have that TSV you can run the commands as in https://github.com/gatk-workflows/seq-format-conversion/blob/master/paired-fastq-to-unmapped-bam.wdl to create an unmapped BAM that will have all the information for BQSR.
-Laura
Please sign in to leave a comment.
1 comment