Read groups: How do I assign RGID when all samples in a pool are run on all available lanes on a flow cell?
Dear GATK Team,
I have a question regarding assigning read groups for samples sequenced on platforms such as the Illumina NextSeq 500/550. On this platform specifically, although four lanes are physically distinct, libraries from multiple samples are pooled together and loaded in one location on the reagent cartridge; there is not an option to load per lane specifically. Therefore, a pool containing multiple samples flows across all four lanes on one flow cell, leading to all samples in the pool being sequenced across all lanes.
Following guidance from the article https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups, my interpretation of assigning read groups for three samples run on a platform such as the NextSeq 500/550, with one associated library each, would be the following:
@RG ID:FLOWCELL1.LANE1 PL:ILLUMINA LB: LIB-01 SM: SAMPLE-01 PU:FLOWCELL1.LANE1.SAMPLEBARCODE1
@RG ID:FLOWCELL1.LANE2 PL:ILLUMINA LB: LIB-01 SM: SAMPLE-01 PU:FLOWCELL1.LANE2.SAMPLEBARCODE1
@RG ID:FLOWCELL1.LANE3 PL:ILLUMINA LB: LIB-01 SM: SAMPLE-01 PU:FLOWCELL1.LANE3.SAMPLEBARCODE1
@RG ID:FLOWCELL1.LANE4 PL:ILLUMINA LB: LIB-01 SM: SAMPLE-01 PU:FLOWCELL1.LANE4.SAMPLEBARCODE1
@RG ID:FLOWCELL1.LANE1 PL:ILLUMINA LB: LIB-02 SM: SAMPLE-02 PU:FLOWCELL1.LANE1.SAMPLEBARCODE2
@RG ID:FLOWCELL1.LANE2 PL:ILLUMINA LB: LIB-02 SM: SAMPLE-02 PU:FLOWCELL1.LANE2.SAMPLEBARCODE2
@RG ID:FLOWCELL1.LANE3 PL:ILLUMINA LB: LIB-02 SM: SAMPLE-02 PU:FLOWCELL1.LANE3.SAMPLEBARCODE2
@RG ID:FLOWCELL1.LANE4 PL:ILLUMINA LB: LIB-02 SM: SAMPLE-02 PU:FLOWCELL1.LANE4.SAMPLEBARCODE2
@RG ID:FLOWCELL1.LANE1 PL:ILLUMINA LB: LIB-03 SM: SAMPLE-03 PU:FLOWCELL1.LANE1.SAMPLEBARCODE3
@RG ID:FLOWCELL1.LANE2 PL:ILLUMINA LB: LIB-03 SM: SAMPLE-03 PU:FLOWCELL1.LANE2.SAMPLEBARCODE3
@RG ID:FLOWCELL1.LANE3 PL:ILLUMINA LB: LIB-03 SM: SAMPLE-03 PU:FLOWCELL1.LANE3.SAMPLEBARCODE3
@RG ID:FLOWCELL1.LANE4 PL:ILLUMINA LB: LIB-03 SM: SAMPLE-03 PU:FLOWCELL1.LANE4.SAMPLEBARCODE3
However, this leads to an issue that the RGID is not unique across different samples. Therefore, would the recommendation be to add a further distinguishing parameter to RGID to ensure it is unique or is this not required considering there are other RG parameters and/or data from each sample will remain in separate files?
Thank you for your time and help.
Kind regards.
-
ISmolicz each ID must be unique for each read group, it does not necessarily need to be named with the flowcell.
For example, this read group:
@RG ID:H0164.2 PL:illumina PU:H0164ALXX140820.2 LB:Solexa-272222 PI:0 DT:2014-08-20T00:00:00-0400 SM:NA12878 CN:BI
-
Thank you for confirming Genevieve Brandt. I will ensure the RGID is unique per read group.
Please sign in to leave a comment.
2 comments