AddOrReplaceReadGroups argument value query
Hi,
I just wanted to ask for clarification on what exactly is important for GATK in picards AddOrReplaceReadGroups function. From reading other forums, it seems like the only important information for later in the somatic variant calling pipeline is the RGSM, so can I just populate the rest of the fields with the same value as a placeholder?
For example:
picard AddOrReplaceReadGroups I=../out/samtools/AA-DNA_S27_L001.bam \
O=../out/picard_read_groups_again/AA-DNA_S27_L001_headers.bam \
RGID=AA-DNA_S27_L001 RGLB=AA-DNA_S27_L001 \
RGPL=ILLUMINA RGPU=AA-DNA_S27_L001 \
If thats not correct, how should I do this differently.
-
The most frequently used SAM tags are SM and ID which are used to distinguish reads and samples within and among files. SM tag is necessary as it follows its trail from the very beginning all the way to the end VCF product. ID tag on the other hand may be necessary to keep if you are sequencing the same sample under different libraries or within different lanes of the same sequencer. That information is used to calculate covariates during base recalibration step therefore different libraries and/or lanes do not interfere with one another. Other tags may seem totally optional and most tools don't even care about those however you may wish to keep them populated with proper information such as the sequencer, technology, date and center info just in case you receive a very heterogeneous set of samples and would like to perform a retrospective study.
I hope this helps.
Please sign in to leave a comment.
1 comment