Queey regarding Read group effect on Haplotype caller and Pipeline stepsAnswered
Dear Genevieve Brandt (she/her) and GATK community,
I would like to know regarding the effect of read group on Haplotype caller. Here, I have shown an example, I have more samples in the same scenario.
516_FCH7L2KCCX2_L7_BISvveXAAEBAAA-91 and 516_BISvveX_2nd are two different samples. ID, SM, and PL are the same for both samples, but LB and PU are different for both samples. Will they be considered the same sample during Haplotype caller? Will there be any effect on variant numbers using Haplotype caller? Kindly help me regarding this issue :)
First Sample name -> 516_BISvveX_2nd
@RG ID:E00591_309_H7L2KCCX2_7 SM:516 PL:ILLUMINA LB:516_BISvveX PU:E00591_309_H7L2KCCX2_7.516_BISvveX
Second sample name -> 516_FCH7L2KCCX2_L7_BISvveXAAEBAAA-91
@RG ID:E00591_309_H7L2KCCX2_7 SM:516 PL:ILLUMINA LB:516_FCH7L2KCCX2 PU:E00591_309_H7L2KCCX2_7.516_FCH7L2KCCX2
Thank you so much in advance
I am going to move your post into our Community Discussions -> General Discussion topic, as the Non-Human topic is for reporting bugs and issues with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
HaplotypeCaller treats reads as the same sample if they share the same SM (Sample) tag. With HaplotypeCaller, you'll want different samples to have different SM tags. You can manually change the read group SM tag in your input BAM and then you should have no issues with HaplotypeCaller.
You can read more about read groups in this article: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
Hi Genevieve Brandt (she/her) ,
Thank you so much .. I had read the article, But it is kind of confusing for me. I would like to clarify only one thing that both ways are correct regarding adding read group for Haplotypecaller. According to your answer, it looks like both ways are correct, as far as I understood. It would be appreciable, if could clarify it.
Thank you so much in advance.
The way you assign read groups depends on how your sequencing was performed. Sometimes one sample contains multiple read groups. In that case, there would be multiple read group identifiers (ID) but the same sample name (SM).
Does this answer your question?
Dear Genevieve Brandt (she/her),
Thank you so much.
Please sign in to leave a comment.