Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Collecting alignment metrics at READ_GROUP level with Picard

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi Edward Formaini

    Can you share the @RG sections of your file header with us? 

    0
    Comment actions Permalink
  • Avatar
    Edward Formaini

    Gökalp Çelik sure - this aligned genome is comprised of 76 read groups 

    @RG    ID:sample123    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-28923D42    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-3CEDF6CF    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-5885F29A    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-3520DFFE    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-366A9FB8    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-4F7A0D5C    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-115C8711    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-18F99BFC    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-120B2C3F    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-3E8D6E03    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-4AD9D46B    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-1E7884DB    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-757E36E4    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-62FBF36A    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-1F664086    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-730D5BF9    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-67B3974F    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-72E1B077    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-3B8C00F6    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-27649607    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-2FBB9824    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-593E25C1    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-583E3F74    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-3DE454D0    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-321B27FF    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-17E95D28    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-782072ED    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-6ED91621    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-7D479A11    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-6B8118C    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-2599CB31    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-156E0DC9    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-6C66E462    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-497B28A2    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-3A1A2D3F    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-55AB40C8    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-2E3DE9AB    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-6FB3E4EB    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-505DDE31    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-1877978D    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-6D60FCD7    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-2C7955C7    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-12529AD1    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-46273D5F    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-3FE36C1F    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-4944ABAE    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-7C2479C0    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-578EC2E6    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-278981C9    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-5F493686    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-338A9647    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-40999CB    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-75850883    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-42C1F418    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-754B21C4    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-317E9F82    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-11A70168    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-A59B4A5    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-39BC19BB    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-414F8D31    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-5E51D997    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-68A3FB9E    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-2861807E    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-1F6842B1    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-6FF60D74    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-7AA3E279    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-3D539683    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-4A7102B9    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-361FC785    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-56BB63A6    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-196F3249    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-2E9AD623    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-38B6A8F1    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-B3093A9    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    @RG    ID:sample123-6F17F4EE    LB:LB0    PL:PL0    PU:PU0    SM:sample123
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    When we checked the SAM spec and the requirements for a RG tag it looks like there can be 2 separate fields that are supposed to be unique per read group. One is ID and the other one is PU. However SAM Spec also mentiones that ID can be modified to prevent collisions within sample during merging but two samples may also share the same ID tag therefore even 2 samples are different RG IDs could be the same. PU on the other hand is a unique field and is never touched during any SAM/BAM operation. Due to this rationale PU is selected as a way to distinguish between read groups within the code. Since all your PU values are the same metric collection cannot distinguish between different read groups and collects all metrics for a single PU. 

    If you wish to get different metrics for each read group we suggest you to modify your RG  tags using AddOrReplaceReadGroups tool. 

    I hope this helps. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Edward Formaini

    This has been very helpful, thank you very much Gökalp Çelik!

    Ed

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk