Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 somatic call error

0

9 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Tanay Biswas,

    I think this issue is coming from your read group line for your IITK-P6-BD_recal.bam sample:

    @RG    ID:normal    PL:ILLUMINA    LB:TruSeq    SM:normal    PI:200

    The sample name for the -normal argument should correspond to the sample name in the read group. The sample name in your read group is normal, but it should be IITK-P6-BD, to match your command line.

    You can fix your read group with the tool AddOrReplaceReadGroups: https://gatk.broadinstitute.org/hc/en-us/articles/360035532352-Errors-about-read-group-RG-information

    Let me know if you have any other questions!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Tanay Biswas
    Hi Genevieve,

    Thanks. I have checked the link but I don't know about Read-Group platform unit. Please let me know what should be specified at --RGPU,-PU <String>           Read-Group platform unit (eg. run barcode)  option while running picard and -SM should be IITK-P6_BD right?

     

    Thanks.

     

    Regards,

    Tanay

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Tanay,

    Yes, the -SM should be IITK-P6_BD. Here is a description of what should be in the Platform Unit, from the read groups article:

    PU = Platform UnitThe PU holds three types of information, the {FLOWCELL_BARCODE}.{LANE}.{SAMPLE_BARCODE}. The {FLOWCELL_BARCODE} refers to the unique identifier for a particular flow cell. The {LANE} indicates the lane of the flow cell and the {SAMPLE_BARCODE} is a sample/library-specific identifier. Although the PU is not required by GATK but takes precedence over ID for base recalibration if it is present. In the example shown earlier, two read group fields, ID and PU, appropriately differentiate flow cell lane, marked by .2, a factor that contributes to batch effects.

    Let me know if you have any further questions!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Tanay Biswas
    Hi Genevieve,

    I have understood the description but I am not able to find anything related to PU. Can you suggest from where should I get the information?

    Thank you.

     

    Regards,

    Tanay

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Tanay,

    Usually you would get the PU information from wherever your sample was sequenced. However, if you do not have the information, PU is not required by GATK so you do not need to include it.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Tanay Biswas
    Hi Genevieve

    Picard was giving error when PU was not mentioned. So I ran the below command:

    [tbiswas@un02 ~]$ java -jar picard.jar AddOrReplaceReadGroups -I /scratch/tbiswas/IITK-P6-BD_fixmate_sorted_duprm.recal.bam -O /scratch/tbiswas/IITK-P6-BD_fixmate_sorted.duprm.recal_RG.bam -LB TruSeq -PL ILLUMINA -PU barcode -SM IITK-P6-BD --CREATE_INDEX true

     

    This gave me the output. But the output file size is reduced by ~4GB, can you comment upon that? Now I'll run Mutect2 again. Lets see..

    Thanks.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    You should be able to run Mutect2 without the PU read group.

    I don't know about why your file size is different, I don't have enough information to determine. If you share the complete stack trace from AddOrReplaceReadGroups, I can get a better idea of if it was successful. 

    0
    Comment actions Permalink
  • Avatar
    Tanay Biswas
    Hi Genevieve

    Mutect2 is running and I've understood that why the output file size was different.

    Thank you.

     

    Regards,

    Tanay

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the update, Tanay! Glad it is working for you now!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk