Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Should joint-calling be performed for the control and the disease group separately?

Answered
0

4 comments

  • Avatar
    Philipp Hähnel

    Hi Jiayi,

    if you mean by joint calling the multi-sample variant calling for Mutect2, then this is not done on a cohort basis, but on a per patient basis. The multi-sample calling pools evidence for a variant across samples and is thus more powered to detect variants in a patient.

    Please read the best practices tutorial.

    Best,

    Philipp

    0
    Comment actions Permalink
  • Avatar
    Jiayi Zhao

    Hi Philipp,

    Thanks for your reply. 

    Actually, I am using HaplotypeCaller, and I am going to try GenotypeGVCF. Is this a good choice? and should I conduct joint-calling on disease and control separately?

    Best,

    Jiayi

    0
    Comment actions Permalink
  • Avatar
    Philipp Hähnel

    Hi Jiayi,

    are you interested in obtaining germline variants or somatic variants? For the former, HaplotypeCaller should be used, for the latter Mutect2.

    Are the disease and control samples patient-matched? If yes, you can use them as tumor-normal pairs in Mutect2 to filter germline variants in the controls.

    Joint calling should only ever be done for multiple samples coming from the same patient. EDIT: this is certainly true for somatic calling. Upon reading documentation for germline calling again, you can run that in cohort mode on multiple patients. If you are interested in which germline variants may be responsible for the disease, then in order to maximize power, I'd run it in two batches: the case batch and the control batch. Maybe someone from the gatk team who is more familiar with germline calling could elaborate on that?

    Best,

    Philipp

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks so much for posting your insight here Philipp Hähnel! I would recommend Jiayi Zhao to run the 50 disease and 20 control samples together, because running them through our joint calling workflow will give the workflow more statistical power to make better calls. You will get a joint called VCF. If you want the VCF calls separated by group, you can divide the VCF with SelectVariants.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk