Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Error in document

Answered
0

7 comments

  • Official comment
    Avatar
    Genevieve Brandt (she/her)

    Hi stella,

    I don't think there is a mistake in the tutorial, there is an explanation for what you are seeing in the tutorial:

    The tutorial provides example small WGS data sourced from the 1000 Genomes ProjectCohort mode illustrations use 24 samples, while case mode illustrations analyze one sample against a cohort model made from the remaining 23 samples. The tutorial uses a fraction of the workflow's recommended hundred samples for ease of illustration. 

    I'll see if I can find recommendations for your use case but we don't guarantee specific solutions for users. If anyone on the forum has thoughts, please chime in!

    Please let me know if you have further questions I can help with.

    Best,

    Genevieve

    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Oh,

    I was able to look into further your methods and have some recommendations:

    The gCNV method is not like the somatic CNV method, the cohort mode does call CNVs for samples. There is no reason to run method 1 calling samples both with cohort and case mode.

    In your case we would recommend 300 samples in cohort mode. However our general maximum we recommend is 200 samples so 300 might take too long to finish. If that is the case, build a cohort with 200 samples that are half male and half female (it doesn't matter which are diseased and normal). Run the rest of your samples with case mode.

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi ,

     

    I am going to move your post into our Community Discussions -> General Discussion topic, as the Germline topic is for reporting bugs and issues with the GATK tools.

    You can read more about our forum guidelines and the topics here: Forum Guidelines.

     

    Thanks for the heads-up on the documentation error. We will tag the documentation team and make the necessary change.

     

    Best,

    Bhanu

    0
    Comment actions Permalink
  • Avatar
    stella

    Hi Genevieve Brandt (she/her)

    Thanks for your response.

    I'll just make three more clear.

     

    Q1. Is this the way you said it?

    1) Run 200 in cohort mode to get model and VCF files

    2) (Of 300 samples, remaining 100 samples) analyze one sample against a cohort model made from the 200 samples.

    3) Is it correct to combine 100 VCF files obtained by performing case mode 100 times and 200 VCF files obtained in 1) and perform comparison between disease and control?

     

    Q2. Suppose I have spare time and resources.

    Can I analyze it like this?

    1) Perform cohort mode with 300 samples and get VCF

    2) Merge 300 VCF files and compare disease and control.

     

    Q3. I saw your tutorial and thought so.

    Using 300 samples to make a model by performing cohort mode, and applying the model to each sample (n=300) to perform case mode.

    Now it's clear.

    Only samples that were not made in cohort mode can be run in Case mode.

    Right?

     

    Thank you.

    Oh.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Oh,

    Q1)

    1. Yes
    2. This matches what I meant. We have a WDL that takes the full set of 100 samples and scatters the job to make it much easier.
    3. Yes, we have WDLs to combine the VCFs in a clever way because it is more than just combining the VCFs. The breaks from the different files won't necessarily match and the script also annotates with site frequency counts as well.

    Q2)

    1. Yes, you can use the joint calling WDL for both these steps

    Q3)

    1. We wouldn't recommend this method. Although there is no check to make sure the samples are not in the cohort, it would not be a good idea.

    Hope this helps!
    Genevieve

    0
    Comment actions Permalink
  • Avatar
    stella

    Hi Genevieve Brandt (she/her)

    Thanks for your helpful answer. :) 

    Oh

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    No problem!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk