Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Calling variants on cohorts of samples using the HaplotypeCaller in GVCF mode Follow

4 comments

  • Avatar
    Robert Gilmore

    All of the links that I've come across for the Best Practices Workflow are 404'd.  
    (e.g. https://gatk.zendesk.com/hc/en-us/articles/360035894751)

    4
    Comment actions Permalink
  • Avatar
    Jonathan Klonowski

    It is not clear at what step do you add the information from a new batch of genotyped samples to the previously genotyped information to solve the N + 1 problem. Do I run GenomicsDBImport on the new "N + 1" batch and then run GenotypeGVCFs on all the GVCFS, new and old? Or the way that this workflow solves the problem is in the previous step, HaplotypeCaller?

    If all (the original N and the new N+1) the gVCFs have to be run together with GenomicsDBImport and GenotypeGVCFs, the amount of resources and time it takes to run GenomicsDBImport to just add a handful of samples to a larger, previously called, sample set is high.

    3
    Comment actions Permalink
  • Avatar
    max genetti

    Running this workflow results in genotype calls defaulting to the reference Allele at sites without reads (DP of 0).  Is there a flag to make those uncalled? As in a GT of ./.

    1
    Comment actions Permalink
  • Avatar
    Jia-Ying Su

    Is this workflow applicable to RNAseq data?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk