Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Using haplotype information from high-coverage samples as a prior for Haplotypecaller for other low-coverage samples in the same dataset?


1 comment

  • Avatar
    Laura Gauthier

    Hi Nils Paffen,

    Do you mean phasing by haplotype or phasing by transmission?  If you have related samples, then supplying a pedigree file to the CalculateGenotypePosteriors tool will apply Bayesian priors as appropriate for related samples given that the related samples are all in the input VCF.  There are very limited scenarios where HaplotypeCaller can take advantage of Haplotype phasing.  To best take advantage of those scenarios you may want to run HaplotypeCaller in multi-input mode, i.e. with multiple `-I` arguments, one for each bam.  In that way, the assembly graph will be built using the high coverage samples and the low coverage ones, but the variants discovered in the graph will be genotyped for all samples.  Depending on the reference context and the distance between two phased variants, if only one haplotype contains a pair of variants that are in cis phase, then a read covering one variant will also give support to the other variant, even if that variant is not covered.  HaplotypeCaller can't change its priors on a site-by-site basis.  If you want to use your high coverage samples as an allele frequency prior, you'll need to pass a VCF as a resource to CalculateGenotypePosteriors.

    Hopefully that helps,


    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk