Using haplotype information from high-coverage samples as a prior for Haplotypecaller for other low-coverage samples in the same dataset?
Dear GATK-devs,
on your page regarding the defaul procedure of the Haplotypecaller (https://gatk.broadinstitute.org/hc/en-us/articles/360035890511-Assigning-per-sample-genotypes-HaplotypeCaller- ) you state the following:
"
$$ P(G) $$represents how probably we expect to see this genotype based on previous observations, studies of the population, and so on. By default, the GATK tools use a flat prior (always the same value) but you can input your own set of priors if you have information about the frequency of certain genotypes in the population you're studying.
"
Is it possible to use phasing information of high-coverage samples (30x) to support Bayesian priors for low-coverage (>1x) genotyping in a pedigree dataset?
Best regards,
Nils
-
Hi Nils Paffen,
Do you mean phasing by haplotype or phasing by transmission? If you have related samples, then supplying a pedigree file to the CalculateGenotypePosteriors tool will apply Bayesian priors as appropriate for related samples given that the related samples are all in the input VCF. There are very limited scenarios where HaplotypeCaller can take advantage of Haplotype phasing. To best take advantage of those scenarios you may want to run HaplotypeCaller in multi-input mode, i.e. with multiple `-I` arguments, one for each bam. In that way, the assembly graph will be built using the high coverage samples and the low coverage ones, but the variants discovered in the graph will be genotyped for all samples. Depending on the reference context and the distance between two phased variants, if only one haplotype contains a pair of variants that are in cis phase, then a read covering one variant will also give support to the other variant, even if that variant is not covered. HaplotypeCaller can't change its priors on a site-by-site basis. If you want to use your high coverage samples as an allele frequency prior, you'll need to pass a VCF as a resource to CalculateGenotypePosteriors.
Hopefully that helps,
Laura
Please sign in to leave a comment.
1 comment