HaplotypeCaller and GenotypeGVCFs
AnsweredHello!
This article https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels- says i should use HaplotypeCaller in GVCF mode and GenotypeGVCFs then, and this article https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels- advises to use HaplotypeCaller without GenotypeGVCFs. I tried the former (with one sample), and the result is similar to the result of HaplotypeCaller in non-GVCF mode, however it differs in some entries.
What is the difference between these two ways and in which cases should i use one or another? And what GenotypeGVCFs does at all? The manual page says "joint genotyping" but i have no idea what it means.
Thanks in advance.
-
If you have more than one sample, we recommend running HaplotypeCaller in GVCF mode and then GenotypeGVCFs. This is our joint genotyping method, we have a couple resources about what that means here and here. A quick run down is that HaplotypeCaller in GVCF mode outputs a GVCF, which contains information about all sites, not just sites with variation. GenotypeGVCF then uses the information at all sites and across all samples to be able to call variants that cannot be called if you only had the information from one sample. This can make a big difference depending on how many samples you have.
With just one sample, running HaplotypeCaller as normal is sufficient. It should get the same results as the sample run in GVCF mode then GenotypeGVCFs.
Let me know if you have further questions!
Please sign in to leave a comment.
1 comment