does GATK offers to match alleles in two vcf files?
Hi GATK, I have done the steps, i.e., 1)haplotypecaller 2)combineGVCF 3) GenotypeGVCF. Now I am interested in performing the allele matching between two samples, like whether the alleles are the same or different at a particular chromosomal location. Does GATK offer to perform this test? if yes, then which function in GATK will be preferable? Looking forward
-
Hi,
I think this depends a little bit on what you're trying to do, and what you hope to get out of "matching alleles." Here are two use cases I can imagine:
1. If you want a cohort of many samples and want to tell whether one sample shares a mutation with another, then it sounds like you've already run the correct tools to create the resulting VCF. The output from GenotypeGVCF should be a cohort VCF which contains a list of variants, along with genotype information for all the samples used to create it. In this way, you can visit the site of your variant in question, and inspect the sample genotypes to see if your samples both have your variant or not. You should be careful to also inspect the quality scores (e.g. GQ fields, etc) to ensure the information you're observing was obtained with high confidence from the pipelines, i.e. filter your results carefully.
2. If you are trying to instead benchmark a pipeline and have a sample VCF from a well-known cell line with truth data (e.g. NIST Genomes in a Bottle) you should compare your VCF to the "truth" VCF instead using a comparison tool like RTG's vcfeval, which will reconstruct haplotypes given the VCF information and compare them to ensure a fair comparison independent of variant representation. This is only applicable in the case where you have a VCF of confident calls for your sample to try to calibrate your pipeline, and not for performing novel studies.
Hope one of these helps!
Ricky
Please sign in to leave a comment.
1 comment