Joint Call on individuals sequenced with differing coverage
Hello gatk community forum!
I have two sets of sequencing data for the same species that I am interested in combining in one joint call. I have carried out a variant call on each sample individually using HaplotypeCaller and am now hoping to carry out a joint call using GenotypeGVCFs to combine all samples into one VCF file. However, one of my datasets of around 300 samples was sequenced with 50X coverage, whilst the other dataset with about 2000 samples was sequenced with 20X coverage. How do I carry out a joint call to combine datasets of differing coverage? Downstream I will be performing a GEA, trying to associate genetic differences with the environment, so it is important that variation doesn't arise from differing depths causing segregation in the dataset. Is there a way to account for this within gatk, or do I need to downsize the 50X samples at the .bam stage? Any help would be greatly appreciated. Thanks!
-
Hi Phoebe
I see that you have same topic under 2 different categories therefore I will be closing the other one and let the discussion continue here.
-
Hi Phoebe
We've done this in the past in the form of exomes plus ~20X whole genomes and it worked pretty well. Most of our annotations for filtering are built so that they compare the distribution of reference reads to the distribution of alternate allele reads and don't depend on depth so much. You do need to make sure you exclude DP from the filtering annotations, as we recommend for exomes, otherwise you might introduce bias (likely towards the higher covered samples).
Please sign in to leave a comment.
2 comments