joint call samples with different ploidy on chrX & chrY (males vs females)
AnsweredHi there,
I'm using GATK4.2 to run germline variant discovery (SNV+small indel) for ~800 WES samples. I went through autosomal chr well, but had a question regarding joint variant call for chrX & chrY. In generating gVCF (haplotyecaller), I set different ploidy for male and female samples --- for chrX, female ploidy=2 & male ploidy=2 if on PAR region & male ploidy=1 if on non-PAR regions. I skipped chrY for female samples, and set ploidy=1 for male samples. I thus got region specific & sample specific gVCF files (e.g. chr1-sample-gvcf). The next step is to put these gVCFs together by either combineGVCF or genomicsdbimport. I had no problem with autosomal chr. However, for chrX & chrY, should I put male and female samples all together or should I put them separately? Similarity, in the genotypeGVCF & VQSR, should I treat male and female samples all together or should I do separately by sex? In VQSR, since all the autosomal chr are in ploidy=2 but sex chr have various ploidy, should I do VQSR separately for autosomal chr and sex chr? However, I was told I should use the entire genome for VQSR.... Could you please give any suggestion on how to deal with sex chr in gatk variant discovery?
Thank you!
-
Okay.... I did some research and found combinegVCF & genotypeGVCF can deal with samples with mixed ploidy (https://gatk.broadinstitute.org/hc/en-us/articles/360035889691-Does-GATK-work-on-non-diploid-organisms-) However, VQSR has to be applied for sites with same ploidy. That means I can do VQSR for all sample autosomal chr & female chrX altogether (n=2), but need to do separate VQSR for male chrY & chrX-nonPAR regions (n=1). Then it comes to the question - would VQSR perform well with few variants on chrY & chrX-nonPAR only? I remember VQSR requires plenty variants to build the model otherwise it would not converge... For the chrY & chrX-nonPAR of male samples, should I use VQSR or should I use hardfiltering? From your experience, which would perform better?
-
Hi Mingzhou Fu,
There have been some forum discussions regarding this topic that would help you get some perspective. We don't have any general best practices recommendations for the X and Y chromosomes, but you should be able to combine the X and Y with the rest of the genome for genotyping and joint genotype.
Here are the links:
- https://gatk.broadinstitute.org/hc/en-us/community/posts/360077511272-CombineGVCF-on-chrX-and-chrY-hg38-running-for-ever
- https://gatk.broadinstitute.org/hc/en-us/community/posts/360057847692-Calling-variants-on-chromosome-Y
Best,
Genevieve
-
I confirmed with my colleagues that CombineGVCFs and GenomicsDBImport can handle the samples with different ploidies for X and Y. You should definitely run VQSR together, since you have WES samples you won't have enough data to build the model for each chromosome on its own.
Please sign in to leave a comment.
3 comments