I would like to perform joint variant calling on female and male human samples, where I am interested in the sex chromosomes as well. In order to handle the different ploidy of the male sex chromosomes, I split the male BAM files into autosomal and sex chromosomal regions. Following the GATK best practices, I generated genomic VCFs for the female samples and the autosomal male samples with default ploidy -2, while I performed this step for the male sex chromosomal regions with ploidy -1. The next steps would be to consolidate the gVCF files by GenomicsDBImport, and then generate a joint VCF by applying the GenotypeGVCFs function.
My question would be regarding how to handle the male samples after splitting them and generating individual gVCFs for autosomal and sex chromosomes.
Is it essential to merge/concatenate them back into one gVCF before running the GenotypeGVCF function? If yes, how would you suggest performing the joining in the correct way?
Without merging them back, I wonder if the fact that 2 gVCFs (autosomes + sex chromosomes) belong to a specific male sample would be passed through the header information during joint genotyping. Does this have any effect at all on the results when joint genotyping?
Alternatively - which would also be the easier way maybe - I can just simply skip merging the gVCFs for males and proceed with the GenotypeGVCF. I wonder what the resulting genotypes would look like for the males in this case.
Any advantages or disadvantages of joining (or not joining) male autosomal and sex chromosomal gVCFs back before joint genotyping?
After genotyping, I am planning to filter variants and then annotate them with AnnoVar.
Thank you very much for your help, any comment is appreciated.
Please sign in to leave a comment.