ukb - Merging Y chromosome VCF file having different number of samples
AnsweredHello everyone,
I am new to GWAS analysis and am using the ukb-imputed WGS data. I have converted the BGEN files of all the chromosomes to VCF format and now wanted to merge all the vcf files.
Was trying to merge using bcftools "concat" but in the case of Y chromosome it has only male samples which throws an error: "unequal number of samples" while trying to merge.
I looked at the GATK documentation for MergeVCFs (https://gatk.broadinstitute.org/hc/en-us/articles/360037226612-MergeVcfs-Picard-) and didn't find any such mentions of merging vcfs with uneven samples numbers.
I would be grateful if anyone can kindly help me find a way to merge Y chromosome vcf file with other vcf files using GATK or any other possible way.
Thank you.
-
This has been an issue since the beginning of time and I can suggest a variety of workarounds. If you're really intent on having a single VCF file, then you'll need to fill in no-calls `./.` for all the female samples. We don't have a tool to do that, but it would be pretty quick to write a new GATK walker of your own if you know any Java. (I don't suggest the awk route because there are many ways that handcrafted, artisanal VCFs go awry, not the lease of which is people forgetting to reindex.)
That said, all of the statistical tests in a traditional GWAS are independent. You could certainly run the Y chromosome separately. After that, depending on your desired next steps, you could keep the Y results separate or combine the GWAS-derived variant-p-value pairs in pandas or hail or similar. At this point the genotype data is gone and the ploidy won't be an issue.
-Laura
-
Yeah, it sounds logical. Thank you Laura Gauthier for your suggestions.
Please sign in to leave a comment.
2 comments