Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ukb - Merging Y chromosome VCF file having different number of samples



  • Avatar
    Laura Gauthier

    Hi Shashank Shekhar Padhi,

    This has been an issue since the beginning of time and I can suggest a variety of workarounds.  If you're really intent on having a single VCF file, then you'll need to fill in no-calls `./.` for all the female samples.  We don't have a tool to do that, but it would be pretty quick to write a new GATK walker of your own if you know any Java.  (I don't suggest the awk route because there are many ways that handcrafted, artisanal VCFs go awry, not the lease of which is people forgetting to reindex.)

    That said, all of the statistical tests in a traditional GWAS are independent.  You could certainly run the Y chromosome separately. After that, depending on your desired next steps, you could keep the Y results separate or combine the GWAS-derived variant-p-value pairs in pandas or hail or similar.  At this point the genotype data is gone and the ploidy won't be an issue.


    Comment actions Permalink
  • Avatar
    Shashank Shekhar Padhi

    Yeah, it sounds logical. Thank you Laura Gauthier for your suggestions. 

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk