Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CombineGVCFs vs GenomicsDBImport in 1000+ WES

0

1 comment

  • Avatar
    SkyWarrior

    Hi Tamer Mansour

    1200 GVCF files immediately tells me that you need GenomicsDBImport to collect them together. This gives you a main advantage of ability to add more and more samples later on to your genomicsDB instance if you need. CombineGVCFs is a former approach before GenomicsDB was here. It is still valid and useful however maintaining a huge GVCF file can be an hassle and sometimes problematic. GenomicsDBImport was developed with this in mind and scalability. 

    You can accelerate your import by providing per contig imports to multiple instances so that your chromosomes are kept under seperate smaller DBs and can be Genotyped in parallel. By this way you may provide thread and bundle parameters as you see fit for your infrastructure. 

    Since all your GVCFs are collected with a bed file I don't think you need to provide additional bed file for the import function. But there is a requirement for a interval parameter so you may provide your contig names to import. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk