Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CombineGVCFs vs GenomicsDBImport for target sequencing data

0

1 comment

  • Avatar
    Gökalp Çelik

    Hi Karen Holm

    Does your organism of interest have contigs with length more than 2^29? If so my bad news is none of our tools can work with that long contigs unless you split your contigs into smaller parts which may not be something you desire. Also 35000 contigs seems way too much to handle at the same time for these tools. What we recommend is to import variants into separate import instances per contig or per contig parts. It will make the process run much faster if also run simultaneously. Additionally if your ploidy is more than 2 we recommend you to try to reduce the number of alleles per site to make importing run with sane duration and resources. Higher the ploidy values the slower and more resource needing imports you will have. Also don't forget to leave much memory for the native GenomicsDBImport library as it works outside of the Java Heap size and may fail if there is not enough memory spared for it. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk