Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenomicsDBImport Follow

1 comment

  • Avatar
    jfarrell

    To use the --bypass-feature-reader option for GenomicsDBImport, the documentation  indicates the VCF must be normalized, block compressed and indexed.   

    Use htslib to read input VCFs instead of GATK's FeatureReader. This will reduce memory usage and potentially speed up the import. Lower memory requirements may also enable parallelism through max-num-intervals-to-import-in-parallel. To enable this option, VCFs must be normalized, block-compressed and indexed.

    Is that format different than the output from ReblockGVCF? If so, what is the best way to normalized the gvcf file?

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk