Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Joint genotyping as cohort size grows

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Asma Riyaz

    GenomicsDB accepts incremental updates to its variant storage therefore you can add more samples each time you have. Once you update your GenomicsDB you may regenotype to get a new updated set of variants containing all your samples. Keep in mind that, as the number of samples increase your need for GenomicsDB storage and time and compute resources will increase as well. In order to keep resources in check you may want to use our new feature of ReblockGVCF tool which reduces the amount of storage needed to contain your variants by removing less confident calls and merging reference confidence blocks into lesser distinct levels.

    I hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Asma Riyaz

    Hello again,

    Does the following command look alright to you (consider this is the second time I am running the pipeline in order to add samples for joint genotyping), here previous_db is the DB generated the first time the pipeline was run for samples 1 to 50.

        gatk --java-options "-Xmx4g -Xms4g" GenomicsDBImport \
          -V data/gvcfs/sample51.g.vcf.gz \
          -V data/gvcfs/sample52.g.vcf.gz \
          -V previous_db \
          --genomicsdb-workspace-path my_database \
          --tmp-dir=/path/to/large/tmp \
          -L 20

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    GenomicsDBImport has a different parameter in case of incremental updates. You just need to use the parameter 

    --genomicsdb-update-workspace-path

    and give the genomicsdb path that you have to this parameter when you are performing incremental updates. 

    One thing to note that once you do this each time a new increment is done a new subdirectory will be formed under genomicsdbimport folder therefore to make genotyping faster and more convenient we have another parameter called

    --consolidate

    to prevent too many fragments to occur in the imported collection. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk