Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

HaplotypeCaller - Shutting down engine - Encountering a large genome

0

7 comments

  • Official comment
    Avatar
    Gökalp Çelik

    Hi Hanan Sela

    We have a prototype CSI index reading functionality for HTSJDK and hopefully will get that reviewed and merged to GATK and other downstream tools shortly. For now glnexus could be the only way of joint genotyping your files.

    Regards. 

    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    zyw could you specify which GATK version you are using?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi zyw, it looks like we there is a size limit while creating the output variant index that is causing this issue. You can work around the problem by setting the option --create-output-variant-index to false with HaplotypeCaller. You should then be able to index your variant output separately with samtools. 

    I have created a github ticket so that we can improve the error message for cases in the future with long reference contigs. We would like to test any changes we make with your use case, would you be able to upload your reference in a folder following these instructions? Please let me know when it is there and the folder name. If it is a large hassle, we can find a workaround.

    0
    Comment actions Permalink
  • Avatar
    zyw

    Thank you very much,  Genevieve Brandt. By setting parameter --create-output-variant-index to false, this problem has been perfectly solved.

    0
    Comment actions Permalink
  • Avatar
    omicsgene omicsgene

     

    I have setted parameter --create-output-variant-index to false, but it always encounters HaplotypeCaller-Shutting down engine. the log as follows:  

    wheat gemome 

     

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi omicsgene omicsgene,

    It looks like HaplotypeCaller is running into an "ArrayIndexOutOfBoundsException" which likely indicates that there is an issue with the formatting or compatibility between your files. You will want to check your input files to make sure they are compatible and don't contain errors. You can check out this resource about mismatching reference files for some more information: https://gatk.broadinstitute.org/hc/en-us/articles/360035891131-Errors-about-input-files-having-missing-or-incompatible-contigs.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Hanan Sela

    Hello 

    Can I use GVCF file generated with

    --create-output-variant-index = false

    in downstream applications such as GenomicsDBImport and GenotypeGVCFs? 

    Is  SAMtools  or BCFtools generated CSI of GVCF files is compatible with downstream applications such as GenomicsDBImport and GenotypeGVCFs? or these commands do not need an index

    Thank you

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk