HaplotypeCaller - Shutting down engine - Encountering a large genome
For long chromosomes (800Mb), HaplotypeCaller (GATK4) is used to detect mutations. When it runs to 536Mb, it always encounters HaplotypeCaller-Shutting down engine. the log as follows:
the command:
gatk HaplotypeCaller --emit-ref-confidence GVCF -R fa -dont-use-soft-clipped-bases -L chr1A -O out.vcf.gz
How can I solve this problem without splitting the chromosome?
-
Official comment
Hi Hanan Sela
We have a prototype CSI index reading functionality for HTSJDK and hopefully will get that reviewed and merged to GATK and other downstream tools shortly. For now glnexus could be the only way of joint genotyping your files.
Regards.
Comment actions -
zyw could you specify which GATK version you are using?
-
Hi zyw, it looks like we there is a size limit while creating the output variant index that is causing this issue. You can work around the problem by setting the option --create-output-variant-index to false with HaplotypeCaller. You should then be able to index your variant output separately with samtools.
I have created a github ticket so that we can improve the error message for cases in the future with long reference contigs. We would like to test any changes we make with your use case, would you be able to upload your reference in a folder following these instructions? Please let me know when it is there and the folder name. If it is a large hassle, we can find a workaround.
-
Thank you very much, Genevieve Brandt. By setting parameter --create-output-variant-index to false, this problem has been perfectly solved.
-
I have setted parameter --create-output-variant-index to false, but it always encounters HaplotypeCaller-Shutting down engine. the log as follows:
wheat gemome
-
It looks like HaplotypeCaller is running into an "ArrayIndexOutOfBoundsException" which likely indicates that there is an issue with the formatting or compatibility between your files. You will want to check your input files to make sure they are compatible and don't contain errors. You can check out this resource about mismatching reference files for some more information: https://gatk.broadinstitute.org/hc/en-us/articles/360035891131-Errors-about-input-files-having-missing-or-incompatible-contigs.
Kind regards,
Pamela
-
Hello
Can I use GVCF file generated with
--create-output-variant-index = false
in downstream applications such as GenomicsDBImport and GenotypeGVCFs?
Is SAMtools or BCFtools generated CSI of GVCF files is compatible with downstream applications such as GenomicsDBImport and GenotypeGVCFs? or these commands do not need an index
Thank you
Please sign in to leave a comment.
7 comments