Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK4.0.3.0 GenotypeGVCFs - Could not open array genomicsdbarray

0

8 comments

  • Avatar
    Pamela Bretscher

    Hi HT,

    The error regarding opening the array file is expected with the older GATK version you are using. Additionally, you cannot combine different GATK versions due to changes to the MQ annotation which is why you are receiving this error. Would it be possible for you to upload your GATK version and start over with GenomicsDBImport? If it is not possible for you to do this, the GATK team could try to look into the problem to debug your specific issue. 

    In regards to the FAI index error, there should be a way for you to still use the old fasta data. Could you try deleting this file and re-indexing the fasta file using samtools? Please let me know if this does not answer your question. 

    Kind regards,

    Pamela

    1
    Comment actions Permalink
  • Avatar
    HT

    Hi Pamela,

    Thanks for your quick reply!

    I reindexing the fast file using samtools as you suggested. Now it works with GenotypeGVCFs in GTAK version 4.2.1.0 using old fasta data.

    I was wondering which is a good plan? Here are 3 scenarios below. Which can avoid making MQ annotation problems due to different GATK versions?

    1. start over from the BaseRecalibrator and ApplyBQSR step using the new GATK version, i.e. update the whole workflow.
    2. start over to recall gVCFs using the new GATK version in the HaplotypeCaller step. 
    3. start over with GenomicsDBImport and in the previous steps can use GATK 4.0.3.0.

    Previously, we already applied a workflow with GATK 4.0.3.0 to ~1000 WES samples. But the joint calling step was using CombineGVCFs. I want to continue using this GATK version and the two batches of samples can be combined properly. If none of the scenarios above works and I still want to use GATK version 4.0.3.0, could GATK teams kindly help to debug this specific issue?

    Thank you so much for any help you can provide!!

    All the Best, HT.

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi HT,

    I'm glad to hear that reindexing the file was successful! In regards to your workflow, any mixing of GATK versions would be prone to errors arising due to changes in annotations, calculations, and algorithms. I would suggest starting by using the newer GATK version from the earliest possible step in your workflow (i.e. scenario 1). This would minimize the possibility for errors. If this is not feasible for you or if you would like to continue using the older version, I can submit a Github ticket for the GATK to look into how you can do this. 

    Kind regards,

    Pamela

    1
    Comment actions Permalink
  • Avatar
    HT

    Hi Pamela,

    Understood, I should always use the same version of GATK. Thank you for your suggestions! They are really helpful!

    Could you please help to submit a Github ticket? I would indeed like to use GATK 4.0.3.0 as previously we already generated 1K gVCF files on this version. It saves a lot of time if GenomicsDBImport and GenotypeGVCFs also work on this older GATK version. I hope this post and ticket could help other users as well.

    Thank you again for your kind help Pamela!

    All the Best, HT

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi HT,

    I have created a Github ticket and will follow up with you if/when the GATK team determines a way for you to move forward with the older version. You can follow the progress of the ticket here.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi HT,

    The GATK team has been working on your issue and I received the following update today:

    "Our strong recommendation would be to upgrade the pipeline to a modern version of GATK, if at all possible. 4.0.3.0 is many years out of date at this point, and I'm not sure we'll be able to diagnose issues with the GenomicsDB version in use at that time. The user should also consider the many improvements and bug fixes to the HaplotypeCaller that have gone in since that version."

    It seems like there may be too many issues that you will run into using version 4.0.3.0 and it is very difficult for the team to pinpoint your initial error given how outdated the version is. Would it be possible for you to update your workflow and run the initial WES samples using the most recent GATK version? This would likely save you a lot of trouble downstream when using tools that have had significant bug fixes since version 4.0.3.0.

    Kind regards,

    Pamela

    1
    Comment actions Permalink
  • Avatar
    HT

    Hi Pamela,

    OK, I see. The latest workflow is indeed a better choice.

    Thank you again for your kind help!

    Best, HT

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi HT,

    I'm glad I could help and I apologize that there was not a better solution for continuing with your existing workflow. I hope that updating the workflow is successful and gives you better results.

    Best, Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk