Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

VariantEval IndexOutOfBoundsException

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Samantha Zarate, I just wanted to let you know we are looking into this issue. We don't have any updates as of now but we'll get back to you as soon as possible.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Samantha Zarate,

    We are thinking this is related to the data being on the X chromosome and the tool not being able to access the Y chromosome data. VariantEval has underlying assumptions about the data being diploid so the sex chromosomes cause issues with that.

    Could you try running with the X and Y together and see if that helps?

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Samantha Zarate

    Hi Genevieve Brandt (she/her), thanks for the insight! How would you recommend combining X and Y population-scale VCF files, given that the samples are not the same between the two (not every sample with an X has a Y)?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Samantha Zarate, we are looking into solutions. Regarding this note from you:

    "This error only occurs for the chromosome X file, and it only occurs with this FASTA file (GRCh38 on chrX does not cause this issue)."

    It looks like you are using a chromosome Y fasta file, but your VCF is named chr X. Are you trying to evaluate variants on chromosme Y using a combined X and Y file with Y as the reference? If not, could you provide clarifications for what you are running?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Samantha Zarate

    Hi Genevieve Brandt (she/her), let me clarify what I mean. I'm comparing VCFs generated using two reference genomes, GRCh38 and CHM13. CHM13, as an organism, doesn't have a Y chromosome, so it's been taken from GRCh38 (hence the `withGRCh38chrY` part of the FASTA name). However, I have confirmed that this issue does in fact occur with the VCFs aligned to GRCh38, so I think your comment about ploidy is correct.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the clarification. Unfortunately I don't think there is a workaround possible at this point to make VariantEval usable on the X chromosome. I have created a ticket on github so that the developers can add this functionality in the future: https://github.com/broadinstitute/gatk/issues/7304. You can follow along there as our team works on it!

    Thanks for writing in and bringing up this issue! I'm sorry we were not able to solve this issue for you at this time.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk