Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Bug: Badly formed genome unclippedLoc

0

12 comments

  • Avatar
    GE

    Actually I figured out the problem above. I was inputting the file as a GCP (gs://) URL, and for some reason, -XL is not able to accept that. I also had to rename the file as combinedblacklist.intervals

    However, now there is another error:

    A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "@HD VN:1.6 SO:coordinate" is not valid for this input.

    So there is still some problem.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi GE, could you please provide more information so that we can look into your question? We have an article describing what we need to provide support: https://gatk.broadinstitute.org/hc/en-us/articles/360053424571-How-to-Write-a-Post

    0
    Comment actions Permalink
  • Avatar
    GE

    Hi, I found the error. GATK is not correctly interpreting a file with the .intervals extension as an interval list format. Once I changed it to .interval_list it worked.

    1
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi GE, glad you found the issue. This sounds like a GATK bug and I will put in an issue ticket to get it changed. Could you please provide more info, the specific command and complete stack trace when you get the issue?

    0
    Comment actions Permalink
  • Avatar
    GE

    Tool was Mutect2, and the interval list file was passed to the -XL command line option. I don't have the full stack trace anymore available.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Ok, I will look into this further and let you know when I have updates.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi GE, could you specify what type of list your interval list was from these options? (A, B, C, or D)

    https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

    I have also created an issue ticket for our team to look into this.

    0
    Comment actions Permalink
  • Avatar
    GE

    I made it with Picard BedToIntervalList, so it must be option A.

    I would recommend GATK add auto-detection of format for intervals. interval_list, list, intervals as extensions can be confusing.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Here is the ticket, the developers will discuss further about the preferred behavior for the interval lists: https://github.com/broadinstitute/gatk/issues/7095

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi GE, I spoke with the developers and we determined that the tool is consistent with the preferred behavior, which is to determine the interval list type from the extension of the file. We think this helps with confusion between interval list types.

    I have updated the ticket so that we can improve the error messages when the wrong extension is used.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Sheryl

    I'm getting a similar sounding issue with Mutect2, the interval list file was passed to the inputs.json for the mutect2.wdl

      "Mutect2.split_intervals_extra_args": "-XL wgs_calling_regions.hg38.interval_list"

    I downloaded this file form here:

    https://storage.googleapis.com/genomics-public-data/references/hg38/v0/wgs_calling_regions.hg38.interval_list

    I get the error:

    A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "wgs_calling_regions.hg38.interval_list" is not valid for this input

     

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    Sheryl Are you sure that you want to exclude the entire callable part of the genome from analysis?  That interval list is most often used as the intervals input.

    If this is really what you intend to do, the extra_args parameters to the Mutect2 WDL are not intended for file inputs because the string literal you put in the json is not processed as an appropriate relative path when the command actually get run by the cromwell engine.  And if you're running on the cloud, the engine won't know that the file needs to be localized.  extra_args parameters are only for things like random numerical arguments and boolean flags.

    One workaround is to use IntervalListTools with the -XL argument to produce the intervals input for the Mutect2 WDL.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk