Bug: Badly formed genome unclippedLoc
Hi,
I'm getting this error:
A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "combinedblacklist.intervallist" is not valid for this input
In Mutect2. The file is being passed to the -XL option. GATK v4.1.9.0.
The intervallist was made using BedToIntervalList using standard hg38 sequence dictionary.
-
Actually I figured out the problem above. I was inputting the file as a GCP (gs://) URL, and for some reason, -XL is not able to accept that. I also had to rename the file as combinedblacklist.intervals
However, now there is another error:
A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "@HD VN:1.6 SO:coordinate" is not valid for this input.
So there is still some problem.
-
Hi GE, could you please provide more information so that we can look into your question? We have an article describing what we need to provide support: https://gatk.broadinstitute.org/hc/en-us/articles/360053424571-How-to-Write-a-Post
-
Hi, I found the error. GATK is not correctly interpreting a file with the .intervals extension as an interval list format. Once I changed it to .interval_list it worked.
-
Hi GE, glad you found the issue. This sounds like a GATK bug and I will put in an issue ticket to get it changed. Could you please provide more info, the specific command and complete stack trace when you get the issue?
-
Tool was Mutect2, and the interval list file was passed to the -XL command line option. I don't have the full stack trace anymore available.
-
Ok, I will look into this further and let you know when I have updates.
-
Hi GE, could you specify what type of list your interval list was from these options? (A, B, C, or D)
https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists
I have also created an issue ticket for our team to look into this.
-
I made it with Picard BedToIntervalList, so it must be option A.
I would recommend GATK add auto-detection of format for intervals. interval_list, list, intervals as extensions can be confusing.
-
Here is the ticket, the developers will discuss further about the preferred behavior for the interval lists: https://github.com/broadinstitute/gatk/issues/7095
-
Hi GE, I spoke with the developers and we determined that the tool is consistent with the preferred behavior, which is to determine the interval list type from the extension of the file. We think this helps with confusion between interval list types.
I have updated the ticket so that we can improve the error messages when the wrong extension is used.
Best,
Genevieve
-
I'm getting a similar sounding issue with Mutect2, the interval list file was passed to the inputs.json for the mutect2.wdl
"Mutect2.split_intervals_extra_args": "-XL wgs_calling_regions.hg38.interval_list"
I downloaded this file form here:
I get the error:
A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "wgs_calling_regions.hg38.interval_list" is not valid for this input
-
Sheryl Are you sure that you want to exclude the entire callable part of the genome from analysis? That interval list is most often used as the intervals input.
If this is really what you intend to do, the extra_args parameters to the Mutect2 WDL are not intended for file inputs because the string literal you put in the json is not processed as an appropriate relative path when the command actually get run by the cromwell engine. And if you're running on the cloud, the engine won't know that the file needs to be localized. extra_args parameters are only for things like random numerical arguments and boolean flags.
One workaround is to use IntervalListTools with the -XL argument to produce the intervals input for the Mutect2 WDL.
Please sign in to leave a comment.
12 comments