Mutect errors
Hello,
I used Mutect2 Terra workflow to successfully call mutations in 232/241 samples.
Mutect2 version: mutect2:4.1.6.0
JobID: 45b78f7b-c418-47b5-bc64-f729602e6679
The 9 samples that fail produce one out of the two errors, both of which seem to be associated with bam files. Examples are below:
1) htsjdk.samtools.util.RuntimeIOException: example.bam has invalid uncompressedLength: -995523422
2)htsjdk.samtools.SAMFormatException: Invalid GZIP header example.bam
By googling around, I see that the issue in 1) and 2) may be to do with idx files. However, because .bam and .bam.bai files were all produced by Broad's GP over standardized processes, I'd be surprised if something went wrong there. Especially as 232 samples from the same study ran successfully. One option is to try re-indexing, but before I do, I would like to know:
1) Is it reasonable to assume that these are idxing problem, and if so, based on what?
2) If so, could the source of either one of these problems be the bam file itself?
3) How could I address the issues?
4) If I was to regenerate the idx file, can you please let me know about the GATK tools compatible with Mutect2 to do that? I found this recommendation (https://gatk.broadinstitute.org/hc/en-us/articles/360037057032-BamIndexStats-Picard-), but I do not know what is picard.jar, nor where to find it.
I'd really appreciate help on this.
Best wishes,
Mia
-
Hi MPetlj,
Thanks for reaching out. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.
- Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
- Click Save.
Let us know the workspace name, as well as the relevant submission and workflow IDs. We’ll be happy to take a closer look as soon as we can.
Best,
Samantha
-
Hi Samantha,
Thanks for reaching out.
Your team already has an access to this workspace, called htapp-project/661-Clonal hematopoiesis.
Please let me know if you can see this and if there is anything else
Thanks,
Mia
-
Hi MPetlj,
It looks like the workspace is protected by an authorization domain. Can you please add svelasqu@broadinstitute.org to the authorization domain?
Thanks,
Samantha -
All set, please let me know if you can see it.
Thanks,
Mia
-
Update from GP re. the bam with error that formed 1 out of 9 failed jobs under discussion ('PAIR_29'; workflow ID: 633568d2-1bb4-47be-a655-a5b78e04f08e )
Mutect2 error was:
Bam malformed, samtools error: 05246_CCPM_0300532_T1.bam java.lang.IllegalArgumentException: Reference name for '1279721472' not found in sequence dictionary.
GP said:
" I have run ValidateSam on the bam file, with no validation errors found. I then tried reverting it, with Sanitze=true (so the program would throw out any invalid reads) but it found no reads to remove. I urge you to re-try the sample and let us know if you still run into trouble."
So this does not seem to be an issue with the bam.... any clues ?
Another 8/9 failed jobs seem to be to do with .bai files, but it would be great to have this confirmed as per initial enquiry ?
Thanks!
Mia -
Hi MPetlj,
Sorry for the delayed response.
We think the
java.lang.IllegalArgumentException: Reference name for '1279721472' not found in sequence dictionary
error for the other failed job could possibly be a bug since GP successfully validated the BAM file but would like to confirm a couple things first:- Are the reference and dictionary files you are using the same version?
- Can you provide the exact GATK commands you used?
- Have you tried re-running the workflow on just this sample as GP suggested?
In regards to the other 8/9 failed jobs, based on the error it seems like an issue with the BAM files being malformed. Have you tried validating those BAM files yet? If not, can you validate them by following these instructions?
Best,
Samantha
-
Hi Samantha:
- Are the reference and dictionary files you are using the same version?
I am not sure what are you refering to, but I ran this as a Terra workflow, to which we gave you the access and jobIDs are in this thread - you could use those check the set ups.
- Can you provide the exact GATK commands you used?
I did not use the GATK commands, I used the Terra workflow - information and jobIDs are all on this thread & we gave you an access to have a look.
- Have you tried re-running the workflow on just this sample as GP suggested?
Yes, this failed multiple times
Thanks
Mia
-
Hi MPetlj,
Thanks for the information.
For the sample with the error message: "java.lang.IllegalArgumentException: Reference name for '1279721472' not found in sequence dictionary":1. How did you generate this file? Please share the commands that were used to align this to a reference so we can confirm you used the same reference that you are using in Terra, which is Homo_sapiens_assembly19.fasta
If you ran this step in Terra, please share with me which WDL, and your job IDs for that.2. Please share the complete stack trace from ValidateSamFile for this input BAM.
Thanks,
Samantha
Please sign in to leave a comment.
8 comments