HaplotypeCaller - incompatible contigs (one scaffold only)
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.1.7.0
b) Exact command used:
gatk HaplotypeCaller -R final.fasta -I ${sampleName}_markdup.bam -O /${sampleName}.g.vcf.gz -ERC GVCF
- c) Entire error log:
A USER ERROR has occurred: Input files reference and reads have incompatible contigs: Found contigs with the same name but different lengths:
contig reference = 000295F_arrow_pilon_obj_pilon_fragment_4_debris / 20000
contig reads = 000295F_arrow_pilon_obj_pilon_fragment_4_debris / 26333.
A couple of years ago I used bcftools to do joint-calling for 100 WGS samples, aligned to a draft assembly (non-model organism).
I am trying to run single-sample GVCF calling using GATK’s HaplotypeCaller using the .bam files I generated two years ago, however I keep getting the error pasted above.
I understand what the error means and have read GATK’s documentation on this (https://gatk.broadinstitute.org/hc/en-us/articles/360035891131-Errors-about-input-files-having-missing-or-incompatible-contigs), but the suggested solution doesn’t help me.
I’ve checked the .bam files and I am absolutely certain I am using the version of the assembly used to align the sequences.
I did notice that this particular scaffold is the last one listed in the bam file (using samtools view -H).
Does anyone know what might be happening? I am confident that the fasta file hasn’t been renamed or changed. This file is in a directory that other people don’t have access to, and I haven’t been messing around in there (the last modified date is from when I received the assembly from a collaborator two years ago).
I’m happy to throw variants on this scaffold away, is there a way of calling the variants and just excluding this scaffold? I tried running:
gatk HaplotypeCaller -R final.fasta -I ${sampleName}_markdup.bam -XL 000295F_arrow_pilon_obj_pilon_fragment_4_debris -O /${sampleName}.g.vcf.gz -ERC GVCF
but still got the error.
Any help would be much appreciated!
-
Hi suzy_bunters,
Unfortunately it is not possible to get around this error message using intervals. One option could be using Picard tools to revert your bam file to reads and then realigning your reads with this file so that they match. Another option (if you don't mind loosing the contig) would be to use samtools to remove the contig from your BAM. There are various examples on other forums regarding that.
Sorry that there is not an easier solution!
Best,
Genevieve
-
Hi Genevieve Brandt (she/her),
Thanks so much for your reply.
I have a couple of follow up questions, if you have the time!
1) Do you know of any reason why this could have happened, other than the assembly file being altered?
2) I recently re-called these variants using the same assembly and BCFTools without any problems. I know you can't be expected to be an expert in BCFTools, but do you know why GATK can't overcome this problem when BCFTools can?
Kind regards,
Suzanne
-
Hi Suzanne,
I can provide some insight,
1) There is also a chance that your BAM file was changed, or there was an issue present 2 years ago that you were not aware of and did not address. Did you successfully run HaplotypeCaller 2 years ago?
2) Yes unfortunately I am not an expert on BCFTools, I do know however that GATK is sometimes more strict than other tools with these errors because we do not like to have anything pass a step and then cause issues later in the pipeline.
Best,
Genevieve
-
No, this is the first time I've tried using GATK. It's very possible there was an issue two years ago that I didn't address - I'd only just started working with WGS and wasn't accustomed to working with these file formats.
Not to worry, thank you so much for your help. At least I now know there's no simple solution!
Please sign in to leave a comment.
4 comments