GATK4 HaplotypeCaller ERROR : expected haplotypes.size() >= eventsAtThisLoc.size() + 1
Hi,
I was trying to call SNPs with HaplotypeCaller (GATK4.0.5.1) from multiple bam files (low coverage ancient human DNA). I used emit all sites, genotype given alleles, min mapping quality, and min base quality arguments.
Here is my command :
java -Xmx25g -jar .../gatk-package-4.0.5.1-local.jar HaplotypeCaller -I Bam.list -R .../hs37d5.fa -L .../1000G.bed --output-mode EMIT_ALL_SITES --genotyping-mode GENOTYPE_GIVEN_ALLELES --alleles .../1000genome.phase3.v5a.ALL.beagle.vcf.gz --output xxx.vcf --minimum-mapping-quality 30 --min-base-quality-score 30
I got ERROR like this :
java.lang.IllegalArgumentException: expected haplotypes.size() >= eventsAtThisLoc.size() + 1
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:722)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerGenotypingEngine.createAlleleMapper(AssemblyBasedCallerGenotypingEngine.java:158)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:143)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:596)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:248)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:295)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:271)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:994)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Any suggestions ?
Thank you in advance.
-
Hi,
You are using a very old version of GATK. Could you maybe upgrade to the latest one and try again. Here are the latest versions: https://github.com/broadinstitute/gatk/releases
I suspect this is solved in the more recent releases.
-
Thanks a lot , but I've already used GATK4.1 with same arguments(except min base and mapping quality) for another bam file. Here is my command:
java "-Xmx25g" -jar ..../gatk HaplotypeCaller -I xxx.bam -R hs37d5.fa -L ..../1000G_chr21.bed --output-mode EMIT_ALL_SITES --genotyping-mode GENOTYPE_GIVEN_ALLELES --alleles ALL.chr21.phase3.vcf .vcf --output kkk.vcf
Again , there was another error :
badly formed variant context at location 21:10945870; getEnd() was 10945870 but this VariantContext contains an END key with value 10990763
What could be the reason for that ?
-
4.1 is pretty old by now, too. There have been several improvements to HaplotypeCaller's force-calling mode since then. Note that if you use the latest release you need only specify "-alleles ____.vcf" and not "-genotyping-mode GENOTYPE_GIVEN_ALLELES".
However, it seems that there might just be something wrong with the VCF, if the error message is to be believed. Could you paste the VCF record at 21:10945870 from ALL.chr21.phase3.vcf?
-
Thank you. I searched it in the ALL.chr21.phase3. vcf file ,but it couldn't find that position in chr21.
Actually, I tried same command and same input bam file with GATK3 UnifiedGenotyper tool, but there wasn't any error like this and it worked well.
That's why I don't get it what is going on here when I'm using HaplotypeCaller.
-
I would still try GATK 4.1.7, but what's the stack trace in 4.1?
-
I'm sorry I don't have stack trace anymore for 4.1, but I tried with the latest version as you said.
I tried SNP calling from same multiple bam files (low coverage human DNA samples with all the chromosomes) with GATK 4.1.7 HaplotypeCaller. I used emit all active sites, alleles, min mapping quality ,and min base quality arguments.
Here is my command :
java -Xmx25g -jar .../gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar HaplotypeCaller -I Bam.list -R .../hs37d5.fa -L ../1000G.bed --output-mode EMIT_ALL_ACTIVE_SITES --alleles .../1000genome.phase3.vcf.gz --output xxx.vcf --minimum-mapping-quality 30 --min-base-quality-score 30
Here is the error I got :
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1927)
at org.broadinstitute.hellbender.tools.walkers.annotator.TandemRepeat.getNumTandemRepeatUnits(TandemRepeat.java:54)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyRegionTrimmer.trim(AssemblyRegionTrimmer.java:175)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:552)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Could you also please give a suggestion for that ?
-
Şevval Aktürk That looks like a bug in the TandemRepeat code. I'm surprised we have never seen it before. We will have to fix it.
-
Thank you so much for the reply.
-
Şevval Aktürk This PR: https://github.com/broadinstitute/gatk/pull/6583 should fix the bug. Thank you for finding and reporting it!
I have put a patched version of 4.1.7 in a public google bucket:
gs://broad-dsde-methods-davidben/gatk-builds/gatk-4.1.7-tandem-repeat-patch.jar
Do not hesitate to let me know if this fails.
-
I use GATK 4.1.9.0 and seems the issue happened again. Although I'm not using HC, instead I used LeftAlignAndTrimVariants.
Here is the error log:
htsjdk.tribble.TribbleException: Badly formed variant context at location chrM:3745; getEnd() was 3834 but this VariantContext contains an END key with value 3836
at htsjdk.variant.variantcontext.VariantContext.validateStop(VariantContext.java:1401)
at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1383)
at htsjdk.variant.variantcontext.VariantContext.<init>(VariantContext.java:489)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:647)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:638)
at org.broadinstitute.hellbender.tools.walkers.variantutils.LeftAlignAndTrimVariants.leftAlignAndTrim(LeftAlignAndTrimVariants.java:333)
at org.broadinstitute.hellbender.tools.walkers.variantutils.LeftAlignAndTrimVariants.apply(LeftAlignAndTrimVariants.java:244)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)I confirmed there is no variant record at chrM:3745, there is only a record at chrM:3747. And its END key value in INFO is 3836.
Here is the vcf record:
chrM 3747 . CCATCATTCTACTATCAACATTACTAATAAGTGGCTCCTTTAACCTCTCCACCCTTATCACAACACAAGAACACCTCTGATTACTCCTGC C . PASS END=3836;HOMLEN=6;HOMSEQ=CATCAT;SVLEN=-89;SVTYPE=DEL GT:AD 0/0:2711,1
-
I can provide VCF to you if needed.
-
Hi Yangyxt,
I am not sure the bug fix was for the issue you have posted above, it looks like it fixed an issue in the TandemRepeat code, whereas your issue is with the VariantContext part of the tool.
Could you post more details, including your complete command line?
Şevval Aktürk were there any fixes you made to resolve the VariantContext issue?
Thank you,
Genevieve
Please sign in to leave a comment.
12 comments