GATK4.0.3.0 - CombineGVCFs - Unexpected base in allele bases
I have combined 400 gVCFs into two files (S2_200_F1.g.vcf.gz and S2_200_F2.g.vcf.gz), each one includes 200 samples. I am trying to but failed to combine these two files. The error info is Unexpected base in allele bases '*CG'.
I was wondering how do I fix this error by working on g.vcf.gz files, instead of going back to bam files as suggested by https://gatk.broadinstitute.org/hc/en-us/community/posts/360071826651-GATK4-1-3-0-HaplotypeCaller-ERROR ?
a) GATK version used: GATK 4.0.3.0
b) Exact command used:
time gatk --java-options "-Xmx90G" CombineGVCFs \
-R b37_human_g1k_v37_decoy.fasta \
--Variant S2_200_F1.g.vcf.gz \
--variant S2_200_F2.g.vcf.gz \
-O S2_all.g.vcf.gz
c) Entire error log:
12:26:50.512 INFO CombineGVCFs - Done initializing engine
12:27:20.968 INFO ProgressMeter - Starting traversal
12:27:20.968 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:28:37.418 INFO ProgressMeter - 1:16712 1.3 1000 784.8
12:29:33.168 INFO ProgressMeter - 1:24589 2.2 3000 1361.6
12:30:03.475 INFO ProgressMeter - 1:65581 2.7 5000 1846.1
12:30:15.035 INFO ProgressMeter - 1:69504 2.9 6000 2068.2
12:31:16.751 INFO ProgressMeter - 1:120843 3.9 7000 1781.3
12:31:27.904 INFO ProgressMeter - 1:639065 4.1 11000 2672.8
12:32:27.876 INFO ProgressMeter - 1:664208 5.1 13000 2541.5
12:33:04.896 INFO ProgressMeter - 1:703850 5.7 14000 2442.4
12:33:18.757 INFO ProgressMeter - 1:762047 6.0 18000 3018.5
12:33:19.283 INFO CombineGVCFs - Shutting down engine
[October 1, 2021 12:33:19 PM SGT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 6.58 minutes.
Runtime.totalMemory()=9969860608
java.lang.IllegalArgumentException: Unexpected base in allele bases '*CG'
at htsjdk.variant.variantcontext.Allele.<init>(Allele.java:165)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:239)
at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.extendAllele(ReferenceConfidenceVariantContextMerger.java:406)
at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.remapAlleles(ReferenceConfidenceVariantContextMerger.java:178)
at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:70)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.endPreviousStates(CombineGVCFs.java:340)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.createIntermediateVariants(CombineGVCFs.java:189)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.apply(CombineGVCFs.java:134)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:73)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:118)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Thank you!
-
Hi HT,
I would first recommend upgrading your GATK version because you are using a quite old version and this could be related to a bug that has since been resolved. We are currently on 4.2.2.0. We do support the '*' representing a spanning deletion. However, I'm not sure that it will be accepted when combined with the CG in one allele.
How did you create these VCF files?
Best,
Genevieve
-
Hi Genevieve,
Thank you for your speedy reply!
Noted that it was a bug in old versions. I followed the GATK best practice workflow and used HaplotypeCaller to create gVCF files. Then combined individual gVCF files together, 200 samples one time. There is no " Unexpected base in allele bases" error in this step. But when I tried to combine these multi-sample gVCF files, this error occurred.
If I would still like to use version 4.0.3.0, is there a way to fix that?
Thank you!!
Best,
HT
-
Hi HT,
Our best practices pipelines involves combining samples with GenomicsDBImport. Samples can be added incrementally with this tool, which would probably work a lot better for you.
I'm not sure why this error message came up in CombineGVCFs, but unfortunately we can't fix that tool unless you were going to be able to upgrade. For a workaround, you could try to find the *CG allele in your file after 1:762047 and then skip the site.
Best,
Genevieve
-
Hi Genevieve,
I understood. Thank you for your kind help!
Best,
HT
Please sign in to leave a comment.
4 comments