Haplotype caller error; java.lang.IllegalStateException: Graph must have ref source and sink vertices
AnsweredHello!
I am using GATK 4.1.7.0 to call SNPs per-sample but have run into an issue I don't know how to resolve.
I have several haploid samples and the Haplotype caller's run to produce GVCFs doesn't seem to run to completion on any of the samples and therefore I am not sure if the resulting gvcf's are fully complete.
My slurm scripts produce a 'run-fail' message and the slurm files show that the algorithm runs up to a certain contig number then it terminates. It is different contigs for each bam file so I'm not able to pinpoint one exact contig/location that causes the algorithm to fail. See the snippet here as an example:
15:40:38.652 INFO ProgressMeter - contig009842:1 172.0 272100 1582.1
15:40:42.574 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:44.825 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:44.826 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:45.484 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:45.484 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:45.484 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:48.797 INFO ProgressMeter - contig009939:98 172.2 272370 1582.1
15:40:54.442 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:55.434 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 6.058770996000001
15:40:55.434 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 5220.671827919
15:40:55.434 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 2564.86 sec
15:40:55.435 INFO HaplotypeCaller - Shutting down engine
[August 10, 2020 3:40:55 PM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 172.29 minutes.
Runtime.totalMemory()=24984944640
java.lang.IllegalStateException: Graph must have ref source and sink vertices
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:500)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:665)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:643)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:534)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assembleKmerGraphsAndHaplotypeCall(ReadThreadingAssembler.java:181)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:146)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:270)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Using GATK jar /crex/proj/uppstore2017083/bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /crex/proj/bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar H$
The exact command I used for the run was:
/gatk-4.1.7.0/gatk --java-options "-Xmx80G" HaplotypeCaller -R CC_combined_no_repeats.FIXD4GATK.fasta -ploidy 1 -I CC10_bwa.mdsRG.bam -O CC10.g.vcf - ERC GVCF --annotation AlleleFraction
I have checked my bam files using ValidateSamFile and they are all error-free.
My ref genome fasta file is a draft assembled whole genome (with repeat sequences removed) and the contigs are of different lengths (biggest has length=196571 and smallest has 1). I don't know if this might have anything to do with this error perhaps, or not.
However, another separate snp calling run with a slightly different version of the assembly (the same assembly as above but with the repeats still in place) appears to run to completion just okay, even though their alternate allele frequency spectrum looks a bit strange considering I am working with a haploid (non-model) organism.
None of these two versions of the whole genome draft assemblies are scaffolded.
I would highly appreciate suggestions on how to resolve this issue as I am stuck at the moment.
Thanks in advance!
-
New_gatk_user could you please post the entire error log?
-
The Haplotype caller runs successfully through many contigs so I'm not sure if I can paste the entire log here. However, I paste the last few lines just before the point where it looks like the algorithm terminates, as well as the error it gives at that point;
05:24:10.865 INFO ProgressMeter - contig007053:494 195.3 258390 1323.3
05:24:14.721 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:14.721 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:17.474 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:17.634 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:18.358 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:18.359 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:18.359 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:18.359 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:18.401 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:18.647 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:19.927 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:19.927 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:19.927 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:25.555 INFO ProgressMeter - contig007098:755 195.5 258620 1322.8
05:24:26.955 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:26.956 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:29.569 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:29.570 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:24:31.278 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 7.139873477
05:24:31.278 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 6470.669570967
05:24:31.278 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 2306.76 sec
05:24:31.278 INFO HaplotypeCaller - Shutting down engine
[August 10, 2020 5:24:31 AM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 195.64 minutes.
Runtime.totalMemory()=29464461312
java.lang.IllegalStateException: Graph must have ref source and sink vertices
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:500)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:665)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:643)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:534)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assembleKmerGraphsAndHaplotypeCall(ReadThreadingAssembler.java:181)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:146)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:270)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Using GATK jar /bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar HaplotypeCaller -R CC_combined_no_repeats.FIXD4GATK.fasta -ploidy 1 -I CC1_bwa.mdsRG.bam -O CC1.g.vcf -ERC GVCF --annotation AlleleFraction
Another one another different sample;
05:53:05.263 INFO ProgressMeter - contig009844:550 202.3 239820 1185.5
05:53:11.912 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:53:11.912 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:53:17.076 INFO ProgressMeter - contig009943:1 202.5 240060 1185.5
05:53:18.191 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
05:53:24.490 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 7.68426567
05:53:24.491 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 7678.790590590001
05:53:24.491 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 1562.25 sec
05:53:24.491 INFO HaplotypeCaller - Shutting down engine
[August 10, 2020 5:53:24 AM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 202.65 minutes.
Runtime.totalMemory()=4958715904
java.lang.IllegalStateException: Graph must have ref source and sink vertices
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:500)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:665)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:643)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:534)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assembleKmerGraphsAndHaplotypeCall(ReadThreadingAssembler.java:181)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:146)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:270)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Using GATK jar /bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar HaplotypeCaller -R CC_combined_no_repeats.FIXD4GATK.fasta -ploidy 1 -I CC11_bwa.mdsRG.bam -O CC11.g.vcf -ERC GVCF --annotation AlleleFraction
I have a total of 24 samples so it looks like this in all, just seems to terminate at different points.
-
Hi New_gatk_user, this issue has been discussed on our old forum. I found the links where it is discussed, and I think they may help you figure out what is going on. It looks like HaplotypeCaller has a hard time with small contigs, which may be your issue. Some solutions would be to generate artificial scaffolds or exclude those contigs that are causing issues.
- https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2016-08-11-2016-04-07/7644-ERROR-MESSAGE-Graph-must-have-ref-source-and-sink-vertices
- https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2017-06-18-2017-01-18/8877-Haplotype-Caller-Error-Graph-must-have-ref-source-and-sink-vertices
- https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2014-03-21-2013-10-14/3452-Error-Graph-must-have-ref-source-and-sink-vertices
-
HI Genevieve,
Thank you for the links, much appreciated.
However, the very last comment by Geraldine on this discussion (https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2016-08-11-2016-04-07/7644-ERROR-MESSAGE-Graph-must-have-ref-source-and-sink-vertices) has caught my attention because I'm using draft genomes assembled from WGS reads:
" ...GATK is not designed to run on draft genomes..."
Does this mean that any results I get from GATK would not be valid?
I especially ask this because, on my full version of my draft genome(with repeats intact), the alternate allele frequency spectrum looks a bit odd for a haploid organism, so I'm wondering whether this might be because "GATK is not designed to run on draft genomes".
Kindly clarify this for me.
Much appreciated!
-
Hi New_gatk_user, the GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. You can find more clarification about this within our documentation, I'll just point you to this link for some initial information.
For questions such as this one, we are building a backlog to work through when we have the capacity. Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.
For context, check out our support policy.
-
Many thanks again for pointing out some helpful discussion links.
I'll continue to look for ways to resolve.
Best.
Please sign in to leave a comment.
6 comments