Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Haplotype caller error; java.lang.IllegalStateException: Graph must have ref source and sink vertices

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    New_gatk_user could you please post the entire error log?

    0
    Comment actions Permalink
  • Avatar
    New_gatk_user

    The Haplotype caller runs successfully through many contigs so I'm not sure if I can paste the entire log here. However, I paste the last few lines just before the point where it looks like the algorithm terminates, as well as the error it gives at that point;

    05:24:10.865 INFO  ProgressMeter -     contig007053:494            195.3                258390           1323.3

    05:24:14.721 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:14.721 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:17.474 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:17.634 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:18.358 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:18.359 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:18.359 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:18.359 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:18.401 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:18.647 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:19.927 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:19.927 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:19.927 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:25.555 INFO  ProgressMeter -     contig007098:755            195.5                258620           1322.8

    05:24:26.955 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:26.956 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:29.569 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:29.570 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:24:31.278 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 7.139873477

    05:24:31.278 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 6470.669570967

    05:24:31.278 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 2306.76 sec

    05:24:31.278 INFO  HaplotypeCaller - Shutting down engine

    [August 10, 2020 5:24:31 AM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 195.64 minutes.

    Runtime.totalMemory()=29464461312

    java.lang.IllegalStateException: Graph must have ref source and sink vertices

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:500)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:665)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:643)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:534)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assembleKmerGraphsAndHaplotypeCall(ReadThreadingAssembler.java:181)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:146)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:270)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)

            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)

            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)

            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)

            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)

            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)

            at org.broadinstitute.hellbender.Main.main(Main.java:292)

    Using GATK jar /bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar HaplotypeCaller -R CC_combined_no_repeats.FIXD4GATK.fasta -ploidy 1 -I CC1_bwa.mdsRG.bam -O CC1.g.vcf -ERC GVCF --annotation AlleleFraction

     

     

    Another one another different sample;

     

    05:53:05.263 INFO  ProgressMeter -     contig009844:550            202.3                239820           1185.5

    05:53:11.912 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:53:11.912 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:53:17.076 INFO  ProgressMeter - contig009943:1            202.5                240060           1185.5

    05:53:18.191 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    05:53:24.490 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 7.68426567

    05:53:24.491 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 7678.790590590001

    05:53:24.491 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 1562.25 sec

    05:53:24.491 INFO  HaplotypeCaller - Shutting down engine

    [August 10, 2020 5:53:24 AM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 202.65 minutes.

    Runtime.totalMemory()=4958715904

    java.lang.IllegalStateException: Graph must have ref source and sink vertices

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:500)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:665)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:643)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:534)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assembleKmerGraphsAndHaplotypeCall(ReadThreadingAssembler.java:181)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:146)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:270)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)

            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)

            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)

            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)

            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)

            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)

            at org.broadinstitute.hellbender.Main.main(Main.java:292)

    Using GATK jar /bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar

    Running:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar HaplotypeCaller -R CC_combined_no_repeats.FIXD4GATK.fasta -ploidy 1 -I CC11_bwa.mdsRG.bam -O CC11.g.vcf -ERC GVCF --annotation AlleleFraction

     

     

    I have a total of 24 samples so it looks like this in all, just seems to terminate at different points.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi New_gatk_user, this issue has been discussed on our old forum. I found the links where it is discussed, and I think they may help you figure out what is going on. It looks like HaplotypeCaller has a hard time with small contigs, which may be your issue. Some solutions would be to generate artificial scaffolds or exclude those contigs that are causing issues.

    0
    Comment actions Permalink
  • Avatar
    New_gatk_user

    HI Genevieve,

     

    Thank you for the links, much appreciated.

    However, the very last comment by Geraldine on this discussion (https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2016-08-11-2016-04-07/7644-ERROR-MESSAGE-Graph-must-have-ref-source-and-sink-vertices) has caught my attention because I'm using draft genomes assembled from WGS reads:

    " ...GATK is not designed to run on draft genomes..."

     

     Does this mean that any results I get from GATK would not be valid?

    I especially ask this because, on my full version of my draft genome(with repeats intact), the alternate allele frequency spectrum looks a bit odd for a haploid organism, so I'm wondering whether this might be because "GATK is not designed to run on draft genomes".

    Kindly clarify this for me.

    Much appreciated!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi New_gatk_user, the GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. You can find more clarification about this within our documentation, I'll just point you to this link for some initial information.

    For questions such as this one, we are building a backlog to work through when we have the capacity. Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

    1
    Comment actions Permalink
  • Avatar
    New_gatk_user

    Many thanks again for pointing out some helpful discussion links. 

    I'll continue to look for ways to resolve.

    Best.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk