Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect:The stop position 0 is less than start 1 in contig contig004333

Answered
0

8 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi New_gatk_user, there is an issue with one of your files, at contig004333. It has the stop position at 0 and the start at 1, which will throw an error with GATK. You will need to fix this issue so that GATK can run properly.

    2
    Comment actions Permalink
  • Avatar
    New_gatk_user

    Hi Genevieve,

    I looked in my fasta file and I found that this specific contig and several others, didn't have any bases so it was just their headers e.g.

    >contig004332

    AAGGGCCT ...

    >contig004333

    >contig004334

    TTTTCCCCAAA ...

    I also see that these contigs are skipped when the fasta dictionary is created so that in the dict file it reads;

    @SQ     SN:contig004332 LN:1965 M5:09a3a3cf32c10a4c052170cba5adcd85    ...

    @SQ     SN:contig004334 LN:2931 M5:5bf2f7ccb1a0d7c57dfb0ca10c46252e      ...

     

    So I removed all these "empty" contigs from the fasta file, made a new index file and a new dic file but I still get the same error again,

    "A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect:The stop position 0 is less than start 1 in contig contig004333"

    Kindly suggest me another way to resolve this issue.

    Many thanks!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    There may be a bigger issue with your file: please see ValidateSamFile https://gatk.broadinstitute.org/hc/en-us/articles/360035891231 to see if there are issues.

    1
    Comment actions Permalink
  • Avatar
    New_gatk_user

    Hi again Genevieve,

     

    Thanks for your pointers.

    I looked through my entire assembly fasta file and removed the empty headers with no bases, and using this fixed assembly fasta file, I ran alignment again and made new SAM/BAM files. 

    I have validated them as you pointed out to me and they have no errors.

    So now my latest challenge is that when I ran Haplotyper again to call variants using the new BAM files and the fixed assembly, all seem to be going well until something fails towards the end of the run.

    I am not sure if the resulting gvcf's are fully complete because my slurm scripts produce a 'run-fail' message and the slurm file shows that the algorithm runs up to a certain contig number then it terminates. It is different contigs for each bam file so I'm not able to pinpoint one exact contig/location that causes the algorithm to fail. See the snippet here as an example:

     

    15:40:38.652 INFO  ProgressMeter - contig009842:1            172.0                272100           1582.1

    15:40:42.574 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    15:40:44.825 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    15:40:44.826 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    15:40:45.484 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    15:40:45.484 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    15:40:45.484 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    15:40:48.797 INFO  ProgressMeter - contig009939:98            172.2                272370           1582.1

    15:40:54.442 WARN  StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

    15:40:55.434 INFO  VectorLoglessPairHMM - Time spent in setup for JNI call : 6.058770996000001

    15:40:55.434 INFO  PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 5220.671827919

    15:40:55.434 INFO  SmithWatermanAligner - Total compute time in java Smith-Waterman : 2564.86 sec

    15:40:55.435 INFO  HaplotypeCaller - Shutting down engine

    [August 10, 2020 3:40:55 PM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 172.29 minutes.

    Runtime.totalMemory()=24984944640

    java.lang.IllegalStateException: Graph must have ref source and sink vertices

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:500)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:665)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:643)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:534)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assembleKmerGraphsAndHaplotypeCall(ReadThreadingAssembler.java:181)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:146)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:270)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)

            at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)

            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)

            at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)

            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)

            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)

            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)

            at org.broadinstitute.hellbender.Main.main(Main.java:292)

    Using GATK jar /crex/proj/uppstore2017083/bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /crex/proj/bin/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar H$

     

     

    The exact command I used for the run was:

     

    /gatk-4.1.7.0/gatk --java-options "-Xmx80G" HaplotypeCaller -R CC_combined_no_repeats.FIXD4GATK.fasta -ploidy 1  -I CC10_bwa.mdsRG.bam  -O CC10.g.vcf - ERC GVCF  --annotation AlleleFraction

     

    A small side note, FYI:

     - My ref fasta file is a draft assembled whole genome (repeat sequences removed, hence the initial problem above- with lines with headers and no bases)and the contigs are of different lengths (biggest has length=196571 and smallest has 1). I don't know if this might have anything to do with this error perhaps, or not.

    - Another separate snp calling run with a slightly different version of the assembly (the same assembly as above but with the repeats not removed) appears to run to completion just okay but the allele frequency spectrum looks a bit strange considering I am working with a haploid (non-model) organism. 

    -None of these two versions of the whole genome draft assemblies are scaffolded.

     

    I would highly appreciate any more insights from you or the team on this issue.

    Thanks in advance!

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi New_gatk_user, have you solved the error above? If this is a new issue, please make a new post.

    1
    Comment actions Permalink
  • Avatar
    New_gatk_user

    Hi Genevieve,

     

    Not yet, and I still don't have a clue on how to proceed/resolve. 

    I thought maybe the issue was somehow related to my initial problem and this is why I've followed it up here.

    I will then make a new post asap, as you have advised.

    Kindly share your insights/suggestions there.

    Many thanks!

    0
    Comment actions Permalink
  • Avatar
    Flora

    Hi There,


    I am trying to run gatk-4.2.3.0's germline caller and I got this error,also, there is no *.vcf  file;


    "A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect: The genome loc coordinates 249106097-249106571 exceed the contig size (248956422)"

     

    Could someone kindly point out what I may be missing here ?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Flora,

    Can you validate your input file with ValidateSamFile?

    Thank you,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk