A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect:The stop position 0 is less than start 1 in contig contig004333
AnsweredHi There,
I am trying to run gatk-'s Haplotype caller and I keep getting this error;
"A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect: The stop position 0 is less than start 1 in contig contig004333"
I am using a non-model genome assembly that has its repeats masked, and I have indexed it using samtoools and made a dict file first using Picard, and later using gatk's CreateSequenceDictionary command. On both of these options, I still get the same error
Could someone kindly point out what I may be missing here as I am unable to further proceed.
Hi New_gatk_user, there is an issue with one of your files, at contig004333. It has the stop position at 0 and the start at 1, which will throw an error with GATK. You will need to fix this issue so that GATK can run properly.
Hi Genevieve,
I looked in my fasta file and I found that this specific contig and several others, didn't have any bases so it was just their headers e.g.
I also see that these contigs are skipped when the fasta dictionary is created so that in the dict file it reads;
@SQ SN:contig004332 LN:1965 M5:09a3a3cf32c10a4c052170cba5adcd85 ...
@SQ SN:contig004334 LN:2931 M5:5bf2f7ccb1a0d7c57dfb0ca10c46252e ...
So I removed all these "empty" contigs from the fasta file, made a new index file and a new dic file but I still get the same error again,
"A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect:The stop position 0 is less than start 1 in contig contig004333"
Kindly suggest me another way to resolve this issue.
Many thanks!
There may be a bigger issue with your file: please see ValidateSamFile https://gatk.broadinstitute.org/hc/en-us/articles/360035891231 to see if there are issues.
Hi again Genevieve,
Thanks for your pointers.
I looked through my entire assembly fasta file and removed the empty headers with no bases, and using this fixed assembly fasta file, I ran alignment again and made new SAM/BAM files.
I have validated them as you pointed out to me and they have no errors.
So now my latest challenge is that when I ran Haplotyper again to call variants using the new BAM files and the fixed assembly, all seem to be going well until something fails towards the end of the run.
I am not sure if the resulting gvcf's are fully complete because my slurm scripts produce a 'run-fail' message and the slurm file shows that the algorithm runs up to a certain contig number then it terminates. It is different contigs for each bam file so I'm not able to pinpoint one exact contig/location that causes the algorithm to fail. See the snippet here as an example:
15:40:38.652 INFO ProgressMeter - contig009842:1 172.0 272100 1582.1
15:40:42.574 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:44.825 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:44.826 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:45.484 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:45.484 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:45.484 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:48.797 INFO ProgressMeter - contig009939:98 172.2 272370 1582.1
15:40:54.442 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:40:55.434 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 6.058770996000001
15:40:55.434 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 5220.671827919
15:40:55.434 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 2564.86 sec
15:40:55.435 INFO HaplotypeCaller - Shutting down engine
[August 10, 2020 3:40:55 PM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 172.29 minutes.
java.lang.IllegalStateException: Graph must have ref source and sink vertices
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs.BaseGraph.removePathsNotConnectedToRef(BaseGraph.java:500)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.getAssemblyResult(ReadThreadingAssembler.java:665)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.createGraph(ReadThreadingAssembler.java:643)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assemble(ReadThreadingAssembler.java:534)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.assembleKmerGraphsAndHaplotypeCall(ReadThreadingAssembler.java:181)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.readthreading.ReadThreadingAssembler.runLocalAssembly(ReadThreadingAssembler.java:146)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.AssemblyBasedCallerUtils.assembleReads(AssemblyBasedCallerUtils.java:270)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:541)
at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:210)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Using GATK jar /crex/proj/uppstore2017083/bin/gatk-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx80G -jar /crex/proj/bin/gatk- H$
The exact command I used for the run was:
/gatk- --java-options "-Xmx80G" HaplotypeCaller -R CC_combined_no_repeats.FIXD4GATK.fasta -ploidy 1 -I CC10_bwa.mdsRG.bam -O CC10.g.vcf - ERC GVCF --annotation AlleleFraction
A small side note, FYI:
- My ref fasta file is a draft assembled whole genome (repeat sequences removed, hence the initial problem above- with lines with headers and no bases)and the contigs are of different lengths (biggest has length=196571 and smallest has 1). I don't know if this might have anything to do with this error perhaps, or not.
- Another separate snp calling run with a slightly different version of the assembly (the same assembly as above but with the repeats not removed) appears to run to completion just okay but the allele frequency spectrum looks a bit strange considering I am working with a haploid (non-model) organism.
-None of these two versions of the whole genome draft assemblies are scaffolded.
I would highly appreciate any more insights from you or the team on this issue.
Thanks in advance!
Hi New_gatk_user, have you solved the error above? If this is a new issue, please make a new post.
Hi Genevieve,
Not yet, and I still don't have a clue on how to proceed/resolve.
I thought maybe the issue was somehow related to my initial problem and this is why I've followed it up here.
I will then make a new post asap, as you have advised.
Kindly share your insights/suggestions there.
Many thanks!
Hi There,
I am trying to run gatk-'s germline caller and I got this error,also, there is no *.vcf file;
"A USER ERROR has occurred: Badly formed genome unclippedLoc: Parameters to GenomeLocParser are incorrect: The genome loc coordinates 249106097-249106571 exceed the contig size (248956422)"Could someone kindly point out what I may be missing here ?
Hi Flora,
Can you validate your input file with ValidateSamFile?
Thank you,
