VariantEval does not work or something wrong with my ref genome and dbsnps?
AnsweredIf you are seeing an error, please provide(REQUIRED) :
a) GATK version used: gatk-4.1.8.1-local.jar
b) Exact command used: java -jar $gatk VariantEval -R $ref -eval iPAM61_sorted_file.vcf --comp GCA_000003025.6_current_ids.vcf -O iPAM61_snps_results.eval.grp
c) Entire error log:
Dear Teams,
I am a new comer in this field.
I've been running variant discovery using HaplotypeCaller.
When I tried to run variant evaluation for my vcf file rnaseq data. it appeared that error. I've already sorted that vcf file by using SortVcf of picard:
java -jar $picard.jar SortVcf
I=iPAM61_filtered_snps.vcf
I=iPAM61_filtered_indels.vcf
O=iPAM61_sorted_file.vcf
then i used output file to run VariantEval.
and dbsnps file i got form : ftp://ftp.ebi.ac.uk/pub/databases/eva/rs_releases/release_2/by_species/sus_scrofa/Sscrofa11.1/GCA_000003025.6_multimap_ids.vcf.gz
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: VariantEval is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
17:32:35.137 INFO VariantEval - Initializing engine
17:32:35.286 INFO FeatureManager - Using codec VCFCodec to read file file:///media/thong/PAM-RNAseq
/rnaseq_tram/iPAM61_1_filtered_snps.vcf
17:32:35.362 INFO FeatureManager - Using codec VCFCodec to read file file:///media/thong/PAM-RNAseq
/rnaseq_tram/GCA_000003025.6_current_ids.vcf
17:32:35.570 INFO VariantEval - Done initializing engine
17:32:35.575 INFO VariantEval - Creating 3 combinatorial stratification states
17:32:35.586 INFO ProgressMeter - Starting traversal
17:32:35.587 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
17:32:45.593 INFO ProgressMeter - Chr1:12152818 0.2 647000
3880447.8
17:32:55.592 INFO ProgressMeter - Chr1:51298106 0.3 1773000
5317936.4
17:33:05.595 INFO ProgressMeter - Chr1:106681758 0.5 2965000
5928419.1
17:33:15.602 INFO ProgressMeter - Chr1:167383788 0.7 4139000
6206327.8
17:33:25.602 INFO ProgressMeter - Chr1:225246459 0.8 5223000
6265845.6
17:33:35.601 INFO ProgressMeter - Chr1:265064813 1.0 6285000
6283533.8
17:33:45.605 INFO ProgressMeter - Chr10:16347042 1.2 7314000
6267620.7
17:34:05.613 INFO ProgressMeter - Chr11:5688224 1.5 9466000
6308844.1
17:34:15.623 INFO ProgressMeter - Chr11:42702086 1.7 10571000
6340380.9
17:34:25.624 INFO ProgressMeter - Chr11:73803413 1.8 11668000
6362223.6
17:34:55.647 INFO ProgressMeter - Chr13:45754507 2.3 15065000
6453662.7
17:35:05.653 INFO ProgressMeter - Chr13:112124302 2.5 16341000
6533568.8
17:35:15.654 INFO ProgressMeter - Chr13:175160707 2.7 17579000
6589365.7
17:35:25.662 INFO ProgressMeter - Chr13:208033404 2.8 18713000
6601714.5
17:35:35.661 INFO ProgressMeter - Chr14:33384694 3.0 19844000
6611948.4
17:35:45.667 INFO ProgressMeter - Chr14:89820063 3.2 21076000
6652777.8
17:35:55.672 INFO ProgressMeter - Chr14:130386098 3.3 22227000
6665267.3
17:36:15.692 INFO ProgressMeter - Chr15:74720074 3.7 24610000
6708646.8
17:36:25.703 INFO ProgressMeter - Chr15:127504466 3.8 25856000
6741643.3
17:36:45.713 INFO ProgressMeter - Chr16:62352803 4.2 28180000
6759793.1
17:36:55.714 INFO ProgressMeter - Chr17:12361116 4.3 29308000
6760082.6
17:36:55.714 INFO ProgressMeter - Chr18:12300006 4.6 26754000
6760082.6
Runtime.totalMemory()=2839543808
java.lang.IllegalStateException: The elements of the input Iterators are not sorted according to the
comparator htsjdk.variant.variantcontext.VariantContextComparator
at htsjdk.samtools.util.MergingIterator.next(MergingIterator.java:107)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1
801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:
173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:
118)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:
140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(Comman
dLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.
java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
-
Hi Anh Mai,
One of the other VCF files could be the unsorted culprit here. Could you sort all the other inputs as well and see if you get the same error?
Best,
Genevieve
-
Hi Ms,
Thanks for your reply! I've sorted my input files and also dbsnps file! but it still appeared this error!
java.lang.IllegalStateException: The elements of the input Iterators are not sorted according to the comparator htsjdk.variant.variantcontext.VariantContextComparator
at htsjdk.samtools.util.MergingIterator.next(MergingIterator.java:107)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:118)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289) -
Thanks for running that, Anh Mai. This kind of issue from VCF formatting can come from a few different issues, so at this point you'll want to test out different solutions to see what works.
- First, somewhat easy options. This could be a bug from how you are using Picard or because you are running an old GATK version. Try updating your GATK version to our current version 4.2.4.1. Then, make sure you are specifying this command using the GATK wrapper script (more details about the GATK wrapper script).
- Check the formatting of your VCF file. All the VCF files must contain proper headers and variant records. We have an article containing details about VCF files when using GATK here. You can check the formatting of your VCF file with ValidateVariants. Make sure to run it in the default strict settings.
- Make sure again that the sort order is compatible by specifying matching sequence dictionaries when running SortVCF for your files (-SD with SortVCF).
- Make sure your VCF file matches the contigs in your dbSNP file. Please see this related forum post for more details: https://gatk.broadinstitute.org/hc/en-us/community/posts/360072397571-VariatEval-ERROR-.
Let me know if these solutions work for you or if you have further questions!
Best,
Genevieve
-
Hi,
I've tried your suggestion, but:1. I'd updated and used your newest version, but it didn't work at all. :((3. I do not understand sequence dictionaries in SortVcf (-SD Undocumented option), does it refer to the reference genome or what? I'd already created one for reference genome. Does it mean that I have to link to the .dict file of ref genome ?4. I ran the whole chromosome set variants in comparison to the dbsnps file I got from ncbi. so I don't think I have to subset chromosomes of interest out, do I?2. I'd run ValidateVariants and it appeared like these:16:03:09.480 INFO ValidateVariants - Picard Version: 2.25.4
16:03:09.481 INFO ValidateVariants - Built for Spark Version: 2.4.5
16:03:09.481 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:03:09.481 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_
FOR_SAMTOOLS : false 16:03:09.481 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_
FOR_SAMTOOLS : true 16:03:09.481 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_
FOR_TRIBBLE : false 16:03:09.481 INFO ValidateVariants - Deflater: IntelDeflater
16:03:09.481 INFO ValidateVariants - Inflater: IntelInflater
16:03:09.481 INFO ValidateVariants - GCS max retries/reopens: 20
16:03:09.481 INFO ValidateVariants - Requester pays: disabled
16:03:09.481 INFO ValidateVariants - Initializing engine
16:03:09.667 INFO FeatureManager - Using codec VCFCodec to read file file:///media/thong/PAM-
RNAseq/rnaseq_tram/dbsnps_ pigs.vcf 16:03:09.754 INFO FeatureManager - Using codec VCFCodec to read file file:///media/thong/PAM-
RNAseq/rnaseq_tram/iPAM61_db_ annotated_sorted.vcf 16:03:09.818 INFO ValidateVariants - Done initializing engine
16:03:09.830 INFO ProgressMeter - Starting traversal
16:03:09.830 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:03:19.850 INFO ProgressMeter - 2:5190274 0.2 27000 161741.2
16:03:29.924 INFO ProgressMeter - 4:13447988 0.3 74000 220983.5
16:03:39.963 INFO ProgressMeter - 6:99161523 0.5 124000 246913.6
16:03:50.162 INFO ProgressMeter - 8:137877192 0.7 162000 241011.7
16:04:00.204 INFO ProgressMeter - 12:5866242 0.8 202000 240609.9
16:04:10.398 INFO ProgressMeter - 14:74736512 1.0 246000 243697.1
16:04:20.439 INFO ProgressMeter - 17:48291527 1.2 284000 241332.4
16:04:27.294 INFO ProgressMeter - AEMK02000256.1:184154 1.3 304758 236057.4
16:04:27.294 INFO ProgressMeter - Traversal complete. Processed 304758 total variants in 1.3 minutes.
16:04:27.294 INFO ValidateVariants - Shutting down engine
[January 10, 2022 at 4:04:27 PM ICT] org.broadinstitute.hellbender.
tools.walkers.variantutils. ValidateVariants done. Elapsed time: 1.30 minutes. Runtime.totalMemory()=
3078619136 -
Hi Anh Mai,
Thanks for these updates!
- It looks like your VCFs are all in the correct format from ValidateVariants.
- You can apply a sequence dictionary to SortVCF which is the sequence dictionary from your reference genome used to create these VCFs. This will guarantee that your VCFs are sorted all to the same order.
- #4 - you still need to verify that your dbSNP file has matching contigs to your VCFs. You can check this in the header of your VCFs looking at the contig lines. (For example, ##contig=<ID=20,length=63025520>). https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format
Let me know if you are then still getting the issue.
Best,
Genevieve
Please sign in to leave a comment.
5 comments