Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

VariantEval does not work or something wrong with my ref genome and dbsnps?

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Anh Mai,

    One of the other VCF files could be the unsorted culprit here. Could you sort all the other inputs as well and see if you get the same error?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Anh Mai

    Hi Ms,

    Thanks for your reply! I've sorted my input files and also dbsnps file! but it still appeared this error!

    java.lang.IllegalStateException: The elements of the input Iterators are not sorted according to the comparator htsjdk.variant.variantcontext.VariantContextComparator
    at htsjdk.samtools.util.MergingIterator.next(MergingIterator.java:107)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:118)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for running that, Anh Mai. This kind of issue from VCF formatting can come from a few different issues, so at this point you'll want to test out different solutions to see what works.

    1. First, somewhat easy options. This could be a bug from how you are using Picard or because you are running an old GATK version. Try updating your GATK version to our current version 4.2.4.1. Then, make sure you are specifying this command using the GATK wrapper script (more details about the GATK wrapper script).
    2. Check the formatting of your VCF file. All the VCF files must contain proper headers and variant records. We have an article containing details about VCF files when using GATK here. You can check the formatting of your VCF file with ValidateVariants. Make sure to run it in the default strict settings.
    3. Make sure again that the sort order is compatible by specifying matching sequence dictionaries when running SortVCF for your files (-SD with SortVCF).
    4. Make sure your VCF file matches the contigs in your dbSNP file. Please see this related forum post for more details: https://gatk.broadinstitute.org/hc/en-us/community/posts/360072397571-VariatEval-ERROR-.

    Let me know if these solutions work for you or if you have further questions!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Anh Mai

    Hi, 

    I've tried your suggestion, but: 
    1. I'd updated and used your newest version, but it didn't work at all. :((
    3. I do not understand sequence dictionaries in SortVcf (-SD Undocumented option), does it refer to the reference genome or what? I'd already created one for reference genome. Does it mean that I have to link to the .dict file of ref genome ?
    4. I ran the whole chromosome set variants in comparison to the dbsnps file I got from ncbi. so I don't think I have to subset chromosomes of interest out, do I?
    2. I'd run ValidateVariants and it appeared like these: 

    16:03:09.480 INFO  ValidateVariants - Picard Version: 2.25.4

    16:03:09.481 INFO  ValidateVariants - Built for Spark Version: 2.4.5

    16:03:09.481 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    16:03:09.481 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    16:03:09.481 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    16:03:09.481 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    16:03:09.481 INFO  ValidateVariants - Deflater: IntelDeflater

    16:03:09.481 INFO  ValidateVariants - Inflater: IntelInflater

    16:03:09.481 INFO  ValidateVariants - GCS max retries/reopens: 20

    16:03:09.481 INFO  ValidateVariants - Requester pays: disabled

    16:03:09.481 INFO  ValidateVariants - Initializing engine

    16:03:09.667 INFO  FeatureManager - Using codec VCFCodec to read file file:///media/thong/PAM-RNAseq/rnaseq_tram/dbsnps_pigs.vcf

    16:03:09.754 INFO  FeatureManager - Using codec VCFCodec to read file file:///media/thong/PAM-RNAseq/rnaseq_tram/iPAM61_db_annotated_sorted.vcf

    16:03:09.818 INFO  ValidateVariants - Done initializing engine

    16:03:09.830 INFO  ProgressMeter - Starting traversal

    16:03:09.830 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute

    16:03:19.850 INFO  ProgressMeter -            2:5190274              0.2                 27000         161741.2

    16:03:29.924 INFO  ProgressMeter -           4:13447988              0.3                 74000         220983.5

    16:03:39.963 INFO  ProgressMeter -           6:99161523              0.5                124000         246913.6

    16:03:50.162 INFO  ProgressMeter -          8:137877192              0.7                162000         241011.7

    16:04:00.204 INFO  ProgressMeter -           12:5866242              0.8                202000         240609.9

    16:04:10.398 INFO  ProgressMeter -          14:74736512              1.0                246000         243697.1

    16:04:20.439 INFO  ProgressMeter -          17:48291527              1.2                284000         241332.4

    16:04:27.294 INFO  ProgressMeter - AEMK02000256.1:184154              1.3                304758         236057.4

    16:04:27.294 INFO  ProgressMeter - Traversal complete. Processed 304758 total variants in 1.3 minutes.

    16:04:27.294 INFO  ValidateVariants - Shutting down engine

    [January 10, 2022 at 4:04:27 PM ICT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 1.30 minutes.

    Runtime.totalMemory()=3078619136

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Anh Mai,

    Thanks for these updates!

    • It looks like your VCFs are all in the correct format from ValidateVariants.
    • You can apply a sequence dictionary to SortVCF which is the sequence dictionary from your reference genome used to create these VCFs. This will guarantee that your VCFs are sorted all to the same order. 
    • #4 - you still need to verify that your dbSNP file has matching contigs to your VCFs. You can check this in the header of your VCFs looking at the contig lines. (For example, ##contig=<ID=20,length=63025520>). https://gatk.broadinstitute.org/hc/en-us/articles/360035531692-VCF-Variant-Call-Format

    Let me know if you are then still getting the issue.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk