GATK concordance
AnsweredREQUIRED for all errors and issues:
a) GATK version used:4.1.3.0
b) Exact command used:
gatk Concordance \
> -eval AC.vcf \
> --truth HG003_GRCh37_1_22_v4.2.1_benchmark.vcf
c) Entire program log:
Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.3.0-local.jar Concordance -eval AC.vcf --truth HG003_GRCh37_1_22_v4.2.1_benchmark.vcf
**BETA FEATURE - WORK IN PROGRESS**
USAGE: Concordance [arguments]
This tool evaluates an input VCF against a VCF that has been validated and is considered to represent ground truth.
The summary statistics (# true positives, # false positives, # false negatives, sensitivity, precision) are reported
in a TSV file (--summary). Note that this tool assumes that the truth VCF only contains PASS variants.
Version:4.1.3.0
Required Arguments:
--evaluation,-eval:String A VCF containing variants to be compared to the truth Required.
--summary,-S:File A table of summary statistics (true positives, sensitivity, etc.) Required.
--truth,-truth:String A VCF containing truth variants Required.
Optional Arguments:
--add-output-sam-program-record,-add-output-sam-program-record:Boolean
If true, adds a PG tag to created SAM/BAM/CRAM files. Default value: true. Possible
values: {true, false}
--add-output-vcf-command-line,-add-output-vcf-command-line:Boolean
If true, adds a command line header line to created VCF files. Default value: true.
Possible values: {true, false}
--arguments_file:File read one or more arguments files and add them to the command line This argument may be
specified 0 or more times. Default value: null.
--cloud-index-prefetch-buffer,-CIPB:Integer
Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to
cloudPrefetchBuffer if unset. Default value: -1.
--cloud-prefetch-buffer,-CPB:Integer
Size of the cloud-only prefetch buffer (in MB; 0 to disable). Default value: 40.
--confidence,-C:File TO BE IMPLEMENTED Default value: null.
--create-output-bam-index,-OBI:Boolean
If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. Default
value: true. Possible values: {true, false}
--create-output-bam-md5,-OBM:Boolean
If true, create a MD5 digest for any BAM/SAM/CRAM file created Default value: false.
Possible values: {true, false}
--create-output-variant-index,-OVI:Boolean
If true, create a VCF index when writing a coordinate-sorted VCF file. Default value:
true. Possible values: {true, false}
--create-output-variant-md5,-OVM:Boolean
If true, create a a MD5 digest any VCF file created. Default value: false. Possible
values: {true, false}
--disable-bam-index-caching,-DBIC:Boolean
If true, don't cache bam indexes, this will reduce memory requirements but may harm
performance if many intervals are specified. Caching is automatically disabled if there
are no intervals specified. Default value: false. Possible values: {true, false}
--disable-read-filter,-DF:String
Read filters to be disabled before analysis This argument may be specified 0 or more
times. Default value: null. Possible Values: {WellformedReadFilter}
--disable-sequence-dictionary-validation,-disable-sequence-dictionary-validation:Boolean
If specified, do not check the sequence dictionaries from our inputs for compatibility.
Use at your own risk! Default value: false. Possible values: {true, false}
--exclude-intervals,-XL:StringOne or more genomic intervals to exclude from processing This argument may be specified 0
or more times. Default value: null.
--filter-analysis:File A table of the contribution of each filter to true and false negatives Default value:
null.
--filtered-true-negatives-and-false-negatives,-ftnfn:File
A vcf to write filtered true negatives and false negatives Default value: null.
--gatk-config-file:String A configuration file to use with the GATK. Default value: null.
--gcs-max-retries,-gcs-retries:Integer
If the GCS bucket channel errors out, how many times it will attempt to re-initiate the
connection Default value: 20.
--gcs-project-for-requester-pays:String
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be
accessed. Default value: .
--help,-h:Boolean display the help message Default value: false. Possible values: {true, false}
--input,-I:String BAM/SAM/CRAM file containing reads This argument may be specified 0 or more times.
Default value: null.
--interval-exclusion-padding,-ixp:Integer
Amount of padding (in bp) to add to each interval you are excluding. Default value: 0.
--interval-merging-rule,-imr:IntervalMergingRule
Interval merging rule for abutting intervals Default value: ALL. Possible values: {ALL,
OVERLAPPING_ONLY}
--interval-padding,-ip:IntegerAmount of padding (in bp) to add to each interval you are including. Default value: 0.
--interval-set-rule,-isr:IntervalSetRule
Set merging approach to use for combining interval inputs Default value: UNION. Possible
values: {UNION, INTERSECTION}
--intervals,-L:String One or more genomic intervals over which to operate This argument may be specified 0 or
more times. Default value: null.
--lenient,-LE:Boolean Lenient processing of VCF files Default value: false. Possible values: {true, false}
--QUIET:Boolean Whether to suppress job-summary info on System.err. Default value: false. Possible
values: {true, false}
--read-filter,-RF:String Read filters to be applied before analysis This argument may be specified 0 or more
times. Default value: null. Possible Values: {AlignmentAgreesWithHeaderReadFilter,
AllowAllReadsReadFilter, AmbiguousBaseReadFilter, CigarContainsNoNOperator,
FirstOfPairReadFilter, FragmentLengthReadFilter, GoodCigarReadFilter,
HasReadGroupReadFilter, IntervalOverlapReadFilter, LibraryReadFilter, MappedReadFilter,
MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
MappingQualityReadFilter, MatchingBasesAndQualsReadFilter, MateDifferentStrandReadFilter,
MateOnSameContigOrNoMappedMateReadFilter, MateUnmappedAndUnmappedReadFilter,
MetricsReadFilter, NonChimericOriginalAlignmentReadFilter,
NonZeroFragmentLengthReadFilter, NonZeroReferenceLengthAlignmentReadFilter,
NotDuplicateReadFilter, NotOpticalDuplicateReadFilter, NotSecondaryAlignmentReadFilter,
NotSupplementaryAlignmentReadFilter, OverclippedReadFilter, PairedReadFilter,
PassesVendorQualityCheckReadFilter, PlatformReadFilter, PlatformUnitReadFilter,
PrimaryLineReadFilter, ProperlyPairedReadFilter, ReadGroupBlackListReadFilter,
ReadGroupReadFilter, ReadLengthEqualsCigarLengthReadFilter, ReadLengthReadFilter,
ReadNameReadFilter, ReadStrandFilter, SampleReadFilter, SecondOfPairReadFilter,
SeqIsStoredReadFilter, SoftClippedReadFilter, ValidAlignmentEndReadFilter,
ValidAlignmentStartReadFilter, WellformedReadFilter}
--read-index,-read-index:String
Indices to use for the read inputs. If specified, an index must be provided for every read
input and in the same order as the read inputs. If this argument is not specified, the
path to the index for each input will be inferred automatically. This argument may be
specified 0 or more times. Default value: null.
--read-validation-stringency,-VS:ValidationStringency
Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default
stringency value SILENT can improve performance when processing a BAM file in which
variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default
value: SILENT. Possible values: {STRICT, LENIENT, SILENT}
--reference,-R:String Reference sequence Default value: null.
--seconds-between-progress-updates,-seconds-between-progress-updates:Double
Output traversal statistics every time this many seconds elapse Default value: 10.0.
--sequence-dictionary,-sequence-dictionary:String
Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a
.dict file. Default value: null.
--sites-only-vcf-output:Boolean
If true, don't emit genotype fields when writing vcf file output. Default value: false.
Possible values: {true, false}
--tmp-dir:GATKPathSpecifier Temp directory to use. Default value: null.
--true-positives-and-false-negatives,-tpfn:File
A vcf to write true positives and false negatives Default value: null.
--true-positives-and-false-positives,-tpfp:File
A vcf to write true positives and false positives Default value: null.
--use-jdk-deflater,-jdk-deflater:Boolean
Whether to use the JdkDeflater (as opposed to IntelDeflater) Default value: false.
Possible values: {true, false}
--use-jdk-inflater,-jdk-inflater:Boolean
Whether to use the JdkInflater (as opposed to IntelInflater) Default value: false.
Possible values: {true, false}
--verbosity,-verbosity:LogLevel
Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING,
INFO, DEBUG}
--version:Boolean display the version number for this tool Default value: false. Possible values: {true,
false}
Advanced Arguments:
--disable-tool-default-read-filters,-disable-tool-default-read-filters:Boolean
Disable all tool default read filters (WARNING: many tools will not function correctly
without their default read filters on) Default value: false. Possible values: {true,
false}
--showHidden,-showHidden:Boolean
display hidden arguments Default value: false. Possible values: {true, false}
***********************************************************************
A USER ERROR has occurred: Argument summary was missing: Argument 'summary' is required.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
What I am looking for---- I have a sequencer whose consistency in producing results is to be determined. So for the same sample I have multiple vcf files(sequenced multiple times). I want to compare and find the concordance of these vcf files with a true vcf file. I do not have a summary table. Can GATK solve my problem?
-
Hi nagam surya,
The summary file is an output file, you just need to give a path where the output should be written.
Best,
Genevieve
-
oh ok, got it. Thanks!
And also just wanted to know is there a way to find the percentage match of records and know what the matched and unmatched genes in the two vcfs are? -
Yes, you can get the percentage of matching records in the summary file. And you can optionally have the tool output a VCF with the variants annotated with their concordance status.
See the tool docs page for more info: https://gatk.broadinstitute.org/hc/en-us/community/posts/5982862303515-GATK-concordance
-
You mean precision is the percentage of matching records right?
-
Yes, precision is the percentage of your VCF calls match the calls in the truth VCF.
Also here's the correct tool docs link: https://gatk.broadinstitute.org/hc/en-us/articles/5358936704667-Concordance
-
(gatk) root@abaf494be0a0:/gatk/my_data# gatk Concordance -eval NA24149.snpeff.vcf --truth HG003_GRCh37_1_22_v4.2.1_benchmark.vcf --summary zebeo
Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.3.0-local.jar Concordance -eval NA24149.snpeff.vcf --truth HG003_GRCh37_1_22_v4.2.1_benchmark.vcf --summary zebeo
16:12:37.856 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 26, 2022 4:12:39 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:12:39.505 INFO Concordance - ------------------------------------------------------------
16:12:39.505 INFO Concordance - The Genome Analysis Toolkit (GATK) v4.1.3.0
16:12:39.505 INFO Concordance - For support and documentation go to https://software.broadinstitute.org/gatk/
16:12:39.506 INFO Concordance - Executing as root@abaf494be0a0 on Linux v5.10.104-linuxkit amd64
16:12:39.506 INFO Concordance - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
16:12:39.506 INFO Concordance - Start Date/Time: May 26, 2022 4:12:37 PM UTC
16:12:39.506 INFO Concordance - ------------------------------------------------------------
16:12:39.506 INFO Concordance - ------------------------------------------------------------
16:12:39.507 INFO Concordance - HTSJDK Version: 2.20.1
16:12:39.507 INFO Concordance - Picard Version: 2.20.5
16:12:39.507 INFO Concordance - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:12:39.507 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:12:39.507 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:12:39.507 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:12:39.508 INFO Concordance - Deflater: IntelDeflater
16:12:39.508 INFO Concordance - Inflater: IntelInflater
16:12:39.508 INFO Concordance - GCS max retries/reopens: 20
16:12:39.508 INFO Concordance - Requester pays: disabled
16:12:39.508 WARN Concordance -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: Concordance is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
16:12:39.508 INFO Concordance - Initializing engine
16:12:39.716 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/HG003_GRCh37_1_22_v4.2.1_benchmark.vcf
16:12:39.742 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/NA24149.snpeff.vcf
16:12:39.780 INFO Concordance - Done initializing engine
16:12:39.788 INFO ProgressMeter - Starting traversal
16:12:39.788 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute
16:12:39.800 INFO Concordance - Shutting down engine
[May 26, 2022 4:12:39 PM UTC] org.broadinstitute.hellbender.tools.walkers.validation.Concordance done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=307757056
java.lang.NullPointerException
at htsjdk.variant.variantcontext.VariantContextComparator.compare(VariantContextComparator.java:87)
at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker$ConcordanceIterator.next(AbstractConcordanceWalker.java:192)
at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker$ConcordanceIterator.next(AbstractConcordanceWalker.java:174)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker.traverse(AbstractConcordanceWalker.java:132)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
This is my error now. Can you please advice how to solve this?
-
nagam surya can you try this same command with a newer GATK version to check that this isn't a bug that has already been solved? The current GATK version is 4.2.6.1.
-
I tried with the latest version as well, it gives me the same error. But I figured out the problem. The contigs in the vcf files do not match, I think that's the issue. The eval vcf was generated using hg38 as reference and the truth vcf was generated using hg37.
-
Now this is strange. The process started successfully but in the middle it gives me the null pointer exception. This time I am sure that I am using the correct truth vcf. What is the problem now?
Here's the stack trace--
(gatk) root@184678cd1c78:/gatk/my_data# gatk Concordance -eval NA24149_ILLM.vcf --truth Homo_sapiens_assembly38.dbsnp138.vcf --summary zebeo
Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.3.0-local.jar Concordance -eval NA24149_ILLM.vcf --truth Homo_sapiens_assembly38.dbsnp138.vcf --summary zebeo
15:35:28.041 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 31, 2022 3:35:29 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:35:29.691 INFO Concordance - ------------------------------------------------------------
15:35:29.692 INFO Concordance - The Genome Analysis Toolkit (GATK) v4.1.3.0
15:35:29.692 INFO Concordance - For support and documentation go to https://software.broadinstitute.org/gatk/
15:35:29.693 INFO Concordance - Executing as root@184678cd1c78 on Linux v5.10.104-linuxkit amd64
15:35:29.693 INFO Concordance - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
15:35:29.693 INFO Concordance - Start Date/Time: May 31, 2022 3:35:28 PM UTC
15:35:29.693 INFO Concordance - ------------------------------------------------------------
15:35:29.693 INFO Concordance - ------------------------------------------------------------
15:35:29.694 INFO Concordance - HTSJDK Version: 2.20.1
15:35:29.694 INFO Concordance - Picard Version: 2.20.5
15:35:29.694 INFO Concordance - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:35:29.694 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:35:29.694 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:35:29.695 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:35:29.695 INFO Concordance - Deflater: IntelDeflater
15:35:29.695 INFO Concordance - Inflater: IntelInflater
15:35:29.695 INFO Concordance - GCS max retries/reopens: 20
15:35:29.695 INFO Concordance - Requester pays: disabled
15:35:29.696 WARN Concordance -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: Concordance is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
15:35:29.696 INFO Concordance - Initializing engine
15:35:29.898 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/Homo_sapiens_assembly38.dbsnp138.vcf
15:35:29.965 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/NA24149_ILLM.vcf
15:35:29.993 INFO Concordance - Done initializing engine
15:35:29.998 INFO ProgressMeter - Starting traversal
15:35:29.998 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute
15:35:40.005 INFO ProgressMeter - chr1:86934407 0.2 1814000 10877473.5
15:35:50.010 INFO ProgressMeter - chr1:227747073 0.3 4196000 12580451.7
15:36:00.014 INFO ProgressMeter - chr2:86888818 0.5 6581000 13154984.0
15:36:10.016 INFO ProgressMeter - chr2:211139230 0.7 8981000 13465440.6
15:36:20.018 INFO ProgressMeter - chr3:75445233 0.8 11314000 13571914.1
15:36:30.018 INFO ProgressMeter - chr3:185751375 1.0 13527000 13522492.5
15:36:40.019 INFO ProgressMeter - chr4:93855504 1.2 15871000 13599634.4
15:36:50.021 INFO ProgressMeter - chr5:8124970 1.3 18083000 13558352.0
15:37:00.022 INFO ProgressMeter - chr5:122487383 1.5 20372000 13577712.6
15:37:10.023 INFO ProgressMeter - chr6:40657757 1.7 22592000 13551812.0
15:37:20.025 INFO ProgressMeter - chr6:151496431 1.8 24837000 13544130.1
15:37:30.029 INFO ProgressMeter - chr7:82096722 2.0 27065000 13529005.0
15:37:40.031 INFO ProgressMeter - chr8:25020766 2.2 29409000 13570044.3
15:37:50.031 INFO ProgressMeter - chr8:143342183 2.3 31819000 13633500.7
15:38:00.034 INFO ProgressMeter - chr9:128003522 2.5 34178000 13668010.8
15:38:10.037 INFO ProgressMeter - chr10:99934821 2.7 36558000 13705909.2
15:38:20.039 INFO ProgressMeter - chr11:78157402 2.8 38971000 13751154.1
15:38:30.039 INFO ProgressMeter - chr12:54976345 3.0 41367000 13785859.9
15:38:40.041 INFO ProgressMeter - chr13:56739339 3.2 43815000 13833257.9
15:38:50.042 INFO ProgressMeter - chr14:59366234 3.3 45897000 13766071.5
15:39:00.046 INFO ProgressMeter - chr15:63183969 3.5 47828000 13662020.1
15:39:10.050 INFO ProgressMeter - chr16:69498223 3.7 50009000 13635595.2
15:39:20.051 INFO ProgressMeter - chr17:80642111 3.8 52237000 13623904.1
15:39:30.056 INFO ProgressMeter - chr19:35068034 4.0 54720000 13676694.8
15:39:40.107 INFO ProgressMeter - chr21:14719022 4.2 56800000 13626059.0
15:39:50.109 INFO ProgressMeter - chrX:58400529 4.3 59166000 13647865.7
15:40:00.115 INFO ProgressMeter - chr2:105148418 4.5 61073000 13565899.2
15:40:10.118 INFO ProgressMeter - chr7:10098478 4.7 62228000 13328859.1
15:40:20.122 INFO ProgressMeter - chr14:22101409 4.8 63482000 13128593.3
15:40:27.864 INFO Concordance - Shutting down engine
[May 31, 2022 3:40:27 PM UTC] org.broadinstitute.hellbender.tools.walkers.validation.Concordance done. Elapsed time: 5.00 minutes.
Runtime.totalMemory()=339738624
java.lang.NullPointerException
at htsjdk.variant.variantcontext.VariantContextComparator.compare(VariantContextComparator.java:87)
at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker$ConcordanceIterator.next(AbstractConcordanceWalker.java:192)
at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker$ConcordanceIterator.next(AbstractConcordanceWalker.java:174)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker.traverse(AbstractConcordanceWalker.java:132)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
-
nagam surya thanks for the update. Could you try this again with a newer GATK version and see if the error message will give more information? Some of these java.lang.NullPointerException errors that are cryptic have been fixed since 4.1.3.0.
-
Same issue with the latest version as well.
-
Ok, thanks for the update. I will look into this.
-
Hey, how's it going? Just wanted to let you know about the issue. So the same eror pops up whenever my eval file is large(like 2 or 3 gb). It works fine when my eval file is in between 100 to 400 mb. Thought this information might help you.
-
Also I observed the prgress meter. It always crashes after processing roughly 65 million records.
-
Hi nagam surya,
Thank you for these updates. I think this is a potential contig mismatch between your eval and truth files. Could you post the VCF headers here so that we can verify that the contigs match between the eval file and the truth file?
Best,
Genevieve
-
Am not sure how to share the headers.Hence I am sharing the vcf files itself(which can be found in the drive link).https://drive.google.com/drive/folders/1gp9j0Ut1LlQkDvf0rnJcE12QJOiiGfRd?usp=sharing
-
Hi nagam surya,
These VCF files do not match, the headers show different contigs in the two files. For example, there is a contig chrUn_gl000214 in your eval, but it is identified as chrUn_GL000214v1 in the truth. It looks like these VCFs were created with two different reference versions. You'll need to get a truth VCF from a matching reference to your eval VCF to be able to compare the variants.
Best,
Genevieve
-
Oh ok. Thanks for informing. As these files are very huge am not able to open them in excel and view the meta information. I tried using R but am able to view only first 100 lines. So could you please guide me on how to see the entire meta information so that I can see the referenece versions used for both.
-
nagam surya I opened them in terminal using the command line and saw the beginning of the files with the less command.
-
Hi Genevieve, so I found out that my eval VCF was formed using hg19 as reference. I had a look at the resource bundle maintained by GATK and found out that you guys have hg38 reference available for download. Can I know wether you guys have hg19 reference vcf also? If yes, could you please share the link?
-
nagam surya I don't have any other links available than what is already in the resource bundle page.
-
Okay. Thanks
Please sign in to leave a comment.
22 comments