Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK concordance

Answered
0

22 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi nagam surya,

    The summary file is an output file, you just need to give a path where the output should be written.

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    nagam surya

    oh ok, got it. Thanks!
    And also just wanted to know is there a way to find the percentage match of records and know what the matched and unmatched genes in the two vcfs are?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yes, you can get the percentage of matching records in the summary file. And you can optionally have the tool output a VCF with the variants annotated with their concordance status. 

    See the tool docs page for more info: https://gatk.broadinstitute.org/hc/en-us/community/posts/5982862303515-GATK-concordance

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    You mean precision is the percentage of matching records right?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yes, precision is the percentage of your VCF calls match the calls in the truth VCF. 

    Also here's the correct tool docs link: https://gatk.broadinstitute.org/hc/en-us/articles/5358936704667-Concordance

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    (gatk) root@abaf494be0a0:/gatk/my_data# gatk Concordance -eval NA24149.snpeff.vcf --truth HG003_GRCh37_1_22_v4.2.1_benchmark.vcf --summary zebeo

    Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.3.0-local.jar Concordance -eval NA24149.snpeff.vcf --truth HG003_GRCh37_1_22_v4.2.1_benchmark.vcf --summary zebeo

    16:12:37.856 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    May 26, 2022 4:12:39 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    16:12:39.505 INFO  Concordance - ------------------------------------------------------------

    16:12:39.505 INFO  Concordance - The Genome Analysis Toolkit (GATK) v4.1.3.0

    16:12:39.505 INFO  Concordance - For support and documentation go to https://software.broadinstitute.org/gatk/

    16:12:39.506 INFO  Concordance - Executing as root@abaf494be0a0 on Linux v5.10.104-linuxkit amd64

    16:12:39.506 INFO  Concordance - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12

    16:12:39.506 INFO  Concordance - Start Date/Time: May 26, 2022 4:12:37 PM UTC

    16:12:39.506 INFO  Concordance - ------------------------------------------------------------

    16:12:39.506 INFO  Concordance - ------------------------------------------------------------

    16:12:39.507 INFO  Concordance - HTSJDK Version: 2.20.1

    16:12:39.507 INFO  Concordance - Picard Version: 2.20.5

    16:12:39.507 INFO  Concordance - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    16:12:39.507 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    16:12:39.507 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    16:12:39.507 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    16:12:39.508 INFO  Concordance - Deflater: IntelDeflater

    16:12:39.508 INFO  Concordance - Inflater: IntelInflater

    16:12:39.508 INFO  Concordance - GCS max retries/reopens: 20

    16:12:39.508 INFO  Concordance - Requester pays: disabled

    16:12:39.508 WARN  Concordance - 

     

       !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

     

       Warning: Concordance is a BETA tool and is not yet ready for use in production

     

       !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

     

     

    16:12:39.508 INFO  Concordance - Initializing engine

    16:12:39.716 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/HG003_GRCh37_1_22_v4.2.1_benchmark.vcf

    16:12:39.742 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/NA24149.snpeff.vcf

    16:12:39.780 INFO  Concordance - Done initializing engine

    16:12:39.788 INFO  ProgressMeter - Starting traversal

    16:12:39.788 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Records Processed   Records/Minute

    16:12:39.800 INFO  Concordance - Shutting down engine

    [May 26, 2022 4:12:39 PM UTC] org.broadinstitute.hellbender.tools.walkers.validation.Concordance done. Elapsed time: 0.03 minutes.

    Runtime.totalMemory()=307757056

    java.lang.NullPointerException

    at htsjdk.variant.variantcontext.VariantContextComparator.compare(VariantContextComparator.java:87)

    at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker$ConcordanceIterator.next(AbstractConcordanceWalker.java:192)

    at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker$ConcordanceIterator.next(AbstractConcordanceWalker.java:174)

    at java.util.Iterator.forEachRemaining(Iterator.java:116)

    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)

    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)

    at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker.traverse(AbstractConcordanceWalker.java:132)

    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)

    at org.broadinstitute.hellbender.Main.main(Main.java:291)

     

     

     

    This is my error now. Can you please advice how to solve this?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    nagam surya can you try this same command with a newer GATK version to check that this isn't a bug that has already been solved? The current GATK version is 4.2.6.1.

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    I tried with the latest version as well, it gives me the same error. But I figured out the problem. The contigs in the vcf files do not match, I think that's the issue. The eval vcf was generated using hg38 as reference and the truth vcf was generated using hg37. 

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    Now this is strange. The process started successfully but in the middle it gives me the null pointer exception. This time I am sure that I am using the correct truth vcf. What is the problem now?

     

    Here's the stack trace--

     

    (gatk) root@184678cd1c78:/gatk/my_data# gatk Concordance -eval NA24149_ILLM.vcf --truth Homo_sapiens_assembly38.dbsnp138.vcf --summary zebeo

    Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.3.0-local.jar Concordance -eval NA24149_ILLM.vcf --truth Homo_sapiens_assembly38.dbsnp138.vcf --summary zebeo

    15:35:28.041 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

    May 31, 2022 3:35:29 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    15:35:29.691 INFO  Concordance - ------------------------------------------------------------

    15:35:29.692 INFO  Concordance - The Genome Analysis Toolkit (GATK) v4.1.3.0

    15:35:29.692 INFO  Concordance - For support and documentation go to https://software.broadinstitute.org/gatk/

    15:35:29.693 INFO  Concordance - Executing as root@184678cd1c78 on Linux v5.10.104-linuxkit amd64

    15:35:29.693 INFO  Concordance - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12

    15:35:29.693 INFO  Concordance - Start Date/Time: May 31, 2022 3:35:28 PM UTC

    15:35:29.693 INFO  Concordance - ------------------------------------------------------------

    15:35:29.693 INFO  Concordance - ------------------------------------------------------------

    15:35:29.694 INFO  Concordance - HTSJDK Version: 2.20.1

    15:35:29.694 INFO  Concordance - Picard Version: 2.20.5

    15:35:29.694 INFO  Concordance - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    15:35:29.694 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    15:35:29.694 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    15:35:29.695 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    15:35:29.695 INFO  Concordance - Deflater: IntelDeflater

    15:35:29.695 INFO  Concordance - Inflater: IntelInflater

    15:35:29.695 INFO  Concordance - GCS max retries/reopens: 20

    15:35:29.695 INFO  Concordance - Requester pays: disabled

    15:35:29.696 WARN  Concordance - 

     

       !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

     

       Warning: Concordance is a BETA tool and is not yet ready for use in production

     

       !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

     

     

    15:35:29.696 INFO  Concordance - Initializing engine

    15:35:29.898 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/Homo_sapiens_assembly38.dbsnp138.vcf

    15:35:29.965 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/NA24149_ILLM.vcf

    15:35:29.993 INFO  Concordance - Done initializing engine

    15:35:29.998 INFO  ProgressMeter - Starting traversal

    15:35:29.998 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Records Processed   Records/Minute

    15:35:40.005 INFO  ProgressMeter -        chr1:86934407              0.2               1814000       10877473.5

    15:35:50.010 INFO  ProgressMeter -       chr1:227747073              0.3               4196000       12580451.7

    15:36:00.014 INFO  ProgressMeter -        chr2:86888818              0.5               6581000       13154984.0

    15:36:10.016 INFO  ProgressMeter -       chr2:211139230              0.7               8981000       13465440.6

    15:36:20.018 INFO  ProgressMeter -        chr3:75445233              0.8              11314000       13571914.1

    15:36:30.018 INFO  ProgressMeter -       chr3:185751375              1.0              13527000       13522492.5

    15:36:40.019 INFO  ProgressMeter -        chr4:93855504              1.2              15871000       13599634.4

    15:36:50.021 INFO  ProgressMeter -         chr5:8124970              1.3              18083000       13558352.0

    15:37:00.022 INFO  ProgressMeter -       chr5:122487383              1.5              20372000       13577712.6

    15:37:10.023 INFO  ProgressMeter -        chr6:40657757              1.7              22592000       13551812.0

    15:37:20.025 INFO  ProgressMeter -       chr6:151496431              1.8              24837000       13544130.1

    15:37:30.029 INFO  ProgressMeter -        chr7:82096722              2.0              27065000       13529005.0

    15:37:40.031 INFO  ProgressMeter -        chr8:25020766              2.2              29409000       13570044.3

    15:37:50.031 INFO  ProgressMeter -       chr8:143342183              2.3              31819000       13633500.7

    15:38:00.034 INFO  ProgressMeter -       chr9:128003522              2.5              34178000       13668010.8

    15:38:10.037 INFO  ProgressMeter -       chr10:99934821              2.7              36558000       13705909.2

    15:38:20.039 INFO  ProgressMeter -       chr11:78157402              2.8              38971000       13751154.1

    15:38:30.039 INFO  ProgressMeter -       chr12:54976345              3.0              41367000       13785859.9

    15:38:40.041 INFO  ProgressMeter -       chr13:56739339              3.2              43815000       13833257.9

    15:38:50.042 INFO  ProgressMeter -       chr14:59366234              3.3              45897000       13766071.5

    15:39:00.046 INFO  ProgressMeter -       chr15:63183969              3.5              47828000       13662020.1

    15:39:10.050 INFO  ProgressMeter -       chr16:69498223              3.7              50009000       13635595.2

    15:39:20.051 INFO  ProgressMeter -       chr17:80642111              3.8              52237000       13623904.1

    15:39:30.056 INFO  ProgressMeter -       chr19:35068034              4.0              54720000       13676694.8

    15:39:40.107 INFO  ProgressMeter -       chr21:14719022              4.2              56800000       13626059.0

    15:39:50.109 INFO  ProgressMeter -        chrX:58400529              4.3              59166000       13647865.7

    15:40:00.115 INFO  ProgressMeter -       chr2:105148418              4.5              61073000       13565899.2

    15:40:10.118 INFO  ProgressMeter -        chr7:10098478              4.7              62228000       13328859.1

    15:40:20.122 INFO  ProgressMeter -       chr14:22101409              4.8              63482000       13128593.3

    15:40:27.864 INFO  Concordance - Shutting down engine

    [May 31, 2022 3:40:27 PM UTC] org.broadinstitute.hellbender.tools.walkers.validation.Concordance done. Elapsed time: 5.00 minutes.

    Runtime.totalMemory()=339738624

    java.lang.NullPointerException

    at htsjdk.variant.variantcontext.VariantContextComparator.compare(VariantContextComparator.java:87)

    at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker$ConcordanceIterator.next(AbstractConcordanceWalker.java:192)

    at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker$ConcordanceIterator.next(AbstractConcordanceWalker.java:174)

    at java.util.Iterator.forEachRemaining(Iterator.java:116)

    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)

    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)

    at org.broadinstitute.hellbender.engine.AbstractConcordanceWalker.traverse(AbstractConcordanceWalker.java:132)

    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)

    at org.broadinstitute.hellbender.Main.main(Main.java:291)

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    nagam surya thanks for the update. Could you try this again with a newer GATK version and see if the error message will give more information? Some of these java.lang.NullPointerException errors that are cryptic have been fixed since 4.1.3.0.

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    Same issue with the latest version as well.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Ok, thanks for the update. I will look into this.

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    Hey, how's it going? Just wanted to let you know about the issue. So the same eror pops up whenever my eval file is large(like 2 or 3 gb). It works fine when my eval file is in between 100 to 400 mb. Thought this information might help you.

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    Also I observed the prgress meter. It always crashes after processing roughly 65 million records. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi nagam surya,

    Thank you for these updates. I think this is a potential contig mismatch between your eval and truth files. Could you post the VCF headers here so that we can verify that the contigs match between the eval file and the truth file?

    Best,

    Genevieve

     

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    Am not sure how to share the headers.Hence I am sharing the vcf files itself(which can be found in the drive link).https://drive.google.com/drive/folders/1gp9j0Ut1LlQkDvf0rnJcE12QJOiiGfRd?usp=sharing 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi nagam surya,

    These VCF files do not match, the headers show different contigs in the two files. For example, there is a contig chrUn_gl000214 in your eval, but it is identified as chrUn_GL000214v1 in the truth. It looks like these VCFs were created with two different reference versions. You'll need to get a truth VCF from a matching reference to your eval VCF to be able to compare the variants.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    Oh ok. Thanks for informing. As these files are very huge am not able to open them in excel and view the meta information. I tried using R but am able to view only first 100 lines. So could you please guide me on how to see the entire meta information so that I can see the referenece versions used for both.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    nagam surya I opened them in terminal using the command line and saw the beginning of the files with the less command.

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    Hi Genevieve, so I found out that my eval VCF was formed using hg19 as reference. I had a look at the resource bundle maintained by GATK and found out that you guys have hg38 reference available for download. Can I know wether you guys have hg19 reference vcf also? If yes, could you please share the link?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    nagam surya I don't have any other links available than what is already in the resource bundle page.

    0
    Comment actions Permalink
  • Avatar
    nagam surya

    Okay. Thanks

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk