Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

The calculate rule of genotypeconcordance, sensitivity and specificity

Answered
0

4 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi panhong liu,

    Our best explanation for why you would be seeing this is on the tool documentation page for GenotypeConcordance: https://gatk.broadinstitute.org/hc/en-us/articles/4418051360155-GenotypeConcordance-Picard-

    Please take a look and let me know if you have any questions about that page. The PDF you linked was not written by our team so I'm not familiar with the contents.

    It's also helpful to provide examples of your confusing results.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    panhong liu

    Thank you for your reply.

    I have calculated the genotype called from chip array and whole genome sequencing from the same sample using the GATK genotypeconcordance.  Then I got the *detail_metrics and *summary_metrics file. I want to know how the  GENOTYPE_CONCORDANCE in *summary_metrics file was calculated from the information of the *detail_metrics file. Thanks a lot.

    the *detail_metrics info:

    VARIANT_TYPE    TRUTH_SAMPLE    CALL_SAMPLE    TRUTH_STATE    CALL_STATE    COUNT    CONTINGENCY_VALUES
    SNP    S1    S1    MISSING    HET_REF_VAR1    881    FP,TN
    SNP    S1    S1    MISSING    HOM_VAR1    91    FP
    SNP    S1    S1    HOM_REF    HET_REF_VAR1    4614    FP,TN
    SNP    S1    S1    HOM_REF    HOM_VAR1    83    FP
    SNP    S1    S1    HET_REF_VAR1    MISSING    11229    TN,FN
    SNP    S1    S1    HET_REF_VAR1    HOM_REF    3892    TN,FN
    SNP    S1    S1    HET_REF_VAR1    HET_REF_VAR1    123752    TP,TN
    SNP    S1    S1    HET_REF_VAR1    HOM_VAR1    1788    TP,FP
    SNP    S1    S1    HET_REF_VAR1    HET_REF_VAR2    10    FP,TN,FN
    SNP    S1    S1    HET_REF_VAR1    HOM_VAR2    3    FP,FN
    SNP    S1    S1    HET_VAR1_VAR2    MISSING    28    FN
    SNP    S1    S1    HET_VAR1_VAR2    HOM_REF    6    FN
    SNP    S1    S1    HET_VAR1_VAR2    HET_REF_VAR1    51    TP,FN
    SNP    S1    S1    HET_VAR1_VAR2    HOM_VAR1    3    TP,FN
    SNP    S1    S1    HOM_VAR1    MISSING    6311    FN
    SNP    S1    S1    HOM_VAR1    HOM_REF    316    FN
    SNP    S1    S1    HOM_VAR1    HET_REF_VAR1    1529    TP,FN
    SNP    S1    S1    HOM_VAR1    HOM_VAR1    101027    TP
    SNP    S1    S1    HOM_VAR1    HET_REF_VAR2    5    FP,FN
    SNP    S1    S1    HOM_VAR1    HOM_VAR2    7    FP,FN
    SNP    S1    S1    NO_CALL    HET_REF_VAR1    418    EMPTY
    SNP    S1    S1    NO_CALL    HOM_VAR1    409    EMPTY
    SNP    S1    S1    VC_FILTERED    MISSING    5806    EMPTY
    SNP    S1    S1    VC_FILTERED    HOM_REF    370    EMPTY
    SNP    S1    S1    VC_FILTERED    HET_REF_VAR1    533    EMPTY
    SNP    S1    S1    VC_FILTERED    HOM_VAR1    647    EMPTY
    SNP    S1    S1    IS_MIXED    HET_REF_VAR1    39    EMPTY
    SNP    S1    S1    IS_MIXED    HOM_VAR1    8    EMPTY


    the *summary_metrics file:

    VARIANT_TYPE    TRUTH_SAMPLE    CALL_SAMPLE    HET_SENSITIVITY    HET_PPV    HET_SPECIFICITY    HOMVAR_SENSITIVITY    HOMVAR_PPV    HOMVAR_SPECIFICITY    VAR_SENSITIVITY    VAR_PPV    VAR_SPECIFICITY    GENOTYPE_CONCORDANCE    NON_REF_GENOTYPE_CONCORDANCE
    SNP    S1    S1    0.891901    0.957888    ?    0.926231    0.981181    ?    0.907013    0.968247    0.950731    0.938495    0.939947

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi panhong liu,

    Have you seen these tables on the Picard site? We are currently working on trying to move this information over to the GATK site, but this is what we have for now:

    In the code, the summary metrics are where the calculations are happening, whereas the detail metrics are just counts. Check out these files for more details about the calculations, there are helpful details in the comments too:

    Let me know if there is a specific metric you want more details on.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Dahn-young Dong

    I have two vcf files of the same sample including only variants. Why do the results of genotype-concordance and non-ref genotype concordance vary drastically? where concord is 77% and non-ref concord in 22%. I would have expected them to be the same as there is no ref-ref match

     

    https://github.com/broadinstitute/picard/issues/1943

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk