Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

As you recommended doing sensitivity and PPV evaluation using truth data in the seminar, what truth data do you recommend using for that purpose??

Answered
1

2 comments

  • Avatar
    Mark Fleharty

    We typically used pooled Hapmap samples.  We pool them in equal fractions of 5, 10 and 20 samples.  This gives us a distribution of truth variants as low as 2.5% allele fraction.

     

    1
    Comment actions Permalink
  • Avatar
    Mark Fleharty

    I would also encourage you to take a look at:

    https://www.biorxiv.org/content/10.1101/825042v1.full

    This describes. LinSeq, which is one of the ways we use to construct somatic truth data.

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk