Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK Best Practices validation

0

4 comments

  • Avatar
    AM

    Hi Marcella,

    If you've modified your pipeline and validated it, then you're good to go, and your pipeline is valid. For my part, I used GATK's best practices for short variant discovery in Human WES. For the sake of performance, I replaced Picard with Sambamba for marking and removing PCR clones. The benchmark results were the same according to the GIAB dataset, and of course, I found out the limit of detection in the pipeline that I utilized. On the other hand, GATK proposed hard-filtering, which I did not use since some "good" variants could be removed along with the "bad" ones. I believe everyone has their own needs and goals and can utilize these recommendations as they fit their experiment.

    If you could be more specific, and let us know which workflow you are looking to validate, that would be very helpful.

    0
    Comment actions Permalink
  • Avatar
    Marcella Toma

    Hello, thank you for your feedback.

    I have adhered strictly to the GATK documentation's suggested pipeline without making any modifications. I am following the Best Practices step by step. I am inquiring whether, by adhering to the presented Best Practices, I can deem my workflow as "validated."

    If affirmative, does deviating from Illumina WGS or WES data (the types on which GATK Best Practices are tested) render my workflow "invalidated"?

    0
    Comment actions Permalink
  • Avatar
    Laura Gauthier

    Just following the GATK Best Practices does not mean that your pipeline is validated. Typically validation refers to a specific truth comparison that you can define yourself.  For our clinical sequencing and analysis pipeline, we use the Genomes in a Bottle truth data for NA12878 to show the sensitivity and specificity. If you change your pipeline for any reason, you should show that the results for whatever validation you used stay the same or are improved.

    0
    Comment actions Permalink
  • Avatar
    AM

    Hi Marcella,

    How do you define valid and invalid pipelines?

    I assume you are processing clinical data. As far as I know, there are no cutoffs or standards that judge your pipeline as valid or invalid for clinical testing. However, people strive to achieve the best possible results. Following GATK best practices is a solid approach to achieve this. In a clinical setting, it's important to include the false detection rate, and you will have to accept its presence. Even if you apply GATK hard filters, you might lose some good variants.

    To make a long story short, here are the steps to validate your pipeline:

    1. Go and download one of the GIAB samples, "Garvan_NA12878_HG001_HiSeq_Exome."
    2. Run it with your own pipeline.
    3. Download the high-confidence variant VCF and the high-confidence BED file (in the case of exomes). " https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/ "
    4. Get your VCF file from step 2, and your BED file (in the case of exomes).
    5. Benchmark with RTG-Tools or other similar tools.
    6. Calculate your metrics, which include specificity, sensitivity, false positive ratio, and false discovery rate.

    Once you've completed these steps, your pipeline is considered validated, with clear limitations identified through your benchmarking process. Remember, in a clinical setting, variants derived from WES/WGS are subject to further confirmation steps like Sanger sequencing.

    For additional resources and information on GIAB data, refer to the GIAB data index " https://github.com/genome-in-a-bottle/giab_data_indexes?tab=readme-ov-file  "

    I hope this is clear and apologize for the late response!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk