Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

evaluation of the VCF file which is made using GATK

Answered
0

6 comments

  • Avatar
    Bhanu Gandham

    Hi ,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

     

    1
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    You can use ValidateVariants to validate your VCF file. 

    -1
    Comment actions Permalink
  • Avatar
    Joshua Davies

    Any update on this as it seems ValidateVariants isn't that useful here since you do not use dbsnp in CNV calling nor do you care about the reference base which GATK now defines as "N" in its CNV VCF output in its GermlineCNVCaller as per https://gatk.broadinstitute.org/hc/en-us/articles/13832655155099--Tool-Documentation-Index#GermlineCNVCaller ?

    0
    Comment actions Permalink
  • Avatar
    Andrey Smirnov

    Joshua Davies we do not have a standard tool for performing CNV validation. CNV pipeline (https://github.com/broadinstitute/gatk/tree/master/scripts/cnv_wdl/germline) has few QC steps that will mark bad samples/batches based on things like excessive number of events or poor model fit.

    For internal validations, we usually take a callset with a matching set of samples and calculate concordance, which can be done with something like bedtools.

    0
    Comment actions Permalink
  • Avatar
    Joshua Davies

    Thanks for getting back to me, Andrey Smirnov. However I was really referring to the validation of the VCF itself, i.e. whether the format would be valid for downstream clinical decision tools etc.

    0
    Comment actions Permalink
  • Avatar
    Laura Gauthier

    We're still using VCF4.2, so there's not a lot in the spec about SVs.   Most tools are going to depend on INFO annotations of their own creation without a lot of standardization.  If you run into specific issues with specific tools we can take a look, but in terms of whether annotations that downstream tools expect to see will be there, it's hard to say.

    The latest version of PostProcessGermlineCNVCalls _should_ apply the correct reference base to the ref allele if you pass it the reference with -R

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk