Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Variant Quality Score Recalibration (VQSR) Follow

4 comments

  • Avatar
    Begonia_pavonina

    This link is broken:

    The human genome training, truth and known resource datasets that are used in our Best Practices workflow applied to human data are all available from our Resource Bundle.

    A link to the tools to make truth and training resource dataset would be welcomed.

    1
    Comment actions Permalink
  • Avatar
    Jacob Shujui Hsu

    I can confirm that the links for VQSR are still missed (May 2021). 

    Also, I found a VQSR parameter discrepancy for omni dataset usage.

    Some previous GATK3 posts indicate the setting for omni dataset and here 

    --resource:omni,known=false,training=true,truth=true,prior=12.0

    Here is the parameter I found in this post :

    --resource:omni,known=false,training=true,truth=false,prior=12.0

     

    Q1: Why are they different? I can not find any post discussing this issue. 

    Q2: Because of the discrepancy above, the parameter recommendations would be needed more than ever. I can not even find the para recommendation for INDEL. 

    0
    Comment actions Permalink
  • Avatar
    Jacob Wang

    @Begonia_pavonina, @Jacob Shujui Hsu

    For the Resource Bundle, I think you can use the following link instead.  

    https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

    0
    Comment actions Permalink
  • Avatar
    Jacob Wang

    @ Jacob Shujui Hsu

    I was also puzzled on whether it should be TRUE or FALSE. The following article explained well.

    In brief, it depends on how conservative you are on the "true variations". As the article discussed, the source of this data is not from NGS data but from the Omni genotyping microarray (2.5M SNPs) of Illumina; in most cases, the SNPs in this dataset can be regarded as true SNPs. 

    URL:    https://zhuanlan.zhihu.com/p/40823886

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk