Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Variant Quality Score Recalibration (VQSR) Follow

7 comments

  • Avatar
    Begonia_pavonina

    This link is broken:

    The human genome training, truth and known resource datasets that are used in our Best Practices workflow applied to human data are all available from our Resource Bundle.

    A link to the tools to make truth and training resource dataset would be welcomed.

    1
    Comment actions Permalink
  • Avatar
    Jacob Shujui Hsu

    I can confirm that the links for VQSR are still missed (May 2021). 

    Also, I found a VQSR parameter discrepancy for omni dataset usage.

    Some previous GATK3 posts indicate the setting for omni dataset and here 

    --resource:omni,known=false,training=true,truth=true,prior=12.0

    Here is the parameter I found in this post :

    --resource:omni,known=false,training=true,truth=false,prior=12.0

     

    Q1: Why are they different? I can not find any post discussing this issue. 

    Q2: Because of the discrepancy above, the parameter recommendations would be needed more than ever. I can not even find the para recommendation for INDEL. 

    0
    Comment actions Permalink
  • Avatar
    Jacob Wang

    @Begonia_pavonina, @Jacob Shujui Hsu

    For the Resource Bundle, I think you can use the following link instead.  

    https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

    1
    Comment actions Permalink
  • Avatar
    Jacob Wang

    @ Jacob Shujui Hsu

    I was also puzzled on whether it should be TRUE or FALSE. The following article explained well.

    In brief, it depends on how conservative you are on the "true variations". As the article discussed, the source of this data is not from NGS data but from the Omni genotyping microarray (2.5M SNPs) of Illumina; in most cases, the SNPs in this dataset can be regarded as true SNPs. 

    URL:    https://zhuanlan.zhihu.com/p/40823886

    1
    Comment actions Permalink
  • Avatar
    Jacob Wang

    I have merged vcf of 96 exome samples and performed VQSR using the default parameters. However, the novel TiTv was small (less than 1), and it seems that even at the first tranche (90), there are a lot of FPs.  And, probably because the TiTv value did not decrease monotonously as the Truth value increased, the following parts of the plot is unreadable. Here I post my pipeline and files, could any one help me to check what was wrong with my analysis. Thank you in advance! 

    https://drive.google.com/file/d/1ArFX2YLPYnzq-3OIlWMrECW3NdkeBwi9/view?usp=sharing, https://drive.google.com/drive/folders/1ZOvM8KLuxBfzmxW2c9noSAu_So1qRaHv?usp=sharing, https://drive.google.com/drive/folders/1skhQluZDqchqRnrZrXFvxN-UPv4UHWf7?usp=sharing

    0
    Comment actions Permalink
  • Avatar
    冰李

    当我使用HaplotypeCaller 时  different linux platfrom ,the same codes generate different result. i was very confused this time. i'm looking forward to hearing messages.

    0
    Comment actions Permalink
  • Avatar
    Qiongfen Lin

    Hi,

    I have a set of WES data (sample size > 100) and WGS data (sample size > 50), and I would like to combine them together for further analysis. I wonder if you would have any suggestions on VQSR, it's better to perform VQSR separately (by WES and WGS) or together (combine first)? Will the large variant number on WGS make the VQSR on WES overcorrected, that is, lead to more false positives? 

    Thank you in advance! 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk