Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

(How to) Filter variants either with VQSR or by hard-filtering Follow

9 comments

  • Avatar
    Andrew Zhang

    Very useful ! Well, I am wondering if the data is whole genome sequencing,is it necessary to add DP < min || DP > 2.5 times avrage depth in Hard-filter step

    Look forward to your favourable reply.

     

    1
    Comment actions Permalink
  • Avatar
    Min Ou

    I cannot view the files in the gs://gcp-public-data--broad-references/hg38/v0

    It seems we need Storage Object Viewer permission.

    0
    Comment actions Permalink
  • Avatar
    Mareike Wendorff

    Hi

    I have had the same problem like Min Ou with downloading the data. But I was able to find them on https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0;tab=objects?prefix= .

    Unfortunately I still cannot ran the VariantRecalibrator with the suggested parameters, as the relevant Info fields are lacking and as the individual information is not included it also cannot be added by hand. Therefore I get the error:

    A USER ERROR has occurred: Bad input: Values for FS annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations.

     

    Is there a way to get the missing INFO fields for the resource datasets?

     

    1
    Comment actions Permalink
  • Avatar
    stl23

    The syntax for specifying argument tags in VariantRecalibrator has changed. I came across an error as "No argument value found for tagged argument:" using GATK(v4.2.0.0). It's ok when I changed the parameters like this: --resource:hapmap,known=false,training=true,truth=true,prior=15.0 /trainee/ref/hapmap_3.3.hg38.vcf

    .  

    0
    Comment actions Permalink
  • Avatar
    Olivia R

    Hi, I am performing a whole exome sequencing study and  I am wondering if I could use some hard filter steps  like QD<2, DP<3 & GQ<20   after performing the VQSR filter or if it is not recommended using both together? My study is in germline, and I have a small sample size (around 70).  

    0
    Comment actions Permalink
  • Avatar
    Matthew Galbraith

    As mentioned by @stl23, example commands as currently given above require updating to account for syntax change, see https://gatk.broadinstitute.org/hc/en-us/articles/360035532192

    0
    Comment actions Permalink
  • Avatar
    Max Bär

    Dear GATK development team,

    I am working on some fairly rare/neglected parasites, of which we have hundreds of samples sequenced at a 25X coverage (some samples up to 50X) across the whole genome. I'd like to use VQSR filtering and incorporate it into our Nextflow pipeline, however there are no "True" datasets out there. I was wondering if I could create one myself, by selecting a subset of the best quality files and conducting joint genotyping and use the resulting cohort VCF as a True dataset to correct the others. Would that be possible? 

    All my best,
    Max

    1
    Comment actions Permalink
  • Avatar
    Tehseen Afridi

    Subject: Issue with undefined variables in VariantFiltration tool

    Dear GATK Community,

    I am encountering warnings about undefined variables when using the VariantFiltration tool in GATK version gatk4-4.5.0.0-0. Specifically, I am applying filters based on the `MQRankSum` and `ReadPosRankSum` annotations in my VCF file. Despite confirming the presence of these annotations and modifying the JEXL expressions accordingly, the warnings persist.

    Key details:

    - GATK version: gatk4-4.5.0.0-0
    - Command used: VariantFiltration with modified JEXL expressions:
      - `MQRankSum < -12.5` for MQRankSum filter
      - `ReadPosRankSum < -8.0` for ReadPosRankSum filter
    - Issue: Warnings about undefined variables for `MQRankSum` and `ReadPosRankSum`

    "15:14:50.582 WARN  JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum
    15:14:50.582 WARN  JexlEngine - ![0,9]: 'MQRankSum < -12.5;' undefined variable MQRankSum
    "

    I kindly request your guidance in resolving this issue. If there are known bugs or compatibility concerns related to these annotations in the VariantFiltration tool of GATK version gatk4-4.5.0.0-0, please advise.

    Thank you for your valuable support. I look forward to your response.

    Sincerely,
    Afridi

    2
    Comment actions Permalink
  • Avatar
    Isadora Machado Ghilardi

    Hello!

    I have a question about the application of CollectVariantcallingmetrics. Should I use in the separeted files generated by the hard filtering (indels and snps)? Should I use in my cohort file? Because the result shown here is divided into indels and snps  https://github.com/broadgsa/gatk/blob/master/doc_archive/tutorials/(howto)_Evaluate_a_callset_with_CollectVariantCallingMetrics.md

    I'm confused. Thank you. 

     

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk