Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenotypeGVCFs -stand-call-conf filtering high-QUAL variants

0

3 comments

  • Avatar
    Bhanu Gandham

    Hi Tyler,

     

    We recommend that you upgrade to GATK4 latest version and try again. We do not support GATK3 anymore and as you mentioned there is a high possibility that this issue has been resolved in the newer versions. 

    0
    Comment actions Permalink
  • Avatar
    Tyler Medina

    Hi Bhanu,

    I've rerun a batch using GATK/4.1.7.0, and I'm still seeing the same effect. In this particular batch:

    28 variants were 'missing' when run with my additional control sample.

        > of these, 24 originally had 30<QUAL<32, so these make sense since the default cut-off is 30

        > the other 4 originally had 40<QUAL<60

    I then re-ran with stand-call-conf = 0, as before, to check if the missing variants' QUAL scores truly did fall far enough to fail the original stand-call-conf = 30 filter. No variants are 'missing', as expected.

    Of the 28 variants that were previously missed:

        > 23 now have QUAL < 30

        > 1 went from QUAL = 31.37 --> 43.05

        > The other 4 that originally had 40<QUAL<60 are now 40<QUAL<80

    So I'm still seeing the same effect. I don't know why these 5 variants are being filtered during genotyping, or why the stand-call-conf level is determining their filtering. However, I do observe once again that when run with my additional control sample, these 5 variants each have at least 2 alternate alleles, whereas the 23 low-QUAL variants each only have 1. Again, this makes me think that the QUAL filtering is normalized by alleles somehow, but it's just a hunch.

    0
    Comment actions Permalink
  • Avatar
    Tyler Medina

    Hi again,

    I think I found my answer in Github open issue 5793.

    Long story short, looks like the QUAL score filtering is performed per-allele. Alleles that fail are removed, but the output QUAL score is unaffected. Hence the presence of seemingly high-QUAL variants with lots of alleles when stand-call-conf is turned down.

    From davidbenjamin's comment:

    "It's kind of tricky because suppose eg that we have three alt alleles each with an allele qual of 19, so that the overall variant qual is roughly 3x19 = 57. If we filter alleles with a confidence of 20, we get no alleles and the variant qual changes to 0.

    Now, if instead of filtering by allele we only filter by overall variant qual we then have to keep an arbitrary number of sketchy alleles. I mean, what if we have 30 alleles each with a qual of 1? The current behavior seems preferable to me because the usual question users would ask downstream is whether some allele is real, not whether some site exhibits variation. As long as we define -stand-call-conf to pertain to alleles everything is consistent."

    From ldgauthier's comment:

    "So we definitely don't update the QUAL if we drop alternate alleles" ... "Note that the QUAL is based off of the AFResult that had alleles removed if they exceeded the output limit, but not if they had less evidence than the calling confidence threshold."

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk