Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Germline pipeline - incorrect read depth reported after filtering variants.

0

1 comment

  • Avatar
    Gökalp Çelik

    Hi Jaime Alvarez Benayas

    The numbers are quite correct and although they may not match exactly what you observe under IGV, they are derived from the local reassembly and realignment operation. We have a short writing to explain that in the link below.

    https://gatk.broadinstitute.org/hc/en-us/articles/360035532252-Allele-Depth-AD-is-lower-than-expected 

    Looking at those numbers in the GVCF which also includes reference confidence values inside the distribution of those values are as follows

    |----|----|-----|------|-------|-----------|
    | T  | TA | TAA | TAAA | TAAAA | <NON_REF> |
    |----|----|-----|------|-------|-----------|
    | 34 | 16 |  14 |   26 |     6 |         0 |
    |----|----|-----|------|-------|-----------|

    T is the REF allele but that is also included and <NON_REF> is for anything that is not REFERENCE but also not those chosen alternates. 

    Looking at those values in the combined GVCF

    |-------|------|--------|---------|---|----|-----|----------|-----------|-----------|
    | TAAAA | TAAA | TAAAAA | TAAAAAA | T | TA | TAA | TAAAAAAA | TAAAAAAAA | <NON_REF> |
    |-------|------|--------|---------|---|----|-----|----------|-----------|-----------|
    |    34 |    0 |     16 |      14 | 0 |  0 |   0 |       26 |         6 |         0 |
    |-------|------|--------|---------|---|----|-----|----------|-----------|-----------|

    Values match those alleles like the above. The reason why the reference allele changes when combined is due to other samples showing more tendency towards the deletion alelles rather than insertion allele found in the sample you showed. Since all sites are represented as the left-most aligned versions in the GVCF/VCF file once the sample is alone you have T to TAAA insertion picked for your sample originally. When combined with others which prefer deletion there then your insertion alelle get shifted towards right side and becomes TAAAA to TAAAAAAA which is not a wrong representation for that single sample.

    Once genotyped some more of those alleles are prunned therefore the reference allele may show slight changes in the final VCF so whatever you observe here is the correct and the expected behavior for GATK. 

    I hope this helps. 

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk