Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenotypeGvcfs has formatting issues in both v4.1.6.0 as v4.1.7.0

0

8 comments

  • Avatar
    Bhanu Gandham

    Hi ABours

     

    This is not a invalid vcf and not something to worry about.. ValidateVariants is going beyond vcf specs. I agree with you and we are working on creating better ValidateVariants messaging for this issue.

    We have created a ticket for this and you can follow its progress here: https://github.com/broadinstitute/gatk/issues/6630

    0
    Comment actions Permalink
  • Avatar
    ABours

    Hi Bhanu,

    Thanks for your reply and making a ticket.

    But like I said, you already have a ticket for this ValidateVariants issue, to allow these lines. However, I just wanted to point out that GenotypeGVCFs created this incorrectly formatted site, thus maybe you can also have a look at why GenotypeGVCFs keeps these sites in the vcf while there is no alternative allele except the spanning deletion, which already should be in the vcf as mark by deletion.

    I'm sorry if this point/comment got lost in my previous post.

    Best,

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi ABours

     

    Apologies I seemed to have missed that info. Let me look into this and get back to you. 

     

    PS: We are facing high volume of questions on the forum lately and it might take me a few days to get back to you. Rest assured this is on my radar. 

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi ABours

    We would like to investigate by reproducing the error caused by GenotypeGVCFs. Can you please share a snippet of the data to reproduce? You can find information to share data here: https://gatk.zendesk.com/hc/en-us/articles/360035889671

    0
    Comment actions Permalink
  • Avatar
    ABours

    Hi Bhanu,

    I believe I shared the data (files_to_share_with_gatk_ABours.tar.gz), can you please confirm that you have?

    Within the snippet I ensured that the three different version of the formatting error occurs.

    Best,

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi ABours

    We are looking into it and will get back to you shortly.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi ABours

     

    We looked into it and we don't think this is necessarily an unwanted behaviour. This call is trying to tell us that something is going on in this region. You can see that the PL values have 3 0s which means that it is equally likely that this variant could be alt1, alt2 or non-ref but it chose to represent it as hom alt1. This could be due to repeats in that region. This may not be the most accurate call but the tool is indicating that something is going on in this region and it should be inspected instead of not calling anything at all.  

     

    For more info on how PL ia calculated: https://gatk.broadinstitute.org/hc/en-us/articles/360035890451-Calculation-of-PL-and-GQ-by-HaplotypeCaller-and-GenotypeGVCFs

    1
    Comment actions Permalink
  • Avatar
    ABours

    Hi Bhanu,

    Thank you for checking it, and it's good to know what's happening here. It's just striking that I didn't have this on a similar run with a different reference.

    Best,

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk