Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Why does --sample-ploidy/-ploidy option not work in HaplotypeCaller or GenotypeGVCFs?

0

4 comments

  • Avatar
    Bhanu Gandham

    Hi,

     

    This site might not be haploid to begin with hence the AD values. HaplotypeCaller is doing the right thing though and you can tell by the GT filed that it is providing Haploid genotype calls.

    If  you question is based on results from a pileup based tool showing a different result then we can look into why that might be.

    0
    Comment actions Permalink
  • Avatar
    Mark Farman

    I'm working with a haploid organism. Therefore, it can’t possibly have multiple alleles at all of the sites in the pasted data snippet, as is shown in the AD field. Instead, what the data tell us is that the region in question is duplicated in the test genome but not in the reference. BUT IN TERMS OF (HAPLOTYPING/GENOTYPING, THE TEST GENOME IS WILD-TYPE (I.E. NO SNPS) AT EACH OF THE REFERENCE GENOME SITES REPORTED. Therefore, these are all false SNP calls. GATK should know this because it’s been told it’s calling in a HAPLOID organism. This is an ERROR in the program. You can’t say an organism only has the ALT allele (which is what the GT=1 field implies), when you’ve detected two alleles. In fact, based on it having returned these false calls, as well as many other bizarre genotyping decisions, I can tell that GATK is making invalid assumptions, and arbitrary decisions, in the calculations it makes when genotyping in triplicated/quadruplicated/etc. chromosome regions in haploids. This in turn tells me the genotyper doesn’t properly consider haploid genome organization and dynamics and their impacts on the haplotyping/genotyping process.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Mark Farman

     

    Sorry for the delay. We are looking into this and will get back to you shortly.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    HI Mark Farman

     

    I checked with our dev team and this is what they said:

    HaplotypeCaller is expecting messy reads that may not always align correctly. In that situation, even in a haploid organism, it's possible to see reads supporting two or even more alleles at a given site: one of the alleles is real, and the others are driven by mapping artifacts or sequencing errors. In this case it's seeing a lot of support for the reference allele and even more support for the alt allele, so it calls alt. It always lists the other possible genotypes it considered, but it isn't saying that the sample is het, it's saying there's some evidence it's ref, and some evidence it's alt, this is how much for each. This isn't fundamentally a problem of ploidy, but that something messy is happening at those sites.
    It sounds like you are expecting duplications, which can lead to messy alignments that can be mistaken for SNPs. HaplotypeCaller should be able to handle small duplications in high-complexity regions, but if those conditions don't hold, it can make errors. Not knowing the nature of these duplications, they could also have resulted in a paralogous sequence variant, which would look "het". And of course there could be a bug. Showing images of pile ups (or sharing a link to a bam trimmed down to these regions) would help us figure out what's going on in this case.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk