Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GQ=0 upstream of regions with no mapped reads

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi Frederik Valeur Seersholm

    Does this behavior happen with all your samples or is it specific to a single sample?

    Can you provide more details of your bam file, images of the problematic region or maybe share a snippet of your bam file?

    0
    Comment actions Permalink
  • Avatar
    Frederik Valeur Seersholm

    Hi Gökalp Çelik,

    Thanks for your quick reply. It seems to happen across samples (at least in the three I checked).

    Here's a link to a very small bam and fasta that should replicate the problem:

    https://drive.google.com/drive/folders/1nZ6eewdXvL04OhQ9UnCNZ4ukV3fzNRWx?usp=sharing

    I've also attached an IGV overview.

    Best,

    Frederik

     

     

    0
    Comment actions Permalink
  • Avatar
    Frederik Valeur Seersholm

    Hi Gökalp Çelik,

    Did you find time to look at this?

    Thanks!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Frederik Valeur Seersholm

    The INDEL algorithm of HaplotypeCaller has a behavior with repetitive and high complexity regions which results in GQ=0 when assembly cannot decide for a large spanning indel vs an immediate short indel depending on the final assembly produced. Since HaplotypeCaller engine decides its genotypes by looking at the final assembly but not the actual reads mapped directly this behavior is almost similar to having reads mapping multiple regions equally with MAPQ=0. 

    The whole region you have seem to have polymers of Ts and Gs surrounded with palindromes or partly repetitive nucleotides it is highly likely that your case ends with GQ=0 due to this particular behavior.

    If all your data are small like this I would recommend not using HaplotypeCaller but instead you may want to stick with bcftools mpileup or GATK3 UnifiedGenotyper tools all of which uses the mapped reads directly instead of a graph assembly. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk