Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

depth of indel in a uncalled snp site

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Sinem Selvi

    HaplotypeCaller uses local reassembly of reads using debruijin graphs and PairHMM to generate its calls in detected active regions of your alignment files. Due to this behavior what you observe in IGV as the pileup of reads may not fully represent what HaplotypeCaller sees and genotypes in that region. If you wish to check what HaplotypeCaller generates as assembly in that region you may use the 

    --bam-output bamout.bam

    option to generate a bam file of genotyped haplotypes of the active region and you may compare that to the original alignment file in IGV. From the command line that you posted you seem to have used this option already. You may check the hc_S16.bam file with IGV to see the actual calls. 

    Looking at your issue here we have a few suggestions for you to try before you decide the best option for your calls. 

    - Interval padding you use for your bed file is 20 basepairs which may be a little too low for our recommendation. We have seen by multiple user feedbacks and our own internal testing that haplotypecaller may generate slightly different calls due to reference intervals provided for the walker. As more or less reference is provided, different number of reads are available for the tool to generate and realign to the provided reference therefore graph construction and genotyping may slightly change due to this behavior. This may result in certain calls to be unexpected or missing. To change this behavior you may decide to remove bed file usage completely which gives the tool ultimate collection of reads to process and realign to the whole active region freely (recommended option) resulting in less false negatives and better assemblies or if you still have to use it increase interval padding to at least 150 to 250 basepairs to provide more room to recover and assemble haplotypes

    - Certain experimental options of HaplotypeCaller may provide more refined assemblies and less false negatives but these options are still beta such as 

    --linked-de-bruijn-graph true

    More information can be found in the following article

    https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant 

    If you still observe an issue you may share a slice of your bam file including up to 1000 BPs upstream and downstream of the region of interest and upload it according to the instructions given below and we may try to replicate your issue. 

    https://gatk.broadinstitute.org/hc/en-us/articles/360035889671 

    I hope this helps. 

    PS: The region you are checking in this instance is a quite variable region with lots of INDELs and SNPs with mostly low qual scores. A very GC rich region and you may observe lots of bad assemblies with short reads in there so be careful. 

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    1- represented bam file is output of HaplotypeCaller.

    2-Actually before solving the problem, I would like to know what caused the issue of not matching variant informations on hc bam and vcf file?

     

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Sinem Selvi

    The answer lies with uninformative reads in the assembly/bamout and actual bam file. 

    In terms of local assembly not all reads are equal to HaplotypeCaller therefore some get filtered out in the calculation steps however this does not mean they are not used by local assembly or the bamout file. 

    As we mentioned before local reassembly can be more complicated that it looks especially when there are repetitive regions of GC rich complex regions. In that case some of the reads although they seem to be supporting a variant they are eliminated by HaplotypeCaller and local assembly filters. 

    If you still think that you cannot call a known variant you can send us a snippet of the bam file and we may check those reasons in more detail. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk