Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

disabled all read filters but reads are still filtering

0

5 comments

  • Avatar
    SkyWarrior

    Hi Sinem Selvi

    You might also want to disable downsampling in HaplotypeCaller using 

    --max-reads-per-alignment-start 0

    parameter. 

     

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Hi SkyWarrior

    Thank you so much. I think that was one of the reasons. The depth increases to 694 but still less than the input bam file. I will keep this parameter for this data. Do you have any other suggestions for why the rest of the reads are removed? 

    Thank you in advance

    0
    Comment actions Permalink
  • Avatar
    SkyWarrior

    Hi again. 

    Although you are disabling your read filters and downsampling there is still the additional local reassembly and pairHMM doing its job based on not only read filters but also base qualities and mapping qualities. Local reassembly algorithm cleans many non-useful reads and kmers based on the base and mapping qualities therefore your depth in IGV will not match perfectly to the depth in bamout or HaplotypeCaller VCF. I hope this answers your question. 

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Thank you SkyWarrior

    The problem is solved after submitting the target regions. Do you know why the target file affected depth for the variants for that much?

     

     

    0
    Comment actions Permalink
  • Avatar
    James Emery

    Hello @Sinem Selvi.

    If you are looking at the Bamout for HaplotypeCaller we don't really expect that to be representative of every read in your sample as there are a number of internal filters that are applied to the reads that are independent of the input filtering. Specifically we filter reads based on MappingQuality, length after trimming low quality, overhanging bases, excessive low BQ,  etc... These filters are part of the genotyping code and thus are not disabled by the `--disable-tool-default-read-filters` argument.  Furthermore, reads can fail to be re-aligned to their best scoring haplotypes or be poorly concordant with any haplotypes, which can also cause them to be dropped from the bamout. 

    Without more context from your site it is hard to say exactly what caused it in this particular case.  Very often the signal you are seeing here (lots of reads not in the bamout) is a sign that there might have been a problem with the local assembly causing problems. Many of the assembly related arguments in this article might be worth testing with if you need to run this particular case to ground:
    https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk