Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Interval Filtering -- How does it actually work?

0

4 comments

  • Avatar
    Genevieve Brandt

    Hi Alijah O'Connor, have you seen our documentation on intervals and how GATK uses those lists? https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

    There is also an option in our tools to change the combining behavior with --interval-set-rule. You can see an explanation in the HaplotypeCaller documentation.

    0
    Comment actions Permalink
  • Avatar
    Alijah O'Connor

    For sure, but I'm more trying to get at the exact operations that are going on under the hood.  The documentation doesn't really help with the questions I posed in the original post.

    An additional related question would be, do the entire reads get filtered out if only part of the read overlaps or just the parts of reads that don't overlap?

    But, I'm most interested in the questions from the original post

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Alijah O'Connor here are the answers to these questions:

    1. Yes, every read is checked. Each mate is checked independently and so one could be left behind if it does not overlap with the interval.
    2. Yes, an overlap can be just one base. The start and end position of each read is checked and reads are kept that overlap with the end position of the interval.
    3. Secondary/supplementary reads are not treated differently regarding intervals. However, other GATK filtering might lose those reads. You can get more information about those filters with the specific tool you are running.
    4. The whole read is kept if part of it overlaps with an interval. Something to keep in mind though, is that this is different than how HaplotypeCaller defines regions. With HaplotypeCaller and regions, a read will get clipped if it goes over a region boundary. The rest of the read may be present in the next region though.

    Hope this helps!

    1
    Comment actions Permalink
  • Avatar
    Alijah O'Connor

    Thank you!  Yes, this very helpful in understanding exactly what behavior to expect when I'm using this parameter.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk