Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MTLowHeteroplasmyFilterTool documentation

Answered
0

7 comments

  • Avatar
    Bhanu Gandham

    Hi ,

     

    Thank you for your post. We are currently prioritizing bugs and errors with GATK tools. You can read more about our forum guidelines and the topics here: Forum Guidelines. We are tracking documentation posts such as this and will circle back to it.

     

    Best,

    Bhanu

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi jorgez,

    This argument is a filter that gets activated once you reach a certain number of low heteroplasmy sites. The default low heteroplasmy threshold is 10%. The MTLowHeteroplasmyFilterTool tool counts the number of sites that do not meet this threshold. Once it gets to a certain number (the --max-allowed-low-hets, default=3), all the sites that are low heteroplasmy will be marked as mt_many_low_hets and will be filtered out. You can increase the --max-allowed-low-hets argument to increase the sensitivity or decrease the argument to increase the likelihood of low heteroplasmy sites getting filtered out.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    jorgez

    Hello Genevieve,

    Thanks so much for the explanation. It's perfectly clear now.

    Best

    Jorge

    0
    Comment actions Permalink
  • Avatar
    eric

    Hello jorgez,

    I want to know whether this argument is to filter the multi-allel if one site has more than three lowheteroplasmy sites or filter all heteroplasmic sites in one sample who has more than 3 low heteroplasmy sites?

    Best

    Eric

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi eric,

    I believe your second explanation would be the most accurate. The argument will filter out all heteroplasmic sites in a particular sample once more than 3 low heteroplasmy sites are detected. As Genevieve noted, you can increase or decrease the allowed number of sites to change the sensitivity of the argument. I hope this answers your question.

    Kind regards,

    Pamela

    1
    Comment actions Permalink
  • Avatar
    eric

    Hi Pamela,

    That's helpful. Thanks!

    However,  I have another problem. As we know, the process includes the filter of NuMT during mutation calling of mitocondria. I truly want to know the principio or the rule for that.

    Kind regards,

    Eric

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi eric,

    I'm glad to hear that the explanation was helpful. I was able to find this previous forum post with an extensive discussion of the mitochondrial pipeline and the filtering of NuMTs. One of the developers provided the following explanation in the post:

    "We use a Poisson distribution taking the median autosomal coverage that was observed and assume that NuMT insertions could have happened multiple times (based on observations we've made in our samples). Then we filter out any site that has a lower number of reads supporting the alternate allele than a cutoff based on the Poisson distribution I just described. This is very broad and will also filter out real low AF mitochondrial variants, so it's a balance as far as how sensitive or precise you'd like to be with these low AF calls."

    Additionally, here is the related tool documentation. Is this helpful in understanding the NuMT filtering logic?

    Kind regards,

    Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk