b) What does the --max-allowed-low-hets MTLowHeteroplasmyFilterTool parameter mean? the help argument says:
Number of low het sites allowed to pass other filters before filtering out all low het sites. Default is 3 Default value: 3.
I was also wondering wether aside from that parameter there is a threshold on the alternative allele fraction so a variant it is flagged as 'mt_many_low_hets'.
Thanks so much
Thank you for your post. We are currently prioritizing bugs and errors with GATK tools. You can read more about our forum guidelines and the topics here: Forum Guidelines. We are tracking documentation posts such as this and will circle back to it.
This argument is a filter that gets activated once you reach a certain number of low heteroplasmy sites. The default low heteroplasmy threshold is 10%. The MTLowHeteroplasmyFilterTool tool counts the number of sites that do not meet this threshold. Once it gets to a certain number (the --max-allowed-low-hets, default=3), all the sites that are low heteroplasmy will be marked as mt_many_low_hets and will be filtered out. You can increase the --max-allowed-low-hets argument to increase the sensitivity or decrease the argument to increase the likelihood of low heteroplasmy sites getting filtered out.
Thanks so much for the explanation. It's perfectly clear now.
I want to know whether this argument is to filter the multi-allel if one site has more than three lowheteroplasmy sites or filter all heteroplasmic sites in one sample who has more than 3 low heteroplasmy sites?
I believe your second explanation would be the most accurate. The argument will filter out all heteroplasmic sites in a particular sample once more than 3 low heteroplasmy sites are detected. As Genevieve noted, you can increase or decrease the allowed number of sites to change the sensitivity of the argument. I hope this answers your question.
That's helpful. Thanks!
However, I have another problem. As we know, the process includes the filter of NuMT during mutation calling of mitocondria. I truly want to know the principio or the rule for that.
I'm glad to hear that the explanation was helpful. I was able to find this previous forum post with an extensive discussion of the mitochondrial pipeline and the filtering of NuMTs. One of the developers provided the following explanation in the post:
"We use a Poisson distribution taking the median autosomal coverage that was observed and assume that NuMT insertions could have happened multiple times (based on observations we've made in our samples). Then we filter out any site that has a lower number of reads supporting the alternate allele than a cutoff based on the Poisson distribution I just described. This is very broad and will also filter out real low AF mitochondrial variants, so it's a balance as far as how sensitive or precise you'd like to be with these low AF calls."
Additionally, here is the related tool documentation. Is this helpful in understanding the NuMT filtering logic?
Please sign in to leave a comment.