Is it recommended to filter out SNPs around INDELS ?
Hi, thanks a lot for this amazing tool,
I have seen in different variant filtration pipelines people recommending to remove/mask SNPs that are located close to INDELs (e.g., 5-10 bp from INDELS position) for the following reason:
"Furthermore, we will exclude SNPs that are in the close proximity of indels. That is, because the proper alignment of sequences around indels can often be problematic and produce false positives."
However, I assume this is tool dependent and if I understand correctly HaplotypeCaller has made significant improvement in reads alignment around indels compared to previous tools.
Therefore I am wondering whether this filtration procedure is still relevant or not ? (on top of other classic filtration parameters (QD, Qual, FS, MP, etc...)
Thank you for your answer
-
Hi Hugo DENIS
HaplotypeCaller uses local assembly and realignment to capture SNPs and INDELs. However as you mentioned there may be cases where a single nucleotide change in a STR region may show as a false SNP due to misalignment. Those variants are often filtered due to strand bias or positional bias. If you wish to further check if those variants are valid or not you may wish to check for other metrics such as inbreeding coefficient to see if those variants only show as a particular genotype and violate HW equilibrium. If that is the case those are most likely false positives to be prunned. To do this you may need to have a cohort of variants present. Also you may wish to compare those sites to gnomAD v4.1 calls to see if there are remarks present in the database for those sites especially for HW equilibrium and inbreeding coefficient.
I hope this helps.
Please sign in to leave a comment.
1 comment