How to remove strand bias from a virus-derived signal-end NGS dataset
Dear all,
I am analyzing the variant composition of the Japanese Encephalitis Virus using single-end NGS datasets. I used a virus-tuned algorithm called lofreq to generate the VCF files. Upon examining them, I found clear strand bias variants (a trail of mutations appeared only with the forward reads with a constant ratio over time), and now I am trying to find a tool to remove this kind of bias from the database systematically and produce a filtered VCF.
I tried to use FilterMutectCalls, but it did not work as it assumes that the VCF was produced by Mutect2, thus expecting the vcf.stats file to be there. Also, I was not able to use StrandBiasBySample and/or FisherStrand to annotate my VCFs as they are HaplotypeCaller-dependent. Finally, I tried FilterVcf, but it did not work as the VCF does not have a sequence dictionary.
I need help finding a solution to filter out the strand bias of my VCFs
Many thanks
-
Fadi Alnaji FilterMutectCalls relies not only on the stats file but on several annotations produced by Mutect2. Is it an option to use Mutect2 instead of lofreq?
-
Thank you very much David; yeah, I am just not sure of the compatibility of Mutect2 with the viral minority variants and quasispecies. Additionally, I am dealing with a large number of samples and was hoping not to repeat all of them using another pipeline. Nonetheless, since there is no specific solution for this, the best thing to do is to run one or two samples with Mutect2 and compare the results with lofreq.
-
It's worth a try, at least. It can't be that expensive to re-run on a genome that small, I would hope. We don't have much experience with viruses, but if I had to guess I would recommend running Mutect2 in mitochondria mode as the best approximation.
-
Thank you, David; yes, I will try that!
Please sign in to leave a comment.
4 comments