I am analyzing the variant composition of the Japanese Encephalitis Virus using single-end NGS datasets. I used a virus-tuned algorithm called lofreq to generate the VCF files. Upon examining them, I found clear strand bias variants (a trail of mutations appeared only with the forward reads with a constant ratio over time), and now I am trying to find a tool to remove this kind of bias from the database systematically and produce a filtered VCF.
I tried to use FilterMutectCalls, but it did not work as it assumes that the VCF was produced by Mutect2, thus expecting the vcf.stats file to be there. Also, I was not able to use StrandBiasBySample and/or FisherStrand to annotate my VCFs as they are HaplotypeCaller-dependent. Finally, I tried FilterVcf, but it did not work as the VCF does not have a sequence dictionary.
I need help finding a solution to filter out the strand bias of my VCFs
Please sign in to leave a comment.