Sequencing depth normalisation for variant discovery (DNA-seq)
Hey Community,
I am really hoping someone can help me with this, I have read several posts and blogs but have not been able to address the question.
In a genomics sequencing run, I have samples which are low sequencing depth and some have more, I have performed variant discovery using GATK, and variants number positively correlated to the sequencing depth, I want to be able to make comparisons between my experiments and control samples - but I cannot do it because of varying sequencing depth, I do not want to subsample my input to perform variant discovery because then we will lose a lot potentially valuable information.
-
This is quite the fundamental problem of using low and high coverage samples for joint genotyping projects. Our main recommendation would be to perform joint genotyping with similarly depth samples due to possibility of filtering many useful sites just because of missingness.
One possible thing that you can do is to perform imputation using a population resource to fill in missing sites and clear up genotype errors. This requires you to have a known population allele frequency resource with phased variants. If this is not possible then you may need to create one using your own samples to your best and perform imputation. That way you may account for missingness on many sites and have a much cleaner data.
I hope this helps.
Regards.
Please sign in to leave a comment.
1 comment