How would you create a vcf file for the genotype of embryos of known parents genotypes
Hi everyone,
Many thanks for creating this forum and allowing us to ask questions.
I was looking at the tools provided by GATK and honnestly there are a lot and I don't know which one to use in my case.
Here's the deal : I work on murine placentas, an organ where there is both cells coming from the mother and from the embryo. I work on a dataset where these cells were sequenced and I am trying to separate these two types of cells. The mother is from a CBA strand and the father is a DBA. Both of these strands have variants in their genomes that I have in the vcf files. They are genetically "pure" strands, with both allele of each gene being the same.
In order to separate the cells, I know that the maternal cells of the placentas would have all the variants of the CBA, and the fetal cells would have a merge of CBA and DBA variants, with one copy of each parental allele. I am trying to generate a vcf file for the embryos. Then I could use a pipeline separating maternal and embryonic cells using this vcf.
Do you know if there is any GATK tool that I could you use to do this ? Merge 2 vcf files (CBA and DBA one) by only merging one allele of each into one sample.
And if there is not, do you have any advice I could use to do so ?
Many thanks,
Kheira
-
Are you trying to calculate the fraction of embryonic cells per placenta? Based on what you provided here that is the most suitable primary output of such a study. What do you mean by separating maternal and embryonic cells?
-
Hi Gökalp Çelik,
Thanks for answering my post. In my experiments, we gave a treatment targetting the immune system of the mothers before the gestation. In my dataset I have all the immune cells isolated from the placentas (maternal+embryonic/fetal), and we want to look at the consequences of the treatment on the maternal one. In order to do that correctly, I want to idendity the fraction of embryonic immune cells and take it out of the analysis.
We want to focus on the maternal immune cells since the treatment targeted them.Hope it is clearer like that.
Have a nice day,
Kheira
-
Hi again.
So my assumption was correct. We don't have a single tool that can single handedly perform all you want however you may wish to collect all variant sites within CBA and DBA strains into a single sites only VCF file. Once they are converted to sites only VCFs you can use MergeVcfs tool to generate a single VCF file that you can use to collect pileup counts using GetPileupSummaries tool for SNPs that belong to the combination of Maternal and embryonic cells.
Once pileups are collected you need to find out the deviation of alt nucleotide counts for AA AB and BB genotypes and from that deviation you can calculate the amount of DBA variants contribution to the overall nucleotide composition. Deviation from the expected allelic counts should look like this on a graph. (Don't be confused about the paper's focus as It will be not so different from finding fetal fraction using SNP profiles)
https://www.biorxiv.org/content/10.1101/096024v1.full
That value would most likely to give you a good estimate of embryonic content present within. Further assistance may not be possible through this forum since it is pretty much designing an experimental setup for your research however if you encounter issues about these tools we can provide help.
I hope this helps.
Regards.
Please sign in to leave a comment.
3 comments