If not an error, choose a category for your question(REQUIRED):
a) Will using GenomeSTRiP to analyze different datasets in terms of coverage, read-base pairs, PCR vs PCR-free, affect the results I will get?
I am trying to run an analysis on a cohort that has samples with different sequencing depth (30X and >30X), some are PCR free and some I'm unsure if they PCR free, some samples have a read length of 150bp and some 100bp. I want to find out if GenomeSTRiP will have problems processing this type of dataset? Or will the read length and depth affect the results? I am using whole-genome sequencing data.
Bob Handsaker here is a GenomeSTRiP question.
Assuming you have enough samples in each subset, it is better to process the samples with similar technical characteristics together and separately from the other samples. Typically you will create a filtered, QCd set of sites and then regenotype each site in each subset to get a genotyped call set for analysis.
You should be able to use the standard 101bp genome mask for both the 100bp and 151bp reads. But the different read lengths may make the most difference in the calling. For read depth analysis, the 100bp reads will actually provide more power and you will be able to get cleaner genotypes on smaller CNVs.
PCR+ vs. PCR-free library prep does not tend to have as big an effect in my experience, but it is still good to separate the two subsets if you have enough samples. Typically you would like at least 100 samples in each batch, although using 50 in each batch is not the end of the world. To evaluate how strong the batch effects are, genotyping the same CNV site in each subset and then compare the plots from PlotGenotypingResults.
Bob Handsaker thank you for the response, this has really helped me.
Please sign in to leave a comment.