Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenomeSTRiP analysis

0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Bob Handsaker here is a GenomeSTRiP question.

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Assuming you have enough samples in each subset, it is better to process the samples with similar technical characteristics together and separately from the other samples. Typically you will create a filtered, QCd set of sites and then regenotype each site in each subset to get a genotyped call set for analysis.

    You should be able to use the standard 101bp genome mask for both the 100bp and 151bp reads. But the different read lengths may make the most difference in the calling. For read depth analysis, the 100bp reads will actually provide more power and you will be able to get cleaner genotypes on smaller CNVs.

    PCR+ vs. PCR-free library prep does not tend to have as big an effect in my experience, but it is still good to separate the two subsets if you have enough samples. Typically you would like at least 100 samples in each batch, although using 50 in each batch is not the end of the world. To evaluate how strong the batch effects are, genotyping the same CNV site in each subset and then compare the plots from PlotGenotypingResults.

     

    0
    Comment actions Permalink
  • Avatar
    Thandeka

    Bob Handsaker thank you for the response, this has really helped me. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk