Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Updating GenomicsDB workspace with additional low coverage WGS

0

9 comments

  • Avatar
    Gökalp Çelik

    Hi Maggie Sudo Pui San

    We do not recommend adding low coverage samples together with high coverage samples as they all have different characteristics in terms of variant metrics, call qualities etc. We recommend processing high coverage and low coverage samples in different groups and later on combining them as a post processing step that may involve phasing and imputation as well. 

    I hope this helps. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Maggie Sudo Pui San

    Hi Gökalp Çelik 

    Thank you for the suggestion. My samples are very different in their coverage, some have up to 40X and some as low as 0.5X. In that case, does it make sense if I perform a few rounds of Joint Genotyping in the following arrangement:

    1) 1st group Joint genotyping with samples above 10X genome coverage (10X to 50X)

    2) 2nd group Joint genotyping with low genome coverage samples (0.5 to 10X)

    Then I will merge their vcfs using Picard GatherVcfs.

    Please advise if should I further split the groups e.g. second group split to ultra-low (0.5x to 5X) and low (5X to 10X) too. Thank you!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    It is up to you to decide on that. Once you perform joint genotyping with your samples you may want to check certain parameters such as ts/tv ratios, mendelian violations, missing genotypes per sample etc to see if your collection is way off for those parameters which may help you remove additional outliers from any collection. Our recommendation will always be keeping closely matching samples together. 

    I hope this helps.

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Maggie Sudo Pui San

    Hi there, 

    Thanks for the suggestion! I will look into the parameters as recommended and proceed from there. 

    Best, 

    Maggie. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Maggie Sudo Pui San

    One additional comment from our team is that if you have too many samples and you are planning to use GnarlyGenotyper for genotyping all samples, please do not use it with low coverage samples as it is not really compatible with that kind of data. Also If you wish to use VQSR or VETS for variant recalibration and filtration for all high and low coverage samples together you may need to remove DP as an annotation for those tools as low coverage samples may end up removing valuable sites due to low DP values. 

    I hope this helps. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Maggie Sudo Pui San

    Hi Gökalp Çelik 

    Thanks for the advice. I have been following your suggestion to process the high and low-coverage samples separately. Now I am almost at the stage where I have to combine them. I do not use GnarlyGenotyper. Do the following steps to combine the vcfs for post processing make sense?

    GenomicsDBImport for different genomic regions -> GenotypeGVCFs to get the vcf.gz for different genomic regions -> SelectVariants to get the SNPs -> GatherVCFs to concatenate the vcf.gz from different genomic regions into one vcf.gz -> MergeVCFs to merge the concatenated vcfs (from different GenomicsDMImport groups) generated using GatherVCFs.

    Is VQSR necessary? I have never done VQSR... only BQSR. 

    Thank you in advance.

    Best.

     

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    We always perform variant filtration either by recalibration or our new VETS workflow. In your case both of them are fine. 

    VQSR uses multiple metrics to generate a model to distinguish between true and false variants in a call set. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Maggie Sudo Pui San

    Hi Gökalp Çelik,

     

    We do not have well-curated training resources for my species. Can I proceed with hard-filtering the merged VCFs? Is there anything I need to take note of when applying hard-filtering to merged VCFs from the low-coverage and high-coverage database?

     

    Best.

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Maggie Sudo Pui San

    Sure you can do hard filtering for merged variant sets. One thing you may need to pay attention is that our recommended parameters are more trained towards human samples and you may need to apply different parameters separately to see the effects and decide how to fine tune those parameters for variant filtration. On the other hand once you have a fine set of variants you may use those to train a VETS model or VQSR and apply to the rest of your samples as well. 

    I hope this helps.

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk