Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Masked reference genomes Follow

2 comments

  • Avatar
    Eva (Evander)

    Dear Derek Caetano-Anolles,

    thank you for this useful comparison. I just have a small question to clarify something I could not find in the above text or in the Clara Parabricks website. Were the variants called on the alt contigs for the comparison above, following the procedure that is described here?

    1
    Comment actions Permalink
  • Avatar
    Derek Caetano-Anolles

    Hi Eva (Evander) -- Good question. 

    The variant calls were not made in the alt contig region, but were based on the Illumina masked reference available in the following DRAGEN-reference in the GCP public data repository

    Masked reference genomes like this one "mask" regions of high similarity to other regions in the reference. Alt contigs are certainly one such example of reads that would be masked in the masked reference.

    Here in this blog post we are comparing the impact of masked vs unmasked references when used in a pretty standard variant calling pipeline. However, the data were not passed through any additional post-processing steps as mentioned in the tutorial document you linked to.

    I hope that this answers your question!

    2
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk