Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Data pre-processing for variant discovery Follow


  • Avatar
    Robert Bremel

    It would be good to point out that the fastq --> bam needs to include the read group tags because they are needed at the recalibrate base quality scores stage of the process and later

    Comment actions Permalink
  • Avatar
    Kountay Dwivedi


    I am a Junior Researcher, and I need to familiarise myself with GATK Best Practices. But I cannot find out sample dataset to test Data Pre-processing. Please help me out. Thanks. 

    Comment actions Permalink
  • Avatar
    yh guo


    I was recently confused about the parameters of sortbam in the call snp flow. At the main steps, you mentioned "SAM/BAM format sorted by coordinate", even in wdl script of "Data Pre-processing", the SAM/BAM file is sorted by "coordinate" (--SORT_ORDER "coordinate"). However, the parameter does not appear as --ASSUME_SORT_ORDER "queryname" until MarkDuplicates. This is very likely to mislead the use of the wrong parameters if the analysis is performed with other software. I hope you can correct it to make it clearer to users. Thank you.

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk