DRAGEN-GATK introduced several new changes to GATK, including two new tools, changes to the variant-calling steps of our Best Practices pipeline, and can serve as a replacement for BWA by using DRAGMAP. These changes are our new best practices recommendations for germline single-sample short variant discovery.
This article gives example usage for the DRAGEN-GATK steps in the germline single sample short variant discovery best practices pipeline. To easily implement this pipeline, check out our DRAGEN-GATK featured workspace on Terra.
DRAGMAP - alignment step
The DRAGMAP tool can be downloaded from Illumina's DRAGMAP repository on GitHub.
Build new hash table of a reference file
The DRAGMAP alignment command uses different reference files than the standard FASTA reference files. You can find the hg38 DRAGEN references at the following URL:
For other reference versions, you can use DRAGMAP to build the reference hash table files.
dragen-os \ --build-hash-table true \ --ht-reference reference.fasta \ --output-directory /home/data/reference/
Map reads to the reference with DRAGMAP
DRAGMAP was developed as an open-source mapper tool by Illumina. The benefits of using DRAGMAP for mapping is its alt-awareness, as explained in the 'Explaining DRAGMAP' blog post. This means that the mapper aligns reads to the reference using reference hash tables built using the previous commands.
dragen-os \ -r /home/data/reference/ \ -1 reads_1.fastq.gz \ -2 reads_2.fastq.gz > result.sam
The BQSR step is no longer necessary when running the DRAGEN-GATK pipeline because the improvement in alignment and variant calling take into consideration information from the read indicating sequencing errors.
Create STR Table File for the Reference
The GATK command
ComposeSTRTableFile builds a short tandem repeat (STR) table file for the reference. You can find the hg38 STR table file at the following URL:
This STR file is used to build the DRAGEN STR model for your reads in the next command.
gatk ComposeSTRTableFile \ -R reference.fasta \ -O str_table.tsv
Build the DRAGEN STR Model from the Aligned Reads
CalibrateDragstrModel command uses the input from
ComposeSTRTableFile, along with your reference FASTA and input BAM to estimate the parameters for the STR model. The output parameter tables from this command are used in HaplotypeCaller dragen mode in order to improve the genotyping model.
gatk CalibrateDragstrModel \ -R reference.fasta \ -I input_reads.bam \ -str str_table.tsv \ -O dragstr_model.txt
Variant Calling and Filtering
HaplotypeCaller in DRAGEN mode
There are three improvements that have been made in this step: the DragSTR model, base quality dropout, and foreign read detection.
You can run all these improvements with one option which is to turn
--dragen-mode to true. You also need to specify the
--dragstr-params-path with the DragSTR model generated in the previous step.
gatk HaplotypeCaller \ -R ref.fasta \ -I input_reads.bam \ -L interval_list \ -O output_file.vcf \ --dragen-mode true \ --dragstr-params-path dragstr_model.txt
Hard Filter Variants
The DRAGEN hardware version does hard filtering on QUAL as the only variant filtering step. This is a result of the QUAL score being more accurate with the DRAGEN-GATK improvements in HaplotypeCaller.
gatk VariantFiltration \ -V output_file.vcf \ --filter-expression "QUAL < 10.4139" \ --filter-name "DRAGENHardQUAL" \ -O output_filtered.vcf