DRAGEN-GATK introduced several new changes to GATK, including two new tools, changes to the variant-calling steps of our Best Practices pipeline, and can serve as a replacement for BWA by using DRAGMAP. These changes are our new best practices recommendations for germline single-sample short variant discovery.
This article gives example usage for the DRAGEN-GATK steps in the germline single sample short variant discovery best practices pipeline. To easily implement this pipeline, check out our DRAGEN-GATK featured workspace on Terra.
DRAGMAP - alignment step
The DRAGMAP tool can be downloaded from Illumina's DRAGMAP repository on GitHub.
Build new hash table of a reference file
The DRAGMAP alignment command uses different reference files than the standard FASTA reference files. You can find the hg38 DRAGEN references at the following URL:
gs://gcp-public-data--broad-references/hg38/v0/
For other reference versions, you can use DRAGMAP to build the reference hash table files.
dragen-os \ --build-hash-table true \ --ht-reference reference.fasta \ --output-directory /home/data/reference/
Map reads to the reference with DRAGMAP
DRAGMAP was developed as an open-source mapper tool by Illumina. The benefits of using DRAGMAP for mapping is its alt-awareness, as explained in the 'Explaining DRAGMAP' blog post. This means that the mapper aligns reads to the reference using reference hash tables built using the previous commands.
dragen-os \ -r /home/data/reference/ \ -1 reads_1.fastq.gz \ -2 reads_2.fastq.gz > result.sam
Data Pre-Processing
BQSR
The BQSR step is no longer necessary when running the DRAGEN-GATK pipeline because the improvement in alignment and variant calling take into consideration information from the read indicating sequencing errors.
Create STR Table File for the Reference
The GATK command ComposeSTRTableFile
builds a short tandem repeat (STR) table file for the reference. You can find the hg38 STR table file at the following URL:
gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.str
This STR file is used to build the DRAGEN STR model for your reads in the next command.
gatk ComposeSTRTableFile \ -R reference.fasta \ -O str_table.tsv
Build the DRAGEN STR Model from the Aligned Reads
The CalibrateDragstrModel
command uses the input from ComposeSTRTableFile
, along with your reference FASTA and input BAM to estimate the parameters for the STR model. The output parameter tables from this command are used in HaplotypeCaller dragen mode in order to improve the genotyping model.
gatk CalibrateDragstrModel \ -R reference.fasta \ -I input_reads.bam \ -str str_table.tsv \ -O dragstr_model.txt
Variant Calling and Filtering
HaplotypeCaller in DRAGEN mode
There are three improvements that have been made in this step: the DragSTR model, base quality dropout, and foreign read detection.
You can run all these improvements with one option which is to turn --dragen-mode
to true. You also need to specify the --dragstr-params-path
with the DragSTR model generated in the previous step.
gatk HaplotypeCaller \ -R ref.fasta \ -I input_reads.bam \ -L interval_list \ -O output_file.vcf \ --dragen-mode true \ --dragstr-params-path dragstr_model.txt
Hard Filter Variants
The DRAGEN hardware version does hard filtering on QUAL as the only variant filtering step. This is a result of the QUAL score being more accurate with the DRAGEN-GATK improvements in HaplotypeCaller.
gatk VariantFiltration \ -V output_file.vcf \ --filter-expression "QUAL < 10.4139" \ --filter-name "DRAGENHardQUAL" \ -O output_filtered.vcf
4 comments
Hi, I can't seem to find the str file, would you please check the resource bundle? Thanks
I used this link to search:
gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.str
Hi, is this the complete workflow?
I would perform trimming before mapping.
Also is Picards MarkDuplicates no longer necessary?
How to convert sam to bam files?
Can I use the workflow for exome data?
Thank you!
The google links seems obsolete. The reference hash table files are not there either.
Why not start with how to download the data? This really seems to be written by experts for experts.
Please sign in to leave a comment.