Before using VariantFiltration, please read the entirety of the discussion in GitHub that describes VariantFiltration's unintuitive behavior when processing compound expressions.
This tutorial illustrates how to filter on genotype, e.g. heterozygous genotype call. The steps apply to either single-sample or multi-sample callsets.
First, the genotype is annotated with a filter expression using VariantFiltration. Then, the filtered genotypes are made into no-call (./.
) genotypes with SelectVariants so that downstream tools may discount them.
We use example variant record FORMAT fields from trio.vcf
to illustrate.
GT:AD:DP:GQ:PL 0/1:17,15:32:99:399,0,439 0/1:11,12:23:99:291,0,292 1/1:0,30:30:90:948,90,0
1. Annotate genotypes using VariantFiltration
If we want to filter heterozygous genotypes, we use VariantFiltration's --genotype-filter-expression "isHet == 1"
option. We can specify the annotation value for the tool to label the heterozygous genotypes with with the --genotype-filter-name
option. Here, this parameter's value is set to "isHetFilter"
.
gatk VariantFiltration \ -V trio.vcf \ -O trio_VF.vcf \ --genotype-filter-expression "isHet == 1" \ --genotype-filter-name "isHetFilter"GT:AD:DP:FT:GQ:PL 0/1:17,15:32:isHetFilter:99:399,0,439 0/1:11,12:23:isHetFilter:99:291,0,292 1/1:0,30:30:PASS:90:948,90,0
We see that HET (0/1
) genotype calls get a isHetFilter
in the FT
field and other genotype calls get a PASS
in the genotype field.
The VariantFiltration tool document lists the various options to filter on the FORMAT (aka genotype call) field:
We have put in convenience methods so that one can now filter out hets ("isHet == 1"), refs ("isHomRef == 1"), or homs ("isHomVar == 1"). Also available are expressions isCalled, isNoCall, isMixed, and isAvailable, in accordance with the methods of the Genotype object.
2. Transform filtered genotypes to no call
Running SelectVariants with --set-filtered-gt-to-nocall
will further transform the flagged genotypes with a null genotype call. This conversion is necessary because downstream tools do not parse the FORMAT-level filter field.
gatk SelectVariants \ -V trio_VF.vcf \ --set-filtered-gt-to-nocall \ -O trioGGVCF_VF_SV.vcf
The result is that the GT
genotypes of the isHetFiltered
genotype records become null or no call (./.
) as follows.
GT:AD:DP:FT:GQ:PL ./.:17,15:32:isHetFilter:99:399,0,439 ./.:11,12:23:isHetFilter:99:291,0,292 1/1:0,30:30:PASS:90:948,90,0
0 comments
Please sign in to leave a comment.