HOW TO EXCLUDE FILTERED VARIANTS WITH VARIANTFILTRATION?
Hi.
I am using GATK 4.2.6.1.
I want to exclude the variants filtered with VariantFiltration, without having to run SelectVariants. My vcf file is 3TB heavy, and it makes absolutely no sense to produce another 3TB file with VariantFiltration, and only then use SelectVariants to exclude the variants marked by VariantFiltration. This generates useless and heavy intermediate files for no reason whatsoever.
The VariantFiltration documentation states "Filtered records will be preserved in the output unless their removal is requested in the command line." but it doesn't bother telling us how to request the removal!!!!! This is extremelly frustrating!
Can someone tell me what am I missing here?
-
Hello,
It appears that the documentation excerpt you shared is actually inaccurate. VariantFiltration does not have an option to exclude the variants. I'll make sure that documentation gets updated.
Having separate steps for filtering and removing variants is a deliberate choice for maintaining data provenance, as it allows you to keep a record of what variants were removed by what filters. However, I think you make a valid point that it can be less efficient to create that intermediate file if you're not interested in retaining that information.
If you would like to see that feature added to VariantFiltration, you can create a feature request issue on the GATK GitHub repo.
Please sign in to leave a comment.
1 comment