VariantFiltration isAvailable and isNoCall expressions for genotype filtration work unexpectedly
Hi. VariantFiltratiion with "isAvailable == 0" and "isNoCall == 1" expressions does not mark "./." genotypes with filter names "isNotAvailable" and "isNoCall", though I know for sure that my VCF file contains genotype "./.".
Where can I find information about isAvailable and isNoCall expressions? I've already checked latest VariantFiltration documentation and an article about how to filter genotypes with VariantFiltration, but I no luck. How can I mark rows with "./." genotype as filtered?
GATK version used:
The Genome Analysis Toolkit (GATK) v4.6.1.0
HTSJDK Version: 4.1.3
Picard Version: 3.3.0
Exact command used:
gatk VariantFiltration \
--genotype-filter-name "isNotAvailable" \
--genotype-filter-expression "isAvailable == 0" \
--genotype-filter-name "isNoCall" \
--genotype-filter-expression "isNoCall == 1" \
--genotype-filter-name "isHomRef" \
--genotype-filter-expression "isHomRef == 1" \
--genotype-filter-name "LowRefPL" \
--genotype-filter-expression "PL[0] < 50.0" \
--filter-name "LowQualByDepth" \
--filter-expression "QD < 10.0" \
--filter-name "BaseQualBiased" \
--filter-expression "BaseQRankSum < 0.0 && BaseQRankSum > 1.65" \
--filter-name "ReadPosBiased" \
--filter-expression "ReadPosRankSum < 0.0 && ReadPosRankSum > 1.65" \
--filter-name "BiasedStrandRatio" \
--filter-expression "SOR > 5.0" \
--variant $annotated_vcf_filepath \
--output $filtered_vcf_filepath
Entire program log:
Using GATK jar /home/administrator/tools/gatk/gatk-package-4.6.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/administrator/tools/gatk/gatk-package-4.6.1.0-local.jar VariantFiltration --genotype-filter-name isNotAvailable --genotype-filter-expression isAvailable == 0 --genotype-filter-name isNoCall --genotype-filter-expression isNoCall == 1 --genotype-filter-name isHomRef --genotype-filter-expression isHomRef == 1 --genotype-filter-name LowRefPL --genotype-filter-expression PL[0] < 50.0 --filter-name LowQualByDepth --filter-expression QD < 10.0 --filter-name BaseQualBiased --filter-expression BaseQRankSum < 0.0 && BaseQRankSum > 1.65 --filter-name ReadPosBiased --filter-expression ReadPosRankSum < 0.0 && ReadPosRankSum > 1.65 --filter-name BiasedStrandRatio --filter-expression SOR > 5.0 --variant /home/administrator/varcall_results/deepvariant/annotated/Undetermined_S0.vcf --output /home/administrator/varcall_results/deepvariant/filtered/Undetermined_S0.vcf
10:22:20.394 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/administrator/tools/gatk/gatk-package-4.6.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@755b5f30]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@29bbc63c]
SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@755b5f30]
10:22:20.645 INFO VariantFiltration - ------------------------------------------------------------
10:22:20.648 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.6.1.0
10:22:20.648 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
10:22:20.648 INFO VariantFiltration - Executing as administrator@compute-vm-8-32-500-hdd-1736852799422 on Linux v6.8.0-51-generic amd64
10:22:20.648 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v17.0.14+7-Ubuntu-124.04
10:22:20.649 INFO VariantFiltration - Start Date/Time: February 21, 2025 at 10:22:20 AM ALMT
10:22:20.649 INFO VariantFiltration - ------------------------------------------------------------
10:22:20.649 INFO VariantFiltration - ------------------------------------------------------------
10:22:20.650 INFO VariantFiltration - HTSJDK Version: 4.1.3
10:22:20.650 INFO VariantFiltration - Picard Version: 3.3.0
10:22:20.650 INFO VariantFiltration - Built for Spark Version: 3.5.0
10:22:20.652 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:22:20.652 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:22:20.652 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:22:20.653 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:22:20.653 INFO VariantFiltration - Deflater: IntelDeflater
10:22:20.653 INFO VariantFiltration - Inflater: IntelInflater
10:22:20.653 INFO VariantFiltration - GCS max retries/reopens: 20
10:22:20.653 INFO VariantFiltration - Requester pays: disabled
10:22:20.653 INFO VariantFiltration - Initializing engine
10:22:20.741 INFO FeatureManager - Using codec VCFCodec to read file file:///home/administrator/varcall_results/deepvariant/annotated/Undetermined_S0.vcf
10:22:20.777 INFO VariantFiltration - Done initializing engine
10:22:20.833 INFO ProgressMeter - Starting traversal
10:22:20.834 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
10:22:21.472 INFO ProgressMeter - 7:44002374 0.0 2413 228000.0
10:22:21.472 INFO ProgressMeter - Traversal complete. Processed 2413 total variants in 0.0 minutes.
10:22:21.616 INFO VariantFiltration - Shutting down engine
[February 21, 2025 at 10:22:21 AM ALMT] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=562036736
-
There seems to be a bug with those expressions. I will keep here posted with the github issue and hopefully we will have a solution in a patch fix and we may advise to use a branch for that before we can push it to a final release.
Regards.
-
Thank you!
Please sign in to leave a comment.
2 comments