Genotype filtering still write "PASS" in filter field
Hey, So i am trying to do VariantFiltration with GQX value. However, the filter field for the filtered variants is still "PASS", even though it is written in the documentation:
"-G-filter-name: Names to use for the list of sample/genotype filters (must be a 1-to-1 mapping); this name is put in the FILTER field for variants that get filtered"
How can I use genotype filter to generate filter in filter field and not in the format field?
I am using: gatk 4.2.0.0
Command: gatk VariantFiltration -V noFilter.vcf -O filterTest.vcf -G-filter 'GQX < 10.0000' -G-filter-name 'LowGQX'
Logs:
16:46:16.326 INFO VariantFiltration - ------------------------------------------------------------
16:46:16.327 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.0.0
16:46:16.327 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
16:46:16.327 INFO VariantFiltration - Executing as marissa@x on Linux v5.15.0-56-generic amd64
16:46:16.327 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
16:46:16.327 INFO VariantFiltration - Start Date/Time: March 21, 2023 4:46:15 PM CET
16:46:16.327 INFO VariantFiltration - ------------------------------------------------------------
16:46:16.327 INFO VariantFiltration - ------------------------------------------------------------
16:46:16.327 INFO VariantFiltration - HTSJDK Version: 2.24.0
16:46:16.327 INFO VariantFiltration - Picard Version: 2.25.0
16:46:16.327 INFO VariantFiltration - Built for Spark Version: 2.4.5
16:46:16.327 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:46:16.327 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:46:16.327 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:46:16.327 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:46:16.327 INFO VariantFiltration - Deflater: IntelDeflater
16:46:16.327 INFO VariantFiltration - Inflater: IntelInflater
16:46:16.328 INFO VariantFiltration - GCS max retries/reopens: 20
16:46:16.328 INFO VariantFiltration - Requester pays: disabled
16:46:16.328 INFO VariantFiltration - Initializing engine
16:46:16.556 INFO FeatureManager - Using codec VCFCodec to read file file:///home/marissa/66111_S1/66111_S1.noFilter.vcf
16:46:16.624 INFO VariantFiltration - Done initializing engine
16:46:16.682 INFO ProgressMeter - Starting traversal
16:46:16.682 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:46:17.126 INFO ProgressMeter - chr20:61460338 0.0 8515 1153273.1
16:46:17.126 INFO ProgressMeter - Traversal complete. Processed 8515 total variants in 0.0 minutes.
16:46:17.236 INFO VariantFiltration - Shutting down engine
[March 21, 2023 4:46:17 PM CET] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=527433728
Thankyou
-
Hi Marissa Eronika,
This is expected behavior for the tool -- the FORMAT field also has a FILTER attribute. If you want to generate an INFO-level filter based on genotype-level data the easiest way to do that would be to leverage JEXL: https://gatk.broadinstitute.org/hc/en-us/articles/360035891011-JEXL-filtering-expressions
Your new argument would look something like
--filter-name LowGQX --filter-expression "vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Double.parseDouble(g.getExtendedAttribute('LowGQX')) < 10.0"
I didn't test the above and I'm assuming that your LowGQX annotation is a double since you were comparing to 10.000 above, but that's at least a good place to get you started.
-Laura
-
Hallo Laura Gauthier
Thank you for the reply! Unfortunately, the syntax is wrong and I do not know how to begin to adjust it, because I could not find any documentation to help me with, for example which function I should call etc. The documentation you mentioned above is not very informative for my case, I think. Where can I find information for this ? Thank you!
Logs:
Using GATK jar /home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar VariantFiltration -V /home/marissa/66111_S1/66111_S1.noFilter.vcf -O /home/marissa/66111_S1/66111_S1.filterTest.vcf --filter-name LowGQX --filter-expression vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Double.parseDouble(g.getExtendedAttribute('LowGQX')) < 10.0
16:57:32.157 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 30, 2023 4:57:32 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:57:32.239 INFO VariantFiltration - ------------------------------------------------------------
16:57:32.239 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.0.0
16:57:32.239 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
16:57:32.239 INFO VariantFiltration - Executing as marissa@moldiag on Linux v5.15.0-56-generic amd64
16:57:32.239 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
16:57:32.239 INFO VariantFiltration - Start Date/Time: March 30, 2023 4:57:32 PM CEST
16:57:32.239 INFO VariantFiltration - ------------------------------------------------------------
16:57:32.239 INFO VariantFiltration - ------------------------------------------------------------
16:57:32.239 INFO VariantFiltration - HTSJDK Version: 2.24.0
16:57:32.239 INFO VariantFiltration - Picard Version: 2.25.0
16:57:32.239 INFO VariantFiltration - Built for Spark Version: 2.4.5
16:57:32.239 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:57:32.239 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:57:32.240 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:57:32.240 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:57:32.240 INFO VariantFiltration - Deflater: IntelDeflater
16:57:32.240 INFO VariantFiltration - Inflater: IntelInflater
16:57:32.240 INFO VariantFiltration - GCS max retries/reopens: 20
16:57:32.240 INFO VariantFiltration - Requester pays: disabled
16:57:32.240 INFO VariantFiltration - Initializing engine
16:57:32.418 INFO FeatureManager - Using codec VCFCodec to read file file:///home/marissa/66111_S1/66111_S1.noFilter.vcf
16:57:32.435 INFO VariantFiltration - Done initializing engine
16:57:32.456 INFO VariantFiltration - Shutting down engine
[March 30, 2023 4:57:32 PM CEST] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=616038400
java.lang.IllegalArgumentException: Argument LowGQXhas a bad value. Invalid expression used (vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Double.parseDouble(g.getExtendedAttribute('LowGQX')) < 10.0). Please see the JEXL docs for correct syntax.
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:283)
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:243)
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:259)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.onTraversalStart(VariantFiltration.java:334)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289) -
Is LowGQX a float or an integer? That could be the main problem.
The code that's being called in the expression all comes from htsjdk, mostly the VariantContext class:https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/variant/variantcontext/VariantContext.java and the Genotype class: https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/variant/variantcontext/Genotype.java
-
It is written that it is Integer in the vcf file and I think you are missing an ")"
So, I changed it into: "vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)"
It still have the same error message.
I tried to change "hasAttribute('GQX')" instead, however still not working.
Logs:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar VariantFiltration -V /home/marissa/66111_S1/66111_S1.filterTest.vcf -O /home/marissa/66111_S1/66111_S1.test.vcf --filter-name LowGQX --filter-expression vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)
16:13:54.766 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 31, 2023 4:13:54 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:13:54.871 INFO VariantFiltration - ------------------------------------------------------------
16:13:54.871 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.0.0
16:13:54.871 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
16:13:54.871 INFO VariantFiltration - Executing as marissa@moldiag on Linux v5.15.0-69-generic amd64
16:13:54.871 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
16:13:54.871 INFO VariantFiltration - Start Date/Time: March 31, 2023 4:13:54 PM CEST
16:13:54.871 INFO VariantFiltration - ------------------------------------------------------------
16:13:54.871 INFO VariantFiltration - ------------------------------------------------------------
16:13:54.872 INFO VariantFiltration - HTSJDK Version: 2.24.0
16:13:54.872 INFO VariantFiltration - Picard Version: 2.25.0
16:13:54.872 INFO VariantFiltration - Built for Spark Version: 2.4.5
16:13:54.872 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:13:54.872 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:13:54.872 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:13:54.872 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:13:54.872 INFO VariantFiltration - Deflater: IntelDeflater
16:13:54.872 INFO VariantFiltration - Inflater: IntelInflater
16:13:54.872 INFO VariantFiltration - GCS max retries/reopens: 20
16:13:54.872 INFO VariantFiltration - Requester pays: disabled
16:13:54.872 INFO VariantFiltration - Initializing engine
16:13:55.044 INFO FeatureManager - Using codec VCFCodec to read file file:///home/marissa/66111_S1/66111_S1.filterTest.vcf
16:13:55.072 INFO VariantFiltration - Done initializing engine
16:13:55.094 INFO VariantFiltration - Shutting down engine
[March 31, 2023 4:13:55 PM CEST] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=613941248
java.lang.IllegalArgumentException: Argument LowGQXhas a bad value. Invalid expression used (vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)). Please see the JEXL docs for correct syntax.
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:283)
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:243)
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:259)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.onTraversalStart(VariantFiltration.java:334)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289) -
My mistake: it should be hasExtendedAttribute("LowGQX") https://github.com/samtools/htsjdk/blob/6d3fc7bc1f613ecfce1c22d368f3ae17cb86823d/src/main/java/htsjdk/variant/variantcontext/Genotype.java#L459
-
Unfortunately, still not working:
Why is it "LowGQX"? the vcf file is not filtered yet and does not have "LowGQX" right ? With only "GQX" it is not working as well. I tried with the filtered.vcf as well, with "LowGQX", also not working.
gatk VariantFiltration -V filterTest.vcf -O test.vcf --filter-name LowGQX --filter-expression "vc.getGenotypes().stream().anyMatch(g -> g.hasExtendedAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)"
logs:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar VariantFiltration -V /home/marissa/66111_S1/66111_S1.filterTest.vcf -O /home/marissa/66111_S1/66111_S1.test.vcf --filter-name LowGQX --filter-expression vc.getGenotypes().stream().anyMatch(g -> g.hasExtendedAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)
12:43:08.696 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 01, 2023 12:43:08 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
12:43:08.782 INFO VariantFiltration - ------------------------------------------------------------
12:43:08.782 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.0.0
12:43:08.782 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
12:43:08.782 INFO VariantFiltration - Executing as marissa@moldiag on Linux v5.15.0-69-generic amd64
12:43:08.782 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
12:43:08.783 INFO VariantFiltration - Start Date/Time: April 1, 2023 12:43:08 PM CEST
12:43:08.783 INFO VariantFiltration - ------------------------------------------------------------
12:43:08.783 INFO VariantFiltration - ------------------------------------------------------------
12:43:08.783 INFO VariantFiltration - HTSJDK Version: 2.24.0
12:43:08.783 INFO VariantFiltration - Picard Version: 2.25.0
12:43:08.783 INFO VariantFiltration - Built for Spark Version: 2.4.5
12:43:08.783 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:43:08.783 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:43:08.783 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:43:08.783 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:43:08.783 INFO VariantFiltration - Deflater: IntelDeflater
12:43:08.783 INFO VariantFiltration - Inflater: IntelInflater
12:43:08.783 INFO VariantFiltration - GCS max retries/reopens: 20
12:43:08.783 INFO VariantFiltration - Requester pays: disabled
12:43:08.783 INFO VariantFiltration - Initializing engine
12:43:08.956 INFO FeatureManager - Using codec VCFCodec to read file file:///home/marissa/66111_S1/66111_S1.filterTest.vcf
12:43:08.987 INFO VariantFiltration - Done initializing engine
12:43:09.008 INFO VariantFiltration - Shutting down engine
[April 1, 2023 12:43:09 PM CEST] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=610795520
java.lang.IllegalArgumentException: Argument LowGQXhas a bad value. Invalid expression used (vc.getGenotypes().stream().anyMatch(g -> g.hasExtendedAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)). Please see the JEXL docs for correct syntax.
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:283)
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:243)
at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:259)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.onTraversalStart(VariantFiltration.java:334)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289) -
This is incredibly hard for me to debug without any data.
You're right -- the JEXL expression should not reference Low GQX since that's the filter you're trying to apply. All this is under the assumption that "GQX" is an annotation you've added, since that's not part of any of our standard pipelines. Try the filter with just the hasAttribute part, which I expect should then filter all sites that have the GQX FORMAT annotation. Once that works, add the value comparison back in. I don't think you need the parseInt call.
Please sign in to leave a comment.
7 comments