Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Genotype filtering still write "PASS" in filter field

1

7 comments

  • Avatar
    Laura Gauthier

    Hi Marissa Eronika,

    This is expected behavior for the tool -- the FORMAT field also has a FILTER attribute.  If you want to generate an INFO-level filter based on genotype-level data the easiest way to do that would be to leverage JEXL: https://gatk.broadinstitute.org/hc/en-us/articles/360035891011-JEXL-filtering-expressions

    Your new argument would look something like 

    --filter-name LowGQX --filter-expression "vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Double.parseDouble(g.getExtendedAttribute('LowGQX')) < 10.0"

    I didn't test the above and I'm assuming that your LowGQX annotation is a double since you were comparing to 10.000 above, but that's at least a good place to get you started.

    -Laura

    0
    Comment actions Permalink
  • Avatar
    Marissa Eronika

    Hallo Laura Gauthier

    Thank you for the reply! Unfortunately, the syntax is wrong and I do not know how to begin to adjust it, because I could not find any documentation to help me with, for example which function I should call etc. The documentation you mentioned above is not very informative for my case, I think. Where can I find information for this ? Thank you!

    Logs:

    Using GATK jar /home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar VariantFiltration -V /home/marissa/66111_S1/66111_S1.noFilter.vcf -O /home/marissa/66111_S1/66111_S1.filterTest.vcf --filter-name LowGQX --filter-expression vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Double.parseDouble(g.getExtendedAttribute('LowGQX')) < 10.0
    16:57:32.157 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Mar 30, 2023 4:57:32 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:57:32.239 INFO  VariantFiltration - ------------------------------------------------------------
    16:57:32.239 INFO  VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.0.0
    16:57:32.239 INFO  VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:57:32.239 INFO  VariantFiltration - Executing as marissa@moldiag on Linux v5.15.0-56-generic amd64
    16:57:32.239 INFO  VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
    16:57:32.239 INFO  VariantFiltration - Start Date/Time: March 30, 2023 4:57:32 PM CEST
    16:57:32.239 INFO  VariantFiltration - ------------------------------------------------------------
    16:57:32.239 INFO  VariantFiltration - ------------------------------------------------------------
    16:57:32.239 INFO  VariantFiltration - HTSJDK Version: 2.24.0
    16:57:32.239 INFO  VariantFiltration - Picard Version: 2.25.0
    16:57:32.239 INFO  VariantFiltration - Built for Spark Version: 2.4.5
    16:57:32.239 INFO  VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:57:32.239 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:57:32.240 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:57:32.240 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:57:32.240 INFO  VariantFiltration - Deflater: IntelDeflater
    16:57:32.240 INFO  VariantFiltration - Inflater: IntelInflater
    16:57:32.240 INFO  VariantFiltration - GCS max retries/reopens: 20
    16:57:32.240 INFO  VariantFiltration - Requester pays: disabled
    16:57:32.240 INFO  VariantFiltration - Initializing engine
    16:57:32.418 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/marissa/66111_S1/66111_S1.noFilter.vcf
    16:57:32.435 INFO  VariantFiltration - Done initializing engine
    16:57:32.456 INFO  VariantFiltration - Shutting down engine
    [March 30, 2023 4:57:32 PM CEST] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=616038400
    java.lang.IllegalArgumentException: Argument LowGQXhas a bad value. Invalid expression used (vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Double.parseDouble(g.getExtendedAttribute('LowGQX')) < 10.0). Please see the JEXL docs for correct syntax.
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:283)
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:243)
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:259)
        at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.onTraversalStart(VariantFiltration.java:334)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

    0
    Comment actions Permalink
  • Avatar
    Laura Gauthier

    Is LowGQX a float or an integer?  That could be the main problem.

    The code that's being called in the expression all comes from htsjdk, mostly the VariantContext class:https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/variant/variantcontext/VariantContext.java and the Genotype class: https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/variant/variantcontext/Genotype.java

    0
    Comment actions Permalink
  • Avatar
    Marissa Eronika

    It is written that it is Integer in the vcf file and I think you are missing an ")"

    So, I changed it into:  "vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)"

    It still have the same error message.

    I tried to change "hasAttribute('GQX')" instead, however still not working.

    Logs:

    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar VariantFiltration -V /home/marissa/66111_S1/66111_S1.filterTest.vcf -O /home/marissa/66111_S1/66111_S1.test.vcf --filter-name LowGQX --filter-expression vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)
    16:13:54.766 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Mar 31, 2023 4:13:54 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:13:54.871 INFO  VariantFiltration - ------------------------------------------------------------
    16:13:54.871 INFO  VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.0.0
    16:13:54.871 INFO  VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:13:54.871 INFO  VariantFiltration - Executing as marissa@moldiag on Linux v5.15.0-69-generic amd64
    16:13:54.871 INFO  VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
    16:13:54.871 INFO  VariantFiltration - Start Date/Time: March 31, 2023 4:13:54 PM CEST
    16:13:54.871 INFO  VariantFiltration - ------------------------------------------------------------
    16:13:54.871 INFO  VariantFiltration - ------------------------------------------------------------
    16:13:54.872 INFO  VariantFiltration - HTSJDK Version: 2.24.0
    16:13:54.872 INFO  VariantFiltration - Picard Version: 2.25.0
    16:13:54.872 INFO  VariantFiltration - Built for Spark Version: 2.4.5
    16:13:54.872 INFO  VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:13:54.872 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:13:54.872 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:13:54.872 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:13:54.872 INFO  VariantFiltration - Deflater: IntelDeflater
    16:13:54.872 INFO  VariantFiltration - Inflater: IntelInflater
    16:13:54.872 INFO  VariantFiltration - GCS max retries/reopens: 20
    16:13:54.872 INFO  VariantFiltration - Requester pays: disabled
    16:13:54.872 INFO  VariantFiltration - Initializing engine
    16:13:55.044 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/marissa/66111_S1/66111_S1.filterTest.vcf
    16:13:55.072 INFO  VariantFiltration - Done initializing engine
    16:13:55.094 INFO  VariantFiltration - Shutting down engine
    [March 31, 2023 4:13:55 PM CEST] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=613941248
    java.lang.IllegalArgumentException: Argument LowGQXhas a bad value. Invalid expression used (vc.getGenotypes().stream().anyMatch(g -> g.hasAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)). Please see the JEXL docs for correct syntax.
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:283)
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:243)
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:259)
        at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.onTraversalStart(VariantFiltration.java:334)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

    0
    Comment actions Permalink
  • 0
    Comment actions Permalink
  • Avatar
    Marissa Eronika

    Unfortunately, still not working:

    Why is it "LowGQX"? the vcf file is not filtered yet and does not have "LowGQX" right ? With only "GQX" it is not working as well. I tried with the filtered.vcf as well, with "LowGQX", also not working.

    gatk VariantFiltration -V filterTest.vcf -O test.vcf --filter-name LowGQX --filter-expression "vc.getGenotypes().stream().anyMatch(g -> g.hasExtendedAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)"

    logs:

    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar VariantFiltration -V /home/marissa/66111_S1/66111_S1.filterTest.vcf -O /home/marissa/66111_S1/66111_S1.test.vcf --filter-name LowGQX --filter-expression vc.getGenotypes().stream().anyMatch(g -> g.hasExtendedAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)
    12:43:08.696 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/marissa/NGA_Pipeline/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Apr 01, 2023 12:43:08 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    12:43:08.782 INFO  VariantFiltration - ------------------------------------------------------------
    12:43:08.782 INFO  VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.0.0
    12:43:08.782 INFO  VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
    12:43:08.782 INFO  VariantFiltration - Executing as marissa@moldiag on Linux v5.15.0-69-generic amd64
    12:43:08.782 INFO  VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
    12:43:08.783 INFO  VariantFiltration - Start Date/Time: April 1, 2023 12:43:08 PM CEST
    12:43:08.783 INFO  VariantFiltration - ------------------------------------------------------------
    12:43:08.783 INFO  VariantFiltration - ------------------------------------------------------------
    12:43:08.783 INFO  VariantFiltration - HTSJDK Version: 2.24.0
    12:43:08.783 INFO  VariantFiltration - Picard Version: 2.25.0
    12:43:08.783 INFO  VariantFiltration - Built for Spark Version: 2.4.5
    12:43:08.783 INFO  VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    12:43:08.783 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    12:43:08.783 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    12:43:08.783 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    12:43:08.783 INFO  VariantFiltration - Deflater: IntelDeflater
    12:43:08.783 INFO  VariantFiltration - Inflater: IntelInflater
    12:43:08.783 INFO  VariantFiltration - GCS max retries/reopens: 20
    12:43:08.783 INFO  VariantFiltration - Requester pays: disabled
    12:43:08.783 INFO  VariantFiltration - Initializing engine
    12:43:08.956 INFO  FeatureManager - Using codec VCFCodec to read file file:///home/marissa/66111_S1/66111_S1.filterTest.vcf
    12:43:08.987 INFO  VariantFiltration - Done initializing engine
    12:43:09.008 INFO  VariantFiltration - Shutting down engine
    [April 1, 2023 12:43:09 PM CEST] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=610795520
    java.lang.IllegalArgumentException: Argument LowGQXhas a bad value. Invalid expression used (vc.getGenotypes().stream().anyMatch(g -> g.hasExtendedAttribute('LowGQX') && Integer.parseInt(g.getExtendedAttribute('LowGQX')) < 10)). Please see the JEXL docs for correct syntax.
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:283)
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:243)
        at htsjdk.variant.variantcontext.VariantContextUtils.initializeMatchExps(VariantContextUtils.java:259)
        at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.onTraversalStart(VariantFiltration.java:334)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1056)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

    0
    Comment actions Permalink
  • Avatar
    Laura Gauthier

    This is incredibly hard for me to debug without any data.

    You're right -- the JEXL expression should not reference Low GQX since that's the filter you're trying to apply.  All this is under the assumption that "GQX" is an annotation you've added, since that's not part of any of our standard pipelines.  Try the filter with just the hasAttribute part, which I expect should then filter all sites that have the GQX FORMAT annotation.  Once that works, add the value comparison back in.  I don't think you need the parseInt call.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk