Hard filters on VariantFiltration following HaplotypeCaller; Filter contains an illegal character
AnsweredI'm having an issue with VariantFiltration on GATK v4.1.2: one of my filters is claimed to not satisfy the regex required, but I have examined the command line and found no issues with it.
Details:
a) GATK version used:
GATK v4.2.1.0
b) Exact command used:
gatk VariantFiltration \
-R <REF> -V <IN_VCF> -O <OUT_VCF> \
--cluster-size 3 --cluster-window-size 35 \
--filter-name "low coverage" --filter-expression "QD < 5.0" \
--filter-name "no reads" --filter-expression "DP < 10" \
--filter-name "failed RPRS" --filter-expression "ReadPosRankSum < -8.0" \
--filter-name "failed MQRS" --filter-expression "MQRankSum < -12.5" \
--filter-name "failed MQ" --filter-expression "MQ < 40.0" \
--filter-name "failed FS" --filter-expression "FS > 60.0"
(NB GATK runs in a singularity container, so "gatk ..." invokes "java -jar gatk.jar ...")
c) Entire program log:
Using GATK jar /gatk/gatk-package-4.2.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.1.0-local.jar IndexFeatureFile -I <IN_VCF>
17:09:23.250 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 27, 2022 5:09:23 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:09:23.381 INFO VariantFiltration - ------------------------------------------------------------
17:09:23.381 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.1.0
17:09:23.381 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
17:09:23.381 INFO VariantFiltration - Executing as hfx494@ddy109 on Linux v3.10.0-1160.62.1.el7.x86_64 amd64
17:09:23.381 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
17:09:23.381 INFO VariantFiltration - Start Date/Time: May 27, 2022 5:09:23 PM GMT
17:09:23.381 INFO VariantFiltration - ------------------------------------------------------------
17:09:23.381 INFO VariantFiltration - ------------------------------------------------------------
17:09:23.382 INFO VariantFiltration - HTSJDK Version: 2.24.1
17:09:23.382 INFO VariantFiltration - Picard Version: 2.25.4
17:09:23.382 INFO VariantFiltration - Built for Spark Version: 2.4.5
17:09:23.382 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:09:23.382 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:09:23.382 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:09:23.382 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:09:23.382 INFO VariantFiltration - Deflater: IntelDeflater
17:09:23.382 INFO VariantFiltration - Inflater: IntelInflater
17:09:23.382 INFO VariantFiltration - GCS max retries/reopens: 20
17:09:23.382 INFO VariantFiltration - Requester pays: disabled
17:09:23.382 INFO VariantFiltration - Initializing engine
17:09:23.600 INFO FeatureManager - Using codec VCFCodec to read file <IN_VCF>
17:09:23.625 INFO VariantFiltration - Done initializing engine
17:09:23.669 INFO ProgressMeter - Starting traversal
17:09:23.669 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
17:09:23.684 WARN JexlEngine - ![0,2]: 'QD < 5.0;' undefined variable QD
17:09:23.684 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum
17:09:23.684 WARN JexlEngine - ![0,2]: 'MQ < 40.0;' undefined variable MQ
17:09:23.684 WARN JexlEngine - ![0,9]: 'MQRankSum < -12.5;' undefined variable MQRankSum
17:09:23.684 WARN JexlEngine - ![0,2]: 'FS > 60.0;' undefined variable FS
17:09:23.685 WARN JexlEngine - ![0,2]: 'QD < 5.0;' undefined variable QD
17:09:23.686 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum
17:09:23.686 WARN JexlEngine - ![0,2]: 'MQ < 40.0;' undefined variable MQ
17:09:23.686 WARN JexlEngine - ![0,2]: 'FS > 60.0;' undefined variable FS
17:09:23.686 WARN JexlEngine - ![0,2]: 'QD < 5.0;' undefined variable QD
17:09:23.686 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum
17:09:23.686 WARN JexlEngine - ![0,2]: 'MQ < 40.0;' undefined variable MQ
17:09:23.687 WARN JexlEngine - ![0,2]: 'QD < 5.0;' undefined variable QD
17:09:23.687 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum
17:09:23.687 WARN JexlEngine - ![0,2]: 'MQ < 40.0;' undefined variable MQ
17:09:23.687 WARN JexlEngine - ![0,2]: 'FS > 60.0;' undefined variable FS
17:09:23.688 WARN JexlEngine - ![0,2]: 'QD < 5.0;' undefined variable QD
17:09:23.688 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum
17:09:23.688 WARN JexlEngine - ![0,2]: 'MQ < 40.0;' undefined variable MQ
17:09:23.688 WARN JexlEngine - ![0,2]: 'FS > 60.0;' undefined variable FS
17:09:23.689 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum
17:09:23.689 WARN JexlEngine - ![0,2]: 'MQ < 40.0;' undefined variable MQ
17:09:23.689 WARN JexlEngine - ![0,9]: 'MQRankSum < -12.5;' undefined variable MQRankSum
17:09:23.691 INFO VariantFiltration - Shutting down engine
[May 27, 2022 5:09:23 PM GMT] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2302148608
java.lang.IllegalStateException: Filter 'no reads' contains an illegal character. It must conform to the regex ;'^[!-:<-~]+$
at htsjdk.variant.variantcontext.VariantContext$Validation.validateFilters(VariantContext.java:400)
at htsjdk.variant.variantcontext.VariantContext$Validation.access$300(VariantContext.java:323)
at htsjdk.variant.variantcontext.VariantContext$Validation$3.validate(VariantContext.java:336)
at htsjdk.variant.variantcontext.VariantContext.lambda$validate$0(VariantContext.java:1384)
at java.lang.Iterable.forEach(Iterable.java:75)
at htsjdk.variant.variantcontext.VariantContext.validate(VariantContext.java:1384)
at htsjdk.variant.variantcontext.VariantContext.<init>(VariantContext.java:489)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:647)
at htsjdk.variant.variantcontext.VariantContextBuilder.make(VariantContextBuilder.java:638)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:422)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.apply(VariantFiltration.java:353)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /gatk/gatk-package-4.2.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.1.0-local.jar VariantFiltration -R <FA> -V <IN_VCF> -O <OUT_VCF> --cluster-size 3 --cluster-window-size 35 --filter-name low coverage --filter-expression QD < 5.0 --filter-name no reads --filter-expression DP < 10 --filter-name failed RPRS --filter-expression ReadPosRankSum < -8.0 --filter-name failed MQRS --filter-expression MQRankSum < -12.5 --filter-name failed MQ --filter-expression MQ < 40.0 --filter-name failed FS --filter-expression FS > 60.0
(In the above, I have replaced filenames with generic placeholders)
This command worked with GATK v3.8, but all I've done is change the flags for GATK v4.0 syntax.
I can't find where the issue is with the filter expression "DP < 10", as it conforms to the regex.
-
Hi Graeme Thorn,
I think the issue is in your no reads filter:
--filter-name "no reads" --filter-expression "DP < 10"
Try switching the 10 to 10.0 so that it is read as an integer.
Let me know if this works.
Best,
Genevieve
-
Hi Genevieve,
I have tried changing the filter expression to "DP < 10.0", but still get the same error.
One thing I did find with switching from GATK3(.8) to GATK4 was that I needed to index the file first using
gatk IndexFeatureFile -I <VCF>
before filtering
gatk VariantFiltration -R <FA> \
-V <VCF_IN> \
-O <VCF_OUT> \
--cluster-size 3 \
--cluster-window-size 35 \
--filter-name "low coverage" --filter-expression "QD < 5.0" \
--filter-name "no reads" --filter-expression "DP < 10.0" \
--filter-name "failed RPRS" --filter-expression "ReadPosRankSum < -8.0" \
--filter-name "failed MQRS" --filter-expression "MQRankSum < -12.5" \
--filter-name "failed MQ" --filter-expression "MQ < 40.0" \
--filter-name "failed FS" --filter-expression "FS > 60.0"Might this be causing the issue?
-
Hi Graeme Thorn,
Thanks for checking that. I don't think the indexing has to do with this issue. It looks like the spaces in your filter names are throwing this exception, spaces are not allowed in the filter name because of the VCF specifications.
I would also recommend keeping the 10.0 instead of 10 because other users have had that come up as a problem before.
Let me know if this fixes it!
Best,
Genevieve
-
Hi Genevieve,
That seems to have fixed it - I've also pre-emptively changed the other filter names to remove the spaces. It obviously is how GATK4 deals with the strings in the filter names compared to GATK3(.8)
Thanks again,
Graeme
-
Great! Glad that this is fixed and thank you for posting the solution!
-
I am using gatK version 4.2.1.0 for variant filtration but it generates same undefined variable warnings for ReadPosRankSum and MQRankSum.
Command used:
java -jar -Xmx30G gatk.jar VariantFiltration -R reference.fa -V A_raw_snps.vcf -O A_filtered_snps.vcf --filter-name "QD_filter" -filter "QD < 2.0" --filter-name "FS_filter" -filter "FS > 60.0" --filter-name "SOR_filter" -filter "SOR > 10.0" --filter-name "MQRankSum_filter" -filter "MQRankSum<-12.5" --filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum<-8.0" --genotype-filter-expression "DP < 10" --genotype-filter-name "DP_filter" --genotype-filter-expression "GQ < 10" --genotype-filter-name "GQ_filter" -
Those warning messages are not a problem. Not all variant contexts contain all the parameters that you are using to filter therefore those warning messages are issued. Tool works just as expected.
Regards.
Please sign in to leave a comment.
7 comments