VariantFiltration with VariantContext expression runs with a warning "unknown, ambiguous or inaccessible method getGenotype"
Hi! My VCF file contains rows with genotypes '0/1', '1/1', and '1/2'. I want to filter out rows all alternative alleles of which have allele fraction (AF) < 0.05. For example, if allele 1 in genotype '1/2' has AF < 0.05, and allele 2 has AF > 0.05, the row is not filtered out. But if both alternative alleles in genotype '1/2' have AF <0.05, the row is filtered out.
I wrote a JEXL expression for a filter using "JEXL filtering expressions" article. But VariantFiltration runs with the warning "unknown, ambiguous or inaccessible method getGenotype", although I used expressions highly similar to those in the article.
Please, help me to write an appropriate expression for filtering on AF, described in the first passage.
GATK version used:
The Genome Analysis Toolkit (GATK) v4.6.1.0
HTSJDK Version: 4.1.3
VariantFiltration - Picard Version: 3.3.0
Exact command used:
gatk VariantFiltration \
--genotype-filter-name "LowAlleleFrac1Var" \
--genotype-filter-expression "(vc.getGenotype().isHet() || vc.getGenotype().isHomVar()) && (vc.getAD().1 / vc.getDP() < 0.05)" \
--genotype-filter-name "LowAlleleFrac2Var" \
--genotype-filter-expression "vc.getGenotype().isHetNonRef() && (vc.getAD().1 / vc.getDP() < 0.05) && (vc.getAD().2 / vc.getDP() < 0.05)" \
--variant input.vcf \
--output output.vcf
Entire program log:
Using GATK jar /home/administrator/tools/gatk/gatk-package-4.6.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/administrator/tools/gatk/gatk-package-4.6.1.0-local.jar VariantFiltration --genotype-filter-name LowAlleleFrac1Var --genotype-filter-expression (vc.getGenotype().isHet() || vc.getGenotype().isHomVar()) && (vc.getAD().1 / vc.getDP() < 0.05) --genotype-filter-name LowAlleleFrac2Var --genotype-filter-expression vc.getGenotype().isHetNonRef() && (vc.getAD().1 / vc.getDP() < 0.05) && (vc.getAD().2 / vc.getDP() < 0.05) --genotype-filter-name LowRefPL --genotype-filter-expression PL[0] < 50.0 --genotype-filter-name LowDP --genotype-filter-expression DP < 18 --genotype-filter-name HighDP --genotype-filter-expression DP > 80 --filter-name LowQualByDepth --filter-expression QD < 10.0 --filter-name BaseQualBiased --filter-expression BaseQRankSum < 0.0 && BaseQRankSum > 1.65 --filter-name ReadPosBiased --filter-expression ReadPosRankSum < 0.0 && ReadPosRankSum > 1.65 --filter-name BiasedStrandRatio --filter-expression SOR > 5.0 --variant /home/administrator/varcall_results/deepvariant/annotated/Undetermined_S0.vcf --output /home/administrator/varcall_results/deepvariant/filtered/Undetermined_S0.vcf
09:40:54.194 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/administrator/tools/gatk/gatk-package-4.6.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@755b5f30]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@29bbc63c]
SLF4J(W): See https://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@755b5f30]
09:40:54.632 INFO VariantFiltration - ------------------------------------------------------------
09:40:54.640 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.6.1.0
09:40:54.640 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
09:40:54.640 INFO VariantFiltration - Executing as administrator@compute-vm-8-32-500-hdd-1736852799422 on Linux v6.8.0-51-generic amd64
09:40:54.640 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v17.0.14+7-Ubuntu-124.04
09:40:54.641 INFO VariantFiltration - Start Date/Time: February 25, 2025 at 9:40:54 AM ALMT
09:40:54.641 INFO VariantFiltration - ------------------------------------------------------------
09:40:54.641 INFO VariantFiltration - ------------------------------------------------------------
09:40:54.649 INFO VariantFiltration - HTSJDK Version: 4.1.3
09:40:54.650 INFO VariantFiltration - Picard Version: 3.3.0
09:40:54.650 INFO VariantFiltration - Built for Spark Version: 3.5.0
09:40:54.661 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:40:54.661 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:40:54.662 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:40:54.662 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:40:54.662 INFO VariantFiltration - Deflater: IntelDeflater
09:40:54.662 INFO VariantFiltration - Inflater: IntelInflater
09:40:54.663 INFO VariantFiltration - GCS max retries/reopens: 20
09:40:54.663 INFO VariantFiltration - Requester pays: disabled
09:40:54.663 INFO VariantFiltration - Initializing engine
09:40:54.901 INFO FeatureManager - Using codec VCFCodec to read file file:///home/administrator/varcall_results/deepvariant/annotated/Undetermined_S0.vcf
09:40:55.046 INFO VariantFiltration - Done initializing engine
09:40:55.242 INFO ProgressMeter - Starting traversal
09:40:55.243 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
09:40:55.305 WARN JexlEngine - ![3,16]: 'vc.getGenotype().isHetNonRef() && (vc.getAD().1 / vc.getDP() < 0.05) && (vc.getAD().2 / vc.getDP() < 0.05);' unknown, ambiguous or inaccessible method getGenotype
09:40:55.308 WARN JexlEngine - ![3,16]: '(vc.getGenotype().isHet() || vc.getGenotype().isHomVar()) && (vc.getAD().1 / vc.getDP() < 0.05);' unknown, ambiguous or inaccessible method getGenotype
09:40:55.680 INFO ProgressMeter - unmapped 0.0 63 8649.9
09:40:55.681 INFO ProgressMeter - Traversal complete. Processed 63 total variants in 0.0 minutes.
09:40:55.731 INFO VariantFiltration - Shutting down engine
[February 25, 2025 at 9:40:55 AM ALMT] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=201326592
-
Hi again.
In order to reach the Genotype attributes using JEXL you need to add the
.getGenotype(samplename)
invocation. Otherwise variant context object can only reach INFO attributes.
Regards.
-
I did as you suggested and now VariantFiltration runs without the warning, thanks! But it behaves not as I expected. I wanted to check if filtering expressions work fine so I rewrote them to mark rows with AF > 0, therefore all rows with requested genotypes. But it marked only 1 row.
The command:
sample='M08260_9_000000000-LFLJ3_1_'
gatk VariantFiltration \
--genotype-filter-name "LowAlleleFrac1Var" \
--genotype-filter-expression "((vc.getGenotype('${sample}').isHet() || vc.getGenotype('${sample}').isHomVar()) && (! vc.getGenotype('${sample}').isHetNonRef())) && (vc.getGenotype('${sample}').getAD().1 / vc.getGenotype('${sample}').getDP() > 0.0)" \
--genotype-filter-name "LowAlleleFrac2Var" \
--genotype-filter-expression "vc.getGenotype('${sample}').isHetNonRef() && (vc.getGenotype('${sample}').getAD().1 / vc.getGenotype('${sample}').getDP() > 0.0) && (vc.getGenotype('${sample}').getAD().2 / vc.getGenotype('${sample}').getDP() > 0.0)" \
--variant $annotated_vcf_filepath \
--output $filtered_vcf_filepathThe row this command marked:
13 32355095 . A G 174 PASS QD=34.80;SOR=1.022 GT:ACN:AD:BSDP:DP:FT:GQ:PL 1/1:0,2:0,5:0,0,5,0:5:LowAlleleFrac1Var:42:174,15,0
Other rows:
13 32337044 . G C,A 255 PASS BaseQRankSum=-1.090;QD=12.14;ReadPosRankSum=-4.327;SOR=0.495;TYPE=MULTISNV GT:ACN:AD:BSDP:DP:GQ:PL 1/2:0,1,1:7,8,6:8,8,4,0:20:99:556,314,417,314,187,417
13 32337048 . G A 64 PASS BaseQRankSum=0.136;QD=0.01;ReadPosRankSum=-19.770;SOR=0.685 GT:ACN:AD:BSDP:DP:GQ:PL 0/1:1,1:9035,24:12,0,95,0:107:64:418,322,3290
13 32337066 . G A 73 PASS BaseQRankSum=0.846;QD=0.01;ReadPosRankSum=-0.429;SOR=1.027 GT:ACN:AD:BSDP:DP:GQ:PL 0/1:1,1:8214,73:12,6,92,0:110:73:626,522,3380
17 43071077 . T C 123 PASS BaseQRankSum=-0.147;QD=6.47;ReadPosRankSum=1.026;SOR=1.265 GT:ACN:AD:BSDP:DP:GQ:PL 0/1:1,1:13,6:0,6,0,12:18:99:209,54,401
17 43106426 . C G 50 PASS BaseQRankSum=-2.617;QD=3.13;ReadPosRankSum=-3.756;SOR=2.774 GT:ACN:AD:BSDP:DP:GQ:PL 0/1:1,1:13,3:0,4,3,0:7:50:103,21,139Please, help me to rewrite this command in the way that will make it output an expected result.
-
FORMAT/AD and FORMAT/DP fields contain integer values which java treats them as regular integers during math operation.
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 Or with bad mates are filtered)">If you wish to get a fraction out of your math operations you need to cast any of those fields to float using an encapsulation. I put an example below. Casting one is enough to get the math operation to return a floating point number.
(double) vc.getGenotype('${sample}').getAD().1
Also you may need to pay attention to your logical operations. Combining multiple logical statements without order will almost always return an unexpected false or true due to unhandled cases. You may need to put your logical operations into parantheses for those that you wish to get evaluated first.
I hope this helps.
Regards.
-
Thank you!
Please sign in to leave a comment.
4 comments