SelectVariants JEXL expression works in somatic matched pair but not germline cohort
Running on one of the matched pairs used in this cohort (command below) works fine. When I run it on the jointcalled cohort, it creates the filtered VCF and index file, but produces an error. Do I need to do something different for cohort VCFs? Can I ignore the error? how do I correct the error?
Thank you.
Matched pair command:
gatk SelectVariants -R /mnt/data/rbueno/ref_genomes/human/BWA-MEM/b37_decoys/hs37d5.fa -L /mnt/data/rbueno/ref_genomes/human/hs37d5_interval_files/hs37d5.genome.exons.gatk-style.intervals -V GNT923.novel.somatic.annotated.vcf --select-type-to-include SNP -select "AC >= 2 && DP >= 30 && AF <= 0.01" -O filtered.GNT923.novel.somatic.annotated.vcf
Cohort command and stack trace:
(REQUIRED) Please provide:
a) GATK version used: Genome Analysis Toolkit (GATK) v4.1.8.0
b) Exact command used:
gatk SelectVariants -R /mnt/data/rbueno/ref_genomes/human/BWA-MEM/b37_decoys/hs37d5.fa -L /mnt/data/rbueno/ref_genomes/human/hs37d5_interval_files/hs37d5.genome.exons.gatk-style.intervals -V germline.Aim1.vcf --select-type-to-include SNP -select "AC >= 2 && DP >= 30 && AF <= 0.01" -O filtered.germline.Aim1.vcf --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
c) Entire error log:
***********************************************************************
A USER ERROR has occurred: Invalid JEXL expression detected for select-0
See https://www.broadinstitute.org/gatk/guide/article?id=1255 for documentation on using JEXL in GATK
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException: Invalid JEXL expression detected for select-0
See https://www.broadinstitute.org/gatk/guide/article?id=1255 for documentation on using JEXL in GATK
at org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants.apply(SelectVariants.java:640)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Caused by: java.lang.IllegalArgumentException: Invalid JEXL expression detected for select-0
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:193)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:95)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:15)
at htsjdk.variant.variantcontext.VariantContextUtils.match(VariantContextUtils.java:298)
at org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants.apply(SelectVariants.java:632)
... 21 more
Caused by: org.apache.commons.jexl2.JexlException: ![0,7]: 'AC >= 2 && DP >= 30 && AF <= 0.01;' >= error
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:787)
at org.apache.commons.jexl2.parser.ASTGENode.jjtAccept(ASTGENode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:449)
at org.apache.commons.jexl2.parser.ASTAndNode.jjtAccept(ASTAndNode.java:18)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:449)
at org.apache.commons.jexl2.parser.ASTAndNode.jjtAccept(ASTAndNode.java:18)
at org.apache.commons.jexl2.Interpreter.interpret(Interpreter.java:232)
at org.apache.commons.jexl2.ExpressionImpl.evaluate(ExpressionImpl.java:65)
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:186)
... 25 more
Caused by: java.lang.ArithmeticException: Long coercion: java.util.ArrayList:([4, 6])
at org.apache.commons.jexl2.JexlArithmetic.toLong(JexlArithmetic.java:914)
at org.apache.commons.jexl2.JexlArithmetic.compare(JexlArithmetic.java:718)
at org.apache.commons.jexl2.JexlArithmetic.greaterThanOrEqual(JexlArithmetic.java:824)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:785)
... 33 more
-
Hi Mark Godek, are you running these commands any differently, perhaps the cohort command through bash? Also, can you confirm that those annotations are present in both files? It looks like this issue may be caused by your JEXL expression not being correctly passed in your GATK command.
Here is some documentation about JEXL with GATK: https://gatk.broadinstitute.org/hc/en-us/articles/360035891011-JEXL-filtering-expressions
-
I'm following the GATK best practices for somatic and germline SNP discovery.
For cohort command through bash, do you mean using gatk GenotypeGVCFs to call variants of a cohort from a genomics DB?
Here are two variants from GNT923.novel.somatic.annotated.vcf and germline.Aim1.vcf used in the commands above, respectively.
Somatic
1 891292 . T C . base_qual;haplotype;normal_artifact;panel_of_normals;weak_evidence AS_FilterStatus=weak_evidence,base_qual;AS_SB_TABLE=161,5|14,0;DP=189;ECNT=2;FUNCOTATION=[NOC2L|hg19|chr1|891292|891292|INTRON||SNP|T|T|C|g.chr1:891292T>C|ENST00000327044.6|-|||c.e6-11A>G|||0.5960099750623441|CTTCACCCCCTCCCCTCTTAC|KLHL17_ENST00000338591.3_FIVE_PRIME_FLANK/NOC2L_ENST00000487214.1_INTRON|||||||||||||||||||||||||90|biliary_tract(2)_%7C_breast(11)_%7C_central_nervous_system(44)_%7C_large_intestine(11)_%7C_pancreas(22)|||||||BC003555|NM_015658.3|NP_056473.2|HGNC:24517|NOC2_%20_like_%20_nucleolar_%20_associated_%20_transcriptional_%20_repressor|Approved|gene_%20_with_%20_protein_%20_product|protein-coding_%20_gene||"nucleolar_%20_complex_%20_associated_%20_2_%20_homolog_%20_(S._%20_cerevisiae)"|DKFZP564C186_%2C__%20_NET7_%2C__%20_NET15_%2C__%20_NIR_%2C__%20_PPP1R112|"novel_%20_INHAT_%20_repressor"_%2C__%20_"protein_%20_phosphatase_%20_1_%2C__%20_regulatory_%20_subunit_%20_12"|1p36.33|2016-02-15||2016-02-15|AL050019||26155|ENSG00000188976|20123734_%2C__%20_22363708|NM_015658|694|Protein_%20_phosphatase_%20_1_%20_regulatory_%20_subunits|CCDS3|OTTHUMG00000040720|26155|610770|NM_015658|Q9Y3T9|ENSG00000188976|uc001abz.5|hg19|OREG1492698|Type_%3D_TRANSCRIPTION_%20_FACTOR_%20_BINDING_%20_SITE_%7C_Gene_Symbol_%3D_NOC2L_%7C_Gene_ID_%3D_ENSG00000188976_%7C_Gene_Source_%3D_ENSEMBL_%7C_Regulatory_Element_Symbol_%3D_EGR1_%7C_Regulatory_Element_ID_%3D_ENST00000239938_%7C_Regulatory_Element_Source_%3D_ENSEMBL_%7C_PMID_%3D_18971253_%7C_Dataset_%3D_PAZAR|NOC2L_HUMAN||Q5SVA3_%7C_Q9BTN6|Q9Y3T9|apoptotic_%20_process_%20_(GO:0006915)_%7C_cellular_%20_response_%20_to_%20_UV_%20_(GO:0034644)_%7C_chromatin_%20_assembly_%20_(GO:0031497)_%7C_negative_%20_regulation_%20_of_%20_B_%20_cell_%20_apoptotic_%20_process_%20_(GO:0002903)_%7C_negative_%20_regulation_%20_of_%20_histone_%20_acetylation_%20_(GO:0035067)_%7C_negative_%20_regulation_%20_of_%20_intrinsic_%20_apoptotic_%20_signaling_%20_pathway_%20_(GO:2001243)_%7C_negative_%20_regulation_%20_of_%20_transcription_%20_from_%20_RNA_%20_polymerase_%20_II_%20_promoter_%20_(GO:0000122)_%7C_nucleolus_%20_to_%20_nucleoplasm_%20_transport_%20_(GO:0032066)_%7C_transcription_%2C__%20_DNA-templated_%20_(GO:0006351)|nucleolus_%20_(GO:0005730)_%7C_nucleoplasm_%20_(GO:0005654)_%7C_nucleus_%20_(GO:0005634)|chromatin_%20_binding_%20_(GO:0003682)_%7C_histone_%20_binding_%20_(GO:0042393)_%7C_nucleosome_%20_binding_%20_(GO:0031491)_%7C_poly(A)_%20_RNA_%20_binding_%20_(GO:0044822)_%7C_repressing_%20_transcription_%20_factor_%20_binding_%20_(GO:0070491)_%7C_transcription_%20_corepressor_%20_activity_%20_(GO:0003714)|||||||||||||||||||false|false||false|false||false|false|false||false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|||false|false||false||false||false|false|false||false|||false|||];GERMQ=93;MBQ=34,11;MFRL=214,213;MMQ=60,60;MPOS=17;NALOD=-3.866;NLOD=18.17;PON;POPAF=3.91;TLOD=5.15 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|1:72,10:0.087:82:19,1:36,0:0|1:891292_T_C:891292:69,3,10,0 0|0:94,4:0.038:98:12,2:53,0:0|1:891292_T_C:891292:92,2,4,0
Germline
1 179924563 . C G 56.36 PASS AC=1;AF=0.031;AN=32;BaseQRankSum=-0.842;DP=58;ExcessHet=3.0103;FS=0;FUNCOTATION=[CEP350|hg19|chr1|179924563|179924563|INTRON||SNP|C|C|G|g.chr1:179924563C>G|ENST00000367607.3|+|||c.e1+286C>G|||0.5610972568578554|GGTAAGATACCTCTAGCCTTT|RP11-533E19.5_ENST00000567904.1_FIVE_PRIME_FLANK|||||||||||||||||||||||||31|breast(1)_%7C_central_nervous_system(22)_%7C_oesophagus(8)|||||||AL645487|NM_014810.4|NP_055625.4|HGNC:24238|centrosomal_%20_protein_%20_350|Approved|gene_%20_with_%20_protein_%20_product|protein-coding_%20_gene||"centrosomal_%20_protein_%20_350kDa"|KIAA0480_%2C__%20_CAP350|"centrosome_%20_associated_%20_protein_%20_350"|1q25.2|2016-03-30||2016-03-30|AF287356||9857|ENSG00000135837|16314388_%2C__%20_15615782|NM_014810|||CCDS1336|OTTHUMG00000035269|9857||NM_014810|Q5VT06|ENSG00000135837|uc001gnt.4|hg19|OREG1268027|Type_%3D_TRANSCRIPTION_%20_FACTOR_%20_BINDING_%20_SITE_%7C_Gene_Symbol_%3D_CEP350_%7C_Gene_ID_%3D_ENSG00000135837_%7C_Gene_Source_%3D_ENSEMBL_%7C_Regulatory_Element_Symbol_%3D_SMARCA4_%7C_Regulatory_Element_ID_%3D_ENST00000344626_%7C_Regulatory_Element_Source_%3D_ENSEMBL_%7C_PMID_%3D_18971253_%7C_Dataset_%3D_PAZAR|CE350_HUMAN||O75068_%7C_Q8TDK3_%7C_Q8WY20|Q5VT06|microtubule_%20_anchoring_%20_(GO:0034453)|centrosome_%20_(GO:0005813)_%7C_cytoplasm_%20_(GO:0005737)_%7C_membrane_%20_(GO:0016020)_%7C_nucleus_%20_(GO:0005634)||||||||||||||||||||true|false|0.9984_%2C_0.001597|false|false|1|false|false|false|CEP350:9857|false|false|true|true|true|false|false|false|false|false|false|false|false|false|false|false|false|false|false|false|143883644|179924563|false|false|0|false|0|false|0.99684_%2C_0.00315956|false|false|false|SNV|true|0x050000080005040036000100|1|false|134|rs143883644|];InbreedingCoeff=-0.1083;MLEAC=1;MLEAF=0.031;MQ=60;MQRankSum=0;QD=9.39;ReadPosRankSum=0.842;SOR=0.307;VQSLOD=19.42;culprit=MQ GT:AD:DP:GQ:PL 0/0:2,0:2:6:0,6,82 0/0:6,0:6:15:0,15,225 0/0:3,0:3:9:0,9,98 0/0:6,0:6:18:0,18,229 0/0:2,0:2:6:0,6,43 0/0:5,0:5:12:0,12,180 0/0:1,0:1:3:0,3,40 0/0:2,0:2:6:0,6,61 0/0:6,0:6:15:0,15,225 0/0:9,0:9:27:0,27,338 0/0:2,0:2:6:0,6,60 0/0:1,0:1:3:0,3,20 0/0:2,0:2:6:0,6,66 0/0:2,0:2:6:0,6,64 0/0:3,0:3:9:0,9,102 0/1:4,2:6:68:68,0,150
It looks like the somatic variant is lacking an AC (allele count) annotation, but it still finishes properly. Germline has AC=1.
DP (Allele Depth) is 189 for somatic, 58 for germline. There are also DP values for the matched samples in the somatic and cohort in the germline.
AF (allele frequency) is 0.031 in germline and the somatic has PopAF and AF for the samples only.
If anything, I would expect the somatic variant to be the one having trouble. I just ran both commands in interactive mode only changing the input VCF and still the germline is the only one that throws an error.
I've even tried a simpler JEXL expression on a less filtered version of the germline file (command below) to see if it was too filtered or the expression was too complex.
gatk SelectVariants -R /mnt/data/rbueno/ref_genomes/human/BWA-MEM/b37_decoys/hs37d5.fa -V annotated.b37names.germline.vcf -select "AC >= 2" --select-type-to-include SNP -O test.selectgermlinevariants.vcf
The above command still yielded the "Invalid JEXL expression detected for select-0". Is it possible to filter germline cohort VCFs with SelectVariants?
Thanks again.
-
Hi Mark Godek, is there a reason why you are not using VariantFiltration for the filtering step? It seems like these JEXL expressions would be better in that tool, and you could use SelectVariants after for getting just the SNPs.
-
Hi Genevieve Brandt (she/her),
I'm using SelectVariants because it was recommended under heading 6 of the Variant Call Format page of the GATK website.
Actually, rereading it, it does say it works on cohorts, so I still don't know why I'm getting a JEXL expression error with SelectVariants but I will give VariantFiltration a shot and report back.
Thanks.
-
Mark Godek, yes there are a lot of different tools in the GATK! I think VariantFiltration will work better for using multiple different filters, and it has notes in the documentation on how to properly submit the command with the filters you are using.
Let me know if the error pops up there too.
-
Hi Genevieve Brandt (she/her),
I tried VariantFiltration and it works, however it throws a warning as it is running. I'm not sure if this is a big deal because it does create the output VCF and adds the filter name to the FILTER column when my AC is over 1.
Command:
gatk VariantFiltration -R /mnt/data/rbueno/ref_genomes/human/BWA-MEM/b37_decoys/hs37d5.fa -V /data/rbueno/analysis_files/MedGenome_FamilialMPMs/old_pipeline/analysis/germline/novel.germline.annotated.reselect.vcf -O /data/rbueno/analysis_files/MedGenome_FamilialMPMs/old_pipeline/analysis/germline/novel.germline.annotated.reselect.quality_reads.vcf --filter-name "Allele_Depth" --filter-expression "AC > 1"
Warning:
09:01:04.779 WARN JexlEngine - ![0,6]: 'AC > 1;' > error
Output:
#CHROM POS ID REF ALT QUAL FILTER INFO
1 13065 . G A 33.58 VQSRTrancheSNP99.90to100.00 AC=1
1 13302 . C T 124.39 Allele_Depth;VQSRTrancheSNP99.90to100.00 AC=2
1 13418 . G A 42.98 VQSRTrancheSNP99.90to100.00 AC=1
1 13621 . A G 44.56 Allele_Depth;VQSRTrancheSNP99.90to100.00 AC=2Is there a way to exclude filtered variants from the output?
The documentation says "Filtered records will be preserved in the output unless their removal is requested in the command line." but VariantFiltration lacks the --exclude-filtered option that SelectVariants has.
Thanks.
-
I found a post on Biostars on excluding based on SelectVariants and it seems to have worked when I tried it.
gatk SelectVariants -R /mnt/data/rbueno/ref_genomes/human/BWA-MEM/b37_decoys/hs37d5.fa --variant /data/rbueno/analysis_files/MedGenome_FamilialMPMs/old_pipeline/analysis/germline/novel.germline.annotated.reselect.invalidate_old_filters.vcf -O /data/rbueno/analysis_files/MedGenome_FamilialMPMs/old_pipeline/analysis/germline/filtered.vcf -select 'FILTER != "Allele_Depth"' --exclude-filtered true
So SelectVariants will filter my cohort file, but not using JEXL expressions.
Still, if there is a way to do this with VariantFiltration I'd be interested how to do it with a single command.
Thanks again.
-
Mark Godek Thanks for the update about how you are filtering. Yes, usually VariantFiltration and SelectVariants are used together.
However, if another user has found a workaround they can chime into the discussion!
Please sign in to leave a comment.
8 comments