VariantFiltration undefined variable
AnsweredCan you please provide
a) GATK version used
4.1.8
b) Exact GATK commands used :
gatk --java-options "-Xmx200g" VariantFiltration \
-R ../reference/SBAPGDGG_D3.fa \
-V ../VCFs/output.scaff_6.vcf \
--filter-name "AB_filter" \
--filter-expression "AB < 0.2" \
--filter-name "MQ0_filter" \
--filter-expression "MQ0 > 50" \
-O ../VCFs/43samples_scaff_6_SNP-INDEL_variants_filt.vcf
c) The entire error log if applicable.
The same error is printing to screen for millions of lines (this scaffold is 380 Mb)
08:45:59.303 WARN JexlEngine - ![0,3]: 'MQ0 > 50;' undefined variable MQ0
08:45:59.303 WARN JexlEngine - ![0,2]: 'AB < 0.2;' undefined variable AB
08:45:59.303 WARN JexlEngine - ![0,3]: 'MQ0 > 50;' undefined variable MQ0
08:45:59.303 WARN JexlEngine - ![0,2]: 'AB < 0.2;' undefined variable AB
08:45:59.303 WARN JexlEngine - ![0,3]: 'MQ0 > 50;' undefined variable MQ0
08:45:59.303 WARN JexlEngine - ![0,2]: 'AB < 0.2;' undefined variable AB
I have used this same command many times in previous version of GATK4 within the last year with no warnings.
I am pulling data from my GenomicsDB database with:
gatk SelectVariants \
-R ../reference/SBAPGDGG_D3.fa \
-V gendb://../Tse_scaff_6_database \
-L scaffold_6 \
-O ../VCFs/output.scaff_6.vcf
I create the GenomicsDB database with:
gatk --java-options "-Xmx200g -Xms200g" GenomicsDBImport \
--genomicsdb-workspace-path Tse_scaff_6_database \
-L scaffold_6 \
--sample-name-map Tse_Scaff_6.sample_map \
--tmp-dir=/data/tmp \
--reader-threads 8
Part of the VCF input to VarinatFiltration:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AHP1168 AHP2709 AHP2731 AHP2746 AHP2759 AHP2765 AHP2787 JDC1442 JFP649 MVZ250717 NCSM7464 TP28037 TP
28323 TP28331 TP29441 TP29442 TP29715 TP29723 TP29746 TP29750 TP29950 TP29957 TP29959 TP29967 TP29969 TP29971 TP30028 TP30030 TP30036 TP30046 TP30050 TP30051 TP30052 TP30059 TP30072 TP30
112 TP30782 TP30784 TP30794 TP30799 TP30819 TP30821 TP30849
scaffold_6 1 . T <NON_REF> . PASS . GT:DP:GQ:MIN_DP:PL ./.:0:0:0:0,0,0 ./.:6:18:6:0,18,251 ./.:1:3:1:0,3,45 ./.:2:6:2:0,6,72
./.:2:6:2:0,6,57 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,42 ./.:3:9:3:0,9,99 ./.:14:42:14:0,42,566 ./.:1:3:1:0,3,30 ./.:1:3:1:0,3,45 ./.:7:21:7:0,21,24
6 ./.:9:27:9:0,27,340 ./.:4:12:4:0,12,141 ./.:2:6:2:0,6,57 ./.:1:3:1:0,3,42 ./.:3:9:3:0,9,117 ./.:4:12:4:0,12,141 ./.:1:3:1:0,3,30 ./.:3:9:3:0,
9,97 ./.:1:3:1:0,3,32 ./.:6:18:6:0,18,239 ./.:3:9:3:0,9,113 ./.:1:3:1:0,3,15 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,30 ./.:0:0:0:0,0,0 ./.:3:9:3:0,9,113
./.:2:6:2:0,6,60 ./.:4:12:4:0,12,167 ./.:3:9:3:0,9,114 ./.:2:6:2:0,6,84 ./.:2:6:2:0,6,72 ./.:2:6:2:0,6,87 ./.:6:18:6:0,18,212 ./.:1:3:1:0
,3,21 ./.:4:12:4:0,12,155 ./.:1:3:1:0,3,42 ./.:4:12:4:0,12,156 ./.:16:48:16:0,48,607 ./.:2:6:2:0,6,60 ./.:3:9:3:0,9,77 ./.:3:9:3:0,9,123
scaffold_6 2 . A <NON_REF> . PASS . GT:DP:GQ:MIN_DP:PL ./.:0:0:0:0,0,0 ./.:6:18:6:0,18,251 ./.:1:3:1:0,3,45 ./.:2:6:2:0,6,72
./.:2:6:2:0,6,57 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,42 ./.:3:9:3:0,9,99 ./.:14:42:14:0,42,566 ./.:1:3:1:0,3,30 ./.:1:3:1:0,3,45 ./.:7:21:7:0,21,24
6 ./.:9:27:9:0,27,340 ./.:4:12:4:0,12,141 ./.:2:6:2:0,6,57 ./.:2:6:2:0,6,57 ./.:3:9:3:0,9,117 ./.:4:12:4:0,12,141 ./.:1:3:1:0,3,30 ./.:3:9:3:0,
9,97 ./.:1:3:1:0,3,32 ./.:6:18:6:0,18,239 ./.:3:9:3:0,9,113 ./.:1:3:1:0,3,15 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,30 ./.:0:0:0:0,0,0 ./.:3:9:3:0,9,113
./.:2:6:2:0,6,60 ./.:4:12:4:0,12,167 ./.:3:9:3:0,9,114 ./.:2:6:2:0,6,84 ./.:2:6:2:0,6,72 ./.:2:6:2:0,6,87 ./.:6:18:6:0,18,212 ./.:1:3:1:0
,3,21 ./.:5:15:5:0,15,182 ./.:1:3:1:0,3,42 ./.:4:12:4:0,12,156 ./.:16:48:16:0,48,607 ./.:2:6:2:0,6,60 ./.:3:9:3:0,9,77 ./.:3:9:3:0,9,123
scaffold_6 3 . T <NON_REF> . PASS . GT:DP:GQ:MIN_DP:PL ./.:0:0:0:0,0,0 ./.:7:21:7:0,21,266 ./.:1:3:1:0,3,45 ./.:2:6:2:0,6,72
./.:2:6:2:0,6,57 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,42 ./.:3:9:3:0,9,99 ./.:14:42:14:0,42,566 ./.:1:3:1:0,3,30 ./.:0:0:0:0,0,0 ./.:7:21:7:0,21,246 ./
.:9:27:9:0,27,340 ./.:4:12:4:0,12,141 ./.:2:6:2:0,6,57 ./.:2:6:2:0,6,57 ./.:3:9:3:0,9,117 ./.:4:12:4:0,12,141 ./.:1:3:1:0,3,30 ./.:3:9:3:0,9,97
./.:1:3:1:0,3,32 ./.:6:18:6:0,18,239 ./.:3:9:3:0,9,113 ./.:1:3:1:0,3,15 ./.:0:0:0:0,0,0 ./.:1:3:1:0,3,30 ./.:0:0:0:0,0,0 ./.:3:9:3:0,9,113 ./
.:2:6:2:0,6,60 ./.:4:12:4:0,12,167 ./.:3:9:3:0,9,114 ./.:2:6:2:0,6,84 ./.:2:6:2:0,6,72 ./.:2:6:2:0,6,87 ./.:6:18:6:0,18,212 ./.:1:3:1:0,3,21
./.:5:15:5:0,15,182 ./.:1:3:1:0,3,42 ./.:4:12:4:0,12,156 ./.:16:48:16:0,48,607 ./.:2:6:2:0,6,60 ./.:3:9:3:0,9,77 ./.:3:9:3:0,9,123
s
-
Hi wbsimey, could you provide more information to determine if the AB and MQ0 annotations exist in your VCF? Please share the ##INFO lines in the header of your VCF. Also, the lines of your VCF that you shared above are from non-ref blocks (scaffold_6 1 . T <NON_REF>) because you are selecting variants from a GVCF. It would be more helpful to see lines where there are variants (instead of <NON_REF> you will see the variant allele). Please share an example of those as well, since the AB score and MQ0 should be calculated at those locations.
-
Hello Genevieve,
It looks like the AB and MQ0 annotations are not here. When are these annotations created? I do not think I am doing anything different from previous GATK4 versions and I am using the same data and these two annotations are included in previous vcf files.
I think I figured out the <NON REF> issue - I had slightly different versions of my reference file and used them interchangeably through my pipeline (HaploTypeCaller>GenomicdDBImport>GenotypeGVCFs>VariantFiltration).
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ formulation.">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">scaffold_12 686 . G A 465.24 . AC=7;AF=0.085;AN=82;BaseQRankSum=-2.220e-01;DP=309;ExcessHet=4.2875;FS=0.000;InbreedingCoeff=-0.0094;MLEAC=8;MLEAF=0.098;MQ=55.26;MQRankSu
m=-9.670e-01;QD=11.63;ReadPosRankSum=0.00;SOR=0.556 GT:AD:DP:GQ:PGT:PID:PL:PS 0/0:1,0:1:3:.:.:0,3,32 0/1:4,4:8:99:.:.:137,0,149 0/0:2,0:2:6:.:.:0,6,78 0/0:6,0:6:18:.:.:0,18,239 0|1:
2,2:6:75:0|1:643_C_G:75,0,78:643 0/0:2,0:2:6:.:.:0,6,78 0/0:10,0:10:30:.:.:0,30,391 0/1:4,2:6:34:.:.:34,0,137 0/1:10,3:14:84:.:.:84,0,394 0/0:4,0:4:12:.:.:0,12,155 0/0:2,0:2:6:.:
.:0,6,55 0/0:2,0:2:6:.:.:0,6,84 0/0:7,0:7:21:.:.:0,21,266 0/0:7,0:7:21:.:.:0,21,292 0/0:7,0:7:18:.:.:0,18,270 0/0:3,0:3:9:.:.:0,9,99 0/0:2,0:2:0:.:.:0,0,3 0/0:14,0:14:42:.:.:0,42,
573 0/0:4,0:4:12:.:.:0,12,141 0/0:5,0:5:15:.:.:0,15,194 ./.:2,0:2:.:.:.:0,0,0 0/0:29,0:29:81:.:.:0,81,1215 0/0:7,0:7:21:.:.:0,21,224 0/0:10,0:10:30:.:.:0,30,372 ./.:0,0:0:
.:.:.:0,0,0 0/0:21,0:21:60:.:.:0,60,696 0/0:1,0:1:3:.:.:0,3,29 0/0:6,0:6:18:.:.:0,18,224 0/0:4,0:4:12:.:.:0,12,141 0/0:35,0:35:99:.:.:0,102,1280 0/0:3,0:3:9:.:.:0,9,81 0/0:8,0:8:21
:.:.:0,21,315 0/0:9,0:9:24:.:.:0,24,360 0/1:1,2:3:36:.:.:71,0,36 0/0:5,0:5:15:.:.:0,15,175 0|1:4,2:6:72:0|1:643_C_G:72,0,161:643 0/0:5,0:5:15:.:.:0,15,182 0/0:14,0:14:36
:.:.:0,36,540 0/0:3,0:3:9:.:.:0,9,85 0/0:17,0:17:51:.:.:0,51,646 0/1:1,1:2:34:.:.:38,0,34 0/0:3,0:3:9:.:.:0,9,104 0/0:3,0:3:0:.:.:0,0,44 -
Hi wbsimey, you mentioned that you did this same workflow before with an older version of GATK. What version was that? Was it with the same data?
I also noticed that you are using SelectVariants to select from your GenomicsDB database and not from the VCF after Genotype GCVFs is run. Are the MQ0 and AB annotations in the file after Genotype GCVFs?
Also can you check if these annotations exist in the unfiltered VCF following HaplotypeCaller? Please check in the output from the older version of GATK and now.
-
Hello Wbsimey and Genevieve,
I face the same problem, did you find the solution?
Best regards,
Astrinaki Maria
-
astrinaki_maria if you still have not solved your issue, go ahead and create a new post and we can help you there. Here is an article with our forum guidelines: https://gatk.broadinstitute.org/hc/en-us/articles/360053845952-Forum-Guidelines
Please sign in to leave a comment.
5 comments