Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

NumberFormatException Error in VariantFiltration

0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Erin C, the whole line of the first error message is this:

    java.lang.NumberFormatException: For input string: "11.22"

    And for the second command, it is this:

    java.lang.NumberFormatException: For input string: "17.26"

    It looks like there may be a problem with the filters looking for a number and finding a string value. Could you find the variants that have this problem and post them here? Also, have you used the same version of GATK for your whole analysis? Can you also validate this VCF input with our ValidateVariants tool?

    0
    Comment actions Permalink
  • Avatar
    Erin C

    Dear Genevieve,

    Thank you!

    a) Could you find the variants that have this problem and post them here?

    I've copied the variants with "11.22" from the first command and "17.26" from the second command here. There are several for each - is there a way to tell which exact variant is causing the issue? 11.22 is found on several QUAL and QD values. 17.26 is found in QD values. This made me realize that maybe the issue is with my QD filter. So, I changed the QD filter expression from QD < 2 to QD < 2.0. This resolved the error message I was getting. However, I am confused about why previously commenting out the QUAL filter expression also resolved the error message while the QD filter still contained an integer...so I want to make sure I am not missing something.

    first command, variants from input file including "11.22" copied here:

    297189:Pf3D7_13_v3 2885878 . G A 3647.45 . AC=14;AF=0.467;AN=30;BaseQRankSum=2.58;DP=475;ExcessHet=10.1934;FS=20.472;InbreedingCoeff=-0.4274;MLEAC=14;MLEAF=0.467;MQ=39.35;MQRankSum=-1.583e+00;QD=11.22;ReadPosRankSum=-2.380e-01;SOR=0.355 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,18:18:54:.:.:580,54,0 0/1:2,13:15:45:.:.:393,0,45 0/1:5,7:12:99:.:.:140,0,114 ./.:0,0:0:.:.:.:0,0,0 0/0:6,0:6:18:.:.:0,18,160 ./.:1,0:1:.:.:.:0,0,0 0/1:1,4:5:9:.:.:165,0,9 0/0:4,0:4:0:.:.:0,0,55 0/1:24,14:38:99:.:.:354,0,857 0/1:54,7:61:25:.:.:25,0,1857 0/1:13,10:23:99:0|1:2885878_G_A:302,0,451 0/0:13,0:13:36:.:.:0,36,540 0/1:21,13:34:99:0|1:2885878_G_A:265,0,742 ./.:6,0:6:.:.:.:0,0,0 0/1:66,10:76:75:.:.:75,0,1993 1/1:2,14:16:4:.:.:583,4,0 0/1:1,3:4:33:0|1:2885878_G_A:123,0,33 0/1:3,20:23:37:.:.:710,0,37

    299664:Pf3D7_13_v3 2915063 . T C 673.35 . AC=3;AF=0.083;AN=36;BaseQRankSum=-1.835e+00;DP=324;ExcessHet=3.7667;FS=1.036;InbreedingCoeff=-0.1264;MLEAC=3;MLEAF=0.083;MQ=36.70;MQRankSum=2.03;QD=11.22;ReadPosRankSum=-1.276e+00;SOR=0.490 GT:AD:DP:GQ:PGT:PID:PL 0/0:13,0:13:11:.:.:0,11,357 0/0:20,0:20:18:.:.:0,18,487 0/1:13,2:15:35:0|1:2915063_T_C:35,0,442 0/0:7,0:7:21:.:.:0,21,188 0/0:2,0:2:6:.:.:0,6,53 0/0:5,0:5:15:.:.:0,15,126 0/0:5,0:5:12:.:.:0,12,180 0/0:13,0:13:12:.:.:0,12,180 0/0:14,0:14:42:.:.:0,42,432 0/1:9,4:13:99:0|1:2915063_T_C:132,0,326 0/0:64,0:64:63:.:.:0,63,1616 0/0:20,0:20:48:.:.:0,48,720 0/0:30,0:30:0:.:.:0,0,695 0/0:13,0:13:27:.:.:0,27,4050/1:15,17:32:99:0|1:2915063_T_C:558,0,580 0/0:6,0:6:15:.:.:0,15,225 0/0:19,0:19:12:.:.:0,12,475 0/0:32,0:32:93:.:.:0,93,906

    300879:Pf3D7_14_v3 2747 . A T 11.22 . AC=2;AF=0.091;AN=22;DP=113;ExcessHet=0.1296;FS=0.000;InbreedingCoeff=0.2811;MLEAC=1;MLEAF=0.045;MQ=35.61;QD=11.22;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,1:1:3:1|1:2719_A_G:45,3,0 0/0:11,0:11:0:.:.:0,0,127 ./.:0,0:0:.:.:.:0,0,0 0/0:4,0:4:12:.:.:0,12,111 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 ./.:7,0:7:.:.:.:0,0,0 0/0:4,0:4:12:.:.:0,12,113 0/0:13,0:13:11:.:.:0,11,358 0/0:25,0:25:69:.:.:0,69,1035 0/0:10,0:10:30:.:.:0,30,292 0/0:3,0:3:0:.:.:0,0,35 0/0:7,0:7:6:.:.:0,6,90 0/0:1,0:1:3:.:.:0,3,26 ./.:9,0:9:.:.:.:0,0,0 ./.:9,0:9:.:.:.:0,0,0 0/0:7,0:7:21:.:.:0,21,198

    305336:Pf3D7_14_v3 228413 . A T 1313.31 . AC=6;AF=0.176;AN=34;BaseQRankSum=-1.043e+00;DP=414;ExcessHet=0.0955;FS=0.000;InbreedingCoeff=0.5376;MLEAC=6;MLEAF=0.176;MQ=59.94;MQRankSum=0.00;QD=11.22;ReadPosRankSum=0.070;SOR=0.706 GT:AD:DP:GQ:PGT:PID:PL 0/0:28,0:28:53:.:.:0,53,829 0/0:13,0:13:36:.:.:0,36,540 1/1:0,8:8:24:1|1:228391_C_T:360,24,0 0/0:13,0:13:39:.:.:0,39,376 ./.:0,0:0:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,28 0/0:14,0:14:39:.:.:0,39,5851/1:0,8:8:27:1|1:228391_C_T:393,27,0 0/0:36,0:36:99:.:.:0,99,1177 0/0:33,0:33:84:.:.:0,84,1260 0/0:22,0:22:57:.:.:0,57,855 0/0:4,0:4:12:.:.:0,12,119 0/0:37,0:37:99:.:.:0,99,1312 0/1:15,9:24:99:0|1:228391_C_T:333,0,608 0/0:30,0:30:81:.:.:0,81,1215 0/0:27,0:27:72:.:.:0,72,1080 0/0:39,0:39:99:.:.:0,99,1077 0/1:65,12:77:99:0|1:228391_C_T:305,0,2735

    306967:Pf3D7_14_v3 521885 . T A 2143.23 . AC=8;AF=0.250;AN=32;BaseQRankSum=-3.165e+00;DP=371;ExcessHet=14.1891;FS=72.419;InbreedingCoeff=-0.5334;MLEAC=11;MLEAF=0.344;MQ=50.62;MQRankSum=0.620;QD=11.22;ReadPosRankSum=1.99;SOR=3.648 GT:AD:DP:GQ:PGT:PID:PL 0/0:27,0:27:0:.:.:0,0,229 0/0:13,0:13:0:.:.:0,0,207 0/0:7,0:7:24:.:.:0,24,333 0/0:8,0:8:5:.:.:0,5,184 ./.:1,0:1:.:.:.:0,0,0 ./.:9,0:9:.:.:.:0,0,0 0/0:7,0:7:0:.:.:0,0,151 0/0:11,0:11:0:.:.:0,0,157 0/1:18,14:32:99:0|1:521885_T_A:364,0,715 0/1:11,9:20:99:0|1:521885_T_A:265,0,455 0/1:13,15:28:99:0|1:521885_T_A:430,0,571 0/1:14,6:20:99:0|1:521885_T_A:208,0,1281 0/0:39,0:39:0:.:.:0,0,474 0/1:7,5:12:99:0|1:521885_T_A:175,0,263 0/1:14,13:27:99:0|1:521885_T_A:330,0,592 0/1:14,9:23:99:0|1:521885_T_A:166,0,567 0/0:32,0:32:2:.:.:0,2,772 0/1:14,15:29:99:0|1:521885_T_A:258,0,388

    308570:Pf3D7_14_v3 851672 . T A 2749.89 . AC=4;AF=0.111;AN=36;BaseQRankSum=0.812;DP=732;ExcessHet=3.8134;FS=3.526;InbreedingCoeff=-0.1321;MLEAC=4;MLEAF=0.111;MQ=60.00;MQRankSum=0.00;QD=11.22;ReadPosRankSum=0.160;SOR=0.616 GT:AD:DP:GQ:PL 0/1:35,49:84:99:1342,0,893 0/0:38,0:38:99:0,99,1361 0/1:14,20:34:99:530,0,335 0/0:34,0:34:90:0,90,1350 0/0:2,0:2:6:0,6,53 0/0:13,0:13:33:0,33,495 0/0:37,0:37:99:0,99,1165 0/0:32,0:32:87:0,87,1305 0/0:49,0:49:99:0,114,1621 0/0:35,0:35:99:0,103,1123 0/0:38,0:38:99:0,99,1126 0/0:34,0:34:99:0,102,1098 0/0:46,0:46:99:0,110,1305 0/1:29,22:51:99:513,0,753 0/0:44,0:44:99:0,102,1216 0/0:36,0:36:99:0,99,1069 0/0:41,0:41:99:0,103,1277 0/1:54,22:76:99:424,0,1375

    314019:Pf3D7_14_v3 1873384 . T G 863.73 . AC=1;AF=0.028;AN=36;BaseQRankSum=-5.789e+00;DP=710;ExcessHet=3.0103;FS=4.404;InbreedingCoeff=-0.0319;MLEAC=1;MLEAF=0.028;MQ=60.00;MQRankSum=0.00;QD=11.22;ReadPosRankSum=0.168;SOR=0.527 GT:AD:DP:GQ:PL 0/0:46,0:46:99:0,105,1575 0/0:41,0:41:99:0,102,1192 0/0:36,0:36:90:0,90,1350 0/0:29,0:29:81:0,81,1215 0/0:3,0:3:9:0,9,74 0/0:10,0:10:30:0,30,293 0/0:35,0:35:90:0,90,1350 0/0:30,0:30:81:0,81,1215 0/0:49,0:49:99:0,101,1271 0/1:37,40:77:99:900,0,937 0/0:39,0:39:99:0,99,1262 0/0:39,0:39:99:0,99,1452 0/0:43,0:43:99:0,111,1191 0/0:35,0:35:87:0,87,1305 0/0:47,0:47:99:0,105,1345 0/0:45,0:45:99:0,120,1800 0/0:46,0:46:99:0,99,1234 0/0:59,0:59:99:0,101,1800

    320013:Pf3D7_14_v3 2987600 . G T 347.93 . AC=1;AF=0.029;AN=34;BaseQRankSum=3.28;DP=576;ExcessHet=3.0103;FS=3.256;InbreedingCoeff=-0.0308;MLEAC=1;MLEAF=0.029;MQ=60.00;MQRankSum=0.00;QD=11.22;ReadPosRankSum=0.397;SOR=1.473 GT:AD:DP:GQ:PL 0/0:38,0:38:99:0,99,1055 0/1:16,15:31:99:384,0,365 0/0:26,0:26:72:0,72,1080 0/0:20,0:20:60:0,60,596 ./.:0,0:0:.:0,0,0 0/0:6,0:6:18:0,18,160 0/0:35,0:35:84:0,84,1260 0/0:23,0:23:40:0,40,593 0/0:42,0:42:99:0,99,1267 0/0:43,0:43:99:0,102,1382 0/0:26,0:26:75:0,75,1125 0/0:36,0:36:99:0,99,1184 0/0:49,0:49:99:0,120,1800 0/0:30,0:30:81:0,81,939 0/0:40,0:40:99:0,100,1131 0/0:36,0:36:99:0,99,1163 0/0:38,0:38:96:0,96,1038 0/0:57,0:57:99:0,106,1529

    323250:Pf3D7_14_v3 3278635 . A C 11.22 . AC=2;AF=0.067;AN=30;DP=143;ExcessHet=0.4742;FS=0.000;InbreedingCoeff=0.1202;MLEAC=1;MLEAF=0.033;MQ=23.00;QD=11.22;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL0/0:29,0:29:0:.:.:0,0,684 1/1:0,1:1:3:1|1:3278632_A_G:45,3,0 0/0:7,0:7:0:.:.:0,0,128 0/0:3,0:3:9:.:.:0,9,86 0/0:1,0:1:3:.:.:0,3,24 ./.:0,0:0:.:.:.:0,0,0 0/0:2,0:2:6:.:.:0,6,63 0/0:6,0:6:6:.:.:0,6,90 0/0:7,0:7:21:.:.:0,21,222 0/0:5,0:5:15:.:.:0,15,156 0/0:14,0:14:0:.:.:0,0,322 ./.:3,0:3:.:.:.:0,0,0 0/0:3,0:3:9:.:.:0,9,90 0/0:6,0:6:0:.:.:0,0,67 0/0:44,0:44:90:.:.:0,90,1350 ./.:3,0:3:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,30 0/0:8,0:8:21:.:.:0,21,315

    for second command, variants from input file including "17.26" copied here:

    150831:Pf3D7_14_v3 1768508 . TTATA T,TTATATA,TTA,TTATATATA 5090.42 . AC=7,4,8,3;AF=0.206,0.118,0.235,0.088;AN=34;BaseQRankSum=0.294;DP=528;ExcessHet=12.9095;FS=2.242;InbreedingCoeff=-0.2715;MLEAC=5,4,7,3;MLEAF=0.147,0.118,0.206,0.088;MQ=60.04;MQRankSum=0.126;QD=17.26;ReadPosRankSum=-1.480e-01;SOR=0.854 GT:AD:DP:GQ:PL 0/4:9,0,3,0,7:19:99:330,283,518,120,320,264,283,518,320,518,0,281,132,281,308 3/3:1,0,0,9,0:10:3:199,202,226,202,226,226,3,27,27,0,202,226,226,27,226 0/1:2,2,0,0,0:4:77:79,0,77,85,84,169,85,84,169,169,85,84,169,169,169 0/3:2,2,0,8,0:12:2:204,139,236,204,218,271,0,2,51,14,204,218,271,51,271 ./.:0,0,0,0,0:0:.:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0/0:5,0,0,0,0:5:6:0,6,90,6,90,90,6,90,90,90,6,90,90,90,90 0/3:2,0,0,6,0:8:35:131,137,190,137,190,190,0,53,53,35,137,190,190,53,190 1/1:0,6,0,0,0:6:19:269,19,0,269,19,269,269,19,269,269,269,19,269,269,269 0/2:33,0,6,2,0:41:25:25,164,1029,0,764,726,119,1015,710,1133,164,1029,764,1015,1029 0/3:9,5,0,30,0:44:99:691,563,1008,712,934,1043,0,138,255,131,712,934,1043,255,1043 1/3:3,11,0,23,0:37:99:832,459,600,798,586,888,198,0,264,160,798,586,888,264,888 0/2:24,0,5,4,0:33:34:34,133,758,0,560,541,44,682,453,747,133,758,560,682,758 0/2:5,0,11,0,3:19:25:315,295,410,0,100,47,295,410,100,410,177,313,25,313,324 2/3:1,0,9,6,0:16:98:301,337,443,98,159,161,223,317,0,355,337,443,159,317,443 1/1:0,0,0,0,0:13:0:225,50,0,50,0,0,50,0,0,0,50,0,0,0,0 4/4:0,0,0,0,3:15:46:385,262,220,262,220,220,262,220,220,220,50,46,46,46,0 0/3:2,0,0,13,0:15:8:270,276,323,276,323,323,0,47,47,8,276,323,323,47,323 0/1:13,11,4,0,0:30:99:396,0,631,339,202,603,444,658,649,1096,444,658,649,1096,1096

    155840:Pf3D7_14_v3 2826567 . TATAA T 120.83 . AC=2;AF=0.056;AN=36;DP=536;ExcessHet=0.0625;FS=0.000;InbreedingCoeff=0.8373;MLEAC=2;MLEAF=0.056;MQ=61.64;QD=17.26;SOR=4.174 GT:AD:DP:GQ:PL 0/0:38,0:38:99:0,102,1345 0/0:22,0:22:45:0,45,675 0/0:21,0:21:51:0,51,765 1/1:0,7:7:20:180,20,0 0/0:1,0:1:3:0,3,27 0/0:10,0:10:24:0,24,360 0/0:12,0:12:21:0,21,315 0/0:13,0:13:33:0,33,495 0/0:58,0:58:99:0,112,1695 0/0:41,0:41:99:0,99,1327 0/0:35,0:35:50:0,50,987 0/0:43,0:43:99:0,99,1341 0/0:41,0:41:99:0,102,1259 0/0:20,0:20:48:0,48,720 0/0:38,0:38:99:0,99,1485 0/0:36,0:36:99:0,99,1485 0/0:30,0:30:78:0,78,1170 0/0:64,0:64:99:0,113,1726

    B) Also, have you used the same version of GATK for your whole analysis?

    Yes - I used GATK 4.0.11.0 for each step in the analysis.

    C) Can you also validate this VCF input with our ValidateVariants tool?

    I believe I validated the variants using the command copied below, but I am a little confused about the output of the tool - I assume if there are no error messages and none of the variants are filtered out, that means they are correctly formatted, but please let me know if I am missing something of if I should use more strict Validation criteria.

    ##Validate SNPs
    java -jar /opt/biotools/GenomeAnalysisTK/4.0.11.0/gatk-package-4.0.11.0-local.jar \
    ValidateVariants \
    -V /oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_snpsONLY \
    --warn-on-errors \

    ##Validate Indels
    java -jar /opt/biotools/GenomeAnalysisTK/4.0.11.0/gatk-package-4.0.11.0-local.jar \
    ValidateVariants \
    -V /oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_indelsONLY \
    --warn-on-errors

    Here are summaries from the err file

    For snp command:

    08:22:14.804 INFO ProgressMeter - Starting traversal
    08:22:14.804 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    08:22:19.795 INFO ValidateVariants - No variants filtered by: AllowAllVariantsVariantFilter
    08:22:20.230 INFO ProgressMeter - Pf3D7_14_v3:3286165 0.1 324699 3902632.2
    08:22:20.231 INFO ProgressMeter - Traversal complete. Processed 324699 total variants in 0.1 minutes.
    08:22:20.231 INFO ValidateVariants - Shutting down engine

    For indel command:

    08:22:25.192 INFO ValidateVariants - Done initializing engine
    08:22:25.192 INFO ProgressMeter - Starting traversal
    08:22:25.193 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    08:22:28.799 INFO ValidateVariants - No variants filtered by: AllowAllVariantsVariantFilter
    08:22:28.801 INFO ProgressMeter - Pf3D7_14_v3:3251639 0.1 158504 2637337.8
    08:22:28.801 INFO ProgressMeter - Traversal complete. Processed 158504 total variants in 0.1 minutes.
    08:22:28.801 INFO ValidateVariants - Shutting down engine

    Thank you in advance for your help!!
    Erin

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Erin C, here my comments on those points:

    a) JEXL expressions can be tricky. I can't comment on why it worked when you commented out QUAL without seeing the specific command you tried, your system, and your files. You can look into it more on your end but most importantly, I would recommend manually checking that your VariantFiltration command properly filtered the variants as you intended. You may not get errors, but could have written the command in a way that did not work. Glad to hear that you found the source of the error with QD! We also have some handy documentation here

    b) To get the best usage of GATK, we recommend that you submit commands using the GATK wrapper script. I would also recommend updating your version of GATK, since you are using below 4.1. GATK has changed quite a bit!

    c) ValidateVariants does not do any filtering. It provides a detailed error message of any problems in your file, using the verbose mode. You can follow this tutorial to learn more (https://gatk.broadinstitute.org/hc/en-us/articles/360035891231). However, I don't believe your files are causing the errors you saw, looks like it was just a problem with the JEXL expression. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk