NumberFormatException Error in VariantFiltration
Hi,
Thanks in advance for your help. We have joint genotyped 18 samples, using HC in ERC mode, followed by CombineGVCFs, GenotypeGVCFs, then separated snps and indels using SelectVariants to generate our input files for VariantFiltration (AMAMBUA18_GT2_raw.snps.indels.vcf_snpsONLY, and AMAMBUA18_GT2_raw.snps.indels.vcf_indelsONLY).
I am getting the following error message from the VariantFiltration tool. I believe the issue is coming from the QUAL filter since commenting out the QUAL filter expression produces filtered vcf files without the error, but I cannot figure out how to fix the QUAL filter. Initially, the error was due to using "QUAL < 500" rather than "QUAL < 500.0" but I have updated the expression as seen below and am still getting the NumberFormatException error.
a) GATK version used: 4.0.11.0
b) Exact command used:
##Filtering SNPs
java -jar /opt/biotools/GenomeAnalysisTK/4.0.11.0/gatk-package-4.0.11.0-local.jar \
VariantFiltration \
-R /home/ecoonahan/platypus_tscc/references/p_fal.fasta \
-V /oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_snpsONLY \
-O /oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_FILTERED8snpsONLY \
--filter-name "LowMQ" \
--filter-expression "MQ < 60.0" \
--filter-name "LowRPRS" \
--filter-expression "ReadPosRankSum < -8.0" \
--filter-name "LowQual" \
--filter-expression "QUAL < 500.0" \
--filter-name "LowQD" \
--filter-expression "QD < 2" \
--filter-name "highSOR" \
--filter-expression "SOR > 4.0" \
--genotype-filter-name "LowFormatDP" \
--genotype-filter-expression "DP < 7" \
--filter-name "highFS" \
--filter-expression "FS > 60.0" \
--filter-name "lowMQRankSum" \
--filter-expression "MQRankSum < -12.5"
##Filtering Indels
java -jar /opt/biotools/GenomeAnalysisTK/4.0.11.0/gatk-package-4.0.11.0-local.jar \
VariantFiltration \
-R /home/ecoonahan/platypus_tscc/references/p_fal.fasta \
-V /oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_indelsONLY \
-O /oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_FILTERED11indelsONLY \
--filter-name "LowRPRS" \
--filter-expression "ReadPosRankSum < -20.0" \
--filter-name "LowQUAL" \
--filter-expression "QUAL < 500.0" \
--filter-name "LowQD" \
--filter-expression "QD < 2" \
--genotype-filter-name "LowFormatDP" \
--genotype-filter-expression "DP < 7" \
--filter-name "highFS" \
--filter-expression "FS > 200.0"
c) Entire error log:
16:52:14.536 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/biotools/GenomeAnalysisTK/4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:52:17.147 INFO VariantFiltration - ------------------------------------------------------------
16:52:17.147 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.0.11.0
16:52:17.147 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
16:52:17.148 INFO VariantFiltration - Executing as ecoonahan@tscc-4-54.sdsc.edu on Linux v3.10.0-1127.8.2.el7.x86_64 amd64
16:52:17.148 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_252-b09
16:52:17.148 INFO VariantFiltration - Start Date/Time: August 7, 2020 4:52:14 PM PDT
16:52:17.148 INFO VariantFiltration - ------------------------------------------------------------
16:52:17.148 INFO VariantFiltration - ------------------------------------------------------------
16:52:17.149 INFO VariantFiltration - HTSJDK Version: 2.16.1
16:52:17.149 INFO VariantFiltration - Picard Version: 2.18.13
16:52:17.149 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:52:17.149 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:52:17.149 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:52:17.149 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:52:17.150 INFO VariantFiltration - Deflater: IntelDeflater
16:52:17.150 INFO VariantFiltration - Inflater: IntelInflater
16:52:17.150 INFO VariantFiltration - GCS max retries/reopens: 20
16:52:17.150 INFO VariantFiltration - Requester pays: disabled
16:52:17.150 INFO VariantFiltration - Initializing engine
16:52:17.636 INFO FeatureManager - Using codec VCFCodec to read file file:///oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_snpsONLY
16:52:17.740 INFO VariantFiltration - Done initializing engine
16:52:17.831 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "vcf_FILTERED8snpsONLY". Defaulting to VCF.
16:52:18.072 INFO ProgressMeter - Starting traversal
16:52:18.072 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:52:18.108 INFO VariantFiltration - Shutting down engine
[August 7, 2020 4:52:18 PM PDT] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=2198339584
java.lang.NumberFormatException: For input string: "11.22"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.commons.jexl2.JexlArithmetic.toLong(JexlArithmetic.java:906)
at org.apache.commons.jexl2.JexlArithmetic.compare(JexlArithmetic.java:718)
at org.apache.commons.jexl2.JexlArithmetic.lessThan(JexlArithmetic.java:774)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:967)
at org.apache.commons.jexl2.parser.ASTLTNode.jjtAccept(ASTLTNode.java:18)
at org.apache.commons.jexl2.Interpreter.interpret(Interpreter.java:232)
at org.apache.commons.jexl2.ExpressionImpl.evaluate(ExpressionImpl.java:65)
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:186)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:95)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:15)
at htsjdk.variant.variantcontext.VariantContextUtils.match(VariantContextUtils.java:338)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.matchesFilter(VariantFiltration.java:379)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:338)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.apply(VariantFiltration.java:298)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:153)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:151)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
16:52:20.115 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/biotools/GenomeAnalysisTK/4.0.11.0/gatk-package-4.0.11.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:52:21.758 INFO VariantFiltration - ------------------------------------------------------------
16:52:21.758 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.0.11.0
16:52:21.758 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
16:52:21.759 INFO VariantFiltration - Executing as ecoonahan@tscc-4-54.sdsc.edu on Linux v3.10.0-1127.8.2.el7.x86_64 amd64
16:52:21.759 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_252-b09
16:52:21.759 INFO VariantFiltration - Start Date/Time: August 7, 2020 4:52:20 PM PDT
16:52:21.759 INFO VariantFiltration - ------------------------------------------------------------
16:52:21.759 INFO VariantFiltration - ------------------------------------------------------------
16:52:21.760 INFO VariantFiltration - HTSJDK Version: 2.16.1
16:52:21.760 INFO VariantFiltration - Picard Version: 2.18.13
16:52:21.760 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:52:21.760 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:52:21.760 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:52:21.760 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:52:21.761 INFO VariantFiltration - Deflater: IntelDeflater
16:52:21.761 INFO VariantFiltration - Inflater: IntelInflater
16:52:21.761 INFO VariantFiltration - GCS max retries/reopens: 20
16:52:21.761 INFO VariantFiltration - Requester pays: disabled
16:52:21.761 INFO VariantFiltration - Initializing engine
16:52:22.219 INFO FeatureManager - Using codec VCFCodec to read file file:///oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_indelsONLY
16:52:22.264 INFO VariantFiltration - Done initializing engine
16:52:22.312 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "vcf_FILTERED11indelsONLY". Defaulting to VCF.
16:52:22.530 INFO ProgressMeter - Starting traversal
16:52:22.531 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
16:52:22.569 INFO VariantFiltration - Shutting down engine
[August 7, 2020 4:52:22 PM PDT] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=2227699712
java.lang.NumberFormatException: For input string: "17.26"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.commons.jexl2.JexlArithmetic.toLong(JexlArithmetic.java:906)
at org.apache.commons.jexl2.JexlArithmetic.compare(JexlArithmetic.java:718)
at org.apache.commons.jexl2.JexlArithmetic.lessThan(JexlArithmetic.java:774)
at org.apache.commons.jexl2.Interpreter.visit(Interpreter.java:967)
at org.apache.commons.jexl2.parser.ASTLTNode.jjtAccept(ASTLTNode.java:18)
at org.apache.commons.jexl2.Interpreter.interpret(Interpreter.java:232)
at org.apache.commons.jexl2.ExpressionImpl.evaluate(ExpressionImpl.java:65)
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:186)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:95)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:15)
at htsjdk.variant.variantcontext.VariantContextUtils.match(VariantContextUtils.java:338)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.matchesFilter(VariantFiltration.java:379)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:338)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.apply(VariantFiltration.java:298)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:153)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:151)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:966)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Choose a category for your question:
a)How do I fix this?
Thank you!
Erin
-
Hi Erin C, the whole line of the first error message is this:
java.lang.NumberFormatException: For input string: "11.22"
And for the second command, it is this:
java.lang.NumberFormatException: For input string: "17.26"
It looks like there may be a problem with the filters looking for a number and finding a string value. Could you find the variants that have this problem and post them here? Also, have you used the same version of GATK for your whole analysis? Can you also validate this VCF input with our ValidateVariants tool?
-
Dear Genevieve,
Thank you!
a) Could you find the variants that have this problem and post them here?
I've copied the variants with "11.22" from the first command and "17.26" from the second command here. There are several for each - is there a way to tell which exact variant is causing the issue? 11.22 is found on several QUAL and QD values. 17.26 is found in QD values. This made me realize that maybe the issue is with my QD filter. So, I changed the QD filter expression from QD < 2 to QD < 2.0. This resolved the error message I was getting. However, I am confused about why previously commenting out the QUAL filter expression also resolved the error message while the QD filter still contained an integer...so I want to make sure I am not missing something.
first command, variants from input file including "11.22" copied here:
297189:Pf3D7_13_v3 2885878 . G A 3647.45 . AC=14;AF=0.467;AN=30;BaseQRankSum=2.58;DP=475;ExcessHet=10.1934;FS=20.472;InbreedingCoeff=-0.4274;MLEAC=14;MLEAF=0.467;MQ=39.35;MQRankSum=-1.583e+00;QD=11.22;ReadPosRankSum=-2.380e-01;SOR=0.355 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,18:18:54:.:.:580,54,0 0/1:2,13:15:45:.:.:393,0,45 0/1:5,7:12:99:.:.:140,0,114 ./.:0,0:0:.:.:.:0,0,0 0/0:6,0:6:18:.:.:0,18,160 ./.:1,0:1:.:.:.:0,0,0 0/1:1,4:5:9:.:.:165,0,9 0/0:4,0:4:0:.:.:0,0,55 0/1:24,14:38:99:.:.:354,0,857 0/1:54,7:61:25:.:.:25,0,1857 0/1:13,10:23:99:0|1:2885878_G_A:302,0,451 0/0:13,0:13:36:.:.:0,36,540 0/1:21,13:34:99:0|1:2885878_G_A:265,0,742 ./.:6,0:6:.:.:.:0,0,0 0/1:66,10:76:75:.:.:75,0,1993 1/1:2,14:16:4:.:.:583,4,0 0/1:1,3:4:33:0|1:2885878_G_A:123,0,33 0/1:3,20:23:37:.:.:710,0,37
299664:Pf3D7_13_v3 2915063 . T C 673.35 . AC=3;AF=0.083;AN=36;BaseQRankSum=-1.835e+00;DP=324;ExcessHet=3.7667;FS=1.036;InbreedingCoeff=-0.1264;MLEAC=3;MLEAF=0.083;MQ=36.70;MQRankSum=2.03;QD=11.22;ReadPosRankSum=-1.276e+00;SOR=0.490 GT:AD:DP:GQ:PGT:PID:PL 0/0:13,0:13:11:.:.:0,11,357 0/0:20,0:20:18:.:.:0,18,487 0/1:13,2:15:35:0|1:2915063_T_C:35,0,442 0/0:7,0:7:21:.:.:0,21,188 0/0:2,0:2:6:.:.:0,6,53 0/0:5,0:5:15:.:.:0,15,126 0/0:5,0:5:12:.:.:0,12,180 0/0:13,0:13:12:.:.:0,12,180 0/0:14,0:14:42:.:.:0,42,432 0/1:9,4:13:99:0|1:2915063_T_C:132,0,326 0/0:64,0:64:63:.:.:0,63,1616 0/0:20,0:20:48:.:.:0,48,720 0/0:30,0:30:0:.:.:0,0,695 0/0:13,0:13:27:.:.:0,27,4050/1:15,17:32:99:0|1:2915063_T_C:558,0,580 0/0:6,0:6:15:.:.:0,15,225 0/0:19,0:19:12:.:.:0,12,475 0/0:32,0:32:93:.:.:0,93,906
300879:Pf3D7_14_v3 2747 . A T 11.22 . AC=2;AF=0.091;AN=22;DP=113;ExcessHet=0.1296;FS=0.000;InbreedingCoeff=0.2811;MLEAC=1;MLEAF=0.045;MQ=35.61;QD=11.22;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,1:1:3:1|1:2719_A_G:45,3,0 0/0:11,0:11:0:.:.:0,0,127 ./.:0,0:0:.:.:.:0,0,0 0/0:4,0:4:12:.:.:0,12,111 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 ./.:0,0:0:.:.:.:0,0,0 ./.:7,0:7:.:.:.:0,0,0 0/0:4,0:4:12:.:.:0,12,113 0/0:13,0:13:11:.:.:0,11,358 0/0:25,0:25:69:.:.:0,69,1035 0/0:10,0:10:30:.:.:0,30,292 0/0:3,0:3:0:.:.:0,0,35 0/0:7,0:7:6:.:.:0,6,90 0/0:1,0:1:3:.:.:0,3,26 ./.:9,0:9:.:.:.:0,0,0 ./.:9,0:9:.:.:.:0,0,0 0/0:7,0:7:21:.:.:0,21,198
305336:Pf3D7_14_v3 228413 . A T 1313.31 . AC=6;AF=0.176;AN=34;BaseQRankSum=-1.043e+00;DP=414;ExcessHet=0.0955;FS=0.000;InbreedingCoeff=0.5376;MLEAC=6;MLEAF=0.176;MQ=59.94;MQRankSum=0.00;QD=11.22;ReadPosRankSum=0.070;SOR=0.706 GT:AD:DP:GQ:PGT:PID:PL 0/0:28,0:28:53:.:.:0,53,829 0/0:13,0:13:36:.:.:0,36,540 1/1:0,8:8:24:1|1:228391_C_T:360,24,0 0/0:13,0:13:39:.:.:0,39,376 ./.:0,0:0:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,28 0/0:14,0:14:39:.:.:0,39,5851/1:0,8:8:27:1|1:228391_C_T:393,27,0 0/0:36,0:36:99:.:.:0,99,1177 0/0:33,0:33:84:.:.:0,84,1260 0/0:22,0:22:57:.:.:0,57,855 0/0:4,0:4:12:.:.:0,12,119 0/0:37,0:37:99:.:.:0,99,1312 0/1:15,9:24:99:0|1:228391_C_T:333,0,608 0/0:30,0:30:81:.:.:0,81,1215 0/0:27,0:27:72:.:.:0,72,1080 0/0:39,0:39:99:.:.:0,99,1077 0/1:65,12:77:99:0|1:228391_C_T:305,0,2735
306967:Pf3D7_14_v3 521885 . T A 2143.23 . AC=8;AF=0.250;AN=32;BaseQRankSum=-3.165e+00;DP=371;ExcessHet=14.1891;FS=72.419;InbreedingCoeff=-0.5334;MLEAC=11;MLEAF=0.344;MQ=50.62;MQRankSum=0.620;QD=11.22;ReadPosRankSum=1.99;SOR=3.648 GT:AD:DP:GQ:PGT:PID:PL 0/0:27,0:27:0:.:.:0,0,229 0/0:13,0:13:0:.:.:0,0,207 0/0:7,0:7:24:.:.:0,24,333 0/0:8,0:8:5:.:.:0,5,184 ./.:1,0:1:.:.:.:0,0,0 ./.:9,0:9:.:.:.:0,0,0 0/0:7,0:7:0:.:.:0,0,151 0/0:11,0:11:0:.:.:0,0,157 0/1:18,14:32:99:0|1:521885_T_A:364,0,715 0/1:11,9:20:99:0|1:521885_T_A:265,0,455 0/1:13,15:28:99:0|1:521885_T_A:430,0,571 0/1:14,6:20:99:0|1:521885_T_A:208,0,1281 0/0:39,0:39:0:.:.:0,0,474 0/1:7,5:12:99:0|1:521885_T_A:175,0,263 0/1:14,13:27:99:0|1:521885_T_A:330,0,592 0/1:14,9:23:99:0|1:521885_T_A:166,0,567 0/0:32,0:32:2:.:.:0,2,772 0/1:14,15:29:99:0|1:521885_T_A:258,0,388
308570:Pf3D7_14_v3 851672 . T A 2749.89 . AC=4;AF=0.111;AN=36;BaseQRankSum=0.812;DP=732;ExcessHet=3.8134;FS=3.526;InbreedingCoeff=-0.1321;MLEAC=4;MLEAF=0.111;MQ=60.00;MQRankSum=0.00;QD=11.22;ReadPosRankSum=0.160;SOR=0.616 GT:AD:DP:GQ:PL 0/1:35,49:84:99:1342,0,893 0/0:38,0:38:99:0,99,1361 0/1:14,20:34:99:530,0,335 0/0:34,0:34:90:0,90,1350 0/0:2,0:2:6:0,6,53 0/0:13,0:13:33:0,33,495 0/0:37,0:37:99:0,99,1165 0/0:32,0:32:87:0,87,1305 0/0:49,0:49:99:0,114,1621 0/0:35,0:35:99:0,103,1123 0/0:38,0:38:99:0,99,1126 0/0:34,0:34:99:0,102,1098 0/0:46,0:46:99:0,110,1305 0/1:29,22:51:99:513,0,753 0/0:44,0:44:99:0,102,1216 0/0:36,0:36:99:0,99,1069 0/0:41,0:41:99:0,103,1277 0/1:54,22:76:99:424,0,1375
314019:Pf3D7_14_v3 1873384 . T G 863.73 . AC=1;AF=0.028;AN=36;BaseQRankSum=-5.789e+00;DP=710;ExcessHet=3.0103;FS=4.404;InbreedingCoeff=-0.0319;MLEAC=1;MLEAF=0.028;MQ=60.00;MQRankSum=0.00;QD=11.22;ReadPosRankSum=0.168;SOR=0.527 GT:AD:DP:GQ:PL 0/0:46,0:46:99:0,105,1575 0/0:41,0:41:99:0,102,1192 0/0:36,0:36:90:0,90,1350 0/0:29,0:29:81:0,81,1215 0/0:3,0:3:9:0,9,74 0/0:10,0:10:30:0,30,293 0/0:35,0:35:90:0,90,1350 0/0:30,0:30:81:0,81,1215 0/0:49,0:49:99:0,101,1271 0/1:37,40:77:99:900,0,937 0/0:39,0:39:99:0,99,1262 0/0:39,0:39:99:0,99,1452 0/0:43,0:43:99:0,111,1191 0/0:35,0:35:87:0,87,1305 0/0:47,0:47:99:0,105,1345 0/0:45,0:45:99:0,120,1800 0/0:46,0:46:99:0,99,1234 0/0:59,0:59:99:0,101,1800
320013:Pf3D7_14_v3 2987600 . G T 347.93 . AC=1;AF=0.029;AN=34;BaseQRankSum=3.28;DP=576;ExcessHet=3.0103;FS=3.256;InbreedingCoeff=-0.0308;MLEAC=1;MLEAF=0.029;MQ=60.00;MQRankSum=0.00;QD=11.22;ReadPosRankSum=0.397;SOR=1.473 GT:AD:DP:GQ:PL 0/0:38,0:38:99:0,99,1055 0/1:16,15:31:99:384,0,365 0/0:26,0:26:72:0,72,1080 0/0:20,0:20:60:0,60,596 ./.:0,0:0:.:0,0,0 0/0:6,0:6:18:0,18,160 0/0:35,0:35:84:0,84,1260 0/0:23,0:23:40:0,40,593 0/0:42,0:42:99:0,99,1267 0/0:43,0:43:99:0,102,1382 0/0:26,0:26:75:0,75,1125 0/0:36,0:36:99:0,99,1184 0/0:49,0:49:99:0,120,1800 0/0:30,0:30:81:0,81,939 0/0:40,0:40:99:0,100,1131 0/0:36,0:36:99:0,99,1163 0/0:38,0:38:96:0,96,1038 0/0:57,0:57:99:0,106,1529
323250:Pf3D7_14_v3 3278635 . A C 11.22 . AC=2;AF=0.067;AN=30;DP=143;ExcessHet=0.4742;FS=0.000;InbreedingCoeff=0.1202;MLEAC=1;MLEAF=0.033;MQ=23.00;QD=11.22;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL0/0:29,0:29:0:.:.:0,0,684 1/1:0,1:1:3:1|1:3278632_A_G:45,3,0 0/0:7,0:7:0:.:.:0,0,128 0/0:3,0:3:9:.:.:0,9,86 0/0:1,0:1:3:.:.:0,3,24 ./.:0,0:0:.:.:.:0,0,0 0/0:2,0:2:6:.:.:0,6,63 0/0:6,0:6:6:.:.:0,6,90 0/0:7,0:7:21:.:.:0,21,222 0/0:5,0:5:15:.:.:0,15,156 0/0:14,0:14:0:.:.:0,0,322 ./.:3,0:3:.:.:.:0,0,0 0/0:3,0:3:9:.:.:0,9,90 0/0:6,0:6:0:.:.:0,0,67 0/0:44,0:44:90:.:.:0,90,1350 ./.:3,0:3:.:.:.:0,0,0 0/0:1,0:1:3:.:.:0,3,30 0/0:8,0:8:21:.:.:0,21,315for second command, variants from input file including "17.26" copied here:
150831:Pf3D7_14_v3 1768508 . TTATA T,TTATATA,TTA,TTATATATA 5090.42 . AC=7,4,8,3;AF=0.206,0.118,0.235,0.088;AN=34;BaseQRankSum=0.294;DP=528;ExcessHet=12.9095;FS=2.242;InbreedingCoeff=-0.2715;MLEAC=5,4,7,3;MLEAF=0.147,0.118,0.206,0.088;MQ=60.04;MQRankSum=0.126;QD=17.26;ReadPosRankSum=-1.480e-01;SOR=0.854 GT:AD:DP:GQ:PL 0/4:9,0,3,0,7:19:99:330,283,518,120,320,264,283,518,320,518,0,281,132,281,308 3/3:1,0,0,9,0:10:3:199,202,226,202,226,226,3,27,27,0,202,226,226,27,226 0/1:2,2,0,0,0:4:77:79,0,77,85,84,169,85,84,169,169,85,84,169,169,169 0/3:2,2,0,8,0:12:2:204,139,236,204,218,271,0,2,51,14,204,218,271,51,271 ./.:0,0,0,0,0:0:.:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 0/0:5,0,0,0,0:5:6:0,6,90,6,90,90,6,90,90,90,6,90,90,90,90 0/3:2,0,0,6,0:8:35:131,137,190,137,190,190,0,53,53,35,137,190,190,53,190 1/1:0,6,0,0,0:6:19:269,19,0,269,19,269,269,19,269,269,269,19,269,269,269 0/2:33,0,6,2,0:41:25:25,164,1029,0,764,726,119,1015,710,1133,164,1029,764,1015,1029 0/3:9,5,0,30,0:44:99:691,563,1008,712,934,1043,0,138,255,131,712,934,1043,255,1043 1/3:3,11,0,23,0:37:99:832,459,600,798,586,888,198,0,264,160,798,586,888,264,888 0/2:24,0,5,4,0:33:34:34,133,758,0,560,541,44,682,453,747,133,758,560,682,758 0/2:5,0,11,0,3:19:25:315,295,410,0,100,47,295,410,100,410,177,313,25,313,324 2/3:1,0,9,6,0:16:98:301,337,443,98,159,161,223,317,0,355,337,443,159,317,443 1/1:0,0,0,0,0:13:0:225,50,0,50,0,0,50,0,0,0,50,0,0,0,0 4/4:0,0,0,0,3:15:46:385,262,220,262,220,220,262,220,220,220,50,46,46,46,0 0/3:2,0,0,13,0:15:8:270,276,323,276,323,323,0,47,47,8,276,323,323,47,323 0/1:13,11,4,0,0:30:99:396,0,631,339,202,603,444,658,649,1096,444,658,649,1096,1096
155840:Pf3D7_14_v3 2826567 . TATAA T 120.83 . AC=2;AF=0.056;AN=36;DP=536;ExcessHet=0.0625;FS=0.000;InbreedingCoeff=0.8373;MLEAC=2;MLEAF=0.056;MQ=61.64;QD=17.26;SOR=4.174 GT:AD:DP:GQ:PL 0/0:38,0:38:99:0,102,1345 0/0:22,0:22:45:0,45,675 0/0:21,0:21:51:0,51,765 1/1:0,7:7:20:180,20,0 0/0:1,0:1:3:0,3,27 0/0:10,0:10:24:0,24,360 0/0:12,0:12:21:0,21,315 0/0:13,0:13:33:0,33,495 0/0:58,0:58:99:0,112,1695 0/0:41,0:41:99:0,99,1327 0/0:35,0:35:50:0,50,987 0/0:43,0:43:99:0,99,1341 0/0:41,0:41:99:0,102,1259 0/0:20,0:20:48:0,48,720 0/0:38,0:38:99:0,99,1485 0/0:36,0:36:99:0,99,1485 0/0:30,0:30:78:0,78,1170 0/0:64,0:64:99:0,113,1726B) Also, have you used the same version of GATK for your whole analysis?
Yes - I used GATK 4.0.11.0 for each step in the analysis.
C) Can you also validate this VCF input with our ValidateVariants tool?
I believe I validated the variants using the command copied below, but I am a little confused about the output of the tool - I assume if there are no error messages and none of the variants are filtered out, that means they are correctly formatted, but please let me know if I am missing something of if I should use more strict Validation criteria.
##Validate SNPs
java -jar /opt/biotools/GenomeAnalysisTK/4.0.11.0/gatk-package-4.0.11.0-local.jar \
ValidateVariants \
-V /oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_snpsONLY \
--warn-on-errors \
##Validate Indels
java -jar /opt/biotools/GenomeAnalysisTK/4.0.11.0/gatk-package-4.0.11.0-local.jar \
ValidateVariants \
-V /oasis/tscc/scratch/ecoonahan/gvcf/test/AMAMBUA18_GT2_raw.snps.indels.vcf_indelsONLY \
--warn-on-errorsHere are summaries from the err file
For snp command:
08:22:14.804 INFO ProgressMeter - Starting traversal
08:22:14.804 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
08:22:19.795 INFO ValidateVariants - No variants filtered by: AllowAllVariantsVariantFilter
08:22:20.230 INFO ProgressMeter - Pf3D7_14_v3:3286165 0.1 324699 3902632.2
08:22:20.231 INFO ProgressMeter - Traversal complete. Processed 324699 total variants in 0.1 minutes.
08:22:20.231 INFO ValidateVariants - Shutting down engineFor indel command:
08:22:25.192 INFO ValidateVariants - Done initializing engine
08:22:25.192 INFO ProgressMeter - Starting traversal
08:22:25.193 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
08:22:28.799 INFO ValidateVariants - No variants filtered by: AllowAllVariantsVariantFilter
08:22:28.801 INFO ProgressMeter - Pf3D7_14_v3:3251639 0.1 158504 2637337.8
08:22:28.801 INFO ProgressMeter - Traversal complete. Processed 158504 total variants in 0.1 minutes.
08:22:28.801 INFO ValidateVariants - Shutting down engineThank you in advance for your help!!
Erin -
Hi Erin C, here my comments on those points:
a) JEXL expressions can be tricky. I can't comment on why it worked when you commented out QUAL without seeing the specific command you tried, your system, and your files. You can look into it more on your end but most importantly, I would recommend manually checking that your VariantFiltration command properly filtered the variants as you intended. You may not get errors, but could have written the command in a way that did not work. Glad to hear that you found the source of the error with QD! We also have some handy documentation here.
b) To get the best usage of GATK, we recommend that you submit commands using the GATK wrapper script. I would also recommend updating your version of GATK, since you are using below 4.1. GATK has changed quite a bit!
c) ValidateVariants does not do any filtering. It provides a detailed error message of any problems in your file, using the verbose mode. You can follow this tutorial to learn more (https://gatk.broadinstitute.org/hc/en-us/articles/360035891231). However, I don't believe your files are causing the errors you saw, looks like it was just a problem with the JEXL expression.
Please sign in to leave a comment.
3 comments