VariantFiltration issue
AnsweredHello,
I am running into issues with my VariantFiltration step that stops without completing. It appears to work just fine and is able to identify 5 snps in the output file (all from the first 6 rows of unfiltered snps), but then it doesn't filter any more of the remain data. The log says "Shutting down engine" and then lists a problem with Java "java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Boolean." I haven't been able to find any other posts about this particular issue. I did try running this with multiple versions (4.2.6.1, 4.2.2.0 and 4.1.4.1). I've copied here the log and command used. I've been following along this pipeline (https://github.com/lindsawi/HybSeq-SNP-Extraction).
Thanks so much for all your help!
a) GATK version used: 4.2.6.1, 4.2.2.0 and 4.1.4.1
b) Exact command used: VariantFiltration -R rorida_quinquenervia_supercontig_reference.fasta -V Rorida_quinquenervia.SNPall.vcf --filter-name hardfilter -O Rorida_quinquenervia.snp.filtered.vcf --filter-expression QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0
c) Entire program log:
Using GATK jar /weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar CombineGVCFs -R rorida_quinquenervia_supercontig_reference.fasta --variant samples.list --output Rorida_quinquenervia.cohort.g.vcf
12:50:08.108 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:50:08.405 INFO CombineGVCFs - ------------------------------------------------------------
12:50:08.405 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.2.6.1
12:50:08.405 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
12:50:08.408 INFO CombineGVCFs - Executing as theresa.saunders@cn136.mgmt.kamiak.wsu.edu on Linux v3.10.0-693.11.6.el7.x86_64 amd64
12:50:08.408 INFO CombineGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_92-b14
12:50:08.409 INFO CombineGVCFs - Start Date/Time: September 9, 2022 12:50:08 PM PDT
12:50:08.409 INFO CombineGVCFs - ------------------------------------------------------------
12:50:08.409 INFO CombineGVCFs - ------------------------------------------------------------
12:50:08.409 INFO CombineGVCFs - HTSJDK Version: 2.24.1
12:50:08.409 INFO CombineGVCFs - Picard Version: 2.27.1
12:50:08.409 INFO CombineGVCFs - Built for Spark Version: 2.4.5
12:50:08.409 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:50:08.409 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:50:08.410 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:50:08.410 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:50:08.410 INFO CombineGVCFs - Deflater: IntelDeflater
12:50:08.410 INFO CombineGVCFs - Inflater: IntelInflater
12:50:08.410 INFO CombineGVCFs - GCS max retries/reopens: 20
12:50:08.410 INFO CombineGVCFs - Requester pays: disabled
12:50:08.410 INFO CombineGVCFs - Initializing engine
12:50:09.016 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/158_Rorida_quinquenervia_%3Ffimbriata-g.vcf
12:50:09.086 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/159_Rorida_quinquenervia_%3Fnoeana-g.vcf
12:50:09.145 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/160_Rorida_quinquenervia_%3Fnoeana-g.vcf
12:50:09.235 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/161_Rorida_quinquenervia_%3Fnoeana_brachystyla-g.vcf
12:50:09.290 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/44_Rorida_quinquenervia_%3Fdolichostyla-g.vcf
12:50:09.317 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/96_Rorida_quinquenervia_%3Fdolichostyla-g.vcf
12:50:11.927 INFO CombineGVCFs - Done initializing engine
12:50:12.070 INFO ProgressMeter - Starting traversal
12:50:12.070 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:50:12.270 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location 4471_supercontig_158:130 the annotation MLEAC=[0, 0] was not a numerical value and was ignored
12:50:22.447 INFO ProgressMeter - 5257_supercontig_158:522 0.2 29000 167694.7
12:50:32.493 INFO ProgressMeter - 5454_supercontig_158:545 0.3 60000 176271.9
12:50:42.495 INFO ProgressMeter - 6226_supercontig_158:957 0.5 155000 305669.7
12:50:48.363 INFO ProgressMeter - 7602_supercontig_158:932 0.6 249914 413160.7
12:50:48.363 INFO ProgressMeter - Traversal complete. Processed 249914 total variants in 0.6 minutes.
12:50:48.483 INFO CombineGVCFs - Shutting down engine
[September 9, 2022 12:50:48 PM PDT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 0.67 minutes.
Runtime.totalMemory()=2675441664
Using GATK jar /weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenotypeGVCFs -R rorida_quinquenervia_supercontig_reference.fasta -V Rorida_quinquenervia.cohort.g.vcf -O Rorida_quinquenervia.cohort.unfiltered.vcf
12:50:56.573 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:50:57.110 INFO GenotypeGVCFs - ------------------------------------------------------------
12:50:57.111 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.6.1
12:50:57.111 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
12:50:57.140 INFO GenotypeGVCFs - Executing as theresa.saunders@cn136.mgmt.kamiak.wsu.edu on Linux v3.10.0-693.11.6.el7.x86_64 amd64
12:50:57.140 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_92-b14
12:50:57.140 INFO GenotypeGVCFs - Start Date/Time: September 9, 2022 12:50:56 PM PDT
12:50:57.140 INFO GenotypeGVCFs - ------------------------------------------------------------
12:50:57.140 INFO GenotypeGVCFs - ------------------------------------------------------------
12:50:57.141 INFO GenotypeGVCFs - HTSJDK Version: 2.24.1
12:50:57.141 INFO GenotypeGVCFs - Picard Version: 2.27.1
12:50:57.141 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
12:50:57.141 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:50:57.141 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:50:57.141 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:50:57.141 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:50:57.141 INFO GenotypeGVCFs - Deflater: IntelDeflater
12:50:57.141 INFO GenotypeGVCFs - Inflater: IntelInflater
12:50:57.141 INFO GenotypeGVCFs - GCS max retries/reopens: 20
12:50:57.141 INFO GenotypeGVCFs - Requester pays: disabled
12:50:57.141 INFO GenotypeGVCFs - Initializing engine
12:50:57.727 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/Rorida_quinquenervia.cohort.g.vcf
12:50:57.925 INFO GenotypeGVCFs - Done initializing engine
12:50:58.029 INFO ProgressMeter - Starting traversal
12:50:58.029 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:50:58.382 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position 4471_supercontig_158:146 and possibly subsequent; at least 10 samples must have called genotypes
12:51:08.034 INFO ProgressMeter - 6265_supercontig_158:1253 0.2 106000 635745.7
12:51:09.608 INFO ProgressMeter - 7628_supercontig_158:690 0.2 168073 870919.8
12:51:09.608 INFO ProgressMeter - Traversal complete. Processed 168073 total variants in 0.2 minutes.
12:51:09.616 INFO GenotypeGVCFs - Shutting down engine
[September 9, 2022 12:51:09 PM PDT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.22 minutes.
Runtime.totalMemory()=2960654336
Using GATK jar /weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar SelectVariants -V Rorida_quinquenervia.cohort.unfiltered.vcf -R rorida_quinquenervia_supercontig_reference.fasta -select-type-to-include SNP -O Rorida_quinquenervia.SNPall.vcf
12:51:12.214 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:51:12.358 INFO SelectVariants - ------------------------------------------------------------
12:51:12.358 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.2.6.1
12:51:12.358 INFO SelectVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
12:51:12.359 INFO SelectVariants - Executing as theresa.saunders@cn136 on Linux v3.10.0-693.11.6.el7.x86_64 amd64
12:51:12.359 INFO SelectVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_92-b14
12:51:12.359 INFO SelectVariants - Start Date/Time: September 9, 2022 12:51:12 PM PDT
12:51:12.359 INFO SelectVariants - ------------------------------------------------------------
12:51:12.359 INFO SelectVariants - ------------------------------------------------------------
12:51:12.359 INFO SelectVariants - HTSJDK Version: 2.24.1
12:51:12.359 INFO SelectVariants - Picard Version: 2.27.1
12:51:12.359 INFO SelectVariants - Built for Spark Version: 2.4.5
12:51:12.359 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:51:12.360 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:51:12.360 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:51:12.360 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:51:12.360 INFO SelectVariants - Deflater: IntelDeflater
12:51:12.360 INFO SelectVariants - Inflater: IntelInflater
12:51:12.360 INFO SelectVariants - GCS max retries/reopens: 20
12:51:12.360 INFO SelectVariants - Requester pays: disabled
12:51:12.360 INFO SelectVariants - Initializing engine
12:51:12.805 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/Rorida_quinquenervia.cohort.unfiltered.vcf
12:51:12.844 INFO SelectVariants - Done initializing engine
12:51:12.881 INFO ProgressMeter - Starting traversal
12:51:12.882 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:51:13.157 INFO ProgressMeter - 6933_supercontig_158:481 0.0 9808 2147737.2
12:51:13.157 INFO ProgressMeter - Traversal complete. Processed 9808 total variants in 0.0 minutes.
12:51:13.163 INFO SelectVariants - Shutting down engine
[September 9, 2022 12:51:13 PM PDT] org.broadinstitute.hellbender.tools.walkers.variantutils.SelectVariants done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2140143616
real 0m3.679s
user 0m9.330s
sys 0m0.801s
Using GATK jar /weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar VariantFiltration -R rorida_quinquenervia_supercontig_reference.fasta -V Rorida_quinquenervia.SNPall.vcf --filter-name hardfilter -O Rorida_quinquenervia.snp.filtered.vcf --filter-expression QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0
12:51:15.887 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/weka/apps/gatk/4.2.6.1/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
12:51:16.030 INFO VariantFiltration - ------------------------------------------------------------
12:51:16.031 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.2.6.1
12:51:16.031 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
12:51:16.031 INFO VariantFiltration - Executing as theresa.saunders@cn136 on Linux v3.10.0-693.11.6.el7.x86_64 amd64
12:51:16.031 INFO VariantFiltration - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_92-b14
12:51:16.031 INFO VariantFiltration - Start Date/Time: September 9, 2022 12:51:15 PM PDT
12:51:16.031 INFO VariantFiltration - ------------------------------------------------------------
12:51:16.031 INFO VariantFiltration - ------------------------------------------------------------
12:51:16.032 INFO VariantFiltration - HTSJDK Version: 2.24.1
12:51:16.032 INFO VariantFiltration - Picard Version: 2.27.1
12:51:16.032 INFO VariantFiltration - Built for Spark Version: 2.4.5
12:51:16.032 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:51:16.032 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:51:16.032 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:51:16.032 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:51:16.032 INFO VariantFiltration - Deflater: IntelDeflater
12:51:16.032 INFO VariantFiltration - Inflater: IntelInflater
12:51:16.032 INFO VariantFiltration - GCS max retries/reopens: 20
12:51:16.032 INFO VariantFiltration - Requester pays: disabled
12:51:16.032 INFO VariantFiltration - Initializing engine
12:51:16.482 INFO FeatureManager - Using codec VCFCodec to read file file:///weka/data/lab/roalson/cleomaceae/slimp_snps/round_1_and_2_and_kew_cola_filtered_brassicales_filtered_paralogs_min_and_filtering/rorida_quinquenervia/Rorida_quinquenervia.SNPall.vcf
12:51:16.525 INFO VariantFiltration - Done initializing engine
12:51:16.592 INFO ProgressMeter - Starting traversal
12:51:16.592 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:51:16.613 WARN JexlEngine - ![35,44]: 'QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0;' undefined variable MQRankSum
12:51:16.615 WARN JexlEngine - ![35,44]: 'QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0;' undefined variable MQRankSum
12:51:16.616 WARN JexlEngine - ![35,44]: 'QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0;' undefined variable MQRankSum
12:51:16.616 WARN JexlEngine - ![35,44]: 'QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0;' undefined variable MQRankSum
12:51:16.616 WARN JexlEngine - ![35,44]: 'QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0;' undefined variable MQRankSum
12:51:16.621 INFO VariantFiltration - Shutting down engine
[September 9, 2022 12:51:16 PM PDT] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2135949312
java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Boolean
at htsjdk.variant.variantcontext.JEXLMap.evaluateExpression(JEXLMap.java:186)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:95)
at htsjdk.variant.variantcontext.JEXLMap.get(JEXLMap.java:15)
at htsjdk.variant.variantcontext.VariantContextUtils.match(VariantContextUtils.java:338)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.matchesFilter(VariantFiltration.java:452)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.filter(VariantFiltration.java:406)
at org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration.apply(VariantFiltration.java:353)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
.
-
Update: The problem seems to somehow be tied to the input file for the VariantFiltration step. In my case, it is Rorida_quinquenervia.SNPall.vcf. The VariantFiltration fails as soon as it come to a SNP in this file with any value for ReadPosRankSum= in the INFO column. The value doesn't matter, even if it should pass the filter I have set for ReadPosRankSum. If I go in and manually delete "ReadPosRankSum=*" from the input file, then the VariantFiltration step can continue. For example, I've copied the first 6 rows of SNPs from my input file. SNPs at position 146, 166, 190, 236, and 269 all pass and are included in the output file; however, the SNP at position 415 does not show up, nor do any after this position. If I manually go in and edit the input file and delete the phrase "ReadPosRankSum=1.26;" then this SNP passes and is included in the output file. The SNP at position 415 should have passed anyway, since it has a ReadPosRankSum greater than -8. I'm not sure what is going wrong, but I would appreciate any help! Thanks so much!
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 158_Rorida_quinquenervia_?fimbriata 159_Rorida_quinquenervia_?noeana 160_Rorida_quinquenervia_?noeana 161_Rorida_quinquenervia_?noeana_brachystyla 44_Rorida_quinquenervia_?dolichostyla 96_Rorida_quinquenervia_?dolichostyla
4471_supercontig_158 146 . A C 357.82 . AC=6;AF=0.600;AN=10;DP=39;ExcessHet=0.0000;FS=0.000;MLEAC=5;MLEAF=0.500;MQ=60.00;QD=25.36;SOR=4.615 GT:AD:DP:GQ:PL 0/0:25,0:25:69:0,69,1035 1/1:0,4:4:12:165,12,0 0/0:5,0:5:15:0,15,205 ./.:0,0:0:0:0,0,0 1/1:0,3:3:9:128,9,0 1/1:0,2:2:6:85,6,0
4471_supercontig_158 166 . G A 1225.88 . AC=8;AF=0.800;AN=10;DP=69;ExcessHet=0.0000;FS=0.000;MLEAC=9;MLEAF=0.900;MQ=60.00;QD=28.73;SOR=3.545 GT:AD:DP:GQ:PL 0/0:35,0:35:99:0,99,1485 1/1:0,11:11:33:411,33,0 1/1:0,10:10:30:379,30,0 ./.:0,0:0:0:0,0,0 1/1:0,6:6:18:250,18,0 1/1:0,5:5:15:196,15,0
4471_supercontig_158 190 . T A 1753.4 . AC=6;AF=0.500;AN=12;DP=95;ExcessHet=0.0000;FS=0.000;MLEAC=6;MLEAF=0.500;MQ=60.00;QD=27.24;SOR=2.093 GT:AD:DP:GQ:PGT:PID:PL:PS 0/0:35,0:35:99:.:.:0,99,1485 1|1:0,22:22:66:1|1:182_A_AT:929,66,0:182 0/0:17,0:17:51:.:.:0,51,580 0/0:1,0:1:3:.:.:0,3,42 1|1:0,15:15:45:1|1:182_A_AT:622,45,0:182 1|1:0,5:5:15:1|1:182_A_AT:225,15,0:182
4471_supercontig_158 236 . G T 3595.17 . AC=10;AF=0.833;AN=12;DP=159;ExcessHet=0.0000;FS=0.000;MLEAC=10;MLEAF=0.833;MQ=60.00;QD=29.47;SOR=1.071 GT:AD:DP:GQ:PL 0/0:35,0:35:99:0,99,1485 1/1:0,39:39:99:1168,117,0 1/1:0,33:33:99:940,99,0 1/1:0,2:2:6:49,6,0 1/1:0,33:33:99:963,99,0 1/1:0,15:15:45:484,45,0
4471_supercontig_158 269 . A G 4466.17 . AC=10;AF=0.833;AN=12;DP=193;ExcessHet=0.0000;FS=0.000;MLEAC=10;MLEAF=0.833;MQ=60.00;QD=28.63;SOR=0.887 GT:AD:DP:GQ:PL 0/0:35,0:35:99:0,99,1485 1/1:0,42:42:99:1183,126,0 1/1:0,44:44:99:1344,132,0 1/1:0,2:2:6:49,6,0 1/1:0,47:47:99:1271,141,0 1/1:0,21:21:63:628,63,0
4471_supercontig_158 415 . A G 88.74 . AC=1;AF=0.100;AN=10;BaseQRankSum=-2.326e+00;DP=97;ExcessHet=0.0000;FS=2.218;MLEAC=1;MLEAF=0.100;MQ=60.00;MQRankSum=0.00;QD=5.22;ReadPosRankSum=1.26;SOR=0.180 GT:AD:DP:GQ:PL 0/0:35,0:35:99:0,99,1485 0/1:11,6:17:97:97,0,347 0/0:29,0:29:78:0,78,1170 ./.:0,0:0:0:0,0,0 0/0:8,0:8:0:0,0,87 0/0:8,0:8:24:0,24,306 -
Well, haha I figured out where everything was going wrong, and it was a really simple fix. In my original command, I just had to replace the & with ||.
Original:
gatk VariantFiltration -R rorida_quinquenervia_supercontig_reference.fasta -V Rorida_quinquenervia.SNPall.vcf --filter-name "hardfilter" -O Rorida_quinquenervia.snp.filtered.vcf --filterExpression "QD < 5.0 & FS > 60.0 & MQ < 40.0 & MQRankSum < -12.5 & ReadPosRankSum < -8.0"
Fixed:
gatk VariantFiltration -R rorida_quinquenervia_supercontig_reference.fasta -V Rorida_quinquenervia.SNPall.vcf --filter-name "hardfilter" -O Rorida_quinquenervia.snp.filtered.vcf --filterExpression "QD < 5.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"
Hopefully this helps someone else eventually :)
-
Hi Theresa Saunders,
Thank you for writing to the GATK forum! I’m happy to hear that you were able to identify and fix this issue.
We appreciate the time and effort you took to post in our forum. As you said, we hope it will help others encountering the same problem in the future.
Thank you for being a vital part of the GATK community! If any other issue should arise, please do not hesitate to reach out again.
Best,
Anthony
Please sign in to leave a comment.
3 comments