GATK4: RNAseq short variant discovery (SNPs + Indels)
Hi,
I am using GATK4 (4.1.3) for "RNAseq short variant discovery (SNPs + Indels)" as given at GATK best practices workflows at: https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-
As suggested, I am doing the haplotype caller per sample, and my command is:
gatk --java-options "-Xmx128g" HaplotypeCaller \
-R ${REFDIR}/Homo_sapiens_assembly38.fasta \
-I ${ID}.Aligned.sorted.split.bqsr.bam \
-O ${ID}.Aligned.sorted.split.bqsr.vcf.gz
However, when I try to validate the variants with:
gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' ValidateVariants \
-R ${REFDIR}/Homo_sapiens_assembly38.fasta \
-V ${ID}.Aligned.sorted.split.bqsr.vcf.gz
I get the error:
[April 13, 2020 7:21:31 AM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.70 minutes.
Runtime.totalMemory()=3643801600
Using GATK jar /mnt/beegfs/v1/js/sw/js/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /mnt/beegfs/v1/js/sw/js/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar ValidateVariants -R /home/js/data/references/star_references/Homo_sapiens_assembly38.fasta -V TW27.Aligned.sorted.split.bqsr.vcf
07:21:44.635 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/beegfs/v1/js/sw/js/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 13, 2020 7:21:46 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
07:21:46.886 INFO ValidateVariants - ------------------------------------------------------------
07:21:46.888 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.3.0
07:21:46.889 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
07:21:46.891 INFO ValidateVariants - Executing as js@node01.local on Linux v3.10.0-1062.18.1.el7.x86_64 amd64
07:21:46.892 INFO ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_111-b14
07:21:46.894 INFO ValidateVariants - Start Date/Time: April 13, 2020 7:21:44 AM EDT
07:21:46.895 INFO ValidateVariants - ------------------------------------------------------------
07:21:46.896 INFO ValidateVariants - ------------------------------------------------------------
07:21:46.903 INFO ValidateVariants - HTSJDK Version: 2.20.1
07:21:46.904 INFO ValidateVariants - Picard Version: 2.20.5
07:21:46.905 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
07:21:46.907 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
07:21:46.908 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
07:21:46.909 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
07:21:46.910 INFO ValidateVariants - Deflater: IntelDeflater
07:21:46.911 INFO ValidateVariants - Inflater: IntelInflater
07:21:46.913 INFO ValidateVariants - GCS max retries/reopens: 20
07:21:46.914 INFO ValidateVariants - Requester pays: disabled
07:21:46.915 INFO ValidateVariants - Initializing engine
07:21:49.630 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/beegfs/v1/js/home/js/data/rnaseq-snp/bbarwick/SCLC/fastq.trimmed/all_vcf/TW27.Aligned.sorted.split.bqsr.vcf
07:21:49.999 INFO ValidateVariants - Done initializing engine
07:21:50.001 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
07:21:50.002 WARN ValidateVariants - Other possible validations will still be performed
07:21:50.003 INFO ProgressMeter - Starting traversal
07:21:50.005 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
07:22:00.059 INFO ProgressMeter - chr5:131343713 0.2 238000 1421885.9
07:22:09.171 INFO ValidateVariants - Shutting down engine
[April 13, 2020 7:22:09 AM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.41 minutes.
Runtime.totalMemory()=3191865344
***********************************************************************
A USER ERROR has occurred: Input TW27.Aligned.sorted.split.bqsr.vcf fails strict validation: one or more of the ALT allele(s) for the record at position chr11:6479079 are not observed at all in the sample genotypes of type:
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$FailsStrictValidation: Input TW27.Aligned.sorted.split.bqsr.vcf fails strict validation: one or more of the ALT allele(s) for the record at position chr11:6479079 are not observed at all in the sample genotypes of type:
at org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants.apply(ValidateVariants.java:265)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
Using GATK jar /mnt/beegfs/v1/js/sw/js/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /mnt/beegfs/v1/js/sw/js/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar ValidateVariants -R /home/js/data/references/star_references/Homo_sapiens_assembly38.fasta -V TOW0062.Aligned.sorted.split.bqsr.vcf
07:22:22.837 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/beegfs/v1/js/sw/js/Pkgs/GATK/4.1.3.0/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 13, 2020 7:22:25 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
This is the corresponding line in the vcf file:
[all_vcf] % grep 6479079 TW27.Aligned.sorted.split.bqsr.vcf
chr11 6479079 . A G 38.01 . AC=0;AF=0.00;AN=2;DP=48;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;SOR=1.022 GT:AD:DP:GQ:PL 0/0:42,5:47:33:0,33,988
-
Hi bsmith030465
I agree this is confusing and we are looking into fixing this. So thank you for bringing this up.
There are two things going on here:-
HaplotypeCaller is calling the site's genotype homozygous reference, but there is an alt allele. This should not happen. Can you please try to reproduce this error with the latest version of GATK and let us know if the issue persists?
-
ValidateVariants
is doing more strict validation than the VCF spec to try to catch logical errors in data processing. So we should probably add a flag to disable the extra strictness in ValidateVariants. We have created a issue ticket for it here: https://github.com/broadinstitute/gatk/issues/6553
-
-
Bhanu,
I will try the newer version of haplotype caller.
Meanwhile, is there a way that ValidateVariants makes a list of all variants that fail in a vcf file? Currently, it appears to exit as soon as one failed variant or error is encountered. Is there a flag I can set for this?
thanks!
-
Hi bsmith030465
Try the --warn-on-errors argument. This will emit warnings on errors instead of terminating the run at the first instance
Please sign in to leave a comment.
3 comments