Haplotypecaller produce unfinished vcf but no idx (no error)
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used:
The Genome Analysis Toolkit (GATK) v4.1.9.0
b) Exact command used:
gatk --java-options "-Xmx8G" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/A157.chr08.g.vcf -L chr08
c) Entire error log:
No error
Hi, I running GATK on each chromosome with the command above (-L chr$i) for potato genome (~800G), but I can not finish the haplotypecaller step. I also try the different java mem about '-Xmx10G'; '-Xmx30G', but still stuck in this step.
1) based on each sample bam, for some sample, I can finish 11/12 chr, for some sample I can finish 5/12 Chr, for some sample I can not finish even one Chr.
2) I also check my results, for unfished Chr, there only have sample.gvcf but no idx file, I used gatk ValidateSamFile.
No errors found
[Tue Mar 30 16:52:52 CST 2021] picard.sam.ValidateSamFile done. Elapsed time: 154.89 minutes.
Runtime.totalMemory()=1738014720
Tool returned:
3) below show my unfished Chr log file. please help me figure out this question.
16:09:43.946 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 30, 2021 4:09:54 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:09:54.285 INFO HaplotypeCaller - ------------------------------------------------------------
16:09:54.285 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
16:09:54.286 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
16:09:54.286 INFO HaplotypeCaller - Executing as chenglin@comput69 on Linux v2.6.32-504.el6.x86_64 amd64
16:09:54.286 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_251-b08
16:09:54.286 INFO HaplotypeCaller - Start Date/Time: March 30, 2021 4:09:43 PM CST
16:09:54.286 INFO HaplotypeCaller - ------------------------------------------------------------
16:09:54.286 INFO HaplotypeCaller - ------------------------------------------------------------
16:09:54.287 INFO HaplotypeCaller - HTSJDK Version: 2.23.0
16:09:54.287 INFO HaplotypeCaller - Picard Version: 2.23.3
16:09:54.287 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:09:54.287 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:09:54.288 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:09:54.288 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:09:54.288 INFO HaplotypeCaller - Deflater: IntelDeflater
16:09:54.288 INFO HaplotypeCaller - Inflater: IntelInflater
16:09:54.288 INFO HaplotypeCaller - GCS max retries/reopens: 20
16:09:54.288 INFO HaplotypeCaller - Requester pays: disabled
16:09:54.288 INFO HaplotypeCaller - Initializing engine
16:09:54.778 INFO IntervalArgumentCollection - Processing 69236331 bp from intervals
16:09:54.802 INFO HaplotypeCaller - Done initializing engine
16:09:54.804 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
16:09:54.811 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
16:09:54.811 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
16:09:54.854 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
16:09:54.856 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
16:09:54.920 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
16:09:54.921 INFO IntelPairHmm - Available threads: 1
16:09:54.922 INFO IntelPairHmm - Requested threads: 4
16:09:54.922 WARN IntelPairHmm - Using 1 available threads, but 4 were requested
16:09:54.922 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
16:09:55.105 INFO ProgressMeter - Starting traversal
16:09:55.106 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
16:09:58.783 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:09:58.783 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:09:58.785 WARN InbreedingCoeff - InbreedingCoeff will not be calculated; at least 10 samples must have called genotypes
16:09:58.793 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:09:58.803 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:00.338 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:13.644 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:13.645 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:13.657 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:13.658 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:13.659 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:13.660 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:13.668 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:13.669 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:14.563 INFO ProgressMeter - chr04:118087 0.3 400 1233.5
16:10:22.140 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.142 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.142 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.143 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.144 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.145 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.146 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.147 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.148 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.149 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.150 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.152 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.163 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.165 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:10:22.166 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
..........
...........
..........
16:30:23.900 INFO ProgressMeter - chr04:981690 20.5 5630 274.9
16:30:26.905 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.905 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.905 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.905 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.906 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.906 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.906 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.906 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.907 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.907 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.907 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.907 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.907 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.908 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.908 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.908 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.908 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:26.909 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
16:30:38.754 INFO ProgressMeter - chr04:988724 20.7 5670 273.6
16:30:49.046 INFO ProgressMeter - chr04:996899 20.9 5730 274.2
Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa --ERC GVCF -I 01_bam/C382.AR.bam -O 02_gvcf/C382.chr04.g.vcf -L chr04
For chr04, it's total len have ##contig=<ID=chr04,length=69236331>, it's left huge unfinished postion.
Please help me figure out this question, I will be very appreciate for your help.
-
Hi Lin Cheng,
Is the process still running but very slowly, or is it hung (it has stopped)?
Genevieve
-
Many thanks for your kind help, the process stopped without error, the last few line show below. For chr04, it's total length have ##contig=<ID=chr04,length=69236331>. The process did not finish all position but exit without error.
16:30:49.046 INFO ProgressMeter - chr04:996899 20.9 5730 274.2
Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa --ERC GVCF -I 01_bam/C382.AR.bam -O 02_gvcf/C382.chr04.g.vcf -L chr04 -
Hi Lin Cheng,
Are you determining where it stopped because of the progress meter? The progress meter is not accurate for where the code is exactly running, but yes, I see there can be an issue because your chromosome is much longer.
Could you check your output VCF with ValidateVariants?
Also, could you run this tool with --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' to make sure you are getting all the possible information in the stack trace? Here is more of a description. And here are details of how to include multiple java options on the command line if you are not familiar.
Thank you,
Genevieve
-
Hi Genevieve Brandt ,
Many thanks for your help, I checked the gatk running progress and progress meter log file a few times, it's indeed stopped in both respects.
And follow your option, I used the ValidateVariants for comparison of one of my unfinished sample and one of my finished gvcf.
commands
## unfinished sample A157 chr08
gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' ValidateVariants -R $ref -V 02_gvcf/A157.chr08.g.vcf -gvcf -L chr08
## finished sample A157 chr07
gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' ValidateVariants -R $ref -V 02_gvcf/A157.chr07.g.vcf -gvcf -L chr07###log file for unfinished sample A157 chr08
Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar ValidateVariants -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa -V 02_gvcf/A157.chr08.g.vcf -gvcf -L chr08
10:01:59.578 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 01, 2021 10:01:59 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:01:59.798 INFO ValidateVariants - ------------------------------------------------------------
10:01:59.799 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0
10:01:59.799 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
10:01:59.799 INFO ValidateVariants - Executing as chenglin@login1 on Linux v2.6.32-431.el6.x86_64 amd64
10:01:59.799 INFO ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_251-b08
10:01:59.799 INFO ValidateVariants - Start Date/Time: April 1, 2021 10:01:59 AM CST
10:01:59.799 INFO ValidateVariants - ------------------------------------------------------------
10:01:59.799 INFO ValidateVariants - ------------------------------------------------------------
10:01:59.800 INFO ValidateVariants - HTSJDK Version: 2.23.0
10:01:59.800 INFO ValidateVariants - Picard Version: 2.23.3
10:01:59.800 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:01:59.800 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:01:59.800 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:01:59.801 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:01:59.801 INFO ValidateVariants - Deflater: IntelDeflater
10:01:59.801 INFO ValidateVariants - Inflater: IntelInflater
10:01:59.801 INFO ValidateVariants - GCS max retries/reopens: 20
10:01:59.801 INFO ValidateVariants - Requester pays: disabled
10:01:59.801 INFO ValidateVariants - Initializing engine
10:02:00.287 INFO FeatureManager - Using codec VCFCodec to read file file:///vol1/agis/huangsanwen_group/chenglin/pan-genome_vol1/02_snp/02_gvcf/A157.chr08.g.vcf
10:02:00.325 INFO IntervalArgumentCollection - Processing 59226000 bp from intervals
10:02:00.330 INFO ValidateVariants - Shutting down engine
[April 1, 2021 10:02:00 AM CST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1067450368
***********************************************************************
A USER ERROR has occurred: Input 02_gvcf/A157.chr08.g.vcf must support random access to enable traversal by intervals. If it's a file, please index it using the bundled tool IndexFeatureFile
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException: Input 02_gvcf/A157.chr08.g.vcf must support random access to enable traversal by intervals. If it's a file, please index it using the bundled tool IndexFeatureFile
at org.broadinstitute.hellbender.engine.FeatureDataSource.setIntervalsForTraversal(FeatureDataSource.java:454)
at org.broadinstitute.hellbender.engine.VariantWalker.onStartup(VariantWalker.java:47)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
###log file for finished sample A157 chr07
Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar ValidateVariants -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa -V 02_gvcf/A157.chr07.g.vcf -gvcf -L chr07
10:01:18.592 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 01, 2021 10:01:19 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:01:19.205 INFO ValidateVariants - ------------------------------------------------------------
10:01:19.205 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.9.0
10:01:19.206 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
10:01:19.206 INFO ValidateVariants - Executing as chenglin@login1 on Linux v2.6.32-431.el6.x86_64 amd64
10:01:19.206 INFO ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_251-b08
10:01:19.206 INFO ValidateVariants - Start Date/Time: April 1, 2021 10:01:18 AM CST
10:01:19.206 INFO ValidateVariants - ------------------------------------------------------------
10:01:19.206 INFO ValidateVariants - ------------------------------------------------------------
10:01:19.207 INFO ValidateVariants - HTSJDK Version: 2.23.0
10:01:19.207 INFO ValidateVariants - Picard Version: 2.23.3
10:01:19.207 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:01:19.207 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:01:19.207 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:01:19.207 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:01:19.207 INFO ValidateVariants - Deflater: IntelDeflater
10:01:19.208 INFO ValidateVariants - Inflater: IntelInflater
10:01:19.208 INFO ValidateVariants - GCS max retries/reopens: 20
10:01:19.208 INFO ValidateVariants - Requester pays: disabled
10:01:19.208 INFO ValidateVariants - Initializing engine
10:01:19.707 INFO FeatureManager - Using codec VCFCodec to read file file:///vol1/agis/huangsanwen_group/chenglin/pan-genome_vol1/02_snp/02_gvcf/A157.chr07.g.vcf
10:01:19.840 INFO IntervalArgumentCollection - Processing 57639317 bp from intervals
10:01:19.845 INFO ValidateVariants - Done initializing engine
10:01:19.846 WARN ValidateVariants - GVCF format is currently incompatible with allele validation. Not validating Alleles.
10:01:19.846 WARN ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
10:01:19.846 WARN ValidateVariants - Other possible validations will still be performed
10:01:19.846 INFO ProgressMeter - Starting traversal
10:01:19.846 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
10:01:29.848 INFO ProgressMeter - chr07:24618694 0.2 1815000 10890000.0
10:01:39.849 INFO ProgressMeter - chr07:53081570 0.3 3846000 11536269.6
10:01:41.364 INFO ProgressMeter - chr07:57637699 0.4 4162763 11607295.3
10:01:41.364 INFO ProgressMeter - Traversal complete. Processed 4162763 total variants in 0.4 minutes.
10:01:41.366 INFO ValidateVariants - Shutting down engine
[April 1, 2021 10:01:41 AM CST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.38 minutes.
Runtime.totalMemory()=2764046336
-
Hi Lin Cheng,
Thank you for running the ValidateVariants command! It is very helpful to verify.
Could you run your HaplotypeCaller command with the -DGATK_STACKTRACE_ON_USER_EXCEPTION=true java option and post the stack trace here?
Thank you,
Genevieve
-
Hi Genevieve Brandt,
A good news, at an accidental time yesterday, I tried the parameters you mentioned(- DGATK_STACKTRACE_ON_USER_EXCEPTION=true) and ran one chromosome of a sample, and the program worked perfectly.
gatk --java-options "-Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/A157.chr08.g.vcf -L chr08
I think it might be because the parameters you mentioned were added, but later I started to run batches and found that there was still a problem with my issue (unfinished vcf without error). I will continue to try to see what the problem is. (Could this be the reason for running gatk in batches at the same time?)
The log file you mentioned is below.
09:56:22.492 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 02, 2021 9:56:43 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:56:43.339 INFO HaplotypeCaller - ------------------------------------------------------------
09:56:43.339 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
09:56:43.339 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
09:56:43.351 INFO HaplotypeCaller - Executing as chenglin@comput69 on Linux v2.6.32-504.el6.x86_64 amd64
09:56:43.352 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_251-b08
09:56:43.352 INFO HaplotypeCaller - Start Date/Time: April 2, 2021 9:56:22 AM CST
09:56:43.352 INFO HaplotypeCaller - ------------------------------------------------------------
09:56:43.352 INFO HaplotypeCaller - ------------------------------------------------------------
09:56:43.353 INFO HaplotypeCaller - HTSJDK Version: 2.23.0
09:56:43.353 INFO HaplotypeCaller - Picard Version: 2.23.3
09:56:43.353 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:56:43.353 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:56:43.354 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:56:43.354 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:56:43.354 INFO HaplotypeCaller - Deflater: IntelDeflater
09:56:43.354 INFO HaplotypeCaller - Inflater: IntelInflater
09:56:43.354 INFO HaplotypeCaller - GCS max retries/reopens: 20
09:56:43.355 INFO HaplotypeCaller - Requester pays: disabled
09:56:43.355 INFO HaplotypeCaller - Initializing engine
09:56:43.875 INFO IntervalArgumentCollection - Processing 59670755 bp from intervals
09:56:43.896 INFO HaplotypeCaller - Done initializing engine
09:56:43.900 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
09:56:43.908 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
09:56:43.908 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
09:56:43.952 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
09:56:43.953 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
09:56:44.020 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
09:56:44.021 INFO IntelPairHmm - Available threads: 1
09:56:44.021 INFO IntelPairHmm - Requested threads: 4
09:56:44.021 WARN IntelPairHmm - Using 1 available threads, but 4 were requested09:56:44.021 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
09:56:44.269 INFO ProgressMeter - Starting traversal
09:56:44.270 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
09:56:51.697 WARN InbreedingCoeff - InbreedingCoeff will not be calculated; at least 10 samples must have called genotypes
09:56:54.294 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.294 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.296 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.298 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.299 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.311 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.313 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.327 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.338 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.350 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.351 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.352 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.357 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.360 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.363 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.377 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.379 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.386 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.388 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.398 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.408 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.410 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.412 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.413 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:56:54.434 INFO ProgressMeter - chr12:125000 0.2 470 2774.5
09:57:00.597 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:57:00.602 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:57:00.606 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
09:57:00.609 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null...........
...........
...........
15:03:15.392 INFO ProgressMeter - chr12:10886886 306.5 64090 209.1
15:03:27.165 INFO ProgressMeter - chr12:10894514 306.7 64140 209.115:03:38.353 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
15:03:38.353 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
Using GATK jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /public/agis/huangsanwen_group/chenglin/softwares/gatk-4.1.9.0/gatk-package-4.1.9.0-local.jar HaplotypeCaller -R /vol3/agis/huangsanwen_group/chenglin/work/1_reference/DM_v6.1_all_chr.fa --ERC GVCF -I 01_bam/C382.AR.bam -O 02_gvcf/C382.chr12.g.vcf -L chr12 -
Hi Lin Cheng,
Great news! Glad it is working for now. There is a possibility it is a memory issue. Here are our recommendations to help if that is the cause:
-
Check memory/disk space availability on your end.
-
Specify java memory usage using java option -Xmx.
-
Run the gatk command with the gatk wrapper script command line.
- Split your analysis into intervals with the option -L.
-
Specify a --tmp-dir that has room for all necessary temporary files.
-
Verify this issue persists with the latest version of GATK.
-
Check the depth of coverage of your sample at the area of interest.
If the issue persists, please let us know and we can continue to look into it.
Best,
Genevieve
-
-
HI Genevieve Brandt ,
After trying different parameters, I still stuck in the Haplotypecaller step. Here is the command that I used.
gatk --java-options "-Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr05.g.vcf -L chr05 --tmp-dir $tem
gatk --java-options "-Xmx50G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr09.g.vcf -L chr09 --tmp-dir $tem
gatk --java-options "-Xmx10G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr11.g.vcf -L chr11 --tmp-dir $templease help me figure out this questions. Many thanks
-
Hi Lin Cheng,
How large is your input BAM? And what is the memory and disk space availability on your end?
Genevieve
-
Lin Cheng, I wanted to follow up with a couple more points.
If the HaplotypeCaller run is successful, it will show a line that it is complete. An example of a successful run is here, in this post:
15:26:56.288 INFO ProgressMeter - Traversal complete. Processed 150454 total regions in 0.4 minutes.
How long is the job running before it stops? Could you give more details about how you are running this, is it a shared cluster? Are there any other logs you could check that would give details about why the job is being stopped?
Unfortunately, there is really not any information in the HaplotypeCaller stack trace that would indicate the process ending because of the tool.
-
Hi Genevieve Brandt ,
For bam size, my bam size from 36G to 466G,
For memory, I used the SGE qsub to submit my job, so I can ask memory by "-Xmx" up to 252G,
For disk space, there is about 1T left for my disk.
However, I want to say, for some of my unfinished sample C001, it's bam file about 40 G, for some of my finished sample C134, it's bam file about 466G.
How long is the job running before it stops?
It's about 8 hours for sample C001(bam size ~40G) Chr05 (length=55599697)
It's about 2 hours for sample C001 Chr09 (length=67600300)
It's about 1 hour for sample C001(bam size ~40G) Chr11 (length=46777387)
Running command on SGE cluster
qsub parameter: "-q queue6 -l mem=10G,nodes=1:ppn=1" gatk --java-options "-Xmx8G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr05.g.vcf -L chr05 --tmp-dir $tem
qsub parameter: "-q queue6 -l mem=50G,nodes=1:ppn=1" gatk --java-options "-Xmx50G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr09.g.vcf -L chr09 --tmp-dir $tem
qsub parameter: "-q queue6 -l mem=10G,nodes=1:ppn=2" gatk --java-options "-Xmx10G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr11.g.vcf -L chr11 --tmp-dir $tem
I will try to run gatk on my local enviroment not shared cluster and related log file
-
Hi Genevieve Brandt ,
after running on my local environment, I found an error log
10:37:38.526 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:37:38.526 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f8e077d454a, pid=6734, tid=0x00007fa83e9ef700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_251-b08) (build 1.8.0_251-b08)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.251-b08 mixed mode linux-amd64 )
# Problematic frame:
# C [libgkl_pairhmm_omp6215386383396637088.so+0x6954a] double compute_full_prob_avxd<double>(testcase*)+0x34a
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /vol1/agis/huangsanwen_group/chenglin/pan-genome_vol1/02_snp/test/hs_err_pid6734.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#And then I used command on my local environment, it's solved perfectly.
ulimit -c unlimited
gatk --java-options "-Xmx100G -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" HaplotypeCaller -R $ref --ERC GVCF -I 01_bam/$i.AR.bam -O 02_gvcf/C001.chr11.g.vcf -L chr11 --tmp-dir $temMany thanks for your kind help. have a nice day ~~
21:46:52.515 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
21:46:52.726 INFO HaplotypeCaller - 296108 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
0 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
47 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
296155 total reads filtered
21:46:52.726 INFO ProgressMeter - chr11:46775573 616.8 316158 512.6
21:46:52.726 INFO ProgressMeter - Traversal complete. Processed 316158 total regions in 616.8 minutes.
21:46:53.120 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 4.253241578
21:46:53.120 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 2854.067335999
21:46:53.120 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 26878.87 sec
21:46:53.121 INFO HaplotypeCaller - Shutting down engine
[April 6, 2021 9:46:53 PM CST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 616.82 minutes.
Runtime.totalMemory()=38404620288 -
Hi Lin Cheng, glad you were able to solve the issue, thank you for sharing the solution!
Please sign in to leave a comment.
13 comments