GATK v4.2.4.0 VariantAnnotator running too slow
AnsweredHi all,
I am running the latest GATK v4.2.4.0 VariantAnnotator, but similar to the problem mentioned in this post, the running speed is very slow (~5000 sites per min). The development team mentioned in that post that a bug has been fixed in v4.1.7.0 so the next version should run VariantAnnotation faster. But it seems not the case. Here is the log file:
Using GATK jar /lustre/Anson/tools/GenomeAnalysisTK-4.2.0/gatk-package-4.2.4.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx8G -jar /lustre/Anson/tools/GenomeAnalysisTK-4.2.0/gatk-package-4.2.4.0-local.jar VariantAnnotator -I /lustre/Anson/working/PGPC_0016/gatk4_hg38/02var/diffRG/PGPC0016.RC.clipper.bam -R /lustre/Anson/ref/hg38/GRCh38.primary_assembly.genome.fa -A BaseQualityRankSumTest -A ClippingRankSumTest -AX Coverage -A FisherStrand -A LikelihoodRankSumTest -A MappingQualityRankSumTest -A MappingQualityZero -A QualByDepth -A ReadPosRankSumTest -A TandemRepeat --dbsnp /lustre/Anson/ref/hg38/Homo_sapiens_assembly38.dbsnp138.vcf --tmp-dir /lustre/Anson/working/PGPC_0016/gatk4_hg38/tmp -V /lustre/Anson/working/PGPC_0016/gatk4_hg38/02var/diffRG/HC_split/0016.raw.g.vcf -L /lustre/Anson/ref/hg38/wgs_calling_regions_split/0016-scattered.interval_list -O /lustre/Anson/working/PGPC_0016/gatk4_hg38/02var/diffRG/refine_split_try/0016.ReAnno.g.vcf
11:03:45.474 WARN GATKAnnotationPluginDescriptor - Disabled annotation (Coverage) is not enabled by this tool
11:03:45.553 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustre/Anson/tools/GenomeAnalysisTK-4.2.0/gatk-package-4.2.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 28, 2022 11:03:50 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:03:50.834 INFO VariantAnnotator - ------------------------------------------------------------
11:03:50.834 INFO VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.2.4.0
11:03:50.835 INFO VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
11:03:50.835 INFO VariantAnnotator - Executing as ansonwong@node02.bcc.cpy on Linux v3.10.0-514.el7.x86_64 amd64
11:03:50.835 INFO VariantAnnotator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_172-b11
11:03:50.835 INFO VariantAnnotator - Start Date/Time: January 28, 2022 11:03:45 AM HKT
11:03:50.836 INFO VariantAnnotator - ------------------------------------------------------------
11:03:50.836 INFO VariantAnnotator - ------------------------------------------------------------
11:03:50.837 INFO VariantAnnotator - HTSJDK Version: 2.24.1
11:03:50.837 INFO VariantAnnotator - Picard Version: 2.25.4
11:03:50.837 INFO VariantAnnotator - Built for Spark Version: 2.4.5
11:03:50.837 INFO VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:03:50.837 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:03:50.837 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:03:50.837 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:03:50.838 INFO VariantAnnotator - Deflater: IntelDeflater
11:03:50.838 INFO VariantAnnotator - Inflater: IntelInflater
11:03:50.838 INFO VariantAnnotator - GCS max retries/reopens: 20
11:03:50.838 INFO VariantAnnotator - Requester pays: disabled
11:03:50.838 INFO VariantAnnotator - Initializing engine
11:03:51.878 INFO FeatureManager - Using codec VCFCodec to read file file:///lustre/Anson/ref/hg38/Homo_sapiens_assembly38.dbsnp138.vcf
11:03:52.328 INFO FeatureManager - Using codec VCFCodec to read file file:///lustre/Anson/working/PGPC_0016/gatk4_hg38/02var/diffRG/HC_split/0016.raw.g.vcf
11:03:53.284 INFO FeatureManager - Using codec IntervalListCodec to read file file:///lustre/Anson/ref/hg38/wgs_calling_regions_split/0016-scattered.interval_list
11:03:53.422 INFO IntervalArgumentCollection - Processing 146187273 bp from intervals
11:03:53.499 INFO VariantAnnotator - Done initializing engine
11:03:53.602 INFO ProgressMeter - Starting traversal
11:03:53.603 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
11:04:04.089 INFO ProgressMeter - chr16:33602180 0.2 1000 5722.5
11:04:18.396 INFO ProgressMeter - chr16:33610314 0.4 3000 7260.1
11:04:30.105 INFO ProgressMeter - chr16:33614947 0.6 4000 6575.0
11:04:45.065 INFO ProgressMeter - chr16:33618497 0.9 5000 5829.7
11:04:59.241 INFO ProgressMeter - chr16:33625865 1.1 7000 6398.7
11:05:10.636 INFO ProgressMeter - chr16:33630329 1.3 8000 6231.1
11:05:26.275 INFO ProgressMeter - chr16:33635022 1.5 9000 5827.0
11:05:42.934 INFO ProgressMeter - chr16:33647200 1.8 11000 6036.7
11:05:58.275 INFO ProgressMeter - chr16:33653087 2.1 12000 5775.2
11:06:13.178 INFO ProgressMeter - chr16:33665085 2.3 14000 6018.3
11:06:24.077 INFO ProgressMeter - chr16:33671219 2.5 15000 5981.1
11:06:43.998 INFO ProgressMeter - chr16:33680631 2.8 17000 5986.1
11:07:02.961 INFO ProgressMeter - chr16:33684845 3.2 18000 5703.5
11:07:13.470 INFO ProgressMeter - chr16:33692187 3.3 20000 6004.0
11:07:23.590 INFO ProgressMeter - chr16:33697055 3.5 21000 6000.4
11:07:38.830 INFO ProgressMeter - chr16:33701785 3.8 22000 5860.8
11:07:49.821 INFO ProgressMeter - chr16:33710457 3.9 24000 6096.1
11:08:03.486 INFO ProgressMeter - chr16:33713957 4.2 25000 6002.8
11:08:17.254 INFO ProgressMeter - chr16:33721525 4.4 26000 5916.9
11:08:27.989 INFO ProgressMeter - chr16:33727347 4.6 27000 5904.1
11:08:54.603 INFO ProgressMeter - chr16:33734658 5.0 28000 5581.4
11:09:05.250 INFO ProgressMeter - chr16:33740791 5.2 29000 5583.2
11:09:33.621 INFO ProgressMeter - chr16:33747472 5.7 30000 5293.8
11:10:00.961 INFO ProgressMeter - chr16:33752500 6.1 31000 5063.2
-
Hi Anson Wong,
Thank you for the report! Could you give more details about your VCF? How many samples do you have?
Best,
Genevieve
-
Thank you for your help.
I did HaplotypeCaller for one WGS sample before running VariantAnnotator. I split the wgs_calling_regions.hg38.interval_list (provided by GATK) into 20 lists using GATK SplitIntervals, and ran HaplotypeCaller (HC) in parallel to speed up the process.
It took ~4 hours to complete all 20 runs. The outputs are 20 vcf, namely 0000.raw.g.vcf to 0019.raw.g.vcf. Then I tried to run VariantAnnotator for each of the vcf.
Here is the log file for one of the 20 HC runs (using 0016-scattered.interval_list). Due to the word limitation in comments I skipped the logs between 19:00 and 22:30:
Using GATK jar /lustre/Anson/tools/GenomeAnalysisTK-4.2.0/gatk-package-4.2.4.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4G -jar /lustre/Anson/tools/GenomeAnalysisTK-4.2.0/gatk-package-4.2.4.0-local.jar HaplotypeCaller --native-pair-hmm-threads 16 -I /lustre/Anson/working/PGPC_0016/gatk4_hg38/02var/diffRG/PGPC0016.RC.clipper.bam --sample-name PGPC0016 -ERC GVCF -R /lustre/Anson/ref/hg38/GRCh38.primary_assembly.genome.fa --dbsnp /lustre/Anson/ref/hg38/Homo_sapiens_assembly38.dbsnp138.vcf -stand-call-conf 10.0 --min-base-quality-score 20 --tmp-dir /lustre/Anson/working/PGPC_0016/gatk4_hg38/tmp -L /lustre/Anson/ref/hg38/wgs_calling_regions_split/0016-scattered.interval_list -O /lustre/Anson/working/PGPC_0016/gatk4_hg38/02var/diffRG/HC_split/0016.raw.g.vcf
18:31:22.578 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/lustre/Anson/tools/GenomeAnalysisTK-4.2.0/gatk-package-4.2.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 25, 2022 6:31:37 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
18:31:37.771 INFO HaplotypeCaller - ------------------------------------------------------------
18:31:37.771 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.4.0
18:31:37.772 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
18:31:37.772 INFO HaplotypeCaller - Executing as ansonwong@node04.bcc.cpy on Linux v3.10.0-514.el7.x86_64 amd64
18:31:37.772 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_172-b11
18:31:37.772 INFO HaplotypeCaller - Start Date/Time: January 25, 2022 6:31:22 PM HKT
18:31:37.772 INFO HaplotypeCaller - ------------------------------------------------------------
18:31:37.772 INFO HaplotypeCaller - ------------------------------------------------------------
18:31:37.773 INFO HaplotypeCaller - HTSJDK Version: 2.24.1
18:31:37.773 INFO HaplotypeCaller - Picard Version: 2.25.4
18:31:37.773 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
18:31:37.773 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:31:37.773 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:31:37.773 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:31:37.773 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:31:37.773 INFO HaplotypeCaller - Deflater: IntelDeflater
18:31:37.773 INFO HaplotypeCaller - Inflater: IntelInflater
18:31:37.773 INFO HaplotypeCaller - GCS max retries/reopens: 20
18:31:37.774 INFO HaplotypeCaller - Requester pays: disabled
18:31:37.774 INFO HaplotypeCaller - Initializing engine
18:31:38.290 INFO FeatureManager - Using codec VCFCodec to read file file:///lustre/Anson/ref/hg38/Homo_sapiens_assembly38.dbsnp138.vcf
18:31:38.531 INFO FeatureManager - Using codec IntervalListCodec to read file file:///lustre/Anson/ref/hg38/wgs_calling_regions_split/0016-scattered.interval_list
18:31:38.607 INFO IntervalArgumentCollection - Processing 146187273 bp from intervals
18:31:38.642 INFO HaplotypeCaller - Done initializing engine
18:31:38.644 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
18:31:38.655 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
18:31:38.655 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
18:31:38.674 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/lustre/Anson/tools/GenomeAnalysisTK-4.2.0/gatk-package-4.2.4.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
18:31:38.679 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/lustre/Anson/tools/GenomeAnalysisTK-4.2.0/gatk-package-4.2.4.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
18:31:38.733 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
18:31:38.734 INFO IntelPairHmm - Available threads: 64
18:31:38.734 INFO IntelPairHmm - Requested threads: 16
18:31:38.734 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
18:31:38.783 INFO ProgressMeter - Starting traversal
18:31:38.783 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
18:31:39.368 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr16:33589174 and possibly subsequent; at least 10 samples must have called genotypes
18:31:49.042 INFO ProgressMeter - chr16:33645076 0.2 230 1345.3
18:32:00.650 INFO ProgressMeter - chr16:33724784 0.4 520 1426.9
18:32:12.578 INFO ProgressMeter - chr16:33730934 0.6 550 976.5
18:32:39.519 INFO ProgressMeter - chr16:33733369 1.0 570 563.1
18:32:58.232 INFO ProgressMeter - chr16:33734689 1.3 580 438.0
18:33:57.971 INFO ProgressMeter - chr16:33736370 2.3 590 254.3
18:34:47.776 INFO ProgressMeter - chr16:33741049 3.1 620 196.8
18:35:14.284 INFO ProgressMeter - chr16:33742764 3.6 630 175.4
18:35:27.431 INFO ProgressMeter - chr16:33744384 3.8 640 167.9
18:36:17.051 INFO ProgressMeter - chr16:33747951 4.6 660 142.3
18:36:30.048 INFO ProgressMeter - chr16:33838281 4.9 1000 206.0
18:36:40.051 INFO ProgressMeter - chr16:33916544 5.0 1300 258.9
18:36:54.376 INFO ProgressMeter - chr16:34099517 5.3 2030 385.9
18:37:13.688 INFO ProgressMeter - chr16:34119711 5.6 2120 379.8
18:37:33.869 INFO ProgressMeter - chr16:34121312 5.9 2130 359.9
18:38:19.409 INFO ProgressMeter - chr16:34162590 6.7 2300 344.5
18:38:30.450 INFO ProgressMeter - chr16:34169897 6.9 2350 342.5
18:38:54.788 INFO ProgressMeter - chr16:34171634 7.3 2360 324.8
18:39:06.769 INFO ProgressMeter - chr16:34184898 7.5 2440 326.8
18:39:19.738 INFO ProgressMeter - chr16:34195958 7.7 2500 325.4
18:39:29.992 INFO ProgressMeter - chr16:34254091 7.9 2730 347.6
18:40:13.797 INFO ProgressMeter - chr16:34572227 8.6 3540 412.4
18:40:27.249 INFO ProgressMeter - chr16:34573538 8.8 3550 403.1
18:41:34.720 INFO ProgressMeter - chr16:34574857 9.9 3560 358.4
18:44:20.862 INFO ProgressMeter - chr16:34581778 12.7 3580 281.9
18:48:34.016 INFO ProgressMeter - chr16:34583313 16.9 3590 212.2
18:50:48.307 INFO ProgressMeter - chr16:34585037 19.2 3600 187.9
18:53:17.146 INFO ProgressMeter - chr16:34586865 21.6 3610 166.8
18:54:24.227 INFO ProgressMeter - chr16:34588423 22.8 3620 159.1
18:54:34.490 INFO ProgressMeter - chr16:34591652 22.9 3640 158.8
18:55:34.370 INFO ProgressMeter - chr16:34592939 23.9 3650 152.6
18:56:42.948 INFO ProgressMeter - chr16:34594368 25.1 3660 146.0
18:58:16.401 INFO ProgressMeter - chr16:34596104 26.6 3670 137.8
18:58:26.937 INFO ProgressMeter - chr16:34732678 26.8 4230 157.8
18:58:38.093 INFO ProgressMeter - chr16:34814876 27.0 4590 170.1
18:58:48.173 INFO ProgressMeter - chr16:35091646 27.2 5630 207.3
18:58:58.242 INFO ProgressMeter - chr16:35160335 27.3 5940 217.4
18:59:08.318 INFO ProgressMeter - chr16:35443430 27.5 7120 259.0
18:59:18.331 INFO ProgressMeter - chr16:35791060 27.7 8450 305.5
18:59:29.113 INFO ProgressMeter - chr16:36091365 27.8 9600 344.8
18:59:44.587 INFO ProgressMeter - chr16:36099633 28.1 9660 343.8
19:00:01.675 INFO ProgressMeter - chr16:36107901 28.4 9710 342.1
...
22:30:03.596 INFO ProgressMeter - chr18:2662552 238.4 657260 2756.8
22:30:14.006 INFO ProgressMeter - chr18:2877974 238.6 658530 2760.1
22:30:24.063 INFO ProgressMeter - chr18:3158775 238.8 659950 2764.1
22:30:34.098 INFO ProgressMeter - chr18:3375991 238.9 661180 2767.3
22:30:44.146 INFO ProgressMeter - chr18:3623729 239.1 662530 2771.1
22:30:54.182 INFO ProgressMeter - chr18:3979159 239.3 664160 2775.9
22:31:04.205 INFO ProgressMeter - chr18:4237497 239.4 665440 2779.3
22:31:14.216 INFO ProgressMeter - chr18:4492493 239.6 666800 2783.1
22:31:24.234 INFO ProgressMeter - chr18:4713896 239.8 668000 2786.1
22:31:34.241 INFO ProgressMeter - chr18:4971323 239.9 669370 2789.9
22:31:44.310 INFO ProgressMeter - chr18:5213583 240.1 670660 2793.3
22:31:54.360 INFO ProgressMeter - chr18:5470032 240.3 672010 2797.0
22:32:04.363 INFO ProgressMeter - chr18:5791324 240.4 673560 2801.5
22:32:14.380 INFO ProgressMeter - chr18:6104205 240.6 675110 2806.0
22:32:24.457 INFO ProgressMeter - chr18:6504142 240.8 676800 2811.1
22:32:34.501 INFO ProgressMeter - chr18:6764972 240.9 678190 2814.9
22:32:44.541 INFO ProgressMeter - chr18:7055500 241.1 679610 2818.8
22:32:54.861 INFO ProgressMeter - chr18:7376567 241.3 681160 2823.3
22:33:04.903 INFO ProgressMeter - chr18:7725406 241.4 682740 2827.8
22:33:14.949 INFO ProgressMeter - chr18:8045198 241.6 684280 2832.3
22:33:25.019 INFO ProgressMeter - chr18:8330902 241.8 685750 2836.4
22:33:35.069 INFO ProgressMeter - chr18:8560611 241.9 687020 2839.7
22:33:45.100 INFO ProgressMeter - chr18:8810462 242.1 688360 2843.2
22:33:55.122 INFO ProgressMeter - chr18:9027954 242.3 689650 2846.6
22:34:05.176 INFO ProgressMeter - chr18:9375453 242.4 691290 2851.4
22:34:15.232 INFO ProgressMeter - chr18:9644666 242.6 692650 2855.0
22:34:25.233 INFO ProgressMeter - chr18:9893347 242.8 694010 2858.7
22:34:35.288 INFO ProgressMeter - chr18:10226517 242.9 695630 2863.4
22:34:45.335 INFO ProgressMeter - chr18:10414173 243.1 696770 2866.1
22:34:55.454 INFO ProgressMeter - chr18:10618702 243.3 697860 2868.6
22:35:05.535 INFO ProgressMeter - chr18:10806832 243.4 698960 2871.1
22:35:15.556 INFO ProgressMeter - chr18:11063481 243.6 700410 2875.1
22:35:25.592 INFO ProgressMeter - chr18:11397626 243.8 702010 2879.7
22:35:35.631 INFO ProgressMeter - chr18:11619433 243.9 703210 2882.6
22:35:45.700 INFO ProgressMeter - chr18:11906812 244.1 704610 2886.4
22:35:55.702 INFO ProgressMeter - chr18:12053950 244.3 705590 2888.4
22:36:05.716 INFO ProgressMeter - chr18:12260129 244.4 706710 2891.0
22:36:15.763 INFO ProgressMeter - chr18:12524552 244.6 708070 2894.6
22:36:25.831 INFO ProgressMeter - chr18:12777649 244.8 709490 2898.4
22:36:35.915 INFO ProgressMeter - chr18:13186980 245.0 711270 2903.7
22:36:45.924 INFO ProgressMeter - chr18:13485814 245.1 712810 2908.0
22:36:55.930 INFO ProgressMeter - chr18:13743986 245.3 714220 2911.8
22:37:05.945 INFO ProgressMeter - chr18:14029878 245.5 715700 2915.8
22:37:15.975 INFO ProgressMeter - chr18:14248899 245.6 716820 2918.4
22:37:26.019 INFO ProgressMeter - chr18:14346600 245.8 717430 2918.9
22:37:36.083 INFO ProgressMeter - chr18:14494766 246.0 718250 2920.2
22:37:46.123 INFO ProgressMeter - chr18:14650531 246.1 719140 2921.9
22:37:56.508 INFO ProgressMeter - chr18:14780506 246.3 719980 2923.2
22:38:01.232 INFO HaplotypeCaller - 2788113 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
1515085 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
68257 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
4371455 total reads filtered
22:38:01.232 INFO ProgressMeter - chr18:14896384 246.4 720515 2924.5
22:38:01.233 INFO ProgressMeter - Traversal complete. Processed 720515 total regions in 246.4 minutes.
22:38:01.470 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 55.059304404
22:38:01.471 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 3682.4121631350004
22:38:01.472 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 2553.43 sec
22:38:01.472 INFO HaplotypeCaller - Shutting down engine
[January 25, 2022 10:38:01 PM HKT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 246.65 minutes.
Runtime.totalMemory()=4261937152
-
Hi Anson Wong, is VariantAnnotator on its own taking 4 hours or is it HaplotypeCaller and VariantAnnotator?
-
Hi Genevieve-Brandt-she-her, it's HaplotypeCaller that only took 4 hours. Since VariantAnnotator has not been completed after running for > 1 day, I stopped it.
-
Thank you for this clarification. I see why this is an issue, VariantAnnotator should definitely be faster than HaplotypeCaller.
Could you check what is happening while VariantAnnotator is running with jstack and share the jstack output? You can do this by starting a new VariantAnnotator command and once the process slows, run jstack on your machine with the process ID of the VariantAnnotator java process.
One other thing to try would be to double the memory allocation from -Xmx8G to -Xmx16G. Let us know how that goes as well!
-
Thank you for your suggestions. I started again VariantAnnotator commands (same as the one provided above) and tried both -Xmx8G and -Xmx16G, but their speed is the same, both possessing ~5,000 variants per minute according to the log files.
Here is the jstack output of the VariantAnnotator (-Xmx16G) java process:
2022-02-08 18:05:29
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode):"Attach Listener" #26 daemon prio=9 os_prio=0 tid=0x00002b806c001000 nid=0x92fb waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"Service Thread" #23 daemon prio=9 os_prio=0 tid=0x00002b7f606e4000 nid=0x9071 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C1 CompilerThread17" #22 daemon prio=9 os_prio=0 tid=0x00002b7f606e0800 nid=0x9070 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C1 CompilerThread16" #21 daemon prio=9 os_prio=0 tid=0x00002b7f606de800 nid=0x906f waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C1 CompilerThread15" #20 daemon prio=9 os_prio=0 tid=0x00002b7f606dc800 nid=0x906e waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C1 CompilerThread14" #19 daemon prio=9 os_prio=0 tid=0x00002b7f606da800 nid=0x906d waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C1 CompilerThread13" #18 daemon prio=9 os_prio=0 tid=0x00002b7f606d8800 nid=0x906c waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C1 CompilerThread12" #17 daemon prio=9 os_prio=0 tid=0x00002b7f606d6800 nid=0x906b waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread11" #16 daemon prio=9 os_prio=0 tid=0x00002b7f606d4000 nid=0x906a waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread10" #15 daemon prio=9 os_prio=0 tid=0x00002b7f606d2000 nid=0x9069 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread9" #14 daemon prio=9 os_prio=0 tid=0x00002b7f606cf800 nid=0x9068 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread8" #13 daemon prio=9 os_prio=0 tid=0x00002b7f606cd800 nid=0x9067 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread7" #12 daemon prio=9 os_prio=0 tid=0x00002b7f606cb000 nid=0x9066 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread6" #11 daemon prio=9 os_prio=0 tid=0x00002b7f606c9000 nid=0x9065 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread5" #10 daemon prio=9 os_prio=0 tid=0x00002b7f606c7000 nid=0x9064 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread4" #9 daemon prio=9 os_prio=0 tid=0x00002b7f606bc800 nid=0x9063 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00002b7f606ba800 nid=0x9062 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00002b7f606b8000 nid=0x9061 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00002b7f606b6800 nid=0x9060 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00002b7f606b3800 nid=0x905f waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00002b7f606b1800 nid=0x905e runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00002b7f6067e800 nid=0x905d in Object.wait() [0x00002b8005412000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000003c000bb50> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x00000003c000bb50> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216)"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00002b7f6067a000 nid=0x905c in Object.wait() [0x00002b8005311000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000003c0013ae8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000003c0013ae8> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)"main" #1 prio=5 os_prio=0 tid=0x00002b7f6000a800 nid=0x902f runnable [0x00002b7f5d879000]
java.lang.Thread.State: RUNNABLE
at com.intel.gkl.compression.IntelInflater.inflateNative(Native Method)
at com.intel.gkl.compression.IntelInflater.inflate(IntelInflater.java:174)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:145)
at htsjdk.samtools.util.BlockGunzipper.unzipBlock(BlockGunzipper.java:96)
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:550)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
at java.io.DataInputStream.read(DataInputStream.java:149)
at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:421)
at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394)
at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:507)
at htsjdk.samtools.util.BinaryCodec.readInt(BinaryCodec.java:518)
at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:270)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:866)
at htsjdk.samtools.BAMFileReader$BAMFileIndexIterator.getNextRecord(BAMFileReader.java:1005)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:840)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:834)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:802)
at htsjdk.samtools.BAMFileReader$BAMQueryFilteringIterator.advance(BAMFileReader.java:1058)
at htsjdk.samtools.BAMFileReader$BAMQueryFilteringIterator.<init>(BAMFileReader.java:1027)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:938)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:412)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:389)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.query(ReadsPathDataSource.java:352)
at org.broadinstitute.hellbender.engine.ReadsContext.iterator(ReadsContext.java:119)
at org.broadinstitute.hellbender.engine.ReadsContext.iterator(ReadsContext.java:102)
at java.lang.Iterable.spliterator(Iterable.java:101)
at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1176)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.makeLikelihoods(VariantAnnotator.java:251)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.apply(VariantAnnotator.java:243)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at org.broadinstitute.hellbender.engine.VariantWalker$$Lambda$151/8039120.accept(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)"VM Thread" os_prio=0 tid=0x00002b7f60672000 nid=0x905b runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00002b7f60020000 nid=0x9030 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00002b7f60021800 nid=0x9031 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00002b7f60023800 nid=0x9032 runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00002b7f60025800 nid=0x9033 runnable
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00002b7f60027000 nid=0x9034 runnable
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00002b7f60029000 nid=0x9035 runnable
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00002b7f6002b000 nid=0x9036 runnable
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00002b7f6002c800 nid=0x9037 runnable
"GC task thread#8 (ParallelGC)" os_prio=0 tid=0x00002b7f6002e800 nid=0x9038 runnable
"GC task thread#9 (ParallelGC)" os_prio=0 tid=0x00002b7f60030000 nid=0x9039 runnable
"GC task thread#10 (ParallelGC)" os_prio=0 tid=0x00002b7f60032000 nid=0x903a runnable
"GC task thread#11 (ParallelGC)" os_prio=0 tid=0x00002b7f60034000 nid=0x903b runnable
"GC task thread#12 (ParallelGC)" os_prio=0 tid=0x00002b7f60035800 nid=0x903c runnable
"GC task thread#13 (ParallelGC)" os_prio=0 tid=0x00002b7f60037800 nid=0x903d runnable
"GC task thread#14 (ParallelGC)" os_prio=0 tid=0x00002b7f60039800 nid=0x903e runnable
"GC task thread#15 (ParallelGC)" os_prio=0 tid=0x00002b7f6003b000 nid=0x903f runnable
"GC task thread#16 (ParallelGC)" os_prio=0 tid=0x00002b7f6003d000 nid=0x9040 runnable
"GC task thread#17 (ParallelGC)" os_prio=0 tid=0x00002b7f6003f000 nid=0x9041 runnable
"GC task thread#18 (ParallelGC)" os_prio=0 tid=0x00002b7f60040800 nid=0x9042 runnable
"GC task thread#19 (ParallelGC)" os_prio=0 tid=0x00002b7f60042800 nid=0x9043 runnable
"GC task thread#20 (ParallelGC)" os_prio=0 tid=0x00002b7f60044800 nid=0x9044 runnable
"GC task thread#21 (ParallelGC)" os_prio=0 tid=0x00002b7f60046000 nid=0x9045 runnable
"GC task thread#22 (ParallelGC)" os_prio=0 tid=0x00002b7f60048000 nid=0x9046 runnable
"GC task thread#23 (ParallelGC)" os_prio=0 tid=0x00002b7f60049800 nid=0x9047 runnable
"GC task thread#24 (ParallelGC)" os_prio=0 tid=0x00002b7f6004b800 nid=0x9048 runnable
"GC task thread#25 (ParallelGC)" os_prio=0 tid=0x00002b7f6004d800 nid=0x9049 runnable
"GC task thread#26 (ParallelGC)" os_prio=0 tid=0x00002b7f6004f000 nid=0x904a runnable
"GC task thread#27 (ParallelGC)" os_prio=0 tid=0x00002b7f60051000 nid=0x904b runnable
"GC task thread#28 (ParallelGC)" os_prio=0 tid=0x00002b7f60053000 nid=0x904c runnable
"GC task thread#29 (ParallelGC)" os_prio=0 tid=0x00002b7f60054800 nid=0x904d runnable
"GC task thread#30 (ParallelGC)" os_prio=0 tid=0x00002b7f60056800 nid=0x904e runnable
"GC task thread#31 (ParallelGC)" os_prio=0 tid=0x00002b7f60058800 nid=0x904f runnable
"GC task thread#32 (ParallelGC)" os_prio=0 tid=0x00002b7f6005a000 nid=0x9050 runnable
"GC task thread#33 (ParallelGC)" os_prio=0 tid=0x00002b7f6005c000 nid=0x9051 runnable
"GC task thread#34 (ParallelGC)" os_prio=0 tid=0x00002b7f6005d800 nid=0x9052 runnable
"GC task thread#35 (ParallelGC)" os_prio=0 tid=0x00002b7f6005f800 nid=0x9053 runnable
"GC task thread#36 (ParallelGC)" os_prio=0 tid=0x00002b7f60061800 nid=0x9054 runnable
"GC task thread#37 (ParallelGC)" os_prio=0 tid=0x00002b7f60063000 nid=0x9055 runnable
"GC task thread#38 (ParallelGC)" os_prio=0 tid=0x00002b7f60065000 nid=0x9056 runnable
"GC task thread#39 (ParallelGC)" os_prio=0 tid=0x00002b7f60067000 nid=0x9057 runnable
"GC task thread#40 (ParallelGC)" os_prio=0 tid=0x00002b7f60068800 nid=0x9058 runnable
"GC task thread#41 (ParallelGC)" os_prio=0 tid=0x00002b7f6006a800 nid=0x9059 runnable
"GC task thread#42 (ParallelGC)" os_prio=0 tid=0x00002b7f6006c800 nid=0x905a runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00002b7f606e9000 nid=0x9072 waiting on condition
JNI global references: 719
-
Hi Anson Wong,
Thanks for getting back to us so quickly! It looks like the major slowdown is occurring during the BAM I/O. Could you verify that this is the case by running the same VariantAnnotator without the BAM input to confirm that this is the case?
We think the object that is used to read in the bam has no caching and could result in this slowdown.
Let me know what you find,
Genevieve
-
You are right! Without the BAM input, the processing time is much faster now (> 500,000 variants per minute).
According to the VariantAnnotator's documentation, "-I sample.bam" is an optional argument. May I know then under what condition must I specify the BAM input (e.g. adding a particular type of annotation)? And what is the purpose of having this parameter in VariantAnnotator?
Thanks for your help again!
-
Some annotations might need the BAM file to calculate the value for each variant. So, if you don't include the BAM file you might miss some annotations. It depends on which annotations you are interested in.
In terms of being able to use the BAM file as is, we have already identified that this is an issue and started working on a fix for it a few years ago, though we were not able to finish the work because of other projects. Here is the github PR: https://github.com/broadinstitute/gatk/pull/4902
A workaround for now would be to remove your reference blocks in your input GVCF with SelectVariants --exclude-non-variants true. Then, you should be able to run VariantAnnotator just fine.
Let me know if you have further questions.
-
Hi everyone I am having similar issues while trying to annotate my file with the gnomAD database. I am not using a bam file as additional input.
It takes around 5 hours for one CHR. I am using GATK 4.2.5.0.
The output is below:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx30G -XX:ParallelGCThreads=10 -jar /run/media/riadh/One Touch/Analysis/gatk-4.2.4.1/gatk-package-4.2.5.0-local.jar VariantAnnotator -V PE69_chr3.vcf -R /run/media/riadh/One Touch1/Reference_data_b38/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta --resource:gnomad /run/media/riadh/One Touch1/Reference_data_b38/gnomad.genomes.v3.1.2.sites.chr3.vcf.bgz -E gnomad.nhomalt -E gnomad.AC -E gnomad.AF -O Chr3_gnomad.vcf
09:52:39.525 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/run/media/riadh/One%20Touch/Analysis/gatk-4.2.4.1/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 22, 2022 9:52:39 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:52:39.611 INFO VariantAnnotator - ------------------------------------------------------------
09:52:39.611 INFO VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.2.5.0
09:52:39.611 INFO VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
09:52:39.611 INFO VariantAnnotator - Executing as riadh@ikm-unix-1012.uio.no on Linux v5.16.15-201.fc35.x86_64 amd64
09:52:39.612 INFO VariantAnnotator - Java runtime: OpenJDK 64-Bit Server VM v11.0.14.1+1
09:52:39.612 INFO VariantAnnotator - Start Date/Time: March 22, 2022 at 9:52:39 AM CET
09:52:39.612 INFO VariantAnnotator - ------------------------------------------------------------
09:52:39.612 INFO VariantAnnotator - ------------------------------------------------------------
09:52:39.612 INFO VariantAnnotator - HTSJDK Version: 2.24.1
09:52:39.612 INFO VariantAnnotator - Picard Version: 2.25.4
09:52:39.612 INFO VariantAnnotator - Built for Spark Version: 2.4.5
09:52:39.612 INFO VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:52:39.613 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:52:39.613 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:52:39.613 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:52:39.613 INFO VariantAnnotator - Deflater: IntelDeflater
09:52:39.613 INFO VariantAnnotator - Inflater: IntelInflater
09:52:39.613 INFO VariantAnnotator - GCS max retries/reopens: 20
09:52:39.613 INFO VariantAnnotator - Requester pays: disabled
09:52:39.613 INFO VariantAnnotator - Initializing engine
09:52:39.747 INFO FeatureManager - Using codec VCFCodec to read file file:///run/media/riadh/One%20Touch1/Reference_data_b38/gnomad.genomes.v3.1.2.sites.chr3.vcf.bgz
09:52:40.119 INFO FeatureManager - Using codec VCFCodec to read file file:///run/media/riadh/My%20Book_From%20Eiklid/Analysis/gatk-4.2.4.1/ensembl-vep/PE69_chr3.vcf
09:52:40.195 INFO VariantAnnotator - Done initializing engine
09:52:40.289 INFO ProgressMeter - Starting traversal
09:52:40.289 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
09:59:18.601 INFO ProgressMeter - chr3:4133564 6.6 1000 150.6 -
Hi Riad Hajdarevic,
Thanks for writing in and it does look like VariantAnnotator is running extremely slowly for you (150 sites/minute). How many samples do you have in your VCF? Could you try the suggestion listed above to check what is happening while VariantAnnotator is running with jstack and share the jstack output? Have you tried running the command with a higher memory specification?
Kind regards,
Pamela
-
Hi Pamel
there are 3 samples only (a family). I allocated 30G of memory to it ( I have 32 in total). here is the jstack output. Thank you!
hreads class SMR info:
_java_thread_list=0x00007fb878000c30, length=10, elements={
0x00007fb910017000, 0x00007fb910254800, 0x00007fb910256800, 0x00007fb91025c800,
0x00007fb91025e800, 0x00007fb910260800, 0x00007fb910262800, 0x00007fb910264800,
0x00007fb910297800, 0x00007fb878001000
}"main" #1 prio=5 os_prio=0 cpu=9858.99ms elapsed=16.21s tid=0x00007fb910017000 nid=0x5ce7 runnable [0x00007fb91624a000]
java.lang.Thread.State: RUNNABLE
at htsjdk.variant.vcf.AbstractVCFCodec.parseInfo(AbstractVCFCodec.java:528)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:428)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:384)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:328)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:48)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.readNextRecord(TabixFeatureReader.java:173)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:205)
at htsjdk.tribble.TabixFeatureReader$FeatureIterator.next(TabixFeatureReader.java:149)
at org.broadinstitute.hellbender.engine.FeatureCache.fill(FeatureCache.java:142)
at org.broadinstitute.hellbender.engine.FeatureDataSource.refillQueryCache(FeatureDataSource.java:623)
at org.broadinstitute.hellbender.engine.FeatureDataSource.queryAndPrefetch(FeatureDataSource.java:591)
at org.broadinstitute.hellbender.engine.FeatureManager.getFeatures(FeatureManager.java:357)
at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:173)
at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:125)
at org.broadinstitute.hellbender.engine.FeatureContext.getValues(FeatureContext.java:240)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateExpressions(VariantAnnotatorEngine.java:468)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.addInfoAnnotations(VariantAnnotatorEngine.java:347)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:334)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:306)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.apply(VariantAnnotator.java:243)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:104)
at org.broadinstitute.hellbender.engine.VariantWalker$$Lambda$235/0x00000008405a8840.accept(Unknown Source)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(java.base@11.0.14.1/ForEachOps.java:183)
at java.util.stream.ReferencePipeline$3$1.accept(java.base@11.0.14.1/ReferencePipeline.java:195)
at java.util.stream.ReferencePipeline$2$1.accept(java.base@11.0.14.1/ReferencePipeline.java:177)
at java.util.stream.ReferencePipeline$3$1.accept(java.base@11.0.14.1/ReferencePipeline.java:195)
at java.util.Iterator.forEachRemaining(java.base@11.0.14.1/Iterator.java:133)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(java.base@11.0.14.1/Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(java.base@11.0.14.1/AbstractPipeline.java:484)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(java.base@11.0.14.1/AbstractPipeline.java:474)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(java.base@11.0.14.1/ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(java.base@11.0.14.1/ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(java.base@11.0.14.1/AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(java.base@11.0.14.1/ReferencePipeline.java:497)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:102)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)"Reference Handler" #2 daemon prio=10 os_prio=0 cpu=0.56ms elapsed=16.18s tid=0x00007fb910254800 nid=0x5cee waiting on condition [0x00007fb8ac519000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.14.1/Native Method)
at java.lang.ref.Reference.processPendingReferences(java.base@11.0.14.1/Reference.java:241)
at java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.14.1/Reference.java:213)"Finalizer" #3 daemon prio=8 os_prio=0 cpu=0.70ms elapsed=16.18s tid=0x00007fb910256800 nid=0x5cef in Object.wait() [0x00007fb8ac419000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.14.1/Native Method)
- waiting on <0x0000000082017f08> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(java.base@11.0.14.1/ReferenceQueue.java:155)
- waiting to re-lock in wait() <0x0000000082017f08> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(java.base@11.0.14.1/ReferenceQueue.java:176)
at java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.14.1/Finalizer.java:170)"Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.57ms elapsed=16.18s tid=0x00007fb91025c800 nid=0x5cf0 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"Service Thread" #5 daemon prio=9 os_prio=0 cpu=0.16ms elapsed=16.18s tid=0x00007fb91025e800 nid=0x5cf1 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"C2 CompilerThread0" #6 daemon prio=9 os_prio=0 cpu=1723.19ms elapsed=16.18s tid=0x00007fb910260800 nid=0x5cf2 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task"C1 CompilerThread0" #14 daemon prio=9 os_prio=0 cpu=255.49ms elapsed=16.18s tid=0x00007fb910262800 nid=0x5cf3 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task"Sweeper thread" #18 daemon prio=9 os_prio=0 cpu=0.07ms elapsed=16.18s tid=0x00007fb910264800 nid=0x5cf4 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"Common-Cleaner" #19 daemon prio=8 os_prio=0 cpu=0.81ms elapsed=16.17s tid=0x00007fb910297800 nid=0x5cf7 in Object.wait() [0x00007fb8a53fc000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.14.1/Native Method)
- waiting on <0x0000000082018510> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(java.base@11.0.14.1/ReferenceQueue.java:155)
- waiting to re-lock in wait() <0x0000000082018510> (a java.lang.ref.ReferenceQueue$Lock)
at jdk.internal.ref.CleanerImpl.run(java.base@11.0.14.1/CleanerImpl.java:148)
at java.lang.Thread.run(java.base@11.0.14.1/Thread.java:829)
at jdk.internal.misc.InnocuousThread.run(java.base@11.0.14.1/InnocuousThread.java:161)"Attach Listener" #22 daemon prio=9 os_prio=0 cpu=0.26ms elapsed=0.10s tid=0x00007fb878001000 nid=0x5de9 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE"VM Thread" os_prio=0 cpu=48.05ms elapsed=16.18s tid=0x00007fb910252000 nid=0x5ced runnable
"GC Thread#0" os_prio=0 cpu=805.37ms elapsed=16.20s tid=0x00007fb91002f000 nid=0x5ce8 runnable
"GC Thread#1" os_prio=0 cpu=804.28ms elapsed=14.45s tid=0x00007fb898001000 nid=0x5d09 runnable
"GC Thread#2" os_prio=0 cpu=796.93ms elapsed=14.45s tid=0x00007fb898002000 nid=0x5d0a runnable
"GC Thread#3" os_prio=0 cpu=801.03ms elapsed=14.45s tid=0x00007fb898003800 nid=0x5d0b runnable
"GC Thread#4" os_prio=0 cpu=798.80ms elapsed=14.45s tid=0x00007fb898005800 nid=0x5d0c runnable
"GC Thread#5" os_prio=0 cpu=805.12ms elapsed=14.45s tid=0x00007fb898007000 nid=0x5d0d runnable
"GC Thread#6" os_prio=0 cpu=803.97ms elapsed=14.45s tid=0x00007fb898008800 nid=0x5d0e runnable
"GC Thread#7" os_prio=0 cpu=806.09ms elapsed=14.45s tid=0x00007fb89800a000 nid=0x5d0f runnable
"GC Thread#8" os_prio=0 cpu=799.53ms elapsed=14.45s tid=0x00007fb89800b800 nid=0x5d10 runnable
"GC Thread#9" os_prio=0 cpu=806.70ms elapsed=14.45s tid=0x00007fb89800d000 nid=0x5d11 runnable
"GC Thread#10" os_prio=0 cpu=804.13ms elapsed=14.45s tid=0x00007fb89800e800 nid=0x5d12 runnable
"GC Thread#11" os_prio=0 cpu=801.83ms elapsed=14.45s tid=0x00007fb898010000 nid=0x5d13 runnable
"GC Thread#12" os_prio=0 cpu=710.48ms elapsed=8.30s tid=0x00007fb89803f000 nid=0x5d29 runnable
"G1 Main Marker" os_prio=0 cpu=2.82ms elapsed=16.20s tid=0x00007fb91009f000 nid=0x5ce9 runnable
"G1 Conc#0" os_prio=0 cpu=1642.20ms elapsed=16.20s tid=0x00007fb9100a1000 nid=0x5cea runnable
"G1 Conc#1" os_prio=0 cpu=1642.15ms elapsed=13.12s tid=0x00007fb8a8001000 nid=0x5d14 runnable
"G1 Conc#2" os_prio=0 cpu=1638.49ms elapsed=13.12s tid=0x00007fb8a8002000 nid=0x5d15 runnable
"G1 Refine#0" os_prio=0 cpu=0.24ms elapsed=16.20s tid=0x00007fb910198800 nid=0x5ceb runnable
"G1 Young RemSet Sampling" os_prio=0 cpu=2.50ms elapsed=16.20s tid=0x00007fb91019a000 nid=0x5cec runnable
"VM Periodic Task Thread" os_prio=0 cpu=4.70ms elapsed=16.17s tid=0x00007fb910295000 nid=0x5cf6 waiting on conditionJNI global refs: 24, weak refs: 0
-
Hi Riad Hajdarevic,
Thank you for providing all of this additional information! Could you actually try running this with less memory (maybe 16GB)? It's possible that you're allocating too much of your available memory to the job.
Kind regards,
Pamela
-
Hi Pamela,
thank you for your quick reply. Now it says I am out of memory. with this error message. The gnomAD annotation files are quite big is that maybe the reason?
There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 123178319872 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (os_linux.cpp:3002), pid=26424, tid=26431
#
# JRE version: OpenJDK Runtime Environment 18.9 (11.0.14.1+1) (build 11.0.14.1+1)
# Java VM: OpenJDK 64-Bit Server VM 18.9 (11.0.14.1+1, mixed mode, tiered, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /run/media/riadh/My Book_From Eiklid/Analysis/gatk-4.2.4.1/ensembl-vep/snpEff_latest_core/snpEff/core.26424)
#--------------- S U M M A R Y ------------
Command Line: -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3016G /run/media/riadh/One Touch1/Analysis/gatk-4.2.4.1/gatk-package-4.2.5.0-local.jar VariantAnnotator -V PE69_chr3.vcf -R /run/media/riadh/One Touch/Reference_data_b38/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta --resource:gnomad /run/media/riadh/One Touch/Reference_data_b38/gnomad.genomes.v3.1.2.sites.chr3.vcf.bgz --expression gnomad.AF -O Chr3_gnomad.vcf -L chr3
Host: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz, 16 cores, 31G, Fedora release 35 (Thirty Five)
Time: Tue Mar 22 14:44:55 2022 CET elapsed time: 51.566969 seconds (0d 0h 0m 51s) -
Hi Riad Hajdarevic,
From this output, it looks like you specified 3016Gb of memory rather than 16GB. Was this intentional?
-
Hi
sorry that was a typo I didnt see. I put it on 16G now and It works very slow again. No error so far, but I can see on the generated output file that it goes very slow.
-
Hi Riad Hajdarevic,
Okay, thank you for letting me know. I'm going to talk to some other members of the GATK team to try to figure out what the issue might be and I will get back to you as soon as I can. Please let me know if you see any progress with this current run of VariantAnnotator.
Kind regards,
Pamela
-
Hi Pamela,
thank you for your time an effort. I run VariantAnnotator with a dbSnp file and it went quite fast. It seems that it only slows down when I use gnomAD vcf files for annotation. I hope this helps.
Regards
Riad
-
Its me again, just adding info. So what I did I used the older version of the gnomAD data. The gnomADv2liftover to b38 instead of the gnomADv3 and it does around 7500 variants per minute.
-
Hi Riad Hajdarevic,
Okay, thank you for providing this additional information. I'm glad you were able to run VariantAnnotator quickly with the dbSnp file and the other version of the gnomAD data. That is helpful because it seems that the slowdown is due to the inputs rather the tool. I'm going to continue looking into why this gnomAD file might be running so slowly.
Kind regards,
Pamela
-
Hi Riad Hajdarevic,
I just wanted to update you on my discussion with some other members of the GATK team on this issue. It seems that the long runtime is likely just due to the sheer number of variants in the gnomAD files that the tool is attempting to process, which is why the dpSnp file seems to work faster. Additionally, gnomADv3 has approximately 5X more variants than gnomADv2 which seems to be why the older version works faster. The GATK team is working on looking for a solution to the time it takes for the tool to process all these variants, but it looks like the behavior you are seeing is the expected behavior for now. Please let me know if you have any questions.
Kind regards,
Pamela
Please sign in to leave a comment.
21 comments