How can I build the GATK jar so that the intel native libraries are available?
AnsweredWhen I build the GATK jar from source using
git lfs install && \
git clone --branch 4.2.6.1 https://github.com/broadinstitute/gatk.git && \
cd gatk && \
./gradlew
The package builds successfully.
However, when I run the Jar on an intel machine with AVX I get the warning below.If I use the same command with the official GATK jar I don't get this warning and AVX is used.
How can I use gradle to build the jar from source such that the Intel native libraries are used?
a) GATK version used: 4.2.6.1
b) Exact command used: gatk HaplotypeCaller -I /data/ERR194161.hg38.bam -O /data/test.vcf --reference /data/Homo_sapiens_assembly38.fasta
c) Entire program log:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/build/libs/gatk-package-4.2.6.1-SNAPSHOT-local.jar HaplotypeCaller -I /data/ERR194161.hg38.bam -O /data/test.vcf --reference /data/Homo_sapiens_assembly38.fasta
15:53:27.824 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/libs/gatk-package-4.2.6.1-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:53:28.060 INFO HaplotypeCaller - ------------------------------------------------------------
15:53:28.061 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.6.1-SNAPSHOT
15:53:28.061 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
15:53:28.061 INFO HaplotypeCaller - Executing as root@9fab51bccadc on Linux v5.10.135-122.509.amzn2.x86_64 amd64
15:53:28.061 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.16.1+9-LTS
15:53:28.061 INFO HaplotypeCaller - Start Date/Time: September 22, 2022 at 3:53:27 PM UTC
15:53:28.061 INFO HaplotypeCaller - ------------------------------------------------------------
15:53:28.061 INFO HaplotypeCaller - ------------------------------------------------------------
15:53:28.061 INFO HaplotypeCaller - HTSJDK Version: 2.24.1
15:53:28.062 INFO HaplotypeCaller - Picard Version: 2.27.1
15:53:28.062 INFO HaplotypeCaller - Built for Spark Version: 2.4.5
15:53:28.062 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:53:28.062 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:53:28.062 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:53:28.062 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:53:28.062 INFO HaplotypeCaller - Deflater: IntelDeflater
15:53:28.062 INFO HaplotypeCaller - Inflater: IntelInflater
15:53:28.062 INFO HaplotypeCaller - GCS max retries/reopens: 20
15:53:28.062 INFO HaplotypeCaller - Requester pays: disabled
15:53:28.062 INFO HaplotypeCaller - Initializing engine
15:53:28.451 INFO HaplotypeCaller - Done initializing engine
15:53:28.574 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
15:53:28.592 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/gatk/build/libs/gatk-package-4.2.6.1-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_utils.so
15:53:28.594 WARN NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/tmp/libgkl_utils10821132658567995603.so: libgomp.so.1: cannot open shared object file: No such file or directory)
15:53:28.594 WARN IntelPairHmm - Intel GKL Utils not loaded
15:53:28.595 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
15:53:28.595 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/gatk/build/libs/gatk-package-4.2.6.1-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_utils.so
15:53:28.596 WARN NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/tmp/libgkl_utils15230342874778145919.so: libgomp.so.1: cannot open shared object file: No such file or directory)
15:53:28.596 WARN IntelPairHmm - Intel GKL Utils not loaded
15:53:28.596 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
15:53:28.660 INFO ProgressMeter - Starting traversal
15:53:28.661 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
15:53:31.272 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr1:10352 and possibly subsequent; at least 10 samples must have called genotypes
15:53:38.692 INFO ProgressMeter - chr1:193563 0.2 880 5264.2
-
Hi Mark Schreiber,
Could you clarify what you mean by the official GATK jar?
If I use the same command with the official GATK jar I don't get this warning and AVX is used.
Thank you,
Genevieve
-
Sorry, by "official" I mean the one bundled with the latest GATK release available here https://github.com/broadinstitute/gatk/releases/tag/4.2.6.1
-
I found the solution. I was running the built GATK jar file in a Docker container that didn't contain the libgomp library which turns out to be a required dependency of the intel AVX native code in GATK.
Installing libgomp solved the problem.
yum -y install libgomp
-
Thank you for posting your solution Mark Schreiber! I'm glad you were able to solve the issue.
Please sign in to leave a comment.
4 comments