Is ARM64 (Linux/MacOS) architecture officially supported
Hello,
I'd like to ask whether Linux ARM64 (aarch64) is an officially supported platform ?
The only related topic I was able to find is https://gatk.broadinstitute.org/hc/en-us/community/posts/360078197412-Error-running-DenoiseReadCounts-on-arm64-processor where it is said that some of the native libraries (.so) are available only for x86_64 architecture.
At https://app.travis-ci.com/github/broadinstitute/gatk I see that the build and tests run only on AMD64 (x86_64).
Thank you!
Mark
-
Hi Mark,
Thanks for writing into the forum about this! No, Linux ARM64 is not an officially supported platform. Generally any of the basic tools that do not use any additional native libraries will be fine. These would include a lot of our read analysis tools. Our other Python and more complex tools probably won't work well or at all.
Since GATK is written in Java, HaplotypeCaller might work by falling back on Java libraries, but it will be very slow.
Please let us know if you have any other questions.
Best,
Genevieve
-
Thank you for the answer, Genevieve Brandt (she/her)!
I will try to build and run it on Linux ARM64 and report any issues I find in the issue tracker.
I hope I will be able to help fixing them myself!
Mark
-
Thanks Mark! You can post your thoughts here and we will come back and take a look if we have the capacity to prioritize this work as well.
-
How did this go? - Mark, did you ever get an ARM build working? I have just tried GATK out on aarch64 and while it runs (with warnings about native libs), it was noticeably slow (particularly SplitNCigarReads), so I was wondering if lack of native libs has an impact on that algorithm (also base recalibration and ApplyBQSR). I was going to also look at the source code, but didn't want to duplicate anything. Any information appreciated,
-
The state of GATK on ARM is complicated and not great. First of all, I'm assuming this is on an M2 mac and not on some other ARM machine. Second, it's important to understand which version of java you're running. There are versions of java released to be M2 native, and versions which are compiled for x64 and run on M2 under the rosetta emulator. If you run the M2 native version any dynamic libraries gatk tries to load have to have an ARM64 version and many do not. So this means things like the Intel GKL will not work at all as well as support for native BWA/FERMILITE/HDF5 and probably other things will not work at all. However, the M2 native JVM should be fast for code that doesn't use those things. Running an Intel x86 version of java under emulation will allow loading those libraries under emulation. This may be slow but will mostly work. The big gotach there is that Rosetta doesn't emulate AVX instructions and instead blows up with an unhelpful error if it encounters one. So if you try to run emulated HaplotypeCaller with the optimized PairHMM/SmithWaterman it will crash. There's no performant solution to run HaplotypeCaller on m2 right now.
Also, as a warning we've had pretty horrible performance running dockers built for x86 on M2 although that may have improved since the last time I've tried.
So the basic summary is, try running using a version of java which is built for M2. If your tool works that's great, if it doesn't, try running it under an emulated java built for x86. If that doesn't work you're out of luck.
We'd love to improve support for M2 but it's a big task and we don't have the expertise or resources to do so right now.
-
Thanks for the quick reply. Actually, I'm not working with the M2. I'm using GATK on AWS Gravitron instances - they are very fast generally and cheaper than intel, so very attractive for genomic workloads. That said, the gatk steps have been among the slowest parts of the pipeline I'm running, so I am wondering if something related to the architecture is making performance worse. Are there any things you'd adjust above with this context?
-
Oh, that's interesting. I didn't know people were interested in it on non-OSX ARM. I'm a bit surprised it's working, but java is of course "write once, kind of run anywhere"...
Do graviton machines have a similar emulation layer for x86 software like OSX does? Are you running natively or are you running the GATK docker? I know very little about it Graviton I can only really speculate. You'll probably have the same issues as running natively on M2. Certain tools that require native libraries will just fall over. You also don't get the benefit of our optimized compression/decompression library. That shouldn't be THAT big of a difference though.
Usually I expect the runtime of most variant calling pipelines to be mostly the alignment, followed by HaplotypeCaller / Mutect2, then MarkDuplicates / SortSam and then a long tail of faster tools. HaplotypeCaller will be very slow without the native acceleration, so if that's what's dominating it's what i would expect. It very well might be worth moving to an Intel machine with AVX2 / AVX 512 for that step.
I'm definitely interested to hear about your experiences in any case.
-
In case you're interested, one tool that definitely doesn't work on Graviton 2 (c6gd) is GenomicsDBImport 4.3.0.0. Apparently it assumes the platform is x86_64 when loading its native libraries.
13:22:17.524 WARN IntelInflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
13:22:17.524 WARN IntelInflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
13:22:17.528 WARN IntelInflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
13:22:17.972 INFO IntervalArgumentCollection - Processing 50818468 bp from intervals
13:22:18.088 INFO GenomicsDBImport - Done initializing engine
13:22:18.256 INFO GenomicsDBImport - Shutting down engine
[August 22, 2023 1:22:18 PM GMT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=4116185088
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.genomicsdb.GenomicsDBUtils.createTileDBWorkspace(GenomicsDBUtils.java:47)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.overwriteCreateOrCheckWorkspace(GenomicsDBImport.java:1000)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onTraversalStart(GenomicsDBImport.java:636)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1093)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: org.genomicsdb.exception.GenomicsDBException: Could not load genomicsdb native library
at org.genomicsdb.GenomicsDBUtilsJni.<clinit>(GenomicsDBUtilsJni.java:34)
... 10 more
Caused by: java.lang.UnsatisfiedLinkError: /tmp/libtiledbgenomicsdb7948125047406477072.so: /tmp/libtiledbgenomicsdb7948125047406477072.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1817)
at java.lang.Runtime.load0(Runtime.java:782)
at java.lang.System.load(System.java:1100)
at org.genomicsdb.GenomicsDBLibLoader.loadLibraryFromJar(GenomicsDBLibLoader.java:156)
at org.genomicsdb.GenomicsDBLibLoader.loadLibrary(GenomicsDBLibLoader.java:55)
at org.genomicsdb.GenomicsDBUtilsJni.<clinit>(GenomicsDBUtilsJni.java:31)
... 10 more -
That makes sense. Anything that uses native code will not work unless they have included appropriate versions of the library. We package x86 binaries for linux and mac but no arm builds yet. I don't know if there is an arm compatible build of genomicsdb/tiledb.
-
Soon ARM compatibility is going to become a bigger issue than supporting macOS.
Windows computer manufacturers will be switching to Qualcomm ARM by default.
AWS is offering ARM servers.
-
Hi,
I'm not sure this comment will help anyone, but it might. I was recently struggling to run HC on an M2 Mac. It was a long journey, but it eventually worked. The main problem was the error message:
env: python: No such file or directory
Working with Miniconda, as recommended on the GitHub page, worked. A better and cleaner solution (IMO) that I preferred and avoid additional Python and other installations is to use a symlink to the CORRECT Python version (the one I assume HC uses in the CommandLineTools):
sudo ln -s /Library/Developer/CommandLineTools/usr/bin/python3 /Library/Developer/CommandLineTools/usr/bin/python
Just remember to remove this symlink later.
Also, it still runs very slow, and BTW, it runs with Java SE 23.0.1.
Guy
-
Hi
Since the day this post was first appeared on the forums we still do not support aarch64 as a native platform for GATK and its companion tools due to lack of native library and development support. Currently ARM based platforms can run only pieces of GATK that does not rely on external python dependencies however fast HTS file compression and decompression relies on Intel accelerated libraries therefore they will be unavailable for ARM users. You may try running our docker image on an Apple Silicon Mac using rosetta. It may help to an extend bu I am not sure if native libraries would still work as those may be checking cpu flags to become enabled.
I hope this helps.
Please sign in to leave a comment.
12 comments