Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Is ARM64 (Linux/MacOS) architecture officially supported

0

10 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Mark,

    Thanks for writing into the forum about this! No, Linux ARM64 is not an officially supported platform. Generally any of the basic tools that do not use any additional native libraries will be fine. These would include a lot of our read analysis tools. Our other Python and more complex tools probably won't work well or at all.

    Since GATK is written in Java, HaplotypeCaller might work by falling back on Java libraries, but it will be very slow. 

    Please let us know if you have any other questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Mark Jens

    Thank you for the answer, Genevieve Brandt (she/her)!

    I will try to build and run it on Linux ARM64 and report any issues I find in the issue tracker.

    I hope I will be able to help fixing them myself!

     

    Mark

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks Mark! You can post your thoughts here and we will come back and take a look if we have the capacity to prioritize this work as well.

    0
    Comment actions Permalink
  • Avatar
    Darren Platt

    How did this go? - Mark, did you ever get an ARM build working?  I have just tried GATK out on aarch64 and while it runs (with warnings about native libs), it was noticeably slow (particularly SplitNCigarReads),  so I was wondering if lack of native libs has an impact on that algorithm (also base recalibration and ApplyBQSR).  I was going to also look at the source code, but didn't want to duplicate anything.  Any information appreciated,

     

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    The state of GATK on ARM is complicated and not great.  First of all, I'm assuming this is on an M2 mac and not on some other ARM machine.  Second, it's important to understand which version of java you're running.  There are versions of java released to be M2 native, and versions which are compiled for x64 and run on M2 under the rosetta emulator.  If you run the M2 native version any dynamic libraries gatk tries to load have to have an ARM64 version and many do not.  So this means things like the Intel GKL will not work at all as well as support for native BWA/FERMILITE/HDF5 and probably other things will not work at all.  However, the M2 native JVM should be fast for code that doesn't use those things.  Running an Intel x86 version of java under emulation will allow loading those libraries under emulation.  This may be slow but will mostly work.  The big gotach there is that Rosetta doesn't emulate AVX instructions and instead blows up with an unhelpful error if it encounters one.  So if you try to run emulated HaplotypeCaller with the optimized PairHMM/SmithWaterman it will crash.  There's no performant solution to run HaplotypeCaller on m2 right now.  

    Also, as a warning we've had pretty horrible performance running dockers built for x86 on M2 although that may have improved since the last time I've tried.

    So the basic summary is, try running using a version of java which is built for M2.  If your tool works that's great, if it doesn't, try running it under an emulated java built for x86. If that doesn't work you're out of luck.

    We'd love to improve support for M2 but it's a big task and we don't have the expertise or resources to do so right now.  

    0
    Comment actions Permalink
  • Avatar
    Darren Platt

    Thanks for the quick reply.  Actually, I'm not working with the M2.  I'm using GATK on AWS Gravitron instances - they are very fast generally and cheaper than intel, so very attractive for genomic workloads.   That said,  the gatk steps have been among the slowest parts of the pipeline I'm running, so I am wondering if something related to the architecture is making performance worse.   Are there any things you'd adjust above with this context?

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    Oh, that's interesting.  I didn't know people were interested in it on non-OSX ARM.  I'm a bit surprised it's working, but java is of course "write once, kind of run anywhere"... 

    Do graviton machines have a similar emulation layer for x86 software like OSX does?  Are you running natively or are you running the GATK docker?  I know very little about it Graviton I can only really speculate.  You'll probably have the same issues as running natively on M2. Certain tools that require native libraries will just fall over.  You also don't get the benefit of our optimized compression/decompression library.  That shouldn't be THAT big of a difference though.  

    Usually I expect the runtime of most variant calling pipelines to be mostly the alignment, followed by HaplotypeCaller / Mutect2, then MarkDuplicates / SortSam and then a long tail of faster tools.   HaplotypeCaller will be very slow without the native acceleration, so if that's what's dominating it's what i would expect.  It very well might be worth moving to an Intel machine with AVX2 / AVX 512 for that step. 

    I'm definitely interested to hear about your experiences in any case.

    0
    Comment actions Permalink
  • Avatar
    Martin Pollard

    In case you're interested, one tool that definitely doesn't work on Graviton 2 (c6gd) is GenomicsDBImport 4.3.0.0. Apparently it assumes the platform is x86_64 when loading its native libraries.

    13:22:17.524 WARN  IntelInflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
    13:22:17.524 WARN  IntelInflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
    13:22:17.528 WARN  IntelInflaterFactory - IntelInflater is not supported, using Java.util.zip.Inflater
    13:22:17.972 INFO  IntervalArgumentCollection - Processing 50818468 bp from intervals
    13:22:18.088 INFO  GenomicsDBImport - Done initializing engine
    13:22:18.256 INFO  GenomicsDBImport - Shutting down engine
    [August 22, 2023 1:22:18 PM GMT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=4116185088
    Exception in thread "main" java.lang.ExceptionInInitializerError
            at org.genomicsdb.GenomicsDBUtils.createTileDBWorkspace(GenomicsDBUtils.java:47)
            at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.overwriteCreateOrCheckWorkspace(GenomicsDBImport.java:1000)
            at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onTraversalStart(GenomicsDBImport.java:636)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1093)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Caused by: org.genomicsdb.exception.GenomicsDBException: Could not load genomicsdb native library
            at org.genomicsdb.GenomicsDBUtilsJni.<clinit>(GenomicsDBUtilsJni.java:34)
            ... 10 more
    Caused by: java.lang.UnsatisfiedLinkError: /tmp/libtiledbgenomicsdb7948125047406477072.so: /tmp/libtiledbgenomicsdb7948125047406477072.so: cannot open shared object file: No such file or directory (Possible cause: can't load AMD 64-bit .so on a AARCH64-bit platform)
            at java.lang.ClassLoader$NativeLibrary.load(Native Method)
            at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934)
            at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1817)
            at java.lang.Runtime.load0(Runtime.java:782)
            at java.lang.System.load(System.java:1100)
            at org.genomicsdb.GenomicsDBLibLoader.loadLibraryFromJar(GenomicsDBLibLoader.java:156)
            at org.genomicsdb.GenomicsDBLibLoader.loadLibrary(GenomicsDBLibLoader.java:55)
            at org.genomicsdb.GenomicsDBUtilsJni.<clinit>(GenomicsDBUtilsJni.java:31)
            ... 10 more

     

    1
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    That makes sense.  Anything that uses native code will not work unless they have included appropriate versions of the library.   We package x86 binaries for linux and mac but no arm builds yet.   I don't know if there is an arm compatible build of genomicsdb/tiledb.

    0
    Comment actions Permalink
  • Avatar
    Layne Sadler

    Soon ARM compatibility is going to become a bigger issue than supporting macOS.

    Windows computer manufacturers will be switching to Qualcomm ARM by default.

    AWS is offering ARM servers.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk