BWASpark error while using local cluster
Hi,
I'm using gatk version: v4.6.0.0-5-g59c9c1b-SNAPSHOT
I'm trying to validate the performance of GATK BwaSpark (if possible try to speed up the run time compared to bwa mem). I have created a local Apache Spark cluster and trying to run gatk in it, but encountered the following error. Please guide me on how to resolve this.
Command Used:
time ./gatk BwaSpark --spark-verbosity ERROR --spark-runner SPARK --spark-master spark://ubuntu-01:1234 --bam-partition-size 4000000 -I /home/username/data/bwa_mem_spark_comparison/final/bwa_spark/sample_file.bam -O /home/username/data/bwa_mem_spark_comparison/final/bwa_spark/sample_file_aligned_july_17_2024.bam -R /home/username/data/reference_genome/human-genome/bwa/human_g1k_v37_decoy.fasta 2> /home/username/data/bwa_mem_spark_comparison/final/bwa_spark/sample_file_july_17_2024.bwalog
Error Encountered:
Using GATK jar /home/username/softwares/gatk/build/libs/gatk-package-4.6.0.0-5-g59c9c1b-SNAPSHOT-spark.jarRunning:/opt/spark/bin/spark-submit --master spark://ubuntu-05:7077 --conf spark.kryoserializer.buffer.max=512m --conf spark.driver.maxResultSize=0 --conf spark.driver.userClassPathFirst=false --conf spark.io.compression.codec=lzf --conf spark.executor.memoryOverhead=600 --conf spark.driver.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 --conf spark.executor.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 /home/username/softwares/gatk/build/libs/gatk-package-4.6.0.0-5-g59c9c1b-SNAPSHOT-spark.jar BwaSpark --spark-verbosity ERROR --spark-master spark://ubuntu-05:7077 --bam-partition-size 4000000 -I /home/username/data/bwa_mem_spark_comparison/final/bwa_spark/sample_file.bam -O /home/username/data/bwa_mem_spark_comparison/final/bwa_spark/sample_file_aligned_july_17_2024.bam -R /home/username/data/reference_genome/human-genome/bwa/human_g1k_v37_decoy.fasta24/07/17 14:05:40 WARN Utils: Your hostname, ubuntu-05 resolves to a loopback address: 127.0.1.1; using 172.17.117.114 instead (on interface wlp2s0)24/07/17 14:05:40 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address24/07/17 14:05:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicablelog4j:WARN No appenders could be found for logger (com.intel.gkl.NativeLibraryLoader).log4j:WARN Please initialize the log4j system properly.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.14:05:44.010 INFO BwaSpark - ------------------------------------------------------------14:05:44.020 INFO BwaSpark - The Genome Analysis Toolkit (GATK) v4.6.0.0-5-g59c9c1b-SNAPSHOT14:05:44.020 INFO BwaSpark - For support and documentation go to https://software.broadinstitute.org/gatk/14:05:44.020 INFO BwaSpark - Executing as username@ubuntu-05 on Linux v5.15.0-113-generic amd6414:05:44.020 INFO BwaSpark - Java runtime: OpenJDK 64-Bit Server VM v17.0.11+9-Ubuntu-120.04.214:05:44.020 INFO BwaSpark - Start Date/Time: July 17, 2024 at 2:05:43 PM CEST14:05:44.020 INFO BwaSpark - ------------------------------------------------------------14:05:44.021 INFO BwaSpark - ------------------------------------------------------------14:05:44.021 INFO BwaSpark - HTSJDK Version: 4.1.114:05:44.022 INFO BwaSpark - Picard Version: 3.2.014:05:44.022 INFO BwaSpark - Built for Spark Version: 3.5.014:05:44.022 INFO BwaSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 214:05:44.022 INFO BwaSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false14:05:44.022 INFO BwaSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false14:05:44.023 INFO BwaSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false14:05:44.023 INFO BwaSpark - Deflater: IntelDeflater14:05:44.023 INFO BwaSpark - Inflater: IntelInflater14:05:44.023 INFO BwaSpark - GCS max retries/reopens: 2014:05:44.023 INFO BwaSpark - Requester pays: disabled14:05:44.023 WARN BwaSpark -[1m[31m !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Warning: BwaSpark is a BETA tool and is not yet ready for use in production!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!![0m14:05:44.024 INFO BwaSpark - Initializing engine14:05:44.024 INFO BwaSpark - Done initializing engineUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties24/07/17 14:05:44 INFO SparkContext: Running Spark version 3.2.024/07/17 14:05:44 INFO ResourceUtils: ==============================================================24/07/17 14:05:44 INFO ResourceUtils: No custom resources configured for spark.driver.24/07/17 14:05:44 INFO ResourceUtils: ==============================================================24/07/17 14:05:44 INFO SparkContext: Submitted application: BwaSpark24/07/17 14:05:44 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(memoryOverhead -> name: memoryOverhead, amount: 600, script: , vendor: , cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)24/07/17 14:05:44 INFO ResourceProfile: Limiting resource is cpu24/07/17 14:05:44 INFO ResourceProfileManager: Added ResourceProfile id: 024/07/17 14:05:44 INFO SecurityManager: Changing view acls to: username24/07/17 14:05:44 INFO SecurityManager: Changing modify acls to: username24/07/17 14:05:44 INFO SecurityManager: Changing view acls groups to:24/07/17 14:05:44 INFO SecurityManager: Changing modify acls groups to:24/07/17 14:05:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(username); groups with view permissions: Set(); users with modify permissions: Set(username); groups with modify permissions: Set()24/07/17 14:05:44 INFO Utils: Successfully started service 'sparkDriver' on port 36945.24/07/17 14:05:44 INFO SparkEnv: Registering MapOutputTracker24/07/17 14:05:44 INFO SparkEnv: Registering BlockManagerMaster24/07/17 14:05:44 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information24/07/17 14:05:44 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up14:05:44.548 INFO BwaSpark - Shutting down engine[July 17, 2024 at 2:05:44 PM CEST] org.broadinstitute.hellbender.tools.spark.bwa.BwaSpark done. Elapsed time: 0.01 minutes.Runtime.totalMemory()=143654912Exception in thread "main" java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x707ba651) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x707ba651at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.createSparkContext(SparkContextFactory.java:185)at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.getSparkContext(SparkContextFactory.java:117)at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:28)at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)at org.broadinstitute.hellbender.Main.main(Main.java:306)at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.base/java.lang.reflect.Method.invoke(Method.java:568)at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)24/07/17 14:05:44 INFO ShutdownHookManager: Shutdown hook called24/07/17 14:05:44 INFO ShutdownHookManager: Deleting directory /tmp/spark-e46f9b06-818d-4814-a7f1-91fa14867302
-
This is not the intended use of this tool yet it is not possible to run it faster than the native bwa mem implementation using apache spark. This tool was intended to help other tools that need Bwa mem operations inside their execution.
Faster BWA MEM alternatives are BWA-MEM2 (AVX2+) and Parabricks v4.0 (CUDA/GPU) or later.
https://github.com/bwa-mem2/bwa-mem2
https://docs.nvidia.com/clara/parabricks/latest/index.html
Regards.
Please sign in to leave a comment.
1 comment