Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicatesSpark crash

1

9 comments

  • Avatar
    David Gómez-Sánchez

    Sorry, exact command was: gatk MarkDuplicatesSpark -I 1.bam -O MD_Spark_1.bam

     
    1
    Comment actions Permalink
  • Avatar
    Derek Caetano-Anolles

    The issue doesn't seem to be originating from MarkDuplicatesSpark; it's coming from Java. You're trying to load a Java class file that is unsupported in the version of Java that you are running.

    If I had to guess I'd say that the version of JDK you are running is older than the version MarkDuplicatesSpark was designed to run under.

    Can you please update your dependencies and try again?

    1
    Comment actions Permalink
  • Avatar
    David Gómez-Sánchez

    Hi Derek. 

    When typing "java --version" on the console I get this:


    openjdk 11.0.5 2019-10-15
    OpenJDK Runtime Environment (build 11.0.5+10-post-Ubuntu-0ubuntu1.118.04)
    OpenJDK 64-Bit Server VM (build 11.0.5+10-post-Ubuntu-0ubuntu1.118.04, mixed mode, sharing)

     

    Which java version do I need to install?

    Thanks,
    David.

    1
    Comment actions Permalink
  • Avatar
    Derek Caetano-Anolles

    Thank you, David. So the issue is definitely related to your version of Java, it's not that your Java is too old, but that it is too new.

    You're running OpenJDK 11, but you're going to need to try running it under OpenJDK 8 instead. Try installing this version and you should not have the error anymore.

    3
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    We have beta support for running with java 11 but you have to build gatk yourself with some special flags enabled.  If you're interested I can try to walk you through it. 

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Hi there, 

     

    I am also running MarkDuplicateSpark using gatk.4.1.9.0 on a Linux-Ubuntu platform but I seemed to get this error which is similar but probably not the same:

    command: ./gatk MarkDuplicatesSpark -I Kurumi.fix.bam -M Kurumidedupmetrics.txt -O Kurumi_sorted_dedup_reads.bam


    Using GATK jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar MarkDuplicatesSpark -I Kurumi.fix.bam -M Kurumidedupmetrics.txt -O Kurumi_sorted_dedup_reads.bam
    18:17:11.796 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Dec 07, 2020 6:17:11 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    18:17:11.936 INFO MarkDuplicatesSpark - ------------------------------------------------------------
    18:17:11.936 INFO MarkDuplicatesSpark - The Genome Analysis Toolkit (GATK) v4.1.9.0
    18:17:11.936 INFO MarkDuplicatesSpark - For support and documentation go to https://software.broadinstitute.org/gatk/
    18:17:11.936 INFO MarkDuplicatesSpark - Executing as naika@DESKTOP-TMKR2LT on Linux v4.19.128-microsoft-standard amd64
    18:17:11.936 INFO MarkDuplicatesSpark - Java runtime: OpenJDK 64-Bit Server VM v11.0.8+10-post-Ubuntu-0ubuntu120.04
    18:17:11.936 INFO MarkDuplicatesSpark - Start Date/Time: December 7, 2020 at 6:17:11 PM JST
    18:17:11.936 INFO MarkDuplicatesSpark - ------------------------------------------------------------
    18:17:11.936 INFO MarkDuplicatesSpark - ------------------------------------------------------------
    18:17:11.937 INFO MarkDuplicatesSpark - HTSJDK Version: 2.23.0
    18:17:11.937 INFO MarkDuplicatesSpark - Picard Version: 2.23.3
    18:17:11.937 INFO MarkDuplicatesSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    18:17:11.937 INFO MarkDuplicatesSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    18:17:11.937 INFO MarkDuplicatesSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    18:17:11.937 INFO MarkDuplicatesSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    18:17:11.937 INFO MarkDuplicatesSpark - Deflater: IntelDeflater
    18:17:11.937 INFO MarkDuplicatesSpark - Inflater: IntelInflater
    18:17:11.938 INFO MarkDuplicatesSpark - GCS max retries/reopens: 20
    18:17:11.938 INFO MarkDuplicatesSpark - Requester pays: disabled
    18:17:11.938 INFO MarkDuplicatesSpark - Initializing engine
    18:17:11.938 INFO MarkDuplicatesSpark - Done initializing engine
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    20/12/07 18:17:12 WARN Utils: Your hostname, DESKTOP-TMKR2LT resolves to a loopback address: 127.0.1.1; using 172.17.107.156 instead (on interface eth0)
    20/12/07 18:17:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/mnt/d/Docker/WGS/gatk4/gatk-package-4.1.9.0-local.jar) to method java.nio.Bits.unaligned()
    WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    20/12/07 18:17:13 INFO SparkContext: Running Spark version 2.4.5
    20/12/07 18:17:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    20/12/07 18:17:13 INFO SparkContext: Submitted application: MarkDuplicatesSpark
    20/12/07 18:17:13 INFO SecurityManager: Changing view acls to: naika
    20/12/07 18:17:13 INFO SecurityManager: Changing modify acls to: naika
    20/12/07 18:17:13 INFO SecurityManager: Changing view acls groups to:
    20/12/07 18:17:13 INFO SecurityManager: Changing modify acls groups to:
    20/12/07 18:17:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(naika); groups with view permissions: Set(); users with modify permissions: Set(naika); groups with modify permissions: Set()
    20/12/07 18:17:13 INFO Utils: Successfully started service 'sparkDriver' on port 39739.
    20/12/07 18:17:13 INFO SparkEnv: Registering MapOutputTracker
    20/12/07 18:17:13 INFO SparkEnv: Registering BlockManagerMaster
    20/12/07 18:17:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    20/12/07 18:17:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    20/12/07 18:17:13 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7a2f7933-f98f-4c80-8e46-7a9a089fca2b
    20/12/07 18:17:13 INFO MemoryStore: MemoryStore started with capacity 7.3 GB
    20/12/07 18:17:13 INFO SparkEnv: Registering OutputCommitCoordinator
    20/12/07 18:17:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    20/12/07 18:17:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.107.156:4040
    20/12/07 18:17:13 INFO Executor: Starting executor ID driver on host localhost
    20/12/07 18:17:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43453.
    20/12/07 18:17:14 INFO NettyBlockTransferService: Server created on 172.17.107.156:43453
    20/12/07 18:17:14 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    20/12/07 18:17:14 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.107.156, 43453, None)
    20/12/07 18:17:14 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.107.156:43453 with 7.3 GB RAM, BlockManagerId(driver, 172.17.107.156, 43453, None)
    20/12/07 18:17:14 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.107.156, 43453, None)
    20/12/07 18:17:14 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.17.107.156, 43453, None)
    18:17:14.168 INFO MarkDuplicatesSpark - Spark verbosity set to INFO (see --spark-verbosity argument)
    20/12/07 18:17:14 INFO GoogleHadoopFileSystemBase: GHFS version: 1.9.4-hadoop3
    20/12/07 18:17:14 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 172.9 KB, free 7.3 GB)
    20/12/07 18:17:14 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 35.4 KB, free 7.3 GB)
    20/12/07 18:17:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.107.156:43453 (size: 35.4 KB, free: 7.3 GB)
    20/12/07 18:17:15 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at PathSplitSource.java:96
    20/12/07 18:17:15 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 172.9 KB, free 7.3 GB)
    20/12/07 18:17:15 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 172.17.107.156:43453 in memory (size: 35.4 KB, free: 7.3 GB)
    20/12/07 18:17:15 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 35.4 KB, free 7.3 GB)
    20/12/07 18:17:15 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.107.156:43453 (size: 35.4 KB, free: 7.3 GB)
    20/12/07 18:17:15 INFO SparkContext: Created broadcast 1 from newAPIHadoopFile at PathSplitSource.java:96
    20/12/07 18:17:15 INFO FileInputFormat: Total input files to process : 1
    20/12/07 18:17:15 INFO SparkUI: Stopped Spark web UI at http://172.17.107.156:4040
    20/12/07 18:17:15 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    20/12/07 18:17:15 INFO MemoryStore: MemoryStore cleared
    20/12/07 18:17:15 INFO BlockManager: BlockManager stopped
    20/12/07 18:17:15 INFO BlockManagerMaster: BlockManagerMaster stopped
    20/12/07 18:17:15 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    20/12/07 18:17:15 INFO SparkContext: Successfully stopped SparkContext
    18:17:15.391 INFO MarkDuplicatesSpark - Shutting down engine
    [December 7, 2020 at 6:17:15 PM JST] org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=843055104
    java.lang.IllegalArgumentException: Unsupported class file major version 55
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
    at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:237)
    at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
    at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175)
    at org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238)
    at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631)
    at org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:307)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:306)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:306)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2326)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2100)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:990)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:989)
    at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:309)
    at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:171)
    at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:151)
    at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
    at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
    at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
    at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:936)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.sortUsingElementsAsKeys(SparkUtils.java:165)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.sortReadsAccordingToHeader(SparkUtils.java:143)
    at org.broadinstitute.hellbender.utils.spark.SparkUtils.querynameSortReadsIfNecessary(SparkUtils.java:306)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.mark(MarkDuplicatesSpark.java:206)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.mark(MarkDuplicatesSpark.java:270)
    at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDuplicatesSpark.java:351)
    at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:546)
    at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    20/12/07 18:17:15 INFO ShutdownHookManager: Shutdown hook called
    20/12/07 18:17:15 INFO ShutdownHookManager: Deleting directory /tmp/spark-3a03e0ca-4832-4244-a083-25d03941a2d1

    I thought it was the java version, for which I am using build 11.0.8, but I tried installing the jdk version where I am using javac 1.8.0_275.

    I have absolutely no idea what exactly is the problem here, it would be great if you can point me towards the right direction.

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn Teoh, it looks like the same problem, and the solution is to use OpenJDK 8. Have you tried this yet? Please check if it was successful with java --version.

    0
    Comment actions Permalink
  • Avatar
    Robinn Teoh

    Dear Genevieve Brandt ,

    I have tried installing openJDK8 using :

    sudo apt-get install openjdk-8-jdk

    and I got:

    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    openjdk-8-jdk is already the newest version (8u275-b01-0ubuntu1~20.04).
    0 upgraded, 0 newly installed, 0 to remove and 131 not upgraded.

    but when I checked for the version using :

    java --version

    I got:

    openjdk 11.0.8 2020-07-14
    OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
    OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)

    I was working on the top until I forgot that I needed to change the java version via

    sudo update-alternatives --config java

    which when I did, the gatk ran well.Thanks so much for the ringer!

    On the other hand, I have another problem with making the path to ./gatk though. Should I be commenting on another thread for this problem instead?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Robinn Teoh, thanks for the explanation and I am glad you solved this issue! I am sure it will help GATK users in the future.

    For the gatk path question, yes, please make a new post if you don't see one related.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk