MarkDuplicatesSpark only supports singleton fragments and pairs. We found the following group with >2 primary reads
REQUIRED for all errors and issues:
a) GATK version used: gatk4-4.3.0.0-0
b) Exact command used:
gatk MarkDuplicatesSpark -I sample3_CNVP.sorted.bam -O sample3_CNVP.dedup.bam -M sample3_CNVP_markdup_metrics.txt --create-output-bam-index true --optical-duplicate-pixel-distance 2500 --tmp-dir .
c) Entire program log:
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:990)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.RDD.collect(RDD.scala:989)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$collectAsMap$1.apply(PairRDDFunctions.scala:743)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$collectAsMap$1.apply(PairRDDFunctions.scala:742)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
at org.apache.spark.rdd.PairRDDFunctions.collectAsMap(PairRDDFunctions.scala:742)
at org.apache.spark.api.java.JavaPairRDD.collectAsMap(JavaPairRDD.scala:661)
at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.saveMetricsRDD(MarkDuplicatesSparkUtils.java:542)
at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDuplicatesSpark.java:360)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:546)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Caused by: org.broadinstitute.hellbender.exceptions.UserException$UnimplementedFeature: MarkDuplicatesSpark only supports singleton fragments and pairs. We found the following group with >2 primary reads: ( 4 number of reads).
indexpair[0,chrM-1 chrM:11291-11390]
indexpair[0,chrM-1 chrM:14771-14870]
indexpair[0,chrM-1 chrM:11188-11287]
indexpair[0,chrM-1 chrM:14855-14954].
at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSparkUtils.lambda$transformToDuplicateNames$5ac2632f$1(MarkDuplicatesSparkUtils.java:160)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$3$1.apply(JavaRDDLike.scala:143)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
23/03/02 11:04:56 INFO ShutdownHookManager: Shutdown hook called
23/03/02 11:04:56 INFO ShutdownHookManager: Deleting directory spark-3cfda755-661b-4eb1-a1ee-831a868acd9f
-
Hello rohit satyam, it looks like you have a problem with the bamfile input to MarkDuplicateSpark. Specifically it looks like there is a readgroup with >2 primary reads (somewhere on ChrM) you can see the error message here:
Caused by: org.broadinstitute.hellbender.exceptions.UserException$UnimplementedFeature: MarkDuplicatesSpark only supports singleton fragments and pairs. We found the following group with >2 primary reads: ( 4 number of reads).
I would recommend running a validation tool like Picard ValidateSamFile on your input sample3_CNVP.sorted.bam to make sure there are no problems with your input data. You should expect to see some errors related to some read groups having multiple mates.
What sequencing technology are you using? This kind of error could happen because there was an error in processing your input pre-MarkDuplicates that could result in samflags being invalid for a chimeric readgroup (which should be handled by MarkDuplicates). Alternatively, you could have duplicated readnames in your input bam which is going to cause problems for MarkDuplicates. ValidateSamFile should tell you what the erroneous reads and it should be worth investigating what caused your readnames to be like this.
Please sign in to leave a comment.
1 comment