I have been running the GATK best practices workflow in Docker verion 4.1.7.0 and everything was working fine up to now.
BQSRPipelineSpark complained about a missing platform specifier. It seems that my BAM file is missing the PL tag and upon further investigation that seems to track back to the fastq file I got from the sequencing facility that only has a sequencer ID @A00275 in the defline no other "platform" information (adapters trimmed by them + ??)
Also, the same and only problem was detected by the ValidateSamFile.
@A00275:174:HGFL7DMXX:1:1101:14136:1016 1:N:0:CTGCTTCC+GATAGATC
CNAGCTTCCCTTGCTCTCTCCCAGCCCCGGCCGAGGCGGCCCTTACCTTGTGCAGCCAGTGCAGGTTCATCTGCTGCCCCACGGCAATGTGACATAGTGCC
+
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF
@A00275:174:HGFL7DMXX:1:1101:18873:1016 1:N:0:CTGCTTCC+GATAGATC
ANGAACCCCTATCCCCGGGTAACCCTGACTCACCGGTGCCATCTGTTGGGCAGCGCTGACACCGCGCGCCCCAGGCCTTGCCGACACTGCAGCAGCAGAGC
- Does the Platform really matter?
- Is there a workaround?
- There doesn't seem to be any command line switch or combination that will do it.
a) GATK version used GATK 4.1.7.0 Docker
b) Exact GATK commands used
./gatk BQSRPipelineSpark --input mydata/P50513/P50513_1N_bwa_bam_dedup.bam --known-sites mydata/refs/00-common_all.vcf --output mydata/P50513/P50513_1N_bwa_bam_dedup_bqsr.bam --reference mydata/refs/Homo_sapiens_assembly19.fasta
c) The entire error log if applicable.
./gatk BQSRPipelineSpark --input mydata/P50513/P50513_1N_bwa_bam_dedup.bam --known-sites mydata/refs/00-common_all.vcf --output mydata/P50513/P50513_1N_bwa_bam_dedup_bqsr.bam --reference mydata/refs/Homo_sapiens_assembly19.fasta
Using GATK jar /gatk/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.7.0-local.jar BQSRPipelineSpark --input mydata/P50513/P50513_1N_bwa_bam_dedup.bam --known-sites mydata/refs/00-common_all.vcf --output mydata/P50513/P50513_1N_bwa_bam_dedup_bqsr.bam --reference mydata/refs/Homo_sapiens_assembly19.fasta
21:01:12.392 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.7.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 27, 2020 9:01:12 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
21:01:12.592 INFO BQSRPipelineSpark - ------------------------------------------------------------
21:01:12.593 INFO BQSRPipelineSpark - The Genome Analysis Toolkit (GATK) v4.1.7.0
21:01:12.593 INFO BQSRPipelineSpark - Executing as root@9839209cb7e2 on Linux v4.19.76-linuxkit amd64
21:01:12.594 INFO BQSRPipelineSpark - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03
21:01:12.594 INFO BQSRPipelineSpark - Start Date/Time: April 27, 2020 9:01:12 PM UTC
21:01:12.594 INFO BQSRPipelineSpark - ------------------------------------------------------------
21:01:12.594 INFO BQSRPipelineSpark - ------------------------------------------------------------
21:01:12.595 INFO BQSRPipelineSpark - HTSJDK Version: 2.21.2
21:01:12.595 INFO BQSRPipelineSpark - Picard Version: 2.21.9
21:01:12.595 INFO BQSRPipelineSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 2
21:01:12.595 INFO BQSRPipelineSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:01:12.596 INFO BQSRPipelineSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:01:12.596 INFO BQSRPipelineSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:01:12.596 INFO BQSRPipelineSpark - Deflater: IntelDeflater
21:01:12.597 INFO BQSRPipelineSpark - Inflater: IntelInflater
21:01:12.597 INFO BQSRPipelineSpark - GCS max retries/reopens: 20
21:01:12.597 INFO BQSRPipelineSpark - Requester pays: disabled
21:01:12.597 WARN BQSRPipelineSpark -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: BQSRPipelineSpark is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
21:01:12.598 INFO BQSRPipelineSpark - Initializing engine
21:01:12.598 INFO BQSRPipelineSpark - Done initializing engine
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/27 21:01:12 INFO SparkContext: Running Spark version 2.4.3
21:01:13.021 WARN NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/04/27 21:01:13 INFO SparkContext: Submitted application: BQSRPipelineSpark
20/04/27 21:01:13 INFO SecurityManager: Changing view acls to: root
20/04/27 21:01:13 INFO SecurityManager: Changing modify acls to: root
20/04/27 21:01:13 INFO SecurityManager: Changing view acls groups to:
20/04/27 21:01:13 INFO SecurityManager: Changing modify acls groups to:
20/04/27 21:01:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
20/04/27 21:01:13 INFO Utils: Successfully started service 'sparkDriver' on port 39587.
20/04/27 21:01:13 INFO SparkEnv: Registering MapOutputTracker
20/04/27 21:01:13 INFO SparkEnv: Registering BlockManagerMaster
20/04/27 21:01:13 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/04/27 21:01:13 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/04/27 21:01:13 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7d4b56c1-b276-4f87-8351-4aeaed46a164
20/04/27 21:01:13 INFO MemoryStore: MemoryStore started with capacity 10.3 GB
20/04/27 21:01:13 INFO SparkEnv: Registering OutputCommitCoordinator
20/04/27 21:01:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/04/27 21:01:13 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://9839209cb7e2:4040
20/04/27 21:01:13 INFO Executor: Starting executor ID driver on host localhost
20/04/27 21:01:13 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36233.
20/04/27 21:01:13 INFO NettyBlockTransferService: Server created on 9839209cb7e2:36233
20/04/27 21:01:13 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/04/27 21:01:13 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 9839209cb7e2, 36233, None)
20/04/27 21:01:13 INFO BlockManagerMasterEndpoint: Registering block manager 9839209cb7e2:36233 with 10.3 GB RAM, BlockManagerId(driver, 9839209cb7e2, 36233, None)
20/04/27 21:01:13 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 9839209cb7e2, 36233, None)
20/04/27 21:01:13 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 9839209cb7e2, 36233, None)
21:01:13.816 INFO BQSRPipelineSpark - Spark verbosity set to INFO (see --spark-verbosity argument)
20/04/27 21:01:13 INFO GoogleHadoopFileSystemBase: GHFS version: 1.6.3-hadoop2
20/04/27 21:01:14 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 272.8 KB, free 10.3 GB)
20/04/27 21:01:14 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 254.6 KB, free 10.3 GB)
20/04/27 21:01:14 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 9839209cb7e2:36233 (size: 254.6 KB, free: 10.3 GB)
20/04/27 21:01:14 INFO SparkContext: Created broadcast 0 from broadcast at BamSource.java:104
20/04/27 21:01:14 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 239.4 KB, free 10.3 GB)
20/04/27 21:01:14 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 25.4 KB, free 10.3 GB)
20/04/27 21:01:14 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 9839209cb7e2:36233 (size: 25.4 KB, free: 10.3 GB)
20/04/27 21:01:14 INFO SparkContext: Created broadcast 1 from newAPIHadoopFile at PathSplitSource.java:96
20/04/27 21:01:15 INFO SparkContext: Added file file:///gatk/mydata/refs/Homo_sapiens_assembly19.fasta at file:///gatk/mydata/refs/Homo_sapiens_assembly19.fasta with timestamp 1588021275120
20/04/27 21:01:15 INFO Utils: Copying /gatk/mydata/refs/Homo_sapiens_assembly19.fasta to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/Homo_sapiens_assembly19.fasta
20/04/27 21:01:18 INFO SparkContext: Added file file:///gatk/mydata/refs/Homo_sapiens_assembly19.fasta.fai at file:///gatk/mydata/refs/Homo_sapiens_assembly19.fasta.fai with timestamp 1588021278092
20/04/27 21:01:18 INFO Utils: Copying /gatk/mydata/refs/Homo_sapiens_assembly19.fasta.fai to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/Homo_sapiens_assembly19.fasta.fai
20/04/27 21:01:18 INFO SparkContext: Added file file:///gatk/mydata/refs/Homo_sapiens_assembly19.dict at file:///gatk/mydata/refs/Homo_sapiens_assembly19.dict with timestamp 1588021278106
20/04/27 21:01:18 INFO Utils: Copying /gatk/mydata/refs/Homo_sapiens_assembly19.dict to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/Homo_sapiens_assembly19.dict
20/04/27 21:01:18 INFO SparkContext: Added file mydata/refs/00-common_all.vcf at file:/gatk/mydata/refs/00-common_all.vcf with timestamp 1588021278113
20/04/27 21:01:18 INFO Utils: Copying /gatk/mydata/refs/00-common_all.vcf to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/00-common_all.vcf
20/04/27 21:01:28 INFO SparkContext: Added file mydata/refs/00-common_all.vcf.idx at file:/gatk/mydata/refs/00-common_all.vcf.idx with timestamp 1588021288160
20/04/27 21:01:28 INFO Utils: Copying /gatk/mydata/refs/00-common_all.vcf.idx to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/00-common_all.vcf.idx
20/04/27 21:01:28 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 272.8 KB, free 10.3 GB)
20/04/27 21:01:28 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 254.6 KB, free 10.3 GB)
20/04/27 21:01:28 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 9839209cb7e2:36233 (size: 254.6 KB, free: 10.3 GB)
20/04/27 21:01:28 INFO SparkContext: Created broadcast 2 from broadcast at BamSource.java:104
20/04/27 21:01:28 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 239.4 KB, free 10.3 GB)
20/04/27 21:01:28 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 25.4 KB, free 10.3 GB)
20/04/27 21:01:28 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 9839209cb7e2:36233 (size: 25.4 KB, free: 10.3 GB)
20/04/27 21:01:28 INFO SparkContext: Created broadcast 3 from newAPIHadoopFile at PathSplitSource.java:96
21:01:28.351 INFO FileInputFormat - Total input files to process : 1
20/04/27 21:01:28 INFO SparkContext: Starting job: treeAggregate at BaseRecalibratorSparkFn.java:38
20/04/27 21:01:28 INFO DAGScheduler: Registering RDD 15 (treeAggregate at BaseRecalibratorSparkFn.java:38)
20/04/27 21:01:28 INFO DAGScheduler: Registering RDD 18 (treeAggregate at BaseRecalibratorSparkFn.java:38)
20/04/27 21:01:28 INFO DAGScheduler: Registering RDD 21 (treeAggregate at BaseRecalibratorSparkFn.java:38)
20/04/27 21:01:28 INFO DAGScheduler: Registering RDD 24 (treeAggregate at BaseRecalibratorSparkFn.java:38)
20/04/27 21:01:28 INFO DAGScheduler: Got job 0 (treeAggregate at BaseRecalibratorSparkFn.java:38) with 2 output partitions
20/04/27 21:01:28 INFO DAGScheduler: Final stage: ResultStage 4 (treeAggregate at BaseRecalibratorSparkFn.java:38)
20/04/27 21:01:28 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
20/04/27 21:01:28 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 3)
20/04/27 21:01:28 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[15] at treeAggregate at BaseRecalibratorSparkFn.java:38), which has no missing parents
20/04/27 21:01:28 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 99.1 KB, free 10.3 GB)
20/04/27 21:01:28 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 42.2 KB, free 10.3 GB)
20/04/27 21:01:28 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 9839209cb7e2:36233 (size: 42.2 KB, free: 10.3 GB)
20/04/27 21:01:28 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1161
20/04/27 21:01:28 INFO DAGScheduler: Submitting 237 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[15] at treeAggregate at BaseRecalibratorSparkFn.java:38) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
20/04/27 21:01:28 INFO TaskSchedulerImpl: Adding task set 0.0 with 237 tasks
20/04/27 21:01:28 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7943 bytes)
20/04/27 21:01:28 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7943 bytes)
20/04/27 21:01:28 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/04/27 21:01:28 INFO Executor: Fetching file:/gatk/mydata/refs/00-common_all.vcf.idx with timestamp 1588021288160
20/04/27 21:01:28 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
20/04/27 21:01:28 INFO Utils: /gatk/mydata/refs/00-common_all.vcf.idx has been previously copied to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/00-common_all.vcf.idx
20/04/27 21:01:28 INFO Executor: Fetching file:///gatk/mydata/refs/Homo_sapiens_assembly19.fasta with timestamp 1588021275120
20/04/27 21:01:32 INFO Utils: /gatk/mydata/refs/Homo_sapiens_assembly19.fasta has been previously copied to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/Homo_sapiens_assembly19.fasta
20/04/27 21:01:32 INFO Executor: Fetching file:/gatk/mydata/refs/00-common_all.vcf with timestamp 1588021278113
20/04/27 21:01:41 INFO Utils: /gatk/mydata/refs/00-common_all.vcf has been previously copied to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/00-common_all.vcf
20/04/27 21:01:41 INFO Executor: Fetching file:///gatk/mydata/refs/Homo_sapiens_assembly19.fasta.fai with timestamp 1588021278092
20/04/27 21:01:41 INFO Utils: /gatk/mydata/refs/Homo_sapiens_assembly19.fasta.fai has been previously copied to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/Homo_sapiens_assembly19.fasta.fai
20/04/27 21:01:41 INFO Executor: Fetching file:///gatk/mydata/refs/Homo_sapiens_assembly19.dict with timestamp 1588021278106
20/04/27 21:01:41 INFO Utils: /gatk/mydata/refs/Homo_sapiens_assembly19.dict has been previously copied to /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/Homo_sapiens_assembly19.dict
20/04/27 21:01:41 INFO NewHadoopRDD: Input split: file:/gatk/mydata/P50513/P50513_1N_bwa_bam_dedup.bam:33554432+33554432
20/04/27 21:01:41 INFO NewHadoopRDD: Input split: file:/gatk/mydata/P50513/P50513_1N_bwa_bam_dedup.bam:0+33554432
21:01:42.008 INFO FeatureManager - Using codec VCFCodec to read file file:///tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/00-common_all.vcf
21:01:42.008 INFO FeatureManager - Using codec VCFCodec to read file file:///tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/00-common_all.vcf
21:01:42.165 INFO BaseRecalibrationEngine - The covariates being used here:
21:01:42.165 INFO BaseRecalibrationEngine - ReadGroupCovariate
21:01:42.166 INFO BaseRecalibrationEngine - QualityScoreCovariate
21:01:42.166 INFO BaseRecalibrationEngine - ContextCovariate
21:01:42.166 INFO BaseRecalibrationEngine - CycleCovariate
21:01:42.186 INFO BaseRecalibrationEngine - The covariates being used here:
21:01:42.186 INFO BaseRecalibrationEngine - ReadGroupCovariate
21:01:42.186 INFO BaseRecalibrationEngine - QualityScoreCovariate
21:01:42.188 INFO BaseRecalibrationEngine - ContextCovariate
21:01:42.195 INFO BaseRecalibrationEngine - CycleCovariate
20/04/27 21:01:42 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.broadinstitute.hellbender.exceptions.UserException$MalformedRead: Read A00275:174:HGFL7DMXX:1:1461:19117:11334 1:10024-10102 is malformed: The input .bam file contains reads with no platform information. First observed at read with name = A00275:174:HGFL7DMXX:1:1461:19117:11334
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.parsePlatformForRead(RecalUtils.java:506)
at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:124)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$null$0(BaseRecalibratorSparkFn.java:33)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.utils.iterators.CloseAtEndIterator.forEachRemaining(CloseAtEndIterator.java:47)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$apply$6ed74b3e$1(BaseRecalibratorSparkFn.java:33)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/04/27 21:01:42 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
org.broadinstitute.hellbender.exceptions.UserException$MalformedRead: Read A00275:174:HGFL7DMXX:2:1219:22200:3176 1:6585887-6585987 is malformed: The input .bam file contains reads with no platform information. First observed at read with name = A00275:174:HGFL7DMXX:2:1219:22200:3176
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.parsePlatformForRead(RecalUtils.java:506)
at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:124)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$null$0(BaseRecalibratorSparkFn.java:33)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.utils.iterators.CloseAtEndIterator.forEachRemaining(CloseAtEndIterator.java:47)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$apply$6ed74b3e$1(BaseRecalibratorSparkFn.java:33)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/04/27 21:01:42 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7943 bytes)
20/04/27 21:01:42 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
20/04/27 21:01:42 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7943 bytes)
20/04/27 21:01:42 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
20/04/27 21:01:42 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.broadinstitute.hellbender.exceptions.UserException$MalformedRead: Read A00275:174:HGFL7DMXX:1:1461:19117:11334 1:10024-10102 is malformed: The input .bam file contains reads with no platform information. First observed at read with name = A00275:174:HGFL7DMXX:1:1461:19117:11334
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.parsePlatformForRead(RecalUtils.java:506)
at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:124)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$null$0(BaseRecalibratorSparkFn.java:33)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.utils.iterators.CloseAtEndIterator.forEachRemaining(CloseAtEndIterator.java:47)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$apply$6ed74b3e$1(BaseRecalibratorSparkFn.java:33)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/04/27 21:01:42 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
20/04/27 21:01:42 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): org.broadinstitute.hellbender.exceptions.UserException$MalformedRead: Read A00275:174:HGFL7DMXX:2:1219:22200:3176 1:6585887-6585987 is malformed: The input .bam file contains reads with no platform information. First observed at read with name = A00275:174:HGFL7DMXX:2:1219:22200:3176
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.parsePlatformForRead(RecalUtils.java:506)
at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:124)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$null$0(BaseRecalibratorSparkFn.java:33)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.utils.iterators.CloseAtEndIterator.forEachRemaining(CloseAtEndIterator.java:47)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$apply$6ed74b3e$1(BaseRecalibratorSparkFn.java:33)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/04/27 21:01:42 INFO TaskSchedulerImpl: Cancelling stage 0
20/04/27 21:01:42 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
20/04/27 21:01:42 INFO Executor: Executor is trying to kill task 2.0 in stage 0.0 (TID 2), reason: Stage cancelled
20/04/27 21:01:42 INFO Executor: Executor is trying to kill task 3.0 in stage 0.0 (TID 3), reason: Stage cancelled
20/04/27 21:01:42 INFO TaskSchedulerImpl: Stage 0 was cancelled
20/04/27 21:01:42 INFO DAGScheduler: ShuffleMapStage 0 (treeAggregate at BaseRecalibratorSparkFn.java:38) failed in 13.545 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.broadinstitute.hellbender.exceptions.UserException$MalformedRead: Read A00275:174:HGFL7DMXX:1:1461:19117:11334 1:10024-10102 is malformed: The input .bam file contains reads with no platform information. First observed at read with name = A00275:174:HGFL7DMXX:1:1461:19117:11334
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.parsePlatformForRead(RecalUtils.java:506)
at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:124)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$null$0(BaseRecalibratorSparkFn.java:33)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.utils.iterators.CloseAtEndIterator.forEachRemaining(CloseAtEndIterator.java:47)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$apply$6ed74b3e$1(BaseRecalibratorSparkFn.java:33)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
20/04/27 21:01:42 INFO DAGScheduler: Job 0 failed: treeAggregate at BaseRecalibratorSparkFn.java:38, took 13.687123 s
20/04/27 21:01:42 INFO NewHadoopRDD: Input split: file:/gatk/mydata/P50513/P50513_1N_bwa_bam_dedup.bam:100663296+33554432
20/04/27 21:01:42 INFO SparkUI: Stopped Spark web UI at http://9839209cb7e2:4040
20/04/27 21:01:42 INFO NewHadoopRDD: Input split: file:/gatk/mydata/P50513/P50513_1N_bwa_bam_dedup.bam:67108864+33554432
21:01:42.393 INFO FeatureManager - Using codec VCFCodec to read file file:///tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/00-common_all.vcf
21:01:42.396 INFO FeatureManager - Using codec VCFCodec to read file file:///tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1/userFiles-9c98ae67-732b-4176-9b50-5cde0a35ba0f/00-common_all.vcf
20/04/27 21:01:42 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/04/27 21:01:42 INFO MemoryStore: MemoryStore cleared
20/04/27 21:01:42 INFO BlockManager: BlockManager stopped
20/04/27 21:01:42 INFO BlockManagerMaster: BlockManagerMaster stopped
20/04/27 21:01:42 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/04/27 21:01:42 INFO SparkContext: Successfully stopped SparkContext
21:01:42.512 INFO BQSRPipelineSpark - Shutting down engine
[April 27, 2020 9:01:42 PM UTC] org.broadinstitute.hellbender.tools.spark.pipelines.BQSRPipelineSpark done. Elapsed time: 0.50 minutes.
Runtime.totalMemory()=1968177152
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.broadinstitute.hellbender.exceptions.UserException$MalformedRead: Read A00275:174:HGFL7DMXX:1:1461:19117:11334 1:10024-10102 is malformed: The input .bam file contains reads with no platform information. First observed at read with name = A00275:174:HGFL7DMXX:1:1461:19117:11334
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.parsePlatformForRead(RecalUtils.java:506)
at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:124)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$null$0(BaseRecalibratorSparkFn.java:33)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.utils.iterators.CloseAtEndIterator.forEachRemaining(CloseAtEndIterator.java:47)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$apply$6ed74b3e$1(BaseRecalibratorSparkFn.java:33)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
at org.apache.spark.rdd.RDD$$anonfun$fold$1.apply(RDD.scala:1098)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.fold(RDD.scala:1092)
at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1161)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1137)
at org.apache.spark.api.java.JavaRDDLike$class.treeAggregate(JavaRDDLike.scala:439)
at org.apache.spark.api.java.AbstractJavaRDDLike.treeAggregate(JavaRDDLike.scala:45)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.apply(BaseRecalibratorSparkFn.java:38)
at org.broadinstitute.hellbender.tools.spark.pipelines.BQSRPipelineSpark.runTool(BQSRPipelineSpark.java:116)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:541)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
Caused by: org.broadinstitute.hellbender.exceptions.UserException$MalformedRead: Read A00275:174:HGFL7DMXX:1:1461:19117:11334 1:10024-10102 is malformed: The input .bam file contains reads with no platform information. First observed at read with name = A00275:174:HGFL7DMXX:1:1461:19117:11334
at org.broadinstitute.hellbender.utils.recalibration.RecalUtils.parsePlatformForRead(RecalUtils.java:506)
at org.broadinstitute.hellbender.utils.recalibration.BaseRecalibrationEngine.processRead(BaseRecalibrationEngine.java:124)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$null$0(BaseRecalibratorSparkFn.java:33)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.broadinstitute.hellbender.utils.iterators.CloseAtEndIterator.forEachRemaining(CloseAtEndIterator.java:47)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.broadinstitute.hellbender.tools.spark.transforms.BaseRecalibratorSparkFn.lambda$apply$6ed74b3e$1(BaseRecalibratorSparkFn.java:33)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:153)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/04/27 21:01:42 INFO ShutdownHookManager: Shutdown hook called
20/04/27 21:01:42 INFO ShutdownHookManager: Deleting directory /tmp/spark-3a32b8a7-9ed1-44ed-b9a4-c9cd128407d1
1 comment