No Pileup Tables
AnsweredHi,
I've been having trouble running the Mutect2-GATK pipeline on terra. I've been getting the error that the tumor and normal pileup table isn't being created. Originally, I only inputted the required tumor files and the reference files plus the normal bam files. But after receiving this error, I guessed that this was linked to the contamination step. So, based on the example inputs from Mutect2-GATK pipeline on terra, I put in this vcf file for the variant for contamination input (gs://gatk-best-practices/somatic-hg38/small_exac_common_3.hg38.vcf.gz) and still got the same error. I also unzipped this file but to no avail.
REQUIRED for all errors and issues:
a) GATK version used: Current Terra version
b) Exact command used: 2-Mutect2-GATK4 (4.1.8.1)
c) Entire program log:
2022/05/24 23:29:01 Starting container setup. 2022/05/24 23:29:03 Done container setup. 2022/05/24 23:29:07 Starting localization. 2022/05/24 23:29:28 Localization script execution started... 2022/05/24 23:29:28 Localizing input gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc/Mutect2/d545c369-6733-421f-9a97-c56e2945d5c6/call-M2/shard-49/attempt-6/script -> /cromwell_root/script 2022/05/24 23:29:33 Localization script execution complete. 2022/05/24 23:29:41 Done localization. 2022/05/24 23:29:42 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash broadinstitute/gatk@sha256:21c3cb43b7d11891ed4b63cc7274f20187f00387cfaa0433b3e7991b5be34dbe /cromwell_root/script Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.fb569f50 23:29:52.572 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so 23:29:53.506 INFO GetSampleName - ------------------------------------------------------------ 23:29:53.509 INFO GetSampleName - The Genome Analysis Toolkit (GATK) v4.2.6.1 23:29:53.509 INFO GetSampleName - For support and documentation go to https://software.broadinstitute.org/gatk/ 23:29:53.510 INFO GetSampleName - Executing as root@46c00700357d on Linux v5.10.107+ amd64 23:29:53.510 INFO GetSampleName - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 23:29:53.511 INFO GetSampleName - Start Date/Time: May 24, 2022 11:29:52 PM GMT 23:29:53.512 INFO GetSampleName - ------------------------------------------------------------ 23:29:53.512 INFO GetSampleName - ------------------------------------------------------------ 23:29:53.513 INFO GetSampleName - HTSJDK Version: 2.24.1 23:29:53.513 INFO GetSampleName - Picard Version: 2.27.1 23:29:53.514 INFO GetSampleName - Built for Spark Version: 2.4.5 23:29:53.514 INFO GetSampleName - HTSJDK Defaults.COMPRESSION_LEVEL : 2 23:29:53.514 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 23:29:53.515 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 23:29:53.515 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 23:29:53.515 INFO GetSampleName - Deflater: IntelDeflater 23:29:53.516 INFO GetSampleName - Inflater: IntelInflater 23:29:53.516 INFO GetSampleName - GCS max retries/reopens: 20 23:29:53.516 INFO GetSampleName - Requester pays: disabled 23:29:53.517 INFO GetSampleName - Initializing engine 23:29:59.659 INFO GetSampleName - Done initializing engine 23:29:59.670 INFO ProgressMeter - Starting traversal 23:29:59.672 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute 23:29:59.676 INFO ProgressMeter - unmapped 0.0 0 NaN 23:29:59.679 INFO ProgressMeter - Traversal complete. Processed 0 total records in 0.0 minutes. 23:29:59.680 INFO GetSampleName - Shutting down engine [May 24, 2022 11:29:59 PM GMT] org.broadinstitute.hellbender.tools.GetSampleName done. Elapsed time: 0.12 minutes. Runtime.totalMemory()=248205312 Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar GetSampleName -R gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/b4adf777-4f97-425c-b3e2-b37c9d927667/call-GatherBamFiles/SRR7588418.hg38.bam -O tumor_name.txt -encode Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.fb569f50 23:30:04.462 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so 23:30:04.802 INFO GetSampleName - ------------------------------------------------------------ 23:30:04.803 INFO GetSampleName - The Genome Analysis Toolkit (GATK) v4.2.6.1 23:30:04.803 INFO GetSampleName - For support and documentation go to https://software.broadinstitute.org/gatk/ 23:30:04.803 INFO GetSampleName - Executing as root@46c00700357d on Linux v5.10.107+ amd64 23:30:04.803 INFO GetSampleName - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 23:30:04.803 INFO GetSampleName - Start Date/Time: May 24, 2022 11:30:04 PM GMT 23:30:04.803 INFO GetSampleName - ------------------------------------------------------------ 23:30:04.803 INFO GetSampleName - ------------------------------------------------------------ 23:30:04.806 INFO GetSampleName - HTSJDK Version: 2.24.1 23:30:04.806 INFO GetSampleName - Picard Version: 2.27.1 23:30:04.807 INFO GetSampleName - Built for Spark Version: 2.4.5 23:30:04.807 INFO GetSampleName - HTSJDK Defaults.COMPRESSION_LEVEL : 2 23:30:04.810 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 23:30:04.811 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 23:30:04.811 INFO GetSampleName - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 23:30:04.811 INFO GetSampleName - Deflater: IntelDeflater 23:30:04.811 INFO GetSampleName - Inflater: IntelInflater 23:30:04.812 INFO GetSampleName - GCS max retries/reopens: 20 23:30:04.812 INFO GetSampleName - Requester pays: disabled 23:30:04.812 INFO GetSampleName - Initializing engine 23:30:10.182 INFO GetSampleName - Done initializing engine 23:30:10.194 INFO ProgressMeter - Starting traversal 23:30:10.203 INFO ProgressMeter - Current Locus Elapsed Minutes Records Processed Records/Minute 23:30:10.205 INFO ProgressMeter - unmapped 0.0 0 NaN 23:30:10.208 INFO ProgressMeter - Traversal complete. Processed 0 total records in 0.0 minutes. 23:30:10.209 INFO GetSampleName - Shutting down engine [May 24, 2022 11:30:10 PM GMT] org.broadinstitute.hellbender.tools.GetSampleName done. Elapsed time: 0.10 minutes. Runtime.totalMemory()=248201216 Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar GetSampleName -R gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/380dbed8-90e7-42a3-9fb8-10607c1ac950/call-GatherBamFiles/SRR7588413.hg38.bam -O normal_name.txt -encode Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.fb569f50 23:30:15.234 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so 23:30:15.527 INFO Mutect2 - ------------------------------------------------------------ 23:30:15.527 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.2.6.1 23:30:15.527 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/ 23:30:15.528 INFO Mutect2 - Executing as root@46c00700357d on Linux v5.10.107+ amd64 23:30:15.528 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 23:30:15.528 INFO Mutect2 - Start Date/Time: May 24, 2022 11:30:15 PM GMT 23:30:15.528 INFO Mutect2 - ------------------------------------------------------------ 23:30:15.528 INFO Mutect2 - ------------------------------------------------------------ 23:30:15.529 INFO Mutect2 - HTSJDK Version: 2.24.1 23:30:15.529 INFO Mutect2 - Picard Version: 2.27.1 23:30:15.530 INFO Mutect2 - Built for Spark Version: 2.4.5 23:30:15.530 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2 23:30:15.530 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 23:30:15.531 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 23:30:15.534 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 23:30:15.535 INFO Mutect2 - Deflater: IntelDeflater 23:30:15.535 INFO Mutect2 - Inflater: IntelInflater 23:30:15.535 INFO Mutect2 - GCS max retries/reopens: 20 23:30:15.536 INFO Mutect2 - Requester pays: disabled 23:30:15.536 INFO Mutect2 - Initializing engine 23:30:23.107 INFO FeatureManager - Using codec IntervalListCodec to read file gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc/Mutect2/d545c369-6733-421f-9a97-c56e2945d5c6/call-SplitIntervals/cacheCopy/glob-0fc990c5ca95eebc97c4c204e3e303e1/0049-scattered.interval_list 23:30:23.807 INFO IntervalArgumentCollection - Processing 64346955 bp from intervals 23:30:23.952 INFO Mutect2 - Done initializing engine 23:30:25.011 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_utils.so 23:30:25.022 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so 23:30:25.144 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM 23:30:25.145 INFO IntelPairHmm - Available threads: 1 23:30:25.146 INFO IntelPairHmm - Requested threads: 4 23:30:25.147 WARN IntelPairHmm - Using 1 available threads, but 4 were requested 23:30:25.147 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation 23:30:25.328 INFO ProgressMeter - Starting traversal 23:30:25.328 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute 23:30:35.830 INFO ProgressMeter - chr22_GL383582v2_alt:601 0.2 1370 7827.8 23:30:46.615 INFO ProgressMeter - chr19_KI270887v1_alt:601 0.4 6540 18434.7 23:30:57.601 INFO ProgressMeter - chr3_KI270895v1_alt:2101 0.5 11380 21157.7 23:31:08.547 INFO ProgressMeter - chr6_GL000251v2_alt:1 0.7 17490 24281.5 23:31:19.333 INFO ProgressMeter - chr6_GL000251v2_alt:3002351 0.9 27560 30620.0 23:31:30.010 INFO ProgressMeter - chr11_KI270902v1_alt:2101 1.1 35820 33227.7 23:31:40.353 INFO ProgressMeter - chr11_KI270902v1_alt:86903 1.3 36120 28886.8 23:31:50.353 INFO ProgressMeter - chr15_KI270905v1_alt:247591 1.4 40370 28488.4 23:32:01.084 INFO ProgressMeter - chr15_KI270905v1_alt:2426941 1.6 47800 29951.8 23:32:11.623 INFO ProgressMeter - chr15_KI270905v1_alt:4708372 1.8 55640 31407.2 23:32:21.899 INFO ProgressMeter - chr17_KI270908v1_alt:549309 1.9 61460 31634.2 23:32:32.285 INFO ProgressMeter - chr19_KI270915v1_alt:2401 2.1 69880 33025.9 23:32:42.579 INFO ProgressMeter - chr4_KI270925v1_alt:901 2.3 76120 33277.5 23:32:52.581 INFO ProgressMeter - chr6_GL000252v2_alt:2835312 2.5 87490 35649.1 23:33:02.886 INFO ProgressMeter - chr11_KI270927v1_alt:168024 2.6 94770 36089.8 23:33:14.478 INFO ProgressMeter - chr6_GL000253v2_alt:1 2.8 103380 36670.6 23:33:24.774 INFO ProgressMeter - chr6_GL000253v2_alt:4006085 3.0 116760 39040.4 23:33:34.775 INFO ProgressMeter - chr6_GL000254v2_alt:1744395 3.2 129190 40916.1 23:33:46.074 INFO ProgressMeter - chr6_GL000255v2_alt:2101 3.3 143610 42923.1 23:33:56.075 INFO ProgressMeter - chr6_GL000255v2_alt:4003370 3.5 157050 44712.8 23:34:06.732 INFO ProgressMeter - chr6_GL000256v2_alt:1766943 3.7 168880 45766.9 23:34:17.141 INFO ProgressMeter - chr19_GL949752v1_alt:180123 3.9 180100 46615.4 23:34:27.576 INFO ProgressMeter - chrUn_KN707606v1_decoy:1151 4.0 189880 47029.7 23:34:38.576 INFO ProgressMeter - chrUn_KN707614v1_decoy:1 4.2 189990 45013.0 23:34:52.190 INFO ProgressMeter - chrUn_KN707626v1_decoy:757 4.4 190110 42743.6 23:35:02.531 INFO ProgressMeter - chrUn_KN707635v1_decoy:923 4.6 190230 41175.2 23:35:13.418 INFO ProgressMeter - chrUn_KN707642v1_decoy:301 4.8 190320 39637.8 23:35:23.895 INFO ProgressMeter - chrUn_KN707649v1_decoy:597 5.0 190420 38266.9 23:35:35.507 INFO ProgressMeter - chrUn_KN707660v1_decoy:1 5.2 190530 36855.6 23:35:46.033 INFO ProgressMeter - chrUn_KN707667v1_decoy:1376 5.3 190680 35674.0 23:35:56.079 INFO ProgressMeter - chrUn_KN707679v1_decoy:1004 5.5 190980 34644.9 23:36:06.957 INFO ProgressMeter - chrUn_KN707692v1_decoy:1 5.7 191160 33575.1 23:36:17.146 INFO ProgressMeter - chrUn_KN707705v1_decoy:601 5.9 191350 32633.4 23:36:27.623 INFO ProgressMeter - chrUn_KN707719v1_decoy:1201 6.0 191540 31721.2 23:36:38.162 INFO ProgressMeter - chrUn_KN707730v1_decoy:901 6.2 191720 30853.5 23:36:49.384 INFO ProgressMeter - chrUn_KN707744v1_decoy:601 6.4 191990 29994.2 23:36:59.891 INFO ProgressMeter - chrUn_KN707757v1_decoy:563 6.6 192210 29228.9 23:37:09.896 INFO ProgressMeter - chrUn_KN707771v1_decoy:2816 6.7 192380 28531.2 23:37:20.090 INFO ProgressMeter - chrUn_KN707783v1_decoy:1738 6.9 192640 27867.7 23:37:30.626 INFO ProgressMeter - chrUn_KN707798v1_decoy:1 7.1 192840 27205.5 23:37:40.988 INFO ProgressMeter - chrUn_KN707813v1_decoy:2101 7.3 193120 26596.9 23:37:54.894 INFO ProgressMeter - chrUn_KN707828v1_decoy:1117 7.5 193320 25801.0 23:38:05.015 INFO ProgressMeter - chrUn_KN707832v1_decoy:192 7.7 193370 25239.4 23:38:16.138 INFO ProgressMeter - chrUn_KN707846v1_decoy:300 7.8 193550 24666.1 23:38:26.721 INFO ProgressMeter - chrUn_KN707860v1_decoy:1030 8.0 193770 24151.2 23:38:38.266 INFO ProgressMeter - chrUn_KN707868v1_decoy:888 8.2 193960 23608.7 23:38:48.549 INFO ProgressMeter - chrUn_KN707879v1_decoy:468 8.4 194140 23147.7 23:38:58.674 INFO ProgressMeter - chrUn_KN707886v1_decoy:721 8.6 194240 22702.9 23:39:09.294 INFO ProgressMeter - chrUn_KN707896v1_decoy:2605 8.7 194390 22259.9 23:39:27.749 INFO ProgressMeter - chrUn_KN707896v1_decoy:8133 9.0 194420 21505.8 23:39:41.680 INFO ProgressMeter - chrUn_KN707896v1_decoy:18785 9.3 194480 20973.8 23:39:56.081 INFO ProgressMeter - chrUn_KN707896v1_decoy:22015 9.5 194500 20446.7 23:40:06.907 INFO ProgressMeter - chrUn_KN707896v1_decoy:30692 9.7 194540 20070.3 23:40:17.266 INFO ProgressMeter - chrUn_KN707904v1_decoy:2734 9.9 194670 19732.2 23:40:27.414 INFO ProgressMeter - chrUn_KN707911v1_decoy:1501 10.0 194810 19413.5 23:40:37.958 INFO ProgressMeter - chrUn_KN707924v1_decoy:1501 10.2 194950 19093.1 23:40:48.247 INFO ProgressMeter - chrUn_KN707937v1_decoy:1201 10.4 195090 18791.2 23:41:00.901 INFO ProgressMeter - chrUn_KN707950v1_decoy:601 10.6 195220 18429.4 Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3000m -jar /root/gatk.jar Mutect2 -R gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/b4adf777-4f97-425c-b3e2-b37c9d927667/call-GatherBamFiles/SRR7588418.hg38.bam -tumor SRR7588418 -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/380dbed8-90e7-42a3-9fb8-10607c1ac950/call-GatherBamFiles/SRR7588413.hg38.bam -normal SRR7588413 -L gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc/Mutect2/d545c369-6733-421f-9a97-c56e2945d5c6/call-SplitIntervals/cacheCopy/glob-0fc990c5ca95eebc97c4c204e3e303e1/0049-scattered.interval_list -O output.vcf ln: failed to access '/cromwell_root/*normal-pileups.table': No such file or directory ln: failed to access '/cromwell_root/*tumor-pileups.table': No such file or directory 2022/05/24 23:42:47 Starting delocalization. 2022/05/24 23:42:50 Delocalization script execution started... 2022/05/24 23:42:50 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc/Mutect2/d545c369-6733-421f-9a97-c56e2945d5c6/call-M2/shard-49/attempt-6/memory_retry_rc 2022/05/24 23:42:54 Delocalizing output /cromwell_root/rc -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc/Mutect2/d545c369-6733-421f-9a97-c56e2945d5c6/call-M2/shard-49/attempt-6/rc 2022/05/24 23:42:55 Delocalizing output /cromwell_root/stdout -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc/Mutect2/d545c369-6733-421f-9a97-c56e2945d5c6/call-M2/shard-49/attempt-6/stdout 2022/05/24 23:42:57 Delocalizing output /cromwell_root/stderr -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc/Mutect2/d545c369-6733-421f-9a97-c56e2945d5c6/call-M2/shard-49/attempt-6/stderr 2022/05/24 23:42:59 Delocalizing output /cromwell_root/output.vcf.idx -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc/Mutect2/d545c369-6733-421f-9a97-c56e2945d5c6/call-M2/shard-49/attempt-6/output.vcf.idx Required file output '/cromwell_root/output.vcf.idx' does not exist.
Terra submission ID: ea86c7f1-b0d2-4a9a-b8a1-ff1dcc114cdc
-
Hi Wesley,
Thanks for writing in! Since this sounds like an issue on Terra can you share the workspace where you are seeing this issue with Terra Support by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.
- Toggle the "Share with support" button to "Yes"
- Click Save
Please provide us with a link to your workspace. We’ll be happy to take a closer look as soon as we can!
Please let me know if you have any questions.
Best,
Josh
-
Hi Josh,
I have toggled the support button.
The workspace link is here.
Thank you!
-
Hi Wesley,
Thanks for getting back to me! I'm going to investigate this for you and I'll let you know once I have any updates.
Best,
Josh
-
Hi Wesley,
I've been looking at the workflow and I have two suggestions for things to look at that might be likely causes of these errors:
- It could be possible that the Workflow actually needs more memory to fully create the tables, so I'd suggest running it with more memory.
- I noticed that variants_for_contamination variable referenced a bucket that is outside of this workspace. Could you please confirm you have access to data from that location as that could be a possible cause for this error as well.
Please let me know if those two suggestions are helpful or if you have any questions.
Best,
Josh
-
Hi Josh,
I reran the workflow with more memory by clicking the rerun with more memory option with a memory retry factor of 1.5 and still getting the same error.
I can confirm that terra has access to this data.
-
Hi Wesley,
Thanks for getting back to me! This workflow might require more than double the base memory to process this data, so I would suggest modifying the mem variable for the M2 task and increasing the memory volume from there.
That should give more memory directly to the task where we are seeing our errors.
Please let me know how that goes or if you have any questions.
Best,
Josh
-
Hi Josh,
Modifying the memory outputted seems to have solved one problem. Now I'm getting this error for both the normal and tumor pileup tables.
2022/06/06 19:42:33 Starting container setup. 2022/06/06 19:42:35 Done container setup. 2022/06/06 19:42:38 Starting localization. 2022/06/06 19:42:53 Localization script execution started... 2022/06/06 19:42:53 Localizing input gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/script -> /cromwell_root/script 2022/06/06 19:42:57 Localizing input gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict -> /cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict 2022/06/06 19:42:58 Localization script execution complete. 2022/06/06 19:43:05 Done localization. 2022/06/06 19:43:06 Running user action: docker run -v /mnt/local-disk:/cromwell_root -v /mnt/d-1b2e5749db4b0a6439c4895809508e1e:/mnt/af7b5955462dc70f18fa6a82eae18e22:ro --entrypoint=/bin/bash broadinstitute/gatk@sha256:21c3cb43b7d11891ed4b63cc7274f20187f00387cfaa0433b3e7991b5be34dbe /cromwell_root/script Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.a0fd8f31 USAGE: GatherPileupSummaries [arguments] Combine output files from GetPileupSummary in the order defined by a sequence dictionary Version:4.2.6.1 Required Arguments: --I <File> an output of PileupSummaryTable This argument must be specified at least once. Required. --O <File> output Required. --sequence-dictionary <File> sequence dictionary file Required. Optional Arguments: --arguments_file <File> read one or more arguments files and add them to the command line This argument may be specified 0 or more times. Default value: null. --gatk-config-file <String> A configuration file to use with the GATK. Default value: null. --gcs-max-retries,-gcs-retries <Integer> If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection Default value: 20. --gcs-project-for-requester-pays <String> Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed. Default value: . --help,-h <Boolean> display the help message Default value: false. Possible values: {true, false} --QUIET <Boolean> Whether to suppress job-summary info on System.err. Default value: false. Possible values: {true, false} --tmp-dir <GATKPath> Temp directory to use. Default value: null. --use-jdk-deflater,-jdk-deflater <Boolean> Whether to use the JdkDeflater (as opposed to IntelDeflater) Default value: false. Possible values: {true, false} --use-jdk-inflater,-jdk-inflater <Boolean> Whether to use the JdkInflater (as opposed to IntelInflater) Default value: false. Possible values: {true, false} --verbosity <LogLevel> Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING, INFO, DEBUG} --version <Boolean> display the version number for this tool Default value: false. Possible values: {true, false} Advanced Arguments: --showHidden <Boolean> display hidden arguments Default value: false. Possible values: {true, false} *********************************************************************** A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',SRR7588418.hg38.tsv}' but no positional argument is defined for this tool. *********************************************************************** Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. Using GATK jar /root/gatk.jar defined in environment variable GATK_LOCAL_JAR Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx3500m -jar /root/gatk.jar GatherPileupSummaries --sequence-dictionary /cromwell_root/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict -I -O SRR7588418.hg38.tsv 2022/06/06 19:43:15 Starting delocalization. 2022/06/06 19:43:16 Delocalization script execution started... 2022/06/06 19:43:16 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/memory_retry_rc 2022/06/06 19:43:19 Delocalizing output /cromwell_root/rc -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/rc 2022/06/06 19:43:21 Delocalizing output /cromwell_root/stdout -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/stdout 2022/06/06 19:43:22 Delocalizing output /cromwell_root/stderr -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/stderr 2022/06/06 19:43:24 Delocalizing output /cromwell_root/SRR7588418.hg38.tsv -> gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/8caf4739-6b66-49d6-9af8-85f550cd481c/Mutect2/ca84dbbe-be1d-429a-9dc0-13794510f65d/call-MergeTumorPileups/SRR7588418.hg38.tsv Required file output '/cromwell_root/SRR7588418.hg38.tsv' does not exist.
-
Hi Wesley,
I'm glad we were able to help you resolve the first issue! From what I can see, it appears that the .TSV file from gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict is passing into the commands in a way they don't expect. I'm going to do some more research on my end and let you know once I have an update.
Best,
Josh
-
Hi Wesley Kwong,
Thanks for your patience. I took a look at this issue with Josh and found that your script running GatherPileupSummaries is not configured properly. This error message indicates that you did not provide the arguments as required by the tool:
A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',SRR7588418.hg38.tsv}' but no positional argument is defined for this tool.
To solve this error message, I would recommend taking another look at your command line and make corrections so that each input is configured properly.
Let us know if you have any other questions.
Best,
Genevieve
-
Hi Josh and Genevieve,
After some troubleshooting, I was able to run the Mutect2 pipeline successfully when I added both the pon and genomAD files without the variant for contamination files. But once I add the variant for contamination files (including the index) taken from the GCP gatk-best-practices bucket, I get this error:
Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx149500m -jar /root/gatk.jar Mutect2 -R gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/b4adf777-4f97-425c-b3e2-b37c9d927667/call-GatherBamFiles/SRR7588418.hg38.bam -tumor SRR7588418 -I gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/cfce2061-efd6-449e-bdc9-a7ff2b633644/PreProcessingForVariantDiscovery_GATK4/380dbed8-90e7-42a3-9fb8-10607c1ac950/call-GatherBamFiles/SRR7588413.hg38.bam -normal SRR7588413 --germline-resource gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/af-only-gnomad.hg38.vcf.gz -pon gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/1000g_pon.hg38.vcf.gz -L gs://fc-d31bc4e7-6d10-4dc4-a585-5895ab2346f3/81583498-648e-4e70-8452-80509b626927/Mutect2/dbb6ef96-ea07-4cfe-9e85-3b133c6d89ea/call-SplitIntervals/cacheCopy/glob-0fc990c5ca95eebc97c4c204e3e303e1/0000-scattered.interval_list -O output.vcf Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/cromwell_root/tmp.c880de1b 21:30:55.896 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so 21:30:55.924 INFO GetPileupSummaries - ------------------------------------------------------------ 21:30:55.925 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.2.6.1 21:30:55.925 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/ 21:30:55.925 INFO GetPileupSummaries - Executing as root@42c5b048ff41 on Linux v5.10.107+ amd64 21:30:55.925 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 21:30:55.925 INFO GetPileupSummaries - Start Date/Time: June 22, 2022 9:30:55 PM GMT 21:30:55.925 INFO GetPileupSummaries - ------------------------------------------------------------ 21:30:55.925 INFO GetPileupSummaries - ------------------------------------------------------------ 21:30:55.926 INFO GetPileupSummaries - HTSJDK Version: 2.24.1 21:30:55.926 INFO GetPileupSummaries - Picard Version: 2.27.1 21:30:55.926 INFO GetPileupSummaries - Built for Spark Version: 2.4.5 21:30:55.926 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2 21:30:55.926 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 21:30:55.926 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 21:30:55.926 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 21:30:55.926 INFO GetPileupSummaries - Deflater: IntelDeflater 21:30:55.926 INFO GetPileupSummaries - Inflater: IntelInflater 21:30:55.926 INFO GetPileupSummaries - GCS max retries/reopens: 20 21:30:55.926 INFO GetPileupSummaries - Requester pays: disabled 21:30:55.927 INFO GetPileupSummaries - Initializing engine 21:30:59.931 INFO FeatureManager - Using codec VCFCodec to read file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz 21:31:00.474 INFO GetPileupSummaries - Shutting down engine [June 22, 2022 9:31:00 PM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 0.08 minutes. Runtime.totalMemory()=2452094976 *********************************************************************** A USER ERROR has occurred: An index is required but was not found for file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input. ***********************************************************************
When I took a look at the WDL script in github, I noticed that the M2 tasks accepts both the variants_for_contamination and variants_for_contamination_idx files as input. But I do not see the variants_for_contamination_idx used anywhere unlike the variants_for_contamination variable. Could you look into this?
Thank you!
-
Hi Wesley Kwong,
Can you share the Submission ID so we can take a closer look at the issue?
Best,
Samantha
-
Hi Samantha,
The submission id is 81583498-648e-4e70-8452-80509b626927.
Thank you!
-
Hi Wesley Kwong,
It looks like the error message in that submission is the one Genevieve pointed out in her latest message:
A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',SRR7588418.hg38.tsv}' but no positional argument is defined for this tool.
As she recommended, to solve this error, you should take another look at your command line and make corrections so that each input is configured properly.
I'm still not sure where you are seeing this error:
A USER ERROR has occurred: An index is required but was not found for file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.
If you are still encountering this error, please let me know the submission ID so I can take a closer look.
Best,
Samantha
-
Hi Samantha,
I recognize that I am still getting the same error message where its asking for the SRR7588418.hg38.tsv file. But I believe this file that would be inputted to generate the pileup tables comes from the output of M2 running successfully.
If you look under the M2 logs, this is where you would find the error that would occur before the pileup error message.
A USER ERROR has occurred: An index is required but was not found for file gs://bruce-processed-data/Prins_Cloughesy_Neoadjuvant/terra_reference_files/small_exac_common_3.hg38.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.
-
I see. Can you temporarily share the gs://bruce-processed-data bucket with me (svelasqu@broadinstitute.org)?
Even though the index file path isn't explicitly being passed to the GetPileupSummaries command, the path should be inferred automatically.
-
I have added you as admin in the bucket.
-
Hi Samantha (she/her), I work with Wesley on this project. Just wanted to check in on this issue, we are super blocked here. It seems like there could be a few problems, and I elaborated on them by posting an issue to the gatk github.
We are passing a variants contamination index in to the workflow, but it is never used. It seems like either:
- The source code to mutect2.wdl has to be changed to actually use the variants_for_contamination_idx workflow variable, if GetPileupSummaries ever supports it as an input argument.
- The source code to mutect2.wdl has to be changed so that it runs IndexFeatureFile on the variants_for_contamination file, prior to calling GetPileupSummaries.
- The variants_for_contamination file should be localized before running?
- There is something wrong with sending a compressed .gz file as input. I believe this is not an issue, Wesley has tried passing in an uncompressed file and it still failed.
We're super blocked but are willing to try different approaches, please reach out if you or your team can think of anything!
-
Hi all,
the gatk tools look for the index files at the same position where the feature file is located. Hence, the index files should be treated as hidden arguments in the command line prompt. The reason the index files appear as arguments in the WDL is to also localize them if the feature file is to be localized for a specific task, so that the gatk tool can find it. Now, most gatk tools use NIO to stream files so that localization can be optional.
Your point 2. is a good idea for adjusting the wdl to make the index files optional and run IndexFeatureFile if they are not supplied. However, since gatk tools essentially always also create an index file as their output, I suppose it is assumed that the index file is present and can be supplied as a workflow argument.
3. You can try to localize both variants_for_contamination and variants_for_contamination_idx, but GetPileupSummaries uses NIO, so localization shouldn't be necessary.
I see that you are working with hg38. In order to create your own resource of variants for contamination, you can take the publicly available gnomad.v3.1.2 chr1 data set and filter and subset it to AF > 0.05 with
gatk SelectVariants -V gs://gcp-public-data--gnomad/release/3.1.2/vcf/genomes/gnomad.genomes.v3.1.2.sites.chr1.vcf.bgz -select 'AF > 0.05' --restrict-alleles-to BIALLELIC --exclude-filtered true -O variants_for_contamination.vcf.gz
This essentially recreates the best practice resource. SelectVariants also creates an index file as output, which you should supply to the variant calling WDL.
If you still don't have luck and it indeed is an issue with the wdl, you can have a look at my updated mutect2 workflow wdls, which I've recently successfully run.
Best,
Philipp
-
Thanks, Philipp Hähnel.
Wesley Kwong - are you able to resolve your issue with Philipp's advice?
Best,
Samantha
-
Hi all,
My apologies for not being able to reply quickly.
The issue was utilizing the preexisting contamination file. Using Philipp's command to generate my own contamination files solved the problem.
Thank you so much for all your help guys!
-
Great, glad you were able to solve the issue! Thanks Wesley!
Please sign in to leave a comment.
21 comments