Missing Output File during MarkDuplicates Step
I'm sorry if this is an easy fix, I am very new to this. I am running "PreProcessingForVariantDiscovery" in Terra.bio, along with other steps recommended by best practices.
I initially ran the toy data (NA12878) included with the workspace I am using. After successfully generating aligned .bam and .bai files, then adapted the steps to the sample data I am looking to analyze.
The initial steps for my sample data:
1) .cram-to-.bam
2) .bam-to-.unmapped.bam
3) validate .bam files.
I then created a list of the URIs, and passed this as the argument into PreProcessingForVariantDisovery-GATK4. Each of the sample failed in the MarkDuplicates command, with an error indicating delocalization was unsuccessful because it was missing the output file.
REQUIRED for all errors and issues:
a) GATK version used: 4.2.6.1
b) Exact command used:
[Tue Dec 05 17:18:32 GMT 2023]
MarkDuplicates --INPUT /cromwell_root/fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MergeBamAlignment/shard-0/HWMFH.1.bam.unmapped.aligned.unsorted.bam --INPUT /cromwell_root/fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MergeBamAlignment/shard-1/HWTFL.1.bam.unmapped.aligned.unsorted.bam --INPUT /cromwell_root/fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MergeBamAlignment/shard-2/attempt-2/HWTWV.1.bam.unmapped.aligned.unsorted.bam --INPUT /cromwell_root/fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MergeBamAlignment/shard-3/HWTYY.1.bam.unmapped.aligned.unsorted.bam --OUTPUT RP-2422_SM-N2OO4_v1_WGS_GCP.hg38.aligned.unsorted.duplicates_marked.bam --METRICS_FILE RP-2422_SM-N2OO4_v1_WGS_GCP.hg38.duplicate_metrics --ASSUME_SORT_ORDER queryname --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --VALIDATION_STRINGENCY SILENT --CREATE_MD5_FILE true --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
c) Entire program log:
The log is extensive, nearly 150k. I will share the last few lines and the error message:
...
INFO 2023-12-05 17:57:26 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2023-12-05 17:57:31 MarkDuplicates Read 477,000,000 records. Elapsed time: 00:38:51s. Time for last 1,000,000: 4s. Last read position: chr15:27,868,283. Last read name: HWTFLDSX5230613:1:2342:6027:9486
INFO 2023-12-05 17:57:31 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2023-12-05 17:57:35 MarkDuplicates Read 478,000,000 records. Elapsed time: 00:38:55s. Time for last 1,000,000: 4s. Last read position: chr1:91,271,677. Last read name: HWTFLDSX5230613:1:2346:14796:22827
INFO 2023-12-05 17:57:35 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2023-12-05 17:57:39 MarkDuplicates Read 479,000,000 records. Elapsed time: 00:39:00s. Time for last 1,000,000: 4s. Last read position: chr18:51,846,241. Last read name: HWTFLDSX5230613:1:2349:22923:19147
INFO 2023-12-05 17:57:39 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
Using GATK jar /gatk/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.compression_level=5 -Xms126G -jar /gatk/gatk-package-4.2.6.1-local.jar MarkDuplicates --INPUT /cromwell_root/fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MergeBamAlignment/shard-0/HWMFH.1.bam.unmapped.aligned.unsorted.bam --INPUT /cromwell_root/fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MergeBamAlignment/shard-1/HWTFL.1.bam.unmapped.aligned.unsorted.bam --INPUT /cromwell_root/fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MergeBamAlignment/shard-2/attempt-2/HWTWV.1.bam.unmapped.aligned.unsorted.bam --INPUT /cromwell_root/fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MergeBamAlignment/shard-3/HWTYY.1.bam.unmapped.aligned.unsorted.bam --OUTPUT RP-2422_SM-N2OO4_v1_WGS_GCP.hg38.aligned.unsorted.duplicates_marked.bam --METRICS_FILE RP-2422_SM-N2OO4_v1_WGS_GCP.hg38.duplicate_metrics --VALIDATION_STRINGENCY SILENT --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --ASSUME_SORT_ORDER queryname --CREATE_MD5_FILE true
2023/12/05 17:58:00 Starting delocalization.
2023/12/05 17:58:01 Delocalization script execution started...
2023/12/05 17:58:01 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MarkDuplicates/memory_retry_rc
2023/12/05 17:58:04 Delocalizing output /cromwell_root/rc -> gs://fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MarkDuplicates/rc
2023/12/05 17:58:06 Delocalizing output /cromwell_root/stdout -> gs://fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MarkDuplicates/stdout
2023/12/05 17:58:08 Delocalizing output /cromwell_root/stderr -> gs://fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MarkDuplicates/stderr
2023/12/05 17:58:10 Delocalizing output /cromwell_root/RP-2422_SM-N2OO4_v1_WGS_GCP.hg38.aligned.unsorted.duplicates_marked.bam -> gs://fc-secure-2300338b-dae8-43d0-8062-7a76842dfe50/submissions/8e387ee4-6544-4e38-bbef-2ee50c82d46f/PreProcessingForVariantDiscovery_GATK4/79c2c044-d51b-456d-a66d-e7db74d74a6f/call-MarkDuplicates/RP-2422_SM-N2OO4_v1_WGS_GCP.hg38.aligned.unsorted.duplicates_marked.bam
Required file output '/cromwell_root/RP-2422_SM-N2OO4_v1_WGS_GCP.hg38.aligned.unsorted.duplicates_marked.bam' does not exist.
Again, if there is something I missed, I would really appreciate any advice I could get. Thank you for your time!
Steve
-
I was able to resolve this by using an alternate workspace/workflow.
Please sign in to leave a comment.
1 comment