AbstractOpticalDuplicateFinderCommandLineProgram Default READ_NAME_REGEX '<optimized capture of last three': 'separated fields as numeric values>' did not match read name '2hpf_wt_total_SRR870747.42096'.
Answered
GATK version used: GATK4
I tried to run MarkDuplicates like this: picard MarkDuplicates I = $ (FILE) .sort.bam O = $ (FILE) .MD.bam M = $ (FILE) .MD_matrix.txt;
And I got the comment: WARNING 2022-02-06 13:24:09 AbstractOpticalDuplicateFinderCommandLineProgram Default READ_NAME_REGEX '<optimized capture of last three': 'separated fields as numeric values>' did not match read name '2hpf_wt_total_SRR870747.42096'. You may need to specify a READ_NAME_REGEX in order to correctly identify optical duplicates. Note that this message will not be emitted again even if other read names do not match the regex.
And in Matrix only one line came out (I previously ran the command string on the current BAM file and many more lines came out in the matrix).
I tried to run according to previous questions in the forum:
picard ValidateSamFile I = $ (FILE) .sort.bam MODE = SUMMARY;
On the file that came out of sort (before the MarkDuplicates).
And I received:
WARNING 2022-02-06 13:16:09 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
and then
What it means? How can this be fixed? Is the problem with MarkDuplicates related to this?
I would love to help, thank you so much!
-
Hi Dina Tzur,
Could you share the complete program log from MarkDuplicates?
Thank you,
Genevieve
-
Hi Genevieve Brandt,
Thanks for the attention!
Can you explain to me how I find it?
Thanks again!
Dina
-
Yes! The program log is all the messages printed to your terminal when running the GATK command line.
Here is another post showing the complete program log from CreateReadCountPanelOfNormals: https://gatk.broadinstitute.org/hc/en-us/community/posts/4417744373019-Error-while-running-CreateReadCountPanelOfNormals
-
[bam_sort_core] merging from 10 files and 1 in-memory blocks...
INFO 2022-02-16 21:09:14 MarkDuplicates********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** MarkDuplicates -I /sci/home/dinnatzur12/group/BAM_z/BAM_Lee2013/bam_bai/2.0hpf_wt_total.star.bam.sort.bam -O /sci/home/dinnatzur12/group/BAM_z/BAM_Lee2013/md/2.0hpf_wt_total.star.bam.MD.bam -M /sci/home/dinnatzur12/group/BAM_z/BAM_Lee2013/md/2.0hpf_wt_total.star.bam.MD_matrix.txt
**********
21:09:15.667 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/hurcs/miniconda3/envs/picard-2.26.4/share/picard-2.26.4-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Feb 16 21:09:15 IST 2022] MarkDuplicates INPUT=[/sci/home/dinnatzur12/group/BAM_z/BAM_Lee2013/bam_bai/2.0hpf_wt_total.star.bam.sort.bam] OUTPUT=/sci/home/dinnatzur12/group/BAM_z/BAM_Lee2013/md/2.0hpf_wt_total.star.bam.MD.bam METRICS_FILE=/sci/home/dinnatzur12/group/BAM_z/BAM_Lee2013/md/2.0hpf_wt_total.star.bam.MD_matrix.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Feb 16 21:09:15 IST 2022] Executing as dinnatzur12@glacier-06 on Linux 5.10.79-aufs-1 amd64; OpenJDK 64-Bit Server VM 11.0.9.1-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.4
INFO 2022-02-16 21:09:15 MarkDuplicates Start of doWork freeMemory: 510616376; totalMemory: 519110656; maxMemory: 2075918336
INFO 2022-02-16 21:09:15 MarkDuplicates Reading input file and constructing read end information.
INFO 2022-02-16 21:09:15 MarkDuplicates Will retain up to 7521443 data points before spilling to disk.
WARNING 2022-02-16 21:09:15 AbstractOpticalDuplicateFinderCommandLineProgram Default READ_NAME_REGEX '<optimized capture of last three ':' separated fields as numeric values>' did not match read name '2hpf_wt_total_SRR870747.420964'. You may need to specify a READ_NAME_REGEX in order to correctly identify optical duplicates. Note that this message will not be emitted again even if other read names do not match the regex.
INFO 2022-02-16 21:09:19 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:03s. Time for last 1,000,000: 3s. Last read position: chr4:55,758,217
INFO 2022-02-16 21:09:19 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:21 MarkDuplicates Read 2,000,000 records. Elapsed time: 00:00:05s. Time for last 1,000,000: 1s. Last read position: chr4:77,551,161
INFO 2022-02-16 21:09:21 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:22 MarkDuplicates Read 3,000,000 records. Elapsed time: 00:00:07s. Time for last 1,000,000: 1s. Last read position: chr4:77,557,063
INFO 2022-02-16 21:09:22 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:24 MarkDuplicates Read 4,000,000 records. Elapsed time: 00:00:08s. Time for last 1,000,000: 1s. Last read position: chr4:77,558,095
INFO 2022-02-16 21:09:24 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:26 MarkDuplicates Read 5,000,000 records. Elapsed time: 00:00:10s. Time for last 1,000,000: 1s. Last read position: chr4:77,558,806
INFO 2022-02-16 21:09:26 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:27 MarkDuplicates Read 6,000,000 records. Elapsed time: 00:00:11s. Time for last 1,000,000: 1s. Last read position: chr4:77,559,415
INFO 2022-02-16 21:09:27 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:29 MarkDuplicates Read 7,000,000 records. Elapsed time: 00:00:13s. Time for last 1,000,000: 1s. Last read position: chr4:77,560,710
INFO 2022-02-16 21:09:29 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:31 MarkDuplicates Read 8,000,000 records. Elapsed time: 00:00:15s. Time for last 1,000,000: 1s. Last read position: chr4:77,563,046
INFO 2022-02-16 21:09:31 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:33 MarkDuplicates Read 9,000,000 records. Elapsed time: 00:00:17s. Time for last 1,000,000: 1s. Last read position: chr7:30,627,338
INFO 2022-02-16 21:09:33 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:37 MarkDuplicates Read 10,000,000 records. Elapsed time: 00:00:21s. Time for last 1,000,000: 4s. Last read position: chr12:17,156,783
INFO 2022-02-16 21:09:37 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:39 MarkDuplicates Read 11,000,000 records. Elapsed time: 00:00:23s. Time for last 1,000,000: 2s. Last read position: chr18:3,576,345
INFO 2022-02-16 21:09:39 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:41 MarkDuplicates Read 12,000,000 records. Elapsed time: 00:00:25s. Time for last 1,000,000: 1s. Last read position: chr23:22,507,823
INFO 2022-02-16 21:09:41 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:43 MarkDuplicates Read 13,000,000 records. Elapsed time: 00:00:27s. Time for last 1,000,000: 1s. Last read position: KZ115963.1:573
INFO 2022-02-16 21:09:43 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:44 MarkDuplicates Read 14,000,000 records. Elapsed time: 00:00:28s. Time for last 1,000,000: 1s. Last read position: KZ115098.1:437
INFO 2022-02-16 21:09:44 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:45 MarkDuplicates Read 15,000,000 records. Elapsed time: 00:00:29s. Time for last 1,000,000: 1s. Last read position: KZ115098.1:6,441
INFO 2022-02-16 21:09:45 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:47 MarkDuplicates Read 16,000,000 records. Elapsed time: 00:00:31s. Time for last 1,000,000: 1s. Last read position: KZ115098.1:7,937
INFO 2022-02-16 21:09:47 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:48 MarkDuplicates Read 17,000,000 records. Elapsed time: 00:00:32s. Time for last 1,000,000: 1s. Last read position: KZ115098.1:8,612
INFO 2022-02-16 21:09:48 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:49 MarkDuplicates Read 18,000,000 records. Elapsed time: 00:00:33s. Time for last 1,000,000: 1s. Last read position: KZ115098.1:9,285
INFO 2022-02-16 21:09:49 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:50 MarkDuplicates Read 19,000,000 records. Elapsed time: 00:00:35s. Time for last 1,000,000: 1s. Last read position: KZ115098.1:10,082
INFO 2022-02-16 21:09:50 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:52 MarkDuplicates Read 20,000,000 records. Elapsed time: 00:00:36s. Time for last 1,000,000: 1s. Last read position: KZ115098.1:12,646
INFO 2022-02-16 21:09:52 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:53 MarkDuplicates Read 21,000,000 records. Elapsed time: 00:00:37s. Time for last 1,000,000: 1s. Last read position: KZ114841.1:92,490
INFO 2022-02-16 21:09:53 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:55 MarkDuplicates Read 22,000,000 records. Elapsed time: 00:00:39s. Time for last 1,000,000: 1s. Last read position: 18S_1716nt:946
INFO 2022-02-16 21:09:55 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:56 MarkDuplicates Read 23,000,000 records. Elapsed time: 00:00:40s. Time for last 1,000,000: 1s. Last read position: 18S_1946nt:893
INFO 2022-02-16 21:09:56 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:57 MarkDuplicates Read 24,000,000 records. Elapsed time: 00:00:41s. Time for last 1,000,000: 1s. Last read position: 28S_4252nt:325
INFO 2022-02-16 21:09:57 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:09:58 MarkDuplicates Read 25,000,000 records. Elapsed time: 00:00:42s. Time for last 1,000,000: 1s. Last read position: 28S_4252nt:1,149
INFO 2022-02-16 21:09:58 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:10:00 MarkDuplicates Read 26,000,000 records. Elapsed time: 00:00:44s. Time for last 1,000,000: 1s. Last read position: 28S_4252nt:1,763
INFO 2022-02-16 21:10:00 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:10:01 MarkDuplicates Read 27,000,000 records. Elapsed time: 00:00:45s. Time for last 1,000,000: 1s. Last read position: 28S_4252nt:2,430
INFO 2022-02-16 21:10:01 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:10:02 MarkDuplicates Read 28,000,000 records. Elapsed time: 00:00:46s. Time for last 1,000,000: 1s. Last read position: 28S_4252nt:3,990
INFO 2022-02-16 21:10:02 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:10:03 MarkDuplicates Read 29,000,000 records. Elapsed time: 00:00:48s. Time for last 1,000,000: 1s. Last read position: 28S_4278nt:812
INFO 2022-02-16 21:10:03 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:10:05 MarkDuplicates Read 30,000,000 records. Elapsed time: 00:00:49s. Time for last 1,000,000: 1s. Last read position: 28S_4278nt:1,434
INFO 2022-02-16 21:10:05 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:10:06 MarkDuplicates Read 31,000,000 records. Elapsed time: 00:00:50s. Time for last 1,000,000: 1s. Last read position: 28S_4278nt:2,135
INFO 2022-02-16 21:10:06 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:10:07 MarkDuplicates Read 32,000,000 records. Elapsed time: 00:00:51s. Time for last 1,000,000: 1s. Last read position: 28S_4278nt:3,141
INFO 2022-02-16 21:10:07 MarkDuplicates Tracking 0 as yet unmatched pairs. 0 records in RAM.
INFO 2022-02-16 21:10:08 MarkDuplicates Read 32675474 records. 0 pairs never matched.
INFO 2022-02-16 21:10:09 MarkDuplicates After buildSortedReadEndLists freeMemory: 765922192; totalMemory: 837369856; maxMemory: 2075918336
INFO 2022-02-16 21:10:09 MarkDuplicates Will retain up to 64872448 duplicate indices before spilling to disk.
INFO 2022-02-16 21:10:09 MarkDuplicates Traversing read pair information and detecting duplicates.
INFO 2022-02-16 21:10:09 MarkDuplicates Traversing fragment information and detecting duplicates.
INFO 2022-02-16 21:10:11 MarkDuplicates Sorting list of duplicate records.
INFO 2022-02-16 21:10:12 MarkDuplicates After generateDuplicateIndexes freeMemory: 929069616; totalMemory: 1462607872; maxMemory: 2075918336
INFO 2022-02-16 21:10:12 MarkDuplicates Marking 7276294 records as duplicates.
INFO 2022-02-16 21:10:12 MarkDuplicates Found 0 optical duplicate clusters.
INFO 2022-02-16 21:10:12 MarkDuplicates Reads are assumed to be ordered by: coordinate
INFO 2022-02-16 21:11:21 MarkDuplicates Written 10,000,000 records. Elapsed time: 00:01:09s. Time for last 10,000,000: 69s. Last read position: chr12:17,156,783
INFO 2022-02-16 21:12:29 MarkDuplicates Written 20,000,000 records. Elapsed time: 00:02:17s. Time for last 10,000,000: 67s. Last read position: KZ115098.1:12,646
INFO 2022-02-16 21:13:33 MarkDuplicates Written 30,000,000 records. Elapsed time: 00:03:20s. Time for last 10,000,000: 63s. Last read position: 28S_4278nt:1,434
INFO 2022-02-16 21:13:50 MarkDuplicates Writing complete. Closing input iterator.
INFO 2022-02-16 21:13:50 MarkDuplicates Duplicate Index cleanup.
INFO 2022-02-16 21:13:50 MarkDuplicates Getting Memory Stats.
INFO 2022-02-16 21:13:50 MarkDuplicates Before output close freeMemory: 1446677912; totalMemory: 1462607872; maxMemory: 2075918336
INFO 2022-02-16 21:13:52 MarkDuplicates Closed outputs. Getting more Memory Stats.
INFO 2022-02-16 21:13:52 MarkDuplicates After output close freeMemory: 1352831712; totalMemory: 1368248320; maxMemory: 2075918336
[Wed Feb 16 21:13:52 IST 2022] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 4.62 minutes.
Runtime.totalMemory()=1368248320Hope I understood correctly what to share.
Thank you!
Dina -
And the matrix I get:
## htsjdk.samtools.metrics.StringHeader
# MarkDuplicates INPUT=[/group/BAM_z/BAM_Lee2013/bam_bai/2.0hpf_wt_total.star.bam.sort.bam] OUTPUT=/group/BAM_z/BAM_Lee2013/md/2.0hpf_wt_total.star.bam.MD.bam METRICS_FILE=/group/BAM_z/BAM_Lee2013/md/2.0hpf_wt_total.star.bam.MD_matrix.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
## htsjdk.samtools.metrics.StringHeader
# Started on: Wed Feb 16 21:09:15 IST 2022## METRICS CLASS picard.sam.DuplicationMetrics
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED SECONDARY_OR_SUPPLEMENTARY_RDS UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
Unknown Library 10503556 0 22171918 0 7276294 0 0 0.692746 -
Hi Dina,
Yes, this is what I was looking for, thank you!
This doesn't look like a problem with MarkDuplicates. Depending on your reads, everything may be fine. The reason that you only got one line in the metrics file is because MarkDuplicates is only detecting one library in this bam. If you are expecting that there should be more than one library, then you'll want to go back and make sure your pre-processing steps were done correctly.
The warning you got could indicate that there is an issue in your BAM file with the read names. Here is an article we have about BAM/SAM files, take a look and make sure that your files meet the specifications: SAM or BAM or CRAM - Mapped sequence data formats. There's also a related forum post that could be helpful.
Your metrics output also indicate that all of your reads were evaluated as unpaired, is this expected? If not, check your file for issues.
Let me know if you have any further questions.
Best,
Genevieve
-
Hi Genevieve,
The truth is that I have run the MARKDUPLICATE already once on the files and got a matrix in this style:
## htsjdk.samtools.metrics.StringHeader
# MarkDuplicates INPUT=[/sci/home/dinnatzur12/group/BAM_z/BAM_Pauli2011/bam_bai/2.5hpf.star.bam.sort.bam] OUTPUT=/sci/home/dinnatzur12/group/BAM_z/BAM_Pauli2011/md/2.5hpf.star.bam.MD.bam METRICS_FILE=/sci/home/dinnatzur12/group/BAM_z/BAM_Pauli2011/md/2.5hpf.star.bam.MD_matrix.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
## htsjdk.samtools.metrics.StringHeader
# Started on: Tue Feb 08 12:29:57 IST 2022## METRICS CLASS picard.sam.DuplicationMetrics
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED SECONDARY_OR_SUPPLEMENTARY_RDS UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
Unknown Library 0 164830805 46942564 0 0 36215207 47810 0.219711 318172211## HISTOGRAM java.lang.Double
BIN CoverageMult all_sets optical_sets non_optical_sets
1.0 1.000221 111529041 0 111561454
2.0 1.596031 11186571 47676 11160566
3.0 1.950942 2909758 59 2905831
4.0 2.162354 1167906 4 1166793
5.0 2.288288 586193 1 585659
6.0 2.363304 337103 0 336840
7.0 2.407989 212422 0 212295
8.0 2.434607 144258 0 144126
9.0 2.450463 101000 0 100931
10.0 2.459908 73941 0 73916
11.0 2.465534 56383 0 56347
12.0 2.468885 43983 0 43956
13.0 2.470882 34999 0 34975
14.0 2.472071 28360 0 28344
15.0 2.472779 23364 0 23339
16.0 2.473201 19084 0 19077
17.0 2.473453 16245 0 16245
18.0 2.473602 13759 0 13756
19.0 2.473691 11889 0 11875
20.0 2.473745 10237 0 10219
21.0 2.473776 9066 0 9077
22.0 2.473795 7918 0 7899
23.0 2.473806 7042 0 7045
24.0 2.473813 6031 0 6032
25.0 2.473817 5492 0 5491
26.0 2.473819 5031 0 5024
27.0 2.473821 4349 0 4344
28.0 2.473822 4082 0 4083
29.0 2.473822 3658 0 3651
30.0 2.473822 3315 0 3328
31.0 2.473823 3048 0 3039
32.0 2.473823 2752 0 2760
33.0 2.473823 2450 0 2441
34.0 2.473823 2360 0 2353
35.0 2.473823 2206 0 2217
36.0 2.473823 2004 0 1993
37.0 2.473823 1903 0 1901
38.0 2.473823 1734 0 1733
39.0 2.473823 1566 0 1568So I expected to get a matrix of this type and not with just one row.
I thought the problem was the names of the readings and looked at the two link you provided (thank you very much!).
According to the second link, I used the recommendation to edit the names of the readings and edited them for this:3:2:2:718:17 4 * 0 0 * * 0 0 NGCTTTTAGGCGGGATTCTGACTTAGAGGCGTTCAGTCATAATCCCGCAG #AAAFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ YT:Z:UU
3:2:2:718:18 4 * 0 0 * * 0 0 NCGGGGCCTATCGGAGATCCGACGGCGCTGCTGTATCGTTGCTTTTAGGC #AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ YT:Z:UU
3:2:2:718:19 4 * 0 0 * * 0 0 NCCGAGGTCTTTTTTTTTTTTTTTAACTTTGCATTTACAGGAACGCTGCC #AAFFJJJJJJJJJJJJJJJJJJJJJ-FJJ---A--7-<A-FAA7AA7A< YT:Z:UU
3:2:2:718:20 4 * 0 0 * * 0 0 NCCGAGGTCTTTTTTTTTTTTTTTAACTTTGCATCTACAGGAACGCTGCC #AAFFJJJJJJJJJJJJJJJJJJJF<<JJJ-<FJ<JF<-<FFFJ7AJ-7A YT:Z:UU
3:2:2:718:21 4 * 0 0 * * 0 0 NGCAGTACGAATGCCCCCGTCTGTCTCTGTTAACCATTACCTCAAGTCCA #AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ YT:Z:UU
3:2:2:718:22 16 chr24 24171147 100 50M * 0 0 GCTGCCGGAGGACCCGAGGAGACGCAGCCTGTGGATGAAGTTTATCGAGN JJF<JFJJJJAAJFJJJJJAJAFJAJJJ7JJJJJJFFJJJJJF-FAAAA# AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49G0 YT:Z:UU ZW:f:1
3:2:2:718:23 256 chr4 77551337 3 50M * 0 0 NTCTGATAAATGCACGCGTCCCCGGGTACCCACCCCCCGCCCCGAGGGGA #AAFFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJ<JF7FJFF AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A49 YT:Z:UU ZW:f:0.5
3:2:2:718:23 0 chr4 77562888 3 50M * 0 0 NTCTGATAAATGCACGCGTCCCCGGGTACCCACCCCCCGCCCCGAGGGGA #AAFFJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJ<JF7FJFF AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0A49 YT:Z:UU ZW:f:0.5
3:2:2:718:24 16 chr1 24558245 100 50M * 0 0 AAGTTTTATAGTTGTTTTCTTTTATTTTCCTAATTATTTTACCAAAGCTN JFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFFAA# AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:30A18G0 YT:Z:UU ZW:f:1
3:2:2:718:25 16 chr20 29580097 100 50M * 0 0 ACTGGCTCTCAACTTCTCTGTCTTCTACTATGAGATCCTTAACTCTCCGN JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJFFAA# AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:49G0 YT:Z:UU ZW:f:1This caused me not to get the comment but I still only got one line in the matrix.
So I would love if you could tell me what exactly the MARKDUPLICATE expects to find in the name of the call? Because even though I tried, I could not figure it out from the first link you gave (the expected format is not that detailed).
Thanks for the patience and much help!
Dina -
Hi Dina Tzur,
MarkDuplicates does not produce the histogram if there is more than one read group in the file. Here's a biostars post with a good explanation: https://www.biostars.org/p/115044/.
Just in case you are not familiar, read groups are different than read names. We have an explanation in this article here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
So the difference you are seeing is not related to the read name warning. And it looks like the read names are matching the specifications now!
Let me know if you have any other future questions.
Best,
Genevieve
-
Hi Genevieve,
Thank you very much!
So it seems that MARKDUPLICATE works well for me?Dina
-
Yes, it's working as expected.
-
Thank you!
Dina
Please sign in to leave a comment.
11 comments