Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicates TAG_DUPLICATE_SET_MEMBERS=true does not add DS and DI tags in the output BAM file

Answered
0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Adeline Morez,

    Thanks for writing into the forum! This seems like it could be a bug. Could you share your log from the MarkDuplicates command to confirm that something else strange isn't happening?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Adeline Morez

    Hi Genevieve,

     

    Many thanks for your reply. You can find the log below.

     

    Best,

    Adeline

    INFO 2021-10-27 14:53:43 MarkDuplicates

    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    ********** MarkDuplicates -I input.coordsorted.bam -O output.markduplicates.bam -M metrics.txt -TAG_DUPLICATE_SET_MEMBERS true
    **********


    14:53:44.571 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/nspamore/software/picard-2.26.3/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Oct 27 14:53:44 BST 2021] MarkDuplicates TAG_DUPLICATE_SET_MEMBERS=true INPUT=[input.coordsorted.bam] OUTPUT=output.markduplicates.bam METRICS_FILE=metrics.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Wed Oct 27 14:53:44 BST 2021] Executing as nspamore@genome.jmu.ac.uk on Linux 3.10.0-1160.25.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_312-b07; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.3
    INFO 2021-10-27 14:53:44 MarkDuplicates Start of doWork freeMemory: 2037170912; totalMemory: 2058354688; maxMemory: 28631367680
    INFO 2021-10-27 14:53:44 MarkDuplicates Reading input file and constructing read end information.
    INFO 2021-10-27 14:53:44 MarkDuplicates Will retain up to 103736839 data points before spilling to disk.
    INFO 2021-10-27 14:53:51 MarkDuplicates Read 83777 records. 0 pairs never matched.
    INFO 2021-10-27 14:53:52 MarkDuplicates After buildSortedReadEndLists freeMemory: 1207695960; totalMemory: 2058354688; maxMemory: 28631367680
    INFO 2021-10-27 14:53:52 MarkDuplicates Will retain up to 447365120 duplicate indices before spilling to disk.
    INFO 2021-10-27 14:53:56 MarkDuplicates Traversing read pair information and detecting duplicates.
    INFO 2021-10-27 14:53:56 MarkDuplicates Traversing fragment information and detecting duplicates.
    INFO 2021-10-27 14:53:57 MarkDuplicates Sorting list of duplicate records.
    INFO 2021-10-27 14:53:58 MarkDuplicates After generateDuplicateIndexes freeMemory: 2570584264; totalMemory: 7964983296; maxMemory: 28631367680
    INFO 2021-10-27 14:53:58 MarkDuplicates Marking 56068 records as duplicates.
    INFO 2021-10-27 14:53:58 MarkDuplicates Found 0 optical duplicate clusters.
    INFO 2021-10-27 14:53:58 MarkDuplicates Reads are assumed to be ordered by: coordinate
    INFO 2021-10-27 14:54:00 MarkDuplicates Writing complete. Closing input iterator.
    INFO 2021-10-27 14:54:00 MarkDuplicates Duplicate Index cleanup.
    INFO 2021-10-27 14:54:00 MarkDuplicates Representative read Index cleanup.
    INFO 2021-10-27 14:54:00 MarkDuplicates Getting Memory Stats.
    INFO 2021-10-27 14:54:01 MarkDuplicates Before output close freeMemory: 6149059176; totalMemory: 7964983296; maxMemory: 28631367680
    INFO 2021-10-27 14:54:01 MarkDuplicates Closed outputs. Getting more Memory Stats.
    INFO 2021-10-27 14:54:04 MarkDuplicates After output close freeMemory: 6149432648; totalMemory: 7964983296; maxMemory: 28631367680
    [Wed Oct 27 14:54:04 BST 2021] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.34 minutes.
    Runtime.totalMemory()=7964983296
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks Adeline Morez.

    I have created an issue ticket in the Picard repository here: https://github.com/broadinstitute/picard/issues/1741. The developers will take a closer look and provide fixes to this bug there.

    Thank you for bringing this to our attention!

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk