Assigns all the reads in a file to a single new read-group. This tool accepts INPUT BAM and SAM files or URLs from the Global Alliance for Genomics and Health (GA4GH).
Usage example:
java -jar picard.jar AddOrReplaceReadGroups \ I=input.bam \ O=output.bam \ RGID=4 \ RGLB=lib1 \ RGPL=ILLUMINA \ RGPU=unit1 \ RGSM=20Caveats
The value of the tags must adhere (according to the SAM-spec) with the regex'^[ -~]+$'
(one or more characters from the ASCII range 32 through 126). In particular <Space> is the only non-printing character allowed.
The program enables only the wholesale assignment of all the reads in the INPUT to a single read-group. If your file already has reads assigned to multiple read-groups, the original RG value will be lost.
For more information about read-groups, see the GATK Dictionary entry.
Category Read Data Manipulation
Overview
Assigns all the reads in a file to a single new read-group.Summary
Many tools (Picard and GATK for example) require or assume the presence of at least oneRG
tag, defining a "read-group"
to which each read can be assigned (as specified in the RG
tag in the SAM record).
This tool enables the user to assign all the reads in the #INPUT to a single new read-group.
For more information about read-groups, see the
GATK Dictionary entry.
This tool accepts as INPUT BAM and SAM files or URLs from the Global Alliance for Genomics and Health (GA4GH).
Usage example:
java -jar picard.jar AddOrReplaceReadGroups \ I=input.bam \ O=output.bam \ RGID=4 \ RGLB=lib1 \ RGPL=ILLUMINA \ RGPU=unit1 \ RGSM=20
Caveats
The value of the tags must adhere (according to the SAM-spec) with the regex#READGROUP_ID_REGEX(one or more characters from the ASCII range 32 through 126). In particular
<Space>
is the only non-printing character allowed.
The program enables only the wholesale assignment of all the reads in the #INPUT to a single read-group. If your file already has reads assigned to multiple read-groups, the original
RG
value will be lost.
AddOrReplaceReadGroups (Picard) specific arguments
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Arguments | |||
--INPUT -I |
null | Input file (BAM or SAM or a GA4GH url). | |
--OUTPUT -O |
null | Output file (BAM or SAM). | |
--RGLB -LB |
null | Read-Group library | |
--RGPL -PL |
null | Read-Group platform (e.g. ILLUMINA, SOLID) | |
--RGPU -PU |
null | Read-Group platform unit (eg. run barcode) | |
--RGSM -SM |
null | Read-Group sample name | |
Optional Tool Arguments | |||
--arguments_file |
[] | read one or more arguments files and add them to the command line | |
--help -h |
false | display the help message | |
--RGCN -CN |
null | Read-Group sequencing center name | |
--RGDS -DS |
null | Read-Group description | |
--RGDT -DT |
null | Read-Group run date | |
--RGFO -FO |
null | Read-Group flow order | |
--RGID -ID |
1 | Read-Group ID | |
--RGKS -KS |
null | Read-Group key sequence | |
--RGPG -PG |
null | Read-Group program group | |
--RGPI -PI |
null | Read-Group predicted insert size | |
--RGPM -PM |
null | Read-Group platform model | |
--SORT_ORDER -SO |
null | Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. | |
--version |
false | display the version number for this tool | |
Optional Common Arguments | |||
--COMPRESSION_LEVEL |
5 | Compression level for all compressed files created (e.g. BAM and VCF). | |
--CREATE_INDEX |
false | Whether to create a BAM index when writing a coordinate-sorted BAM file. | |
--CREATE_MD5_FILE |
false | Whether to create an MD5 digest for any BAM or FASTQ files created. | |
--GA4GH_CLIENT_SECRETS |
client_secrets.json | Google Genomics API client_secrets.json file path. | |
--MAX_RECORDS_IN_RAM |
500000 | When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. | |
--QUIET |
false | Whether to suppress job-summary info on System.err. | |
--REFERENCE_SEQUENCE -R |
null | Reference sequence file. | |
--TMP_DIR |
[] | One or more directories with space available to be used by this program for temporary storage of working files | |
--USE_JDK_DEFLATER -use_jdk_deflater |
false | Use the JDK Deflater instead of the Intel Deflater for writing compressed output | |
--USE_JDK_INFLATER -use_jdk_inflater |
false | Use the JDK Inflater instead of the Intel Inflater for reading compressed input | |
--VALIDATION_STRINGENCY |
STRICT | Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. | |
--VERBOSITY |
INFO | Control verbosity of logging. | |
Advanced Arguments | |||
--showHidden |
false | display hidden arguments |
Argument details
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
--arguments_file / NA
read one or more arguments files and add them to the command line
List[File] []
--COMPRESSION_LEVEL / NA
Compression level for all compressed files created (e.g. BAM and VCF).
int 5 [ [ -∞ ∞ ] ]
--CREATE_INDEX / NA
Whether to create a BAM index when writing a coordinate-sorted BAM file.
Boolean false
--CREATE_MD5_FILE / NA
Whether to create an MD5 digest for any BAM or FASTQ files created.
boolean false
--GA4GH_CLIENT_SECRETS / NA
Google Genomics API client_secrets.json file path.
String client_secrets.json
--help / -h
display the help message
boolean false
--INPUT / -I
Input file (BAM or SAM or a GA4GH url).
R String null
--MAX_RECORDS_IN_RAM / NA
When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
Integer 500000 [ [ -∞ ∞ ] ]
--OUTPUT / -O
Output file (BAM or SAM).
R File null
--QUIET / NA
Whether to suppress job-summary info on System.err.
Boolean false
--REFERENCE_SEQUENCE / -R
Reference sequence file.
File null
--RGCN / -CN
Read-Group sequencing center name
String null
--RGDS / -DS
Read-Group description
String null
--RGDT / -DT
Read-Group run date
Iso8601Date null
--RGFO / -FO
Read-Group flow order
String null
--RGID / -ID
Read-Group ID
String 1
--RGKS / -KS
Read-Group key sequence
String null
--RGLB / -LB
Read-Group library
R String null
--RGPG / -PG
Read-Group program group
String null
--RGPI / -PI
Read-Group predicted insert size
Integer null
--RGPL / -PL
Read-Group platform (e.g. ILLUMINA, SOLID)
R String null
--RGPM / -PM
Read-Group platform model
String null
--RGPU / -PU
Read-Group platform unit (eg. run barcode)
R String null
--RGSM / -SM
Read-Group sample name
R String null
--showHidden / -showHidden
display hidden arguments
boolean false
--SORT_ORDER / -SO
Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT.
The --SORT_ORDER argument is an enumerated type (SortOrder), which can have one of the following values:
- unsorted
- queryname
- coordinate
- duplicate
- unknown
SortOrder null
--TMP_DIR / NA
One or more directories with space available to be used by this program for temporary storage of working files
List[File] []
--USE_JDK_DEFLATER / -use_jdk_deflater
Use the JDK Deflater instead of the Intel Deflater for writing compressed output
Boolean false
--USE_JDK_INFLATER / -use_jdk_inflater
Use the JDK Inflater instead of the Intel Inflater for reading compressed input
Boolean false
--VALIDATION_STRINGENCY / NA
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:
- STRICT
- LENIENT
- SILENT
ValidationStringency STRICT
--VERBOSITY / NA
Control verbosity of logging.
The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:
- ERROR
- WARNING
- INFO
- DEBUG
LogLevel INFO
--version / NA
display the version number for this tool
boolean false
GATK version 4.1.4.1 built at Thu, 5 Dec 2019 09:51:56 -0500.
4 comments
Hi, the link for GATK dictionary entry is missing.
AddOrReplaceReadGroups doesn't add an @PG program line for the call to this program. MarkDuplicates does. Is there a list of which Picard tools add @PG line for metadata processing and which don't?
Hi, Could you tell me how can I get a read-group library?I can not find it .
Thank you. We are trying to run GermlineCNVCaller and we keep getting this error:
Command: gatk --java-options -Xmx4g GermlineCNVCaller -I input.bam -O output.bam --output-prefix gatk --run-mode COHORT --intervals targets.bed --contig-ploidy-calls 2 -- interval-merging-rule OVERLAPPING_ONLY
Error: java.lang.IllegalArgumentException: The collection is empty: The input header does not contain any read groups. Cannot determine a sample name.
We've checked our BAMs and confirmed with SAMtools that there is a header:
Command: samtools view -H Test.bam | grep '^@RG' | less -S
Output: @RG ID:foo LB:bar PL:illumina SM:Sample1 PU:7
We re-run the AddOrReplaceReadGroups as explained in this article but the error persists. Any feedback would be very appreciated. Thanks!
Please sign in to leave a comment.