Converts a FASTQ file to an unaligned BAM or SAM file.
Output read records will contain the original base calls and quality scores will be translated depending on the base quality score encoding: FastqSanger, FastqSolexa and FastqIllumina.
There are also arguments to provide values for SAM header and read attributes that are not present in FASTQ (e.g see RG or SM below).
Inputs
One FASTQ file name for single-end or two for pair-end sequencing input data. These files might be in gzip compressed format (when file name is ending with ".gz").
Alternatively, for larger inputs you can provide a collection of FASTQ files indexed by their name (see USE_SEQUENCIAL_FASTQ for details below).
By default, this tool will try to guess the base quality score encoding. However you can indicate it explicitly using the QUALITY_FORMAT argument.
Output
A single unaligned BAM or SAM file. By default, the records are sorted by query (read) name.
Usage examples
Example 1:
Single-end sequencing FASTQ file conversion. All reads are annotated as belonging to the "rg0013" read group that in turn is part of the sample "sample001".
java -jar picard.jar FastqToSam \ F1=input_reads.fastq \ O=unaligned_reads.bam \ SM=sample001 \ RG=rg0013
Example 2:
Similar to example 1 above, but for paired-end sequencing.
java -jar picard.jar FastqToSam \ F1=forward_reads.fastq \ F2=reverse_reads.fastq \ O=unaligned_read_pairs.bam \ SM=sample001 \ RG=rg0013
Category Read Data Manipulation
Overview
Converts a FASTQ file to an unaligned BAM or SAM file.Output read records will contain the original base calls and quality scores will be translated depending on the base quality score encoding: FastqSanger, FastqSolexa and FastqIllumina.
There are also arguments to provide values for SAM header and read attributes that are not present in FASTQ
(e.g see RG
or SM
below).
Inputs
One FASTQ file name for single-end or two for pair-end sequencing input data. These files might be in gzip compressed format (when file name is ending with ".gz").
Alternatively, for larger inputs you can provide a collection of FASTQ files indexed by their name (see USE_SEQUENCIAL_FASTQ
for details below).
By default, this tool will try to guess the base quality score encoding. However you can indicate it explicitly
using the QUALITY_FORMAT
argument.
Output
A single unaligned BAM or SAM file. By default, the records are sorted by query (read) name.Usage examples
Example 1:
Single-end sequencing FASTQ file conversion. All reads are annotated as belonging to the "rg0013" read group that in turn is part of the sample "sample001".
java -jar picard.jar FastqToSam \ F1=input_reads.fastq \ O=unaligned_reads.bam \ SM=sample001 \ RG=rg0013
Example 2:
Similar to example 1 above, but for paired-end sequencing.
java -jar picard.jar FastqToSam \ F1=forward_reads.fastq \ F2=reverse_reads.fastq \ O=unaligned_read_pairs.bam \ SM=sample001 \ RG=rg0013
FastqToSam (Picard) specific arguments
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Arguments | |||
--FASTQ -F1 |
null | Input fastq file (optionally gzipped) for single end data, or first read in paired end data. | |
--OUTPUT -O |
null | Output SAM/BAM file. | |
--SAMPLE_NAME -SM |
null | Sample name to insert into the read group header | |
Optional Tool Arguments | |||
--ALLOW_AND_IGNORE_EMPTY_LINES |
false | Allow (and ignore) empty lines | |
--arguments_file |
[] | read one or more arguments files and add them to the command line | |
--COMMENT -CO |
[] | Comment(s) to include in the merged output file's header. | |
--DESCRIPTION -DS |
null | Inserted into the read group header | |
--FASTQ2 -F2 |
null | Input fastq file (optionally gzipped) for the second read of paired end data. | |
--help -h |
false | display the help message | |
--LIBRARY_NAME -LB |
null | The library name to place into the LB attribute in the read group header | |
--MAX_Q |
93 | Maximum quality allowed in the input fastq. An exception will be thrown if a quality is greater than this value. | |
--MIN_Q |
0 | Minimum quality allowed in the input fastq. An exception will be thrown if a quality is less than this value. | |
--PLATFORM -PL |
null | The platform type (e.g. ILLUMINA, SOLID) to insert into the read group header | |
--PLATFORM_MODEL -PM |
null | Platform model to insert into the group header (free-form text providing further details of the platform/technology used) | |
--PLATFORM_UNIT -PU |
null | The platform unit (often run_barcode.lane) to insert into the read group header | |
--PREDICTED_INSERT_SIZE -PI |
null | Predicted median insert size, to insert into the read group header | |
--PROGRAM_GROUP -PG |
null | Program group to insert into the read group header. | |
--QUALITY_FORMAT -V |
null | A value describing how the quality values are encoded in the input FASTQ file. Either Solexa (phred scaling + 66), Illumina (phred scaling + 64) or Standard (phred scaling + 33). If this value is not specified, the quality format will be detected automatically. | |
--READ_GROUP_NAME -RG |
A | Read group name | |
--RUN_DATE -DT |
null | Date the run was produced, to insert into the read group header | |
--SEQUENCING_CENTER -CN |
null | The sequencing center from which the data originated | |
--SORT_ORDER -SO |
queryname | The sort order for the output sam/bam file. | |
--USE_SEQUENTIAL_FASTQS |
false | Use sequential fastq files with the suffix _###.fastq or _###.fastq.gz.The files should be named: _001., _002., ..., _XYZ. The base files should be: _001. An example would be: RUNNAME_S8_L005_R1_001.fastq RUNNAME_S8_L005_R1_002.fastq RUNNAME_S8_L005_R1_003.fastq RUNNAME_S8_L005_R1_004.fastq RUNNAME_S8_L005_R1_001.fastq should be provided as FASTQ. | |
--version |
false | display the version number for this tool | |
Optional Common Arguments | |||
--COMPRESSION_LEVEL |
5 | Compression level for all compressed files created (e.g. BAM and VCF). | |
--CREATE_INDEX |
false | Whether to create a BAM index when writing a coordinate-sorted BAM file. | |
--CREATE_MD5_FILE |
false | Whether to create an MD5 digest for any BAM or FASTQ files created. | |
--GA4GH_CLIENT_SECRETS |
client_secrets.json | Google Genomics API client_secrets.json file path. | |
--MAX_RECORDS_IN_RAM |
500000 | When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. | |
--QUIET |
false | Whether to suppress job-summary info on System.err. | |
--REFERENCE_SEQUENCE -R |
null | Reference sequence file. | |
--TMP_DIR |
[] | One or more directories with space available to be used by this program for temporary storage of working files | |
--USE_JDK_DEFLATER -use_jdk_deflater |
false | Use the JDK Deflater instead of the Intel Deflater for writing compressed output | |
--USE_JDK_INFLATER -use_jdk_inflater |
false | Use the JDK Inflater instead of the Intel Inflater for reading compressed input | |
--VALIDATION_STRINGENCY |
STRICT | Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. | |
--VERBOSITY |
INFO | Control verbosity of logging. | |
Advanced Arguments | |||
--showHidden |
false | display hidden arguments | |
Deprecated Arguments | |||
--STRIP_UNPAIRED_MATE_NUMBER |
false | Deprecated (No longer used). If true and this is an unpaired fastq any occurrence of '/1' or '/2' will be removed from the end of a read name. |
Argument details
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
--ALLOW_AND_IGNORE_EMPTY_LINES / NA
Allow (and ignore) empty lines
Boolean false
--arguments_file / NA
read one or more arguments files and add them to the command line
List[File] []
--COMMENT / -CO
Comment(s) to include in the merged output file's header.
List[String] []
--COMPRESSION_LEVEL / NA
Compression level for all compressed files created (e.g. BAM and VCF).
int 5 [ [ -∞ ∞ ] ]
--CREATE_INDEX / NA
Whether to create a BAM index when writing a coordinate-sorted BAM file.
Boolean false
--CREATE_MD5_FILE / NA
Whether to create an MD5 digest for any BAM or FASTQ files created.
boolean false
--DESCRIPTION / -DS
Inserted into the read group header
String null
--FASTQ / -F1
Input fastq file (optionally gzipped) for single end data, or first read in paired end data.
R File null
--FASTQ2 / -F2
Input fastq file (optionally gzipped) for the second read of paired end data.
File null
--GA4GH_CLIENT_SECRETS / NA
Google Genomics API client_secrets.json file path.
String client_secrets.json
--help / -h
display the help message
boolean false
--LIBRARY_NAME / -LB
The library name to place into the LB attribute in the read group header
String null
--MAX_Q / NA
Maximum quality allowed in the input fastq. An exception will be thrown if a quality is greater than this value.
int 93 [ [ -∞ ∞ ] ]
--MAX_RECORDS_IN_RAM / NA
When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
Integer 500000 [ [ -∞ ∞ ] ]
--MIN_Q / NA
Minimum quality allowed in the input fastq. An exception will be thrown if a quality is less than this value.
int 0 [ [ -∞ ∞ ] ]
--OUTPUT / -O
Output SAM/BAM file.
R File null
--PLATFORM / -PL
The platform type (e.g. ILLUMINA, SOLID) to insert into the read group header
String null
--PLATFORM_MODEL / -PM
Platform model to insert into the group header (free-form text providing further details of the platform/technology used)
String null
--PLATFORM_UNIT / -PU
The platform unit (often run_barcode.lane) to insert into the read group header
String null
--PREDICTED_INSERT_SIZE / -PI
Predicted median insert size, to insert into the read group header
Integer null
--PROGRAM_GROUP / -PG
Program group to insert into the read group header.
String null
--QUALITY_FORMAT / -V
A value describing how the quality values are encoded in the input FASTQ file. Either Solexa (phred scaling + 66), Illumina (phred scaling + 64) or Standard (phred scaling + 33). If this value is not specified, the quality format will be detected automatically.
The --QUALITY_FORMAT argument is an enumerated type (FastqQualityFormat), which can have one of the following values:
- Solexa
- Illumina
- Standard
FastqQualityFormat null
--QUIET / NA
Whether to suppress job-summary info on System.err.
Boolean false
--READ_GROUP_NAME / -RG
Read group name
String A
--REFERENCE_SEQUENCE / -R
Reference sequence file.
File null
--RUN_DATE / -DT
Date the run was produced, to insert into the read group header
Iso8601Date null
--SAMPLE_NAME / -SM
Sample name to insert into the read group header
R String null
--SEQUENCING_CENTER / -CN
The sequencing center from which the data originated
String null
--showHidden / -showHidden
display hidden arguments
boolean false
--SORT_ORDER / -SO
The sort order for the output sam/bam file.
The --SORT_ORDER argument is an enumerated type (SortOrder), which can have one of the following values:
- unsorted
- queryname
- coordinate
- duplicate
- unknown
SortOrder queryname
--STRIP_UNPAIRED_MATE_NUMBER / NA
Deprecated (No longer used). If true and this is an unpaired fastq any occurrence of '/1' or '/2' will be removed from the end of a read name.
Boolean false
--TMP_DIR / NA
One or more directories with space available to be used by this program for temporary storage of working files
List[File] []
--USE_JDK_DEFLATER / -use_jdk_deflater
Use the JDK Deflater instead of the Intel Deflater for writing compressed output
Boolean false
--USE_JDK_INFLATER / -use_jdk_inflater
Use the JDK Inflater instead of the Intel Inflater for reading compressed input
Boolean false
--USE_SEQUENTIAL_FASTQS / NA
Use sequential fastq files with the suffix _###.fastq or _###.fastq.gz.The files should be named:
_001., _002., ..., _XYZ.
The base files should be:
_001.
An example would be:
RUNNAME_S8_L005_R1_001.fastq
RUNNAME_S8_L005_R1_002.fastq
RUNNAME_S8_L005_R1_003.fastq
RUNNAME_S8_L005_R1_004.fastq
RUNNAME_S8_L005_R1_001.fastq should be provided as FASTQ.
boolean false
--VALIDATION_STRINGENCY / NA
Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:
- STRICT
- LENIENT
- SILENT
ValidationStringency STRICT
--VERBOSITY / NA
Control verbosity of logging.
The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:
- ERROR
- WARNING
- INFO
- DEBUG
LogLevel INFO
--version / NA
display the version number for this tool
boolean false
GATK version 4.1.3.0 built at Sat, 23 Nov 2019 16:20:54 -0500.
2 comments
I am using docker on windows with the latest gatk installed. I am getting errors like this:
no main manifest attribute, in picard.jar
and
Error: Unable to access jarfile picard.jar
Where can I find out how to fix these issues while I try to use FastqToSam?
I assume that the syntax in the case of pair-end and using the option --USE_SEQUENTIAL_FASTQS would be something like:
RUNNAME_S8_L005_R1_001.fastq RUNNAME_S8_L005_R2_001.fastq RUNNAME_S8_L005_R1_002.fastq RUNNAME_S8_L005_R2_002.fastq
would that be correct?
Please sign in to leave a comment.