Denoises read counts to produce denoised copy ratios
Category Copy Number Variant Discovery
Overview
Denoises read counts to produce denoised copy ratios.Typically, a panel of normals produced by CreateReadCountPanelOfNormals is provided as input. The input counts are then standardized by 1) transforming to fractional coverage, 2) performing optional explicit GC-bias correction (if the panel contains GC-content annotated intervals), 3) filtering intervals to those contained in the panel, 4) dividing by interval medians contained in the panel, 5) dividing by the sample median, and 6) transforming to log2 copy ratio. The result is then denoised by subtracting the projection onto the specified number of principal components from the panel.
If no panel is provided, then the input counts are instead standardized by 1) transforming to fractional coverage, 2) performing optional explicit GC-bias correction (if GC-content annotated intervals are provided), 3) dividing by the sample median, and 4) transforming to log2 copy ratio. No denoising is performed, so the denoised result is simply taken to be identical to the standardized result.
If performed, explicit GC-bias correction is done by GCBiasCorrector.
Note that number-of-eigensamples principal components from the input panel will be used for denoising; if only fewer are available in the panel, then they will all be used. This parameter can thus be used to control the amount of denoising, which will ultimately affect the sensitivity of the analysis.
See comments for CreateReadCountPanelOfNormals regarding coverage on sex chromosomes. If sex chromosomes are not excluded from coverage collection, it is strongly recommended that case samples are denoised only with panels containing only individuals of the same sex as the case samples.
Inputs
- Counts TSV or HDF5 file from CollectReadCounts.
- (Optional) Panel-of-normals from CreateReadCountPanelOfNormals. If provided, it will be used to standardize and denoise the input counts. This may include explicit GC-bias correction if annotated intervals were used to create the panel.
- (Optional) GC-content annotated-intervals from AnnotateIntervals. This can be provided in place of a panel of normals to perform explicit GC-bias correction.
Outputs
- Standardized-copy-ratios file. This is a tab-separated values (TSV) file with a SAM-style header containing a read group sample name, a sequence dictionary, a row specifying the column headers contained in CopyRatioCollection.CopyRatioTableColumn, and the corresponding entry rows.
- Denoised-copy-ratios file. This is a tab-separated values (TSV) file with a SAM-style header containing a read group sample name, a sequence dictionary, a row specifying the column headers contained in CopyRatioCollection.CopyRatioTableColumn, and the corresponding entry rows.
Usage examples
gatk DenoiseReadCounts \ -I sample.counts.hdf5 \ --count-panel-of-normals panel_of_normals.pon.hdf5 \ --standardized-copy-ratios sample.standardizedCR.tsv \ --denoised-copy-ratios sample.denoisedCR.tsv
gatk DenoiseReadCounts \ -I sample.counts.hdf5 \ --annotated-intervals annotated_intervals.tsv \ --standardized-copy-ratios sample.standardizedCR.tsv \ --denoised-copy-ratios sample.denoisedCR.tsv
gatk DenoiseReadCounts \ -I sample.counts.hdf5 \ --standardized-copy-ratios sample.standardizedCR.tsv \ --denoised-copy-ratios sample.denoisedCR.tsv
DenoiseReadCounts specific arguments
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Arguments | |||
--denoised-copy-ratios |
null | Output file for denoised copy ratios. | |
--input -I |
null | Input TSV or HDF5 file containing integer read counts in genomic intervals for a single case sample (output of CollectReadCounts). | |
--standardized-copy-ratios |
null | Output file for standardized copy ratios. GC-bias correction will be performed if annotations for GC content are provided. | |
Optional Tool Arguments | |||
--annotated-intervals |
null | Input file containing annotations for GC content in genomic intervals (output of AnnotateIntervals). Intervals must be identical to and in the same order as those in the input read-counts file. If a panel of normals is provided, this input will be ignored. | |
--arguments_file |
[] | read one or more arguments files and add them to the command line | |
--count-panel-of-normals |
null | Input HDF5 file containing the panel of normals (output of CreateReadCountPanelOfNormals). | |
--gcs-max-retries -gcs-retries |
20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection | |
--gcs-project-for-requester-pays |
"" | Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. | |
--help -h |
false | display the help message | |
--number-of-eigensamples |
null | Number of eigensamples to use for denoising. If not specified or if the number of eigensamples available in the panel of normals is smaller than this, all eigensamples will be used. | |
--version |
false | display the version number for this tool | |
Optional Common Arguments | |||
--gatk-config-file |
null | A configuration file to use with the GATK. | |
--QUIET |
false | Whether to suppress job-summary info on System.err. | |
--tmp-dir |
null | Temp directory to use. | |
--use-jdk-deflater -jdk-deflater |
false | Whether to use the JdkDeflater (as opposed to IntelDeflater) | |
--use-jdk-inflater -jdk-inflater |
false | Whether to use the JdkInflater (as opposed to IntelInflater) | |
--verbosity |
INFO | Control verbosity of logging. | |
Advanced Arguments | |||
--showHidden |
false | display hidden arguments |
Argument details
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
--annotated-intervals / NA
Input file containing annotations for GC content in genomic intervals (output of AnnotateIntervals). Intervals must be identical to and in the same order as those in the input read-counts file. If a panel of normals is provided, this input will be ignored.
File null
--arguments_file / NA
read one or more arguments files and add them to the command line
List[File] []
--count-panel-of-normals / NA
Input HDF5 file containing the panel of normals (output of CreateReadCountPanelOfNormals).
File null
--denoised-copy-ratios / NA
Output file for denoised copy ratios.
R File null
--gatk-config-file / NA
A configuration file to use with the GATK.
String null
--gcs-max-retries / -gcs-retries
If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
int 20 [ [ -∞ ∞ ] ]
--gcs-project-for-requester-pays / NA
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.
String ""
--help / -h
display the help message
boolean false
--input / -I
Input TSV or HDF5 file containing integer read counts in genomic intervals for a single case sample (output of CollectReadCounts).
R File null
--number-of-eigensamples / NA
Number of eigensamples to use for denoising. If not specified or if the number of eigensamples available in the panel of normals is smaller than this, all eigensamples will be used.
Integer null
--QUIET / NA
Whether to suppress job-summary info on System.err.
Boolean false
--showHidden / -showHidden
display hidden arguments
boolean false
--standardized-copy-ratios / NA
Output file for standardized copy ratios. GC-bias correction will be performed if annotations for GC content are provided.
R File null
--tmp-dir / NA
Temp directory to use.
GATKPathSpecifier null
--use-jdk-deflater / -jdk-deflater
Whether to use the JdkDeflater (as opposed to IntelDeflater)
boolean false
--use-jdk-inflater / -jdk-inflater
Whether to use the JdkInflater (as opposed to IntelInflater)
boolean false
--verbosity / -verbosity
Control verbosity of logging.
The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:
- ERROR
- WARNING
- INFO
- DEBUG
LogLevel INFO
--version / NA
display the version number for this tool
boolean false
GATK version 4.1.6.0-SNAPSHOT built at Thu, 2 Apr 2020 14:54:17 -0400.
0 comments
Please sign in to leave a comment.