Calculate the fraction of reads coming from cross-sample contamination
Category Diagnostics and Quality Control
Overview
Calculates the fraction of reads coming from cross-sample contamination, given results from GetPileupSummaries. The resulting contamination table is used with FilterMutectCalls.
This tool is featured in the Somatic Short Mutation calling Best Practice Workflow. See Tutorial#11136 for a step-by-step description of the workflow and Article#11127 for an overview of what traditional somatic calling entails. For the latest pipeline scripts, see the Mutect2 WDL scripts directory.
This tool borrows from ContEst by Cibulskis et al the idea of estimating contamination from ref reads at hom alt sites. However, ContEst uses a probabilistic model that assumes a diploid genotype with no copy number variation and independent contaminating reads. That is, ContEst assumes that each contaminating read is drawn randomly and independently from a different human. This tool uses a simpler estimate of contamination that relaxes these assumptions. In particular, it works in the presence of copy number variations and with an arbitrary number of contaminating samples. In addition, this tool is designed to work well with no matched normal data. However, one can run GetPileupSummaries on a matched normal bam file and input the result to this tool.
Usage examples
Tumor-only mode
gatk CalculateContamination \ -I pileups.table \ -O contamination.table
Matched normal mode
gatk CalculateContamination \ -I tumor-pileups.table \ -matched normal-pileups.table \ -O contamination.table
The resulting table provides the fraction contamination, one line per sample, e.g. SampleID--TAB--Contamination. The file has no header.
CalculateContamination specific arguments
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Arguments | |||
--input -I |
The input table | ||
--output -O |
The output table | ||
Optional Tool Arguments | |||
--arguments_file |
read one or more arguments files and add them to the command line | ||
--gcs-max-retries -gcs-retries |
20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection | |
--gcs-project-for-requester-pays |
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed. | ||
--help -h |
false | display the help message | |
--high-coverage-ratio-threshold |
3.0 | The maximum coverage relative to the mean. | |
--low-coverage-ratio-threshold |
0.5 | The minimum coverage relative to the median. | |
--matched-normal -matched |
The matched normal input table | ||
--tumor-segmentation -segments |
The output table containing segmentation of the tumor by minor allele fraction | ||
--version |
false | display the version number for this tool | |
Optional Common Arguments | |||
--gatk-config-file |
A configuration file to use with the GATK. | ||
--QUIET |
false | Whether to suppress job-summary info on System.err. | |
--tmp-dir |
Temp directory to use. | ||
--use-jdk-deflater -jdk-deflater |
false | Whether to use the JdkDeflater (as opposed to IntelDeflater) | |
--use-jdk-inflater -jdk-inflater |
false | Whether to use the JdkInflater (as opposed to IntelInflater) | |
--verbosity |
INFO | Control verbosity of logging. | |
Advanced Arguments | |||
--showHidden |
false | display hidden arguments |
Argument details
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
--arguments_file
read one or more arguments files and add them to the command line
List[File] []
--gatk-config-file
A configuration file to use with the GATK.
String null
--gcs-max-retries / -gcs-retries
If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
int 20 [ [ -∞ ∞ ] ]
--gcs-project-for-requester-pays
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed.
String ""
--help / -h
display the help message
boolean false
--high-coverage-ratio-threshold
The maximum coverage relative to the mean.
double 3.0 [ [ -∞ ∞ ] ]
--input / -I
The input table
R File null
--low-coverage-ratio-threshold
The minimum coverage relative to the median.
double 0.5 [ [ -∞ ∞ ] ]
--matched-normal / -matched
The matched normal input table
File null
--output / -O
The output table
R File null
--QUIET
Whether to suppress job-summary info on System.err.
Boolean false
--showHidden / -showHidden
display hidden arguments
boolean false
--tmp-dir
Temp directory to use.
GATKPath null
--tumor-segmentation / -segments
The output table containing segmentation of the tumor by minor allele fraction
File null
--use-jdk-deflater / -jdk-deflater
Whether to use the JdkDeflater (as opposed to IntelDeflater)
boolean false
--use-jdk-inflater / -jdk-inflater
Whether to use the JdkInflater (as opposed to IntelInflater)
boolean false
--verbosity / -verbosity
Control verbosity of logging.
The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:
- ERROR
- WARNING
- INFO
- DEBUG
LogLevel INFO
--version
display the version number for this tool
boolean false
GATK version 4.2.5.0-SNAPSHOT built at Mon, 7 Feb 2022 11:18:01 -0500.
0 comments
Please sign in to leave a comment.