Train a CNN model for filtering variants
Category Variant Filtering
Overview
Train a Convolutional Neural Network (CNN) for filtering variants. This tool expects requires training data generated by CNNVariantWriteTensors.Inputs
- data-dir The training data created by CNNVariantWriteTensors.
- The --tensor-type argument determines what types of tensors the model will expect. Set it to "reference" for 1D tensors or "read_tensor" for 2D tensors.
Outputs
- output-dir The model weights file and semantic configuration json are saved here. This default to the current working directory.
- model-name The name for your model.
Usage example
Train a 1D CNN on Reference Tensors
gatk CNNVariantTrain \ --tensor-type reference \ --input-tensor-dir my_tensor_folder \ --model-name my_1d_model
Train a 2D CNN on Read Tensors
gatk CNNVariantTrain \ --input-tensor-dir my_tensor_folder \ --tensor-type read-tensor \ --model-name my_2d_model
CNNVariantTrain specific arguments
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Arguments | |||
--input-tensor-dir |
null | Directory of training tensors to create. | |
Optional Tool Arguments | |||
--arguments_file |
[] | read one or more arguments files and add them to the command line | |
--epochs |
10 | Maximum number of training epochs. | |
--gcs-max-retries -gcs-retries |
20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection | |
--gcs-project-for-requester-pays |
"" | Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. | |
--help -h |
false | display the help message | |
--image-dir |
null | Path where plots and figures are saved. | |
--model-name |
variant_filter_model | Name of the model to be trained. | |
--output-dir |
./ | Directory where models will be saved, defaults to current working directory. | |
--tensor-type |
reference | Name of the tensors to generate, reference for 1D reference tensors and read_tensor for 2D tensors. | |
--training-steps |
10 | Number of training steps per epoch. | |
--validation-steps |
2 | Number of validation steps per epoch. | |
--version |
false | display the version number for this tool | |
Optional Common Arguments | |||
--gatk-config-file |
null | A configuration file to use with the GATK. | |
--QUIET |
false | Whether to suppress job-summary info on System.err. | |
--tmp-dir |
null | Temp directory to use. | |
--use-jdk-deflater -jdk-deflater |
false | Whether to use the JdkDeflater (as opposed to IntelDeflater) | |
--use-jdk-inflater -jdk-inflater |
false | Whether to use the JdkInflater (as opposed to IntelInflater) | |
--verbosity |
INFO | Control verbosity of logging. | |
Advanced Arguments | |||
--annotation-set |
best_practices | Which set of annotations to use. | |
--channels-last |
true | Store the channels in the last axis of tensors, tensorflow->true, theano->false | |
--showHidden |
false | display hidden arguments |
Argument details
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
--annotation-set / -annotation-set
Which set of annotations to use.
String best_practices
--arguments_file / NA
read one or more arguments files and add them to the command line
List[File] []
--channels-last / -channels-last
Store the channels in the last axis of tensors, tensorflow->true, theano->false
boolean true
--epochs / -epochs
Maximum number of training epochs.
int 10 [ [ 0 ∞ ] ]
--gatk-config-file / NA
A configuration file to use with the GATK.
String null
--gcs-max-retries / -gcs-retries
If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
int 20 [ [ -∞ ∞ ] ]
--gcs-project-for-requester-pays / NA
Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed.
String ""
--help / -h
display the help message
boolean false
--image-dir / -image-dir
Path where plots and figures are saved.
String null
--input-tensor-dir / -input-tensor-dir
Directory of training tensors to create.
R String null
--model-name / -model-name
Name of the model to be trained.
String variant_filter_model
--output-dir / -output-dir
Directory where models will be saved, defaults to current working directory.
String ./
--QUIET / NA
Whether to suppress job-summary info on System.err.
Boolean false
--showHidden / -showHidden
display hidden arguments
boolean false
--tensor-type / -tensor-type
Name of the tensors to generate, reference for 1D reference tensors and read_tensor for 2D tensors.
The --tensor-type argument is an enumerated type (TensorType), which can have one of the following values:
- reference
- read_tensor
TensorType reference
--tmp-dir / NA
Temp directory to use.
String null
--training-steps / -training-steps
Number of training steps per epoch.
int 10 [ [ 0 ∞ ] ]
--use-jdk-deflater / -jdk-deflater
Whether to use the JdkDeflater (as opposed to IntelDeflater)
boolean false
--use-jdk-inflater / -jdk-inflater
Whether to use the JdkInflater (as opposed to IntelInflater)
boolean false
--validation-steps / -validation-steps
Number of validation steps per epoch.
int 2 [ [ 0 ∞ ] ]
--verbosity / -verbosity
Control verbosity of logging.
The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:
- ERROR
- WARNING
- INFO
- DEBUG
LogLevel INFO
--version / NA
display the version number for this tool
boolean false
GATK version 4.0.10.0 built at 25-30-2019 05:30:15.
0 comments
Please sign in to leave a comment.