Make a panel of normals for use with Mutect2
Category Variant Filtering
Overview
Create a panel of normals (PoN) containing germline and artifactual sites for use with Mutect2.The tool takes multiple normal sample callsets produced by Mutect2's tumor-only mode and collates them into a single variant call format (VCF) file of false positive calls. The PoN captures common artifactual and germline variant sites. Mutect2 then uses the PoN to filter variants at the site-level.
This contrasts with the GATK3 workflow, which uses CombineVariants to retain variant sites called in at least two samples and then uses Picard MakeSitesOnlyVcf to simplify the callset for use as a PoN.
Examples
Step 1. Run Mutect2 in tumor-only mode for each normal sample.
gatk Mutect2 \ -R ref_fasta.fa \ -I normal1.bam \ -tumor normal1_sample_name \ --germline-resource af-only-gnomad.vcf.gz \ -O normal1_for_pon.vcf.gz
Step 2. Create a file ending with .args extension with the paths to the VCFs from step 1, one per line. This approach is optional. It will fail if a file with an extension other than .args is used.
normal1_for_pon.vcf.gz normal2_for_pon.vcf.gz normal3_for_pon.vcf.gz
Step 3. Combine the normal calls using CreateSomaticPanelOfNormals.
gatk CreateSomaticPanelOfNormals \ -vcfs normals_for_pon_vcf.args \ -O pon.vcf.gz
Alternatively, provide each normal's VCF as separate arguments.
gatk CreateSomaticPanelOfNormals \ -vcfs normal1_for_pon_vcf.gz \ -vcfs normal2_for_pon_vcf.gz \ -vcfs normal3_for_pon_vcf.gz \ -O pon.vcf.gz
The tool also accepts multiple .args files. Pass each in with the -vcfs option.
By default the tool fails if multiple vcfs have the same sample name, but the --duplicate-sample-strategy argument can be changed to ALLOW_ALL to allow duplicates or CHOOSE_FIRST to use only the first vcf with a given sample name.
See Mutect2 documentation for usage examples.
CreateSomaticPanelOfNormals specific arguments
This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.
Argument name(s) | Default value | Summary | |
---|---|---|---|
Required Arguments | |||
--output -O |
null | Output vcf | |
--vcfs |
[] | VCFs for samples to include. May be specified either one at a time, or as one or more .args file containing multiple VCFs, one per line. | |
Optional Tool Arguments | |||
--arguments_file |
[] | read one or more arguments files and add them to the command line | |
--duplicate-sample-strategy |
THROW_ERROR | How to handle duplicate samples: THROW_ERROR to fail, CHOOSE_FIRST to use the first vcf with each sample name, ALLOW_ALL to use all samples regardless of duplicate sample names. | |
--gcs-max-retries -gcs-retries |
20 | If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection | |
--help -h |
false | display the help message | |
--version |
false | display the version number for this tool | |
Optional Common Arguments | |||
--gatk-config-file |
null | A configuration file to use with the GATK. | |
--QUIET |
false | Whether to suppress job-summary info on System.err. | |
--TMP_DIR |
[] | Undocumented option | |
--use-jdk-deflater -jdk-deflater |
false | Whether to use the JdkDeflater (as opposed to IntelDeflater) | |
--use-jdk-inflater -jdk-inflater |
false | Whether to use the JdkInflater (as opposed to IntelInflater) | |
--verbosity |
INFO | Control verbosity of logging. | |
Advanced Arguments | |||
--showHidden |
false | display hidden arguments |
Argument details
Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.
--arguments_file / NA
read one or more arguments files and add them to the command line
List[File] []
--duplicate-sample-strategy / NA
How to handle duplicate samples: THROW_ERROR to fail, CHOOSE_FIRST to use the first vcf with each sample name, ALLOW_ALL to use all samples regardless of duplicate sample names.
How to handle duplicate samples: THROW_ERROR to fail, CHOOSE_FIRST to use the first vcf with each sample name, ALLOW_ALL to use all samples regardless of duplicate sample names."
The --duplicate-sample-strategy argument is an enumerated type (DuplicateSampleStrategy), which can have one of the following values:
- THROW_ERROR
- CHOOSE_FIRST
- ALLOW_ALL
DuplicateSampleStrategy THROW_ERROR
--gatk-config-file / NA
A configuration file to use with the GATK.
String null
--gcs-max-retries / -gcs-retries
If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
int 20 [ [ -∞ ∞ ] ]
--help / -h
display the help message
boolean false
--output / -O
Output vcf
R File null
--QUIET / NA
Whether to suppress job-summary info on System.err.
Boolean false
--showHidden / -showHidden
display hidden arguments
boolean false
--TMP_DIR / NA
Undocumented option
List[File] []
--use-jdk-deflater / -jdk-deflater
Whether to use the JdkDeflater (as opposed to IntelDeflater)
boolean false
--use-jdk-inflater / -jdk-inflater
Whether to use the JdkInflater (as opposed to IntelInflater)
boolean false
--vcfs / -vcfs
VCFs for samples to include. May be specified either one at a time, or as one or more .args file containing multiple VCFs, one per line.
The VCFs can be input as either one or more .args file(s) containing one VCF per line, or VCFs can be
specified explicitly on the command line.
R Set[File] []
--verbosity / -verbosity
Control verbosity of logging.
The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:
- ERROR
- WARNING
- INFO
- DEBUG
LogLevel INFO
--version / NA
display the version number for this tool
boolean false
GATK version 4.0.0.0 built at 27-36-2019 11:36:13.
0 comments
Please sign in to leave a comment.