Error in Running command GtcToVcf
REQUIRED for all errors and issues:
a) GATK version used: v4.4.0.0
b) Exact command used: GtcToVcf -INPUT 206307460092_R07C01.gtc -REFERENCE_SEQUENCE hg19_v0_Homo_sapiens_assembly19.fasta -OUTPUT 206307460092_R07C01.vcf -EXTENDED_ILLUMINA_MANIFEST /home/user3/Documents/ASA-24v1-0_E1.extended.csv -CLUSTER_FILE /home/user3/Documents/4th_Run.egt -ILLUMINA_BEAD_POOL_MANIFEST_FILE /home/user3/Documents/ASA-24v1-0_E1.bpm -SAMPLE_ALIAS my_sample_alias
c) Entire program log:
ava -jar /usr/local/picard.jar GtcToVcf INPUT=206307460092_R07C01.gtc REFERENCE_SEQUENCE=hg19_v0_Homo_sapiens_assembly19.fasta OUTPUT=206307460092_R07C01.vcf EXTENDED_ILLUMINA_MANIFEST=/home/user3/Documents/ASA-24v1-0_E1.extended.csv CLUSTER_FILE=/home/user3/Documents/4th_Run.egt ILLUMINA_BEAD_POOL_MANIFEST_FILE=/home/user3/Documents/ASA-24v1-0_E1.bpm SAMPLE_ALIAS=my_sample_alias
INFO 2023-03-29 13:22:25 GtcToVcf
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
**********
https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** GtcToVcf -INPUT 206307460092_R07C01.gtc -REFERENCE_SEQUENCE hg19_v0_Homo_sapiens_assembly19.fasta -OUTPUT 206307460092_R07C01.vcf -EXTENDED_ILLUMINA_MANIFEST /home/user3/Documents/ASA-24v1-0_E1.extended.csv -CLUSTER_FILE /home/user3/Documents/4th_Run.egt -ILLUMINA_BEAD_POOL_MANIFEST_FILE /home/user3/Documents/ASA-24v1-0_E1.bpm -SAMPLE_ALIAS my_sample_alias
**********
13:22:25.572 WARN LegacyCommandLineArgumentParser - Hidden arguments are always printed in LegacyCommandLineArgumentParser
USAGE: GtcToVcf [options]
Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#GtcToVcf
GtcToVcf takes an Illumina GTC file and converts it to a VCF file using several supporting files. A GTC file is an
Illumina-specific file containing called genotypes in AA/AB/BB format.
(https://github.com/Illumina/BeadArrayFiles/blob/develop/docs/GTC_File_Format_v5.pdf) A VCF, aka Variant Calling Format,
is a text file for storing how a sequenced sample differs from the reference genome.
(http://software.broadinstitute.org/software/igv/book/export/html/184)
Usage example:
java -jar picard.jar GtcToVcf \
INPUT=input.gtc \
REFERENCE_SEQUENCE=reference.fasta \
OUTPUT=output.vcf \
EXTENDED_ILLUMINA_MANIFEST=chip_name.extended.csv \
CLUSTER_FILE=chip_name.egt \
ILLUMINA_BEAD_POOL_MANIFEST_FILE=chip_name.bpm \
SAMPLE_ALIAS=my_sample_alias \
Version: 3.0.0-1-g62ec81c-SNAPSHOT
Options:
--help
-h Displays options specific to this tool.
--stdhelp
-H Displays options specific to this tool AND options common to all Picard command line
tools.
--version Displays program version.
INPUT=File
I=File GTC file to be converted Required.
OUTPUT=File
O=File The output VCF file to write. Required.
EXTENDED_ILLUMINA_MANIFEST=File
MANIFEST=File An Extended Illumina Manifest file (csv). This is an extended version of the Illumina
manifest it contains additional reference-specific fields Required.
CLUSTER_FILE=File
CF=File An Illumina cluster file (egt) Required.
ILLUMINA_BEAD_POOL_MANIFEST_FILE=File
BPM_FILE=File The Illumina Bead Pool Manifest (.bpm) file Required.
EXPECTED_GENDER=String
E_GENDER=String The expected gender for this sample. Default value: null.
SAMPLE_ALIAS=String The sample alias Required.
PIPELINE_VERSION=String The version of the pipeline used to generate this VCF Default value: null.
ANALYSIS_VERSION_NUMBER=Integer
The analysis version of the data used to generate this VCF Default value: null.
GENDER_GTC=File
G_GTC=File An optional GTC file that was generated by calling the chip using a cluster file designed
to optimize gender calling. Default value: null.
FINGERPRINT_GENOTYPES_VCF_FILE=File
FP_VCF=File The fingerprint VCF for this sample Default value: null.
DO_NOT_ALLOW_CALLS_ON_ZEROED_OUT_ASSAYS=Boolean
Causes the program to fail if it finds a case where there is a call on an assay that is
flagged as 'zeroed-out' in the Illumina cluster file. Default value: false. This option
can be set to 'null' to clear the default value. Possible values: {true, false}
TMP_DIR=File One or more directories with space available to be used by this program for temporary
storage of working files Default value: null. This option may be specified 0 or more
times.
VERBOSITY=LogLevel Control verbosity of logging. Default value: INFO. This option can be set to 'null' to
clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG}
QUIET=Boolean Whether to suppress job-summary info on System.err. Default value: false. This option can
be set to 'null' to clear the default value. Possible values: {true, false}
VALIDATION_STRINGENCY=ValidationStringency
Validation stringency for all SAM files read by this program. Setting stringency to
SILENT can improve performance when processing a BAM file in which variable-length data
(read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This
option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT,
SILENT}
COMPRESSION_LEVEL=Integer Compression level for all compressed files created (e.g. BAM and VCF). Default value: 5.
This option can be set to 'null' to clear the default value.
MAX_RECORDS_IN_RAM=Integer When writing files that need to be sorted, this will specify the number of records stored
in RAM before spilling to disk. Increasing this number reduces the number of file handles
needed to sort the file, and increases the amount of RAM needed. Default value: 500000.
This option can be set to 'null' to clear the default value.
CREATE_INDEX=Boolean Whether to create an index when writing VCF or coordinate sorted BAM output. Default
value: false. This option can be set to 'null' to clear the default value. Possible
values: {true, false}
CREATE_MD5_FILE=Boolean Whether to create an MD5 digest for any BAM or FASTQ files created. Default value:
false. This option can be set to 'null' to clear the default value. Possible values:
{true, false}
REFERENCE_SEQUENCE=File
R=File Reference sequence file. Required.
USE_JDK_DEFLATER=Boolean
USE_JDK_DEFLATER=Boolean Use the JDK Deflater instead of the Intel Deflater for writing compressed output Default
value: false. This option can be set to 'null' to clear the default value. Possible
values: {true, false}
USE_JDK_INFLATER=Boolean
USE_JDK_INFLATER=Boolean Use the JDK Inflater instead of the Intel Inflater for reading compressed input Default
value: false. This option can be set to 'null' to clear the default value. Possible
values: {true, false}
OPTIONS_FILE=File File of OPTION_NAME=value pairs. No positional parameters allowed. Unlike command-line
options, unrecognized options are ignored. A single-valued option set in an options file
may be overridden by a subsequent command-line option. A line starting with '#' is
considered a comment. Required.
The selected reference sequence ('true') is not supported. This tool is currently only implemented to support NCBI Build 37 / HG19 Reference Sequence.
-
Hi GATK Team,
I am currently working on converting gtc to vcf file. However, there is an error appeared while doing it. Kindly advise me on this
-
Hi Nur Adlina Binti Mohd Affian,
It looks like the assembly (AS) tag in the sequence dictionary (ie., the ".dict" file) for your fasta reference has the value "true". This particular tool requires the assembly tag to have the value "GRCh37".
Regards,
David
-
Dear David,
I have successfully run the conversion of the data. However, I found there are a lot of duplicates found in the vcf file . Thus may I know how can I remove the duplicates? Is there any command that I can use to do so.
-
Hi Nur Adlina Binti Mohd Affian,
I believe that bcftools can do this -- specifically the "bcftools norm" command as discussed here: https://www.biostars.org/p/420990/
Regards,
David
Please sign in to leave a comment.
4 comments