Problem generating germline resource file for Mutect2
REQUIRED for all errors and issues:
a) GATK version used: 4.01.1
b) Exact command used:
c) Entire program log:
A few years ago, I had used the the off-label workflow for Mutect2 to find somatic differences in clonally derived plant tissue with RNAseq. I had really good success with it and I am now running this pipeline again. Because it is not a model, I have to generate my own germline resource file to run Mutect2 and it requires manual edits. However, when I attempt to use ValidateVariants to turn my manually modified VCF into a working one with an index file, I get the one of the following errors.
Error - 1. "No suitable codecs found" OR Error - 2. "The provided VCF file is malformed at approximately line number 2561: unparsable vcf record with allele A"
Steps I used to create these files were:
1. Convert vcf from 1st pass of Mutect2 to a table
/home/jschwoch/stow/gatk-18.104.22.168/gatk VariantsToTable -V /scratch/jschwoch/1RG_MT.vcf -F CHROM -F POS -F ID -F REF -F ALT -F TLOD -GF QUAL -GF AF -O /scratch/jschwoch/1RG_MT.table
2. Modify in excel
=CONCATENATE("TLOD=",F2," ; AF=",H2)
Add # in front of CHROM
Copy-Paste column in-column as values
Copy contents to a text editor and remove trailing white spaces
Save as a text file. For example: "Aaf.table"
3. Create a new header then manually edit
cat /scratch/jschwoch/1RG_MT.vcf | grep '##' > /vol/share/cruzan_lab/Jaime/1RG_MT.header
While in nano mode Manually change FORMAT to INFO
Use CTRL + K to delete unnecessary lines
4. Concatenate the edited header to the edited table
cat /vol/share/cruzan_lab/Jaime/clonal_rna/Mutect2_passes/VCF_headers/1RG_MT.header /vol/share/cruzan_lab/Jaime/clonal_rna/Mutect2_passes/vcf_table/1RG_MT.table > /vol/share/cruzan_lab/Jaime/clonal_rna/Mutect2_passes/vcf_table/1_manmod.vcf
5. ValidateVariants to create an index file
/home/jschwoch/stow/gatk-22.214.171.124/gatk ValidateVariants -V /scratch/jschwoch/clonal_rna/1_manmod.vcf
I can't seem to figure out why these manual edits all used to work but no longer do.
When I open the VCF and go to the Line it is reporting an unparsable VCF record, I only notice that is is the first SNP in the file that has two alternate allele options.
- Error - 2. "The provided VCF file is malformed at approximately line number 2561: unparsable vcf record with allele A"
For example: It seems to be having a problem with variants shown below
Please sign in to leave a comment.