FilterVariantTranches Error: 'no variants with INFO score key "CNN_2D"'
AnsweredHello GATK staff and community
I have encountered an issue while working with FilterVariantTranches as it does not detect the INFO score key "CNN_2D" in the input vcf. Please see below for more details:
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: gatk-package-4.2.2.0-local in Docker
b) Exact command used:
# Run docker and set up environmental variables
docker run --rm -v ${main_dir}:/my_data -it broadinstitute/gatk:4.2.2.0
cd /my_data/scripts
source input_VC.sh# Call HaplotypeCaller
cd /my_data/bam/clean_bam
for file in $(ls *.sorted.marked_duplicates.bam)
do
echo "Working on variant calling for ${file}"
gatk HaplotypeCaller \
-R /my_data/human_genome/${genome} \
-I /my_data/bam/clean_bam/${file} \
-L /my_data/intervals/${interval_list} \
-O /my_data/vcf/haplotypecaller/raw/${file%%.*}.vcf.gz \
--native-pair-hmm-threads ${threads} \
-ERC BP_RESOLUTION \
-GQB 10 -GQB 20 -GQB 30 -GQB 40 -GQB 50 -GQB 60 -GQB 70 -GQB 80 -GQB 90 # Exclusive upper bounds for reference confidence GQ bands
donecd /my_data/vcf/haplotypecaller/raw
for sample in $(ls *.vcf.gz | rev | cut -c 8- | rev | uniq)
do
echo "Working on ${sample}"
gatk CNNScoreVariants \
-I /my_data/bam/clean_bam/${sample}.sorted.marked_duplicates.bam \
-V /my_data/vcf/haplotypecaller/raw/${sample}.vcf.gz \
-R /my_data/human_genome/${genome} \
-O /my_data/vcf/haplotypecaller/clean/CNN/${sample}.annotated.vcf.gz \
--inter-op-threads ${threads} \
--intra-op-threads ${threads} \
--tensor-type read_tensor
donecd /my_data/vcf/haplotypecaller/clean/CNN
for sample in $(ls *.annotated.vcf.gz | rev | cut -c 18- | rev | uniq)
do
echo "Working on ${sample}.annotated.vcf.gz"
gatk FilterVariantTranches \
-V /my_data/vcf/haplotypecaller/clean/CNN/${sample}.annotated.vcf.gz \
--resource /my_data/genomic_prior/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
--resource /my_data/genomic_prior/hapmap_3.3.hg38.vcf.gz \
--resource /my_data/genomic_prior/1000G_phase1.snps.high_confidence.hg38.vcf.gz \
--info-key CNN_2D \
--snp-tranche 99.95 \
--indel-tranche 99.4 \
--invalidate-previous-filters \
-O /my_data/vcf/haplotypecaller/clean/CNN_filtered/${sample}.vcf.gz \
--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
done
c) Entire error log:
Working on Sample13_S13_L001.annotated.vcf.gz
Using GATK jar /gatk/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /gatk/gatk-package-4.2.2.0-local.jar FilterVariantTranches -V /my_data/vcf/haplotypecaller/clean/CNN/Sample13_S13_L001.annotated.vcf.gz --resource /my_data/genomic_prior/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --resource /my_data/genomic_prior/hapmap_3.3.hg38.vcf.gz --resource /my_data/genomic_prior/1000G_phase1.snps.high_confidence.hg38.vcf.gz --info-key CNN_2D --snp-tranche 99.95 --indel-tranche 99.4 --invalidate-previous-filters -O /my_data/vcf/haplotypecaller/clean/CNN_filtered/Sample13_S13_L001.vcf.gz
03:31:14.162 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 13, 2021 3:31:14 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
03:31:14.307 INFO FilterVariantTranches - ------------------------------------------------------------
03:31:14.308 INFO FilterVariantTranches - The Genome Analysis Toolkit (GATK) v4.2.2.0
03:31:14.308 INFO FilterVariantTranches - For support and documentation go to https://software.broadinstitute.org/gatk/
03:31:14.308 INFO FilterVariantTranches - Executing as root@7b3a0705db5c on Linux v5.10.60.1-microsoft-standard-WSL2 amd64
03:31:14.308 INFO FilterVariantTranches - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
03:31:14.308 INFO FilterVariantTranches - Start Date/Time: December 13, 2021 3:31:14 AM GMT
03:31:14.308 INFO FilterVariantTranches - ------------------------------------------------------------
03:31:14.308 INFO FilterVariantTranches - ------------------------------------------------------------
03:31:14.309 INFO FilterVariantTranches - HTSJDK Version: 2.24.1
03:31:14.309 INFO FilterVariantTranches - Picard Version: 2.25.4
03:31:14.309 INFO FilterVariantTranches - Built for Spark Version: 2.4.5
03:31:14.309 INFO FilterVariantTranches - HTSJDK Defaults.COMPRESSION_LEVEL : 2
03:31:14.309 INFO FilterVariantTranches - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
03:31:14.309 INFO FilterVariantTranches - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
03:31:14.309 INFO FilterVariantTranches - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
03:31:14.309 INFO FilterVariantTranches - Deflater: IntelDeflater
03:31:14.309 INFO FilterVariantTranches - Inflater: IntelInflater
03:31:14.310 INFO FilterVariantTranches - GCS max retries/reopens: 20
03:31:14.310 INFO FilterVariantTranches - Requester pays: disabled
03:31:14.310 INFO FilterVariantTranches - Initializing engine
03:31:14.627 INFO FeatureManager - Using codec VCFCodec to read file file:///my_data/genomic_prior/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
03:31:14.873 INFO FeatureManager - Using codec VCFCodec to read file file:///my_data/genomic_prior/hapmap_3.3.hg38.vcf.gz
03:31:15.070 INFO FeatureManager - Using codec VCFCodec to read file file:///my_data/genomic_prior/1000G_phase1.snps.high_confidence.hg38.vcf.gz
03:31:15.262 INFO FeatureManager - Using codec VCFCodec to read file file:///my_data/vcf/haplotypecaller/clean/CNN/Sample13_S13_L001.annotated.vcf.gz
03:31:15.418 INFO FilterVariantTranches - Done initializing engine
03:31:15.493 INFO ProgressMeter - Starting traversal
03:31:15.493 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
03:31:15.494 INFO FilterVariantTranches - Starting pass 0 through the variants
03:31:17.194 WARN IntelInflater - Zero Bytes Written : 0
03:31:17.200 INFO FilterVariantTranches - Finished pass 0 through the variants
03:31:17.201 INFO FilterVariantTranches - Found 0 SNPs and 0 indels with INFO score key:CNN_2D.
03:31:17.201 INFO FilterVariantTranches - Found 0 SNPs and 344 indels in the resources.
03:31:17.201 INFO FilterVariantTranches - Filtered 0 SNPs out of 0 and filtered 0 indels out of 0 with INFO score: CNN_2D.
03:31:17.224 INFO FilterVariantTranches - Shutting down engine
[December 13, 2021 3:31:17 AM GMT] org.broadinstitute.hellbender.tools.walkers.vqsr.FilterVariantTranches done. Elapsed time: 0.05 minutes.
Runtime.totalMemory()=2586836992
***********************************************************************A USER ERROR has occurred: Bad input: VCF contains no variants or no variants with INFO score key "CNN_2D"
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$BadInput: Bad input: VCF contains no variants or no variants with INFO score key "CNN_2D"
at org.broadinstitute.hellbender.tools.walkers.vqsr.FilterVariantTranches.afterFirstPass(FilterVariantTranches.java:211)
at org.broadinstitute.hellbender.engine.TwoPassVariantWalker.afterNthPass(TwoPassVariantWalker.java:29)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:44)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Here is the printed tail of my vcf file which clearly has the INFO score key "CNN_2D":
(gatk) root@7b3a0705db5c:/my_data/vcf/haplotypecaller/clean/CNN# tail /my_data/vcf/haplotypecaller/clean/CNN/Sample14_S14_L001.annotated.vcf
chrX 38421436 . A <NON_REF> . . CNN_2D=-7.863 GT:AD:DP:GQ:PL 0/0:315,4:319:99:0,120,1800
chrX 38421437 . C <NON_REF> . . CNN_2D=-7.842 GT:AD:DP:GQ:PL 0/0:316,1:317:99:0,120,1800
chrX 38421438 . C <NON_REF> . . CNN_2D=-7.775 GT:AD:DP:GQ:PL 0/0:313,1:314:99:0,120,1800
chrX 38421439 . C <NON_REF> . . CNN_2D=-7.871 GT:AD:DP:GQ:PL 0/0:309,1:310:99:0,120,1800
chrX 38421440 . A <NON_REF> . . CNN_2D=-8.512 GT:AD:DP:GQ:PL 0/0:304,2:306:99:0,120,1800
chrX 38421441 . T <NON_REF> . . CNN_2D=-8.095 GT:AD:DP:GQ:PL 0/0:305,0:305:99:0,120,1800
chrX 38421442 . G <NON_REF> . . CNN_2D=-8.224 GT:AD:DP:GQ:PL 0/0:304,1:305:99:0,120,1800
chrX 38421443 . C <NON_REF> . . CNN_2D=-8.169 GT:AD:DP:GQ:PL 0/0:304,0:304:99:0,120,1800
chrX 38421444 . T <NON_REF> . . CNN_2D=-8.260 GT:AD:DP:GQ:PL 0/0:304,0:304:99:0,120,1800
chrX 38421445 . A <NON_REF> . . CNN_2D=-8.195 GT:AD:DP:GQ:PL 0/0:302,1:303:99:0,120,1800
Thank you in advance for your attention and help.
-
Hi Cher Wei Yuan,
Hmmm, interesting. I'm not exactly sure what's going on here, but I'll help to figure it out! All the lines you shared from your VCF file are from non-variant reference blocks. Could you share any lines that have variance with the CNN_2D key?
Best,
Genevieve
-
Hi Genevieve
Thank you so much. Here are two lines with variance:
chr7 96317809 . TA T,<NON_REF> 798.04 . BaseQRankSum=0.929;CNN_2D=-10.974;DP=189;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=0.000;RAW_MQandDP=680400,189;ReadPosRankSum=2.347 GT:AD:DP:GQ:PL:SB 0/1:97,59,0:156:99:853,0,3018,1121,3297,4596:41,56,24,24
chr7 96317810 . AT *,A,TT,ATT,<NON_REF> 0 . BaseQRankSum=-0.177;CNN_2D=-13.423;DP=191;ExcessHet=3.0103;MLEAC=1,0,0,0,0;MLEAF=0.500,0.00,0.00,0.00,0.00;MQRankSum=0.000;RAW_MQandDP=687600,191;ReadPosRankSum=0.910 GT:AD:DP:GQ:PL:SB 0/1:70,60,3,9,2,0:144:99:2341,0,2585,2335,1988,4355,1412,2283,3152,3772,2563,1936,4190,3226,4630,2525,2380,4333,3641,4456,4723:29,41,34,40Please let me know if more information is required.
Best Regards
Cher Wei Yuan
-
Hi I am receiving the same error using GATK-4.2.2, but with CNN_1D. e.g.
A USER ERROR has occurred: Bad input: VCF contains no variants or no variants with INFO score key "CNN_1D"
This is despite having checked the VCF file and finding the CNN_1D tag both in the header and INFO fields of most variants.
Thank you
-
Thanks for providing the information Amatta Mirandari!
Is your file also a GVCF file? I'm trying to piece together why this is occurring.
-
Hi Cher Wei Yuan,
I discussed this issue with the developer team who works on CNNScoreVariants. The input to CNNScoreVariants is supposed to be a VCF file. We have not tested CNNScoreVariants on GVCFs, so this may be why you are getting the error message. You can genotype your GVCF with GenotypeGVCFs or you can go back to the HaplotypeCaller step and remove the -ERC BP_RESOLUTION argument.
Please let me know if for some reason you are still having this error with a VCF file or if you have further questions.
Best,
Genevieve
-
Hello Genevieve Brandt (she/her)
I did not realized my HaplotypeCaller parameters output a gvcf. I removed "-ERC BP_RESOLUTION \
-GQB 10 -GQB 20 -GQB 30 -GQB 40 -GQB 50 -GQB 60 -GQB 70 -GQB 80 -GQB 90" and re-run the HaplotypeCaller and the remaining workflow. Everything works now! Thank you for your help.Best Regards
Wei Yuan
-
Glad to hear we could solve the issue! Thanks for the update.
Please sign in to leave a comment.
7 comments