empty pon vcf file
AnsweredHi,
I am creating a panel of normals from 50 normal samples.
the final pon.vcf.gz file is completely empty.
GATK version used: 4.2.2.0
command used for GenomicsDBImport:
gatk GenomicsDBImport --genomicsdb-shared-posixfs-optimizations true --merge-input-intervals -R ${Reference_genome} -L ${CaptureKitFile} --genomicsdb-workspace-path /gpfs/project/yasinl/PON_50_final -V ${Output_directory}/sample1.vcf.gz .......... -V ${Output_directory}/sample50.vcf.gz
INFO: Failed to detect whether we are running on Google Compute Engine.
15:01:27.478 INFO GenomicsDBImport - ------------------------------------------------------------
15:01:27.478 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.2.0
15:01:27.478 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
15:01:27.478 INFO GenomicsDBImport - Executing as on Linux v3.10.0-1160.36.2.el7.x86_64 amd64
15:01:27.479 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
15:01:27.479 INFO GenomicsDBImport - Start Date/Time: November 3, 2021 3:01:27 PM CET
15:01:27.479 INFO GenomicsDBImport - ------------------------------------------------------------
15:01:27.479 INFO GenomicsDBImport - ------------------------------------------------------------
15:01:27.479 INFO GenomicsDBImport - HTSJDK Version: 2.24.1
15:01:27.479 INFO GenomicsDBImport - Picard Version: 2.25.4
15:01:27.479 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
15:01:27.479 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:01:27.479 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:01:27.479 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:01:27.479 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:01:27.479 INFO GenomicsDBImport - Deflater: IntelDeflater
15:01:27.479 INFO GenomicsDBImport - Inflater: IntelInflater
15:01:27.479 INFO GenomicsDBImport - GCS max retries/reopens: 20
15:01:27.480 INFO GenomicsDBImport - Requester pays: disabled
15:01:27.480 INFO GenomicsDBImport - Initializing engine
15:01:33.059 INFO FeatureManager - Using codec BEDCodec to read file file:///gpfs/project/yasinl/V5_split/Target_region.Hg38_V5_UTRs.bed
15:01:33.726 INFO IntervalArgumentCollection - Processing 74660147 bp from intervals
15:01:34.058 INFO GenomicsDBImport - Done initializing engine
15:01:34.357 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.1-d59e886
15:01:34.361 INFO GenomicsDBImport - Vid Map JSON file will be written to /gpfs/project/yasinl/PON_50_final/vidmap.json
15:01:34.361 INFO GenomicsDBImport - Callset Map JSON file will be written to /gpfs/project/yasinl/PON_50_final/callset.json
15:01:34.361 INFO GenomicsDBImport - Complete VCF Header will be written to /gpfs/project/yasinl/PON_50_final/vcfheader.vcf
15:01:34.361 INFO GenomicsDBImport - Importing to workspace - /gpfs/project/yasinl/PON_50_final
15:01:38.062 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:02:05.346 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:02:22.678 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:02:37.423 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:02:49.158 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:03:01.415 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:03:16.595 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:03:30.690 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:03:40.734 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:03:51.029 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:04:02.946 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:04:18.854 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:04:33.806 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:04:39.932 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:04:50.392 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:04:59.691 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:05:11.333 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:05:24.528 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:05:29.317 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:05:43.448 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:05:52.131 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:05:57.914 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:06:04.980 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:06:12.030 INFO GenomicsDBImport - Importing batch 1 with 50 samples
15:06:12.316 INFO GenomicsDBImport - Done importing batch 1/1
15:06:12.332 INFO GenomicsDBImport - Import completed!
15:06:12.332 INFO GenomicsDBImport - Shutting down engine
[November 3, 2021 3:06:12 PM CET] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 4.75 minutes.
Runtime.totalMemory()=2076049408
Tool returned:
true
Those are the output files:
-rwx------ 1 ss ggg 5089 Nov 3 15:06 callset.json
drwx------ 4 ss ggg 4096 Nov 3 15:06 Y$283740$57194223
drwx------ 4 ss ggg 4096 Nov 3 15:06 X$283740$156007703
drwx------ 4 ss ggg 4096 Nov 3 15:05 22$10703471$50782418
drwx------ 4 ss ggg 4096 Nov 3 15:05 21$5011786$46665088
drwx------ 4 ss ggg 4096 Nov 3 15:05 20$87619$64327960
drwx------ 4 ss ggg 4096 Nov 3 15:05 19$107094$58572634
drwx------ 4 ss ggg 4096 Nov 3 15:05 18$116845$80247546
drwx------ 4 ss ggg 4096 Nov 3 15:05 17$137516$83135856
drwx------ 4 ss ggg 4096 Nov 3 15:05 16$14307$90094834
drwx------ 4 ss ggg 4096 Nov 3 15:04 15$19964734$101976341
drwx------ 4 ss ggg 4096 Nov 3 15:04 14$18657639$106874948
drwx------ 4 ss ggg 4096 Nov 3 15:04 13$18267236$114327342
drwx------ 4 ss ggg 4096 Nov 3 15:04 12$14736$133236122
drwx------ 4 ss ggg 4096 Nov 3 15:04 11$193025$134986787
drwx------ 4 ss ggg 4096 Nov 3 15:03 10$48892$133659612
drwx------ 4 ss ggg 4096 Nov 3 15:03 9$14732$138177307
drwx------ 4 ss ggg 4096 Nov 3 15:03 8$232714$145055519
drwx------ 4 ss ggg 4096 Nov 3 15:03 7$193047$159233261
drwx------ 4 ss ggg 4096 Nov 3 15:03 6$203389$170584721
drwx------ 4 ss ggg 4096 Nov 3 15:02 5$140232$181261114
drwx------ 4 ss ggg 4096 Nov 3 15:02 4$53280$190060412
drwx------ 4 ss ggg 4096 Nov 3 15:02 3$197668$198043692
drwx------ 4 ss ggg 4096 Nov 3 15:02 2$38784$242095067
drwx------ 4 ss ggg 4096 Nov 3 15:01 1$14621$248919935
-rwx------ 1 ss ggg 5949 Nov 3 15:01 vidmap.json
-rwx------ 1 ss ggg 12901 Nov 3 15:01 vcfheader.vcf
-rwx------ 1 ss ggg 0 Nov 3 15:01 __tiledb_workspace.tdb
the command used for the next step:
gatk CreateSomaticPanelOfNormals -R ${Reference_genome} --germline-resource ${afgnomad} -V gendb:///path/PON_50_final/ -O /output/PON_output/pon.vcf.gz
the output log:
1
4:45:41.686 INFO CreateSomaticPanelOfNormals - Initializing engine
14:45:42.241 INFO FeatureManager - Using codec VCFCodec to read file file:///path/af-gnomad/af-only-gnomad.hg38_modified.vcf.bgz
14:45:43.289 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.1-d59e886
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field AS_UNIQ_ALT_READ_COUNT - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field CONTQ - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field ECNT - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field GERMQ - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field MBQ - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field MFRL - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field MMQ - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field MPOS - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field NALOD - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field NCount - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field NLOD - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field OCM - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field PON - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field POPAF - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field ROQ - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field RPA - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field RU - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field SEQQ - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field STR - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field STRANDQ - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field STRQ - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.360 info NativeGenomicsDB - pid=27626 tid=27627 No valid combination operation found for INFO field TLOD - the field will NOT be part of INFO fields in the generated VCF records
14:45:43.556 INFO CreateSomaticPanelOfNormals - Done initializing engine
14:45:43.587 INFO ProgressMeter - Starting traversal
14:45:43.595 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
I have two questions:
1) is it still ok to use the --merge-input-intervals option to merge the intervals of the exome capture kit ? because otherwise the processing will not finish because the file has more than 20K intervals.
2) the second important question, why is the produced output pon vcf file empty ? what could be the reason ?
Many thanks for any input!
-
Hi Lait,
The CreateSomaticPanelOfNormals tool is still in beta, and we are aware of a bug that creates an empty PON file in certain instances. Could you please review the following post and follow the troubleshooting steps outlined in there?
https://gatk.broadinstitute.org/hc/en-us/community/posts/360076166452-Empty-final-PON-vcf-file-from-7-samples
Your arguments appear to be fine, so I need a little more information before knowing how to proceed.
Please sign in to leave a comment.
1 comment