Panel of Normals creation error: GenomicsDBImport does not support GVCFs with MNPs
AnsweredREQUIRED for all errors and issues:
a) GATK version used: 4.3.0.0
b) Exact command used:
Command to Create VCF:
home/hp/Desktop/GenomicsApps/gatk*/gatk Mutect2 -R /home/hp/Desktop/reference_genome/hg38/hg38.fa -I /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_no_dup_bam/84-B.pon.no-dup.bam -O /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/84-B.subPoN.vcf.gz
/home/hp/Desktop/GenomicsApps/gatk*/gatk Mutect2 -R /home/hp/Desktop/reference_genome/hg38/hg38.fa -I /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_no_dup_bam/85-B.pon.no-dup.bam -O /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/85-B.subPoN.vcf.gz
/home/hp/Desktop/GenomicsApps/gatk*/gatk Mutect2 -R /home/hp/Desktop/reference_genome/hg38/hg38.fa -I /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_no_dup_bam/86-B.pon.no-dup.bam -O /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/86-B.subPoN.vcf.gz
/home/hp/Desktop/GenomicsApps/gatk*/gatk Mutect2 -R /home/hp/Desktop/reference_genome/hg38/hg38.fa -I /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_no_dup_bam/88-B.pon.no-dup.bam -O /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/88-B.subPoN.vcf.gz
Command to create PON:
/home/hp/Desktop/GenomicsApps/gatk*/gatk GenomicsDBImport -R /home/hp/Desktop/reference_genome/hg38/hg38.fa -L /home/hp/Desktop/reference_genome/hg38/hg38.genome.interval_list --genomicsdb-workspace-path /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/PoN_db \
-V /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/84-B.subPoN.vcf.gz\
-V /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/85-B.subPoN.vcf.gz\
-V /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/86-B.subPoN.vcf.gz\
-V /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/88-B.subPoN.vcf.gz
c) Entire program log:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/hp/Desktop/GenomicsApps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar GenomicsDBImport -R /home/hp/Desktop/reference_genome/hg38/hg38.fa -L /home/hp/Desktop/reference_genome/hg38/hg38.genome.interval_list --genomicsdb-workspace-path /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/PoN_db -V /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/84-B.subPoN.vcf.gz -V /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/85-B.subPoN.vcf.gz -V /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/86-B.subPoN.vcf.gz -V /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/88-B.subPoN.vcf.gz
14:02:52.745 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/hp/Desktop/GenomicsApps/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:02:52.916 INFO GenomicsDBImport - ------------------------------------------------------------
14:02:52.916 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.3.0.0
14:02:52.916 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
14:02:52.916 INFO GenomicsDBImport - Executing as hp@HP-Z8-G4-Workstation on Linux v5.15.0-47-generic amd64
14:02:52.916 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v11.0.16+8-post-Ubuntu-0ubuntu122.04
14:02:52.917 INFO GenomicsDBImport - Start Date/Time: 2 November 2022 at 2:02:52 PM IST
14:02:52.917 INFO GenomicsDBImport - ------------------------------------------------------------
14:02:52.917 INFO GenomicsDBImport - ------------------------------------------------------------
14:02:52.917 INFO GenomicsDBImport - HTSJDK Version: 3.0.1
14:02:52.918 INFO GenomicsDBImport - Picard Version: 2.27.5
14:02:52.918 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
14:02:52.918 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:02:52.918 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:02:52.918 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:02:52.918 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:02:52.918 INFO GenomicsDBImport - Deflater: IntelDeflater
14:02:52.918 INFO GenomicsDBImport - Inflater: IntelInflater
14:02:52.918 INFO GenomicsDBImport - GCS max retries/reopens: 20
14:02:52.918 INFO GenomicsDBImport - Requester pays: disabled
14:02:52.918 INFO GenomicsDBImport - Initializing engine
14:02:52.959 WARN IntelInflater - Zero Bytes Written : 0
14:02:52.999 WARN IntelInflater - Zero Bytes Written : 0
14:02:53.024 WARN IntelInflater - Zero Bytes Written : 0
14:02:53.048 WARN IntelInflater - Zero Bytes Written : 0
14:02:53.233 INFO FeatureManager - Using codec IntervalListCodec to read file file:///home/hp/Desktop/reference_genome/hg38/hg38.genome.interval_list
14:02:53.303 INFO IntervalArgumentCollection - Processing 3044953910 bp from intervals
14:02:53.305 WARN GenomicsDBImport - A large number of intervals were specified. Using more than 100 intervals in a single import is not recommended and can cause performance to suffer. If GVCF data only exists within those intervals, performance can be improved by aggregating intervals with the merge-input-intervals argument.
14:02:53.342 INFO GenomicsDBImport - Done initializing engine
14:02:53.624 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.3-6069e4a
14:02:53.625 INFO GenomicsDBImport - Vid Map JSON file will be written to /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/PoN_db/vidmap.json
14:02:53.625 INFO GenomicsDBImport - Callset Map JSON file will be written to /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/PoN_db/callset.json
14:02:53.625 INFO GenomicsDBImport - Complete VCF Header will be written to /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/PoN_db/vcfheader.vcf
14:02:53.625 INFO GenomicsDBImport - Importing to workspace - /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/PoN_db
14:02:53.837 WARN IntelInflater - Zero Bytes Written : 0
14:02:53.850 WARN IntelInflater - Zero Bytes Written : 0
14:02:53.860 WARN IntelInflater - Zero Bytes Written : 0
14:02:53.871 WARN IntelInflater - Zero Bytes Written : 0
14:02:53.874 INFO GenomicsDBImport - Importing batch 1 with 4 samples
14:02:54.077 INFO GenomicsDBImport - Shutting down engine
[2 November 2022 at 2:02:54 PM IST] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2634022912
***********************************************************************
A USER ERROR has occurred: Bad input: GenomicsDBImport does not support GVCFs with MNPs. MNP found at chr1:8330429 in VCF /home/hp/Desktop/WEAP_output_TEST_Pipeline_TOM/output/PON/pon_vcf/88-B.subPoN.vcf.gz
$ grep 8330429 88-B.subPoN.vcf
chr1 8330429 . GT CG,<NON_REF> . . AS_SB_TABLE=0,0|0,0|0,0;DP=2;ECNT=4;MBQ=37,11,0;MFRL=166,166,0;MMQ=60,60,60;MPOS=17,50;POPAF=7.30,7.30;TLOD=-4.257e-02,-4.257e-02 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|1|2:1,1,0:0.333,0.333:2:0,0,0:0,0,0:0,0,0:0|1:8330429_GT_CG:8330429:0,1,1,0
chr1 8330439 . G C,<NON_REF> . . AS_SB_TABLE=0,0|0,0|0,0;DP=2;ECNT=4;MBQ=37,11,0;MFRL=166,166,0;MMQ=60,60,60;MPOS=7,50;POPAF=7.30,7.30;TLOD=-4.257e-02,-4.257e-02 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|1|2:1,1,0:0.333,0.333:2:0,0,0:0,0,0:0,0,0:0|1:8330429_GT_CG:8330429:0,1,1,0
chr1 8330443 . T A,<NON_REF> . . AS_SB_TABLE=0,0|0,0|0,0;DP=2;ECNT=4;MBQ=37,11,0;MFRL=166,166,0;MMQ=60,60,60;MPOS=3,50;POPAF=7.30,7.30;TLOD=-4.257e-02,-4.257e-02 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|1|2:1,1,0:0.333,0.333:2:0,0,0:0,0,0:0,0,0:0|1:8330429_GT_CG:8330429:0,1,1,0
I am seeking help in this regard.
-
Hi Ranjan J. Sarma,
Thank you for posting on the forum! One of our developers solved this issue here: https://gatk.broadinstitute.org/hc/en-us/community/posts/360071895952-Mutect2-s-support-for-MNP-in-GATK-4-1-8-1-
I also just wanted to mention that you should make sure that your PON has more than 40 samples. If it does not, we recommend that you use our publicly available PON instead.
Please let us know if you have any further questions.
Best,
Genevieve
Please sign in to leave a comment.
1 comment