Mutect2 PoN - GenomicsDBImport creates empty DB
AnsweredI am trying to follow GATK 4.2.0 best-practice guidelines for Mutect2 PoN creation. I called variants in my samples as recommended with:
gatk Mutect2 \
-R ${REF} \
-L ${EXOME_INPUT_INTERVALS} \
-I ${BAM} \
--sequence-dictionary ${DICT} \
--max-mnp-distance 0 \
-O ${SAMPLE_NAME}.mutect2.vcf
but I see that the tool is unable to create a proper GenomicsDB
through the GenomicsDBImport command. Even focusing the analysis on a little interval in which I know I have variants in the Mutect2 generated VCFs, here the SelectVariants
output from one of the VCF I'll use in the GenomicsDBImport
command:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT XXX
1 883625 . A G . . AS_SB_TABLE=0,0|12,41;DP=54;ECNT=1;MBQ=0,33;MFRL=0,260;MMQ=60,60;MPOS=31;POPAF=7.30;TLOD=182.40 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:0,53:0.981:53:0,26:0,26:0,0,12,41
and here the command to generate the DB:
gatk \
--java-options "-Djava.io.tmpdir=/nfs/projects/CNV_WGS/CHIP-PON-DB/TMP-DIR" \
GenomicsDBImport \
-R $REF \
-L 1:883600-883650 \
--genomicsdb-workspace-path $OUT \
--tmp-dir /nfs/projects/CNV_WGS/CHIP-PON-DB/TMP-DIR \
-V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0003D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0020D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0008D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq48f19.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq138f52.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0052D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0019D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0051D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq186f69.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq126f48.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq303f113.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq251f92.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq319f119.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro221f102.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq382f144.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0084D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0096D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0075D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq509f190.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq537f200.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0082D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0068D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq665f245.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq673f249.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0113D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0126D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0114D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro551f306.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq575f215.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq879f322.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq917f335.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq920f336.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0146D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0148D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0147D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0151D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1070f254.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1000f365.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1092f397.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1098f399.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1102f400.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0155D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0165D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1239f446.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro989f551.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0214D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0217D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0221D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1703f610.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1153f417.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1383f497.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1388f499.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1394f501.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1413f507.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1438f516.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1508f540.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1741f622.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1746f624.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0159D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0166D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0169D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0170D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0173D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0175D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0182D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0192D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0193D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0197D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0202D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0206D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0210D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro1089f613.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro741f416.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/ocd4005001236yy3.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0015D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq264f96.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq436f163.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro337f175.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq530f196.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq608f226.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq621f231.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq639f237.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro442f249.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq944f346.Roche-M.mutect2.vcf
I get an EMPTY GenomicsDB!!!
gatk SelectVariants --java-options "-Djava.io.tmpdir=/nfs/projects/CNV_WGS/CHIP-PON-DB/TMP-DIR" \
-R $REF
-V gendb://$OUT \
-L 1:883600-883650 \
-O output.chr1_883600_883650.vcf
### OUTPUT: empty VCF (only header)
Does anybody had this problem before or have a clue why? I'm losing my mind! I also googled and searched around and found similar issues like this but none explained why this should happen with my data!
Here the full logs from the GenomicsDBImport command:
Using GATK jar /home/ec3408/GATK-4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Djava.io.tmpdir=/nfs/projects/CNV_WGS/CHIP-PON-DB/TMP-DIR -jar /home/ec3408/GATK-4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar GenomicsDBImport -R /nfs/seqscratch09/AZ-IPF/reference/hs37d5.fa -L 1:883600-883650 --genomicsdb-workspace-path /nfs/projects/CNV_WGS/CHIP-PON-DB/ROCHE-M-n84-PON_DB --tmp-dir /nfs/projects/CNV_WGS/CHIP-PON-DB/TMP-DIR -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0003D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0020D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0008D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq48f19.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq138f52.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0052D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0019D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0051D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq186f69.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq126f48.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq303f113.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq251f92.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq319f119.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro221f102.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq382f144.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0084D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0096D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0075D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq509f190.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq537f200.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0082D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0068D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq665f245.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq673f249.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0113D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0126D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0114D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro551f306.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq575f215.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq879f322.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq917f335.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq920f336.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0146D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0148D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0147D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0151D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1070f254.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1000f365.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1092f397.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1098f399.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1102f400.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0155D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0165D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1239f446.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro989f551.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0214D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0217D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0221D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1703f610.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1153f417.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1383f497.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1388f499.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1394f501.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1413f507.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1438f516.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1508f540.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1741f622.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq1746f624.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0159D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0166D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0169D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0170D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0173D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0175D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0182D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0192D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0193D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0197D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0202D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0206D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0210D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro1089f613.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro741f416.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/ocd4005001236yy3.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0015D.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq264f96.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq436f163.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro337f175.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq530f196.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq608f226.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq621f231.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq639f237.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/neuro442f249.Roche-M.mutect2.vcf -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/diagseq944f346.Roche-M.mutect2.vcf
11:20:35.249 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/ec3408/GATK-4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 13, 2021 11:20:35 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:20:35.481 INFO GenomicsDBImport - ------------------------------------------------------------
11:20:35.482 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.0.0
11:20:35.482 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
11:20:35.482 INFO GenomicsDBImport - Executing as ec3408@dev2.igm.cumc.columbia.edu on Linux v3.10.0-957.27.2.el7.x86_64 amd64
11:20:35.482 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_222-b10
11:20:35.482 INFO GenomicsDBImport - Start Date/Time: July 13, 2021 11:20:35 AM EDT
11:20:35.482 INFO GenomicsDBImport - ------------------------------------------------------------
11:20:35.482 INFO GenomicsDBImport - ------------------------------------------------------------
11:20:35.483 INFO GenomicsDBImport - HTSJDK Version: 2.24.0
11:20:35.483 INFO GenomicsDBImport - Picard Version: 2.25.0
11:20:35.483 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
11:20:35.483 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:20:35.483 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:20:35.484 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:20:35.484 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:20:35.484 INFO GenomicsDBImport - Deflater: IntelDeflater
11:20:35.484 INFO GenomicsDBImport - Inflater: IntelInflater
11:20:35.484 INFO GenomicsDBImport - GCS max retries/reopens: 20
11:20:35.484 INFO GenomicsDBImport - Requester pays: disabled
11:20:35.484 INFO GenomicsDBImport - Initializing engine
11:20:37.469 INFO IntervalArgumentCollection - Processing 51 bp from intervals
11:20:37.507 INFO GenomicsDBImport - Done initializing engine
11:20:38.460 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
11:20:38.471 INFO GenomicsDBImport - Vid Map JSON file will be written to /nfs/projects/CNV_WGS/CHIP-PON-DB/ROCHE-M-n84-PON_DB/vidmap.json
11:20:38.471 INFO GenomicsDBImport - Callset Map JSON file will be written to /nfs/projects/CNV_WGS/CHIP-PON-DB/ROCHE-M-n84-PON_DB/callset.json
11:20:38.471 INFO GenomicsDBImport - Complete VCF Header will be written to /nfs/projects/CNV_WGS/CHIP-PON-DB/ROCHE-M-n84-PON_DB/vcfheader.vcf
11:20:38.471 INFO GenomicsDBImport - Importing to workspace - /nfs/projects/CNV_WGS/CHIP-PON-DB/ROCHE-M-n84-PON_DB
11:20:38.471 INFO ProgressMeter - Starting traversal
11:20:38.472 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
11:20:39.746 INFO GenomicsDBImport - Importing batch 1 with 84 samples
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
[E::vcf_parse_format] Invalid character '.' in 'AF' FORMAT field at 1:883625
11:20:40.460 INFO GenomicsDBImport - Done importing batch 1/1
11:20:40.463 INFO ProgressMeter - unmapped 0.0 1 30.2
11:20:40.463 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 0.0 minutes.
11:20:40.463 INFO GenomicsDBImport - Import completed!
11:20:40.464 INFO GenomicsDBImport - Shutting down engine
[July 13, 2021 11:20:40 AM EDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=2418016256
Tool returned:
true
-
Hi Enrico Cocchi,
It looks like there may be a problem with your VCF files that you are inputting to GenomicsDBImport. Did you take the VCF files directly from Mutect2 or did you modify them at all? Could you please try running ValidateVariants on your files?
Kind regards,
Pamela
-
Hi Pamela Bretscher, thank you so much for your reply! The VCF files are direct output from Mutect2 command (first command in my original post above), I did not modify them at all.
I also tried ValidateVariants on my Mutect2 output VCF files, as you suggested, and here is the full command and LOG (I never used it so I'm unable to judge it properly):
### COMMAND ###
REF='/nfs/seqscratch09/AZ-IPF/reference/hs37d5.fa'
DBSNP='/nfs/projects/CNV_WGS/mutect2/dependencies_hg37/Homo_sapiens_assembly19.dbsnp138.vcf'
VCF='/nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0217D.Roche-M.mutect2.vcf'
gatk ValidateVariants \
--java-options "-Djava.io.tmpdir=/nfs/projects/CNV_WGS/CHIP-PON-DB/TMP-DIR" \
-R $REF \
-V $VCF \
--dbsnp $DBSNP
### OUTPUT ###
Using GATK jar /home/ec3408/GATK-4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Djava.io.tmpdir=/nfs/projects/CNV_WGS/CHIP-PON-DB/TMP-DIR -jar /home/ec3408/GATK-4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar ValidateVariants -R /nfs/seqscratch09/AZ-IPF/reference/hs37d5.fa -V /nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0217D.Roche-M.mutect2.vcf --dbsnp /nfs/projects/CNV_WGS/mutect2/dependencies_hg37/Homo_sapiens_assembly19.dbsnp138.vcf
09:42:45.493 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/ec3408/GATK-4.2.0.0/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 15, 2021 9:42:45 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:42:45.809 INFO ValidateVariants - ------------------------------------------------------------
09:42:45.809 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.2.0.0
09:42:45.809 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
09:42:45.810 INFO ValidateVariants - Executing as ec3408@dev2.igm.cumc.columbia.edu on Linux v3.10.0-957.27.2.el7.x86_64 amd64
09:42:45.810 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_222-b10
09:42:45.810 INFO ValidateVariants - Start Date/Time: July 15, 2021 9:42:45 AM EDT
09:42:45.810 INFO ValidateVariants - ------------------------------------------------------------
09:42:45.810 INFO ValidateVariants - ------------------------------------------------------------
09:42:45.811 INFO ValidateVariants - HTSJDK Version: 2.24.0
09:42:45.811 INFO ValidateVariants - Picard Version: 2.25.0
09:42:45.811 INFO ValidateVariants - Built for Spark Version: 2.4.5
09:42:45.811 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:42:45.811 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:42:45.811 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:42:45.811 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:42:45.812 INFO ValidateVariants - Deflater: IntelDeflater
09:42:45.812 INFO ValidateVariants - Inflater: IntelInflater
09:42:45.812 INFO ValidateVariants - GCS max retries/reopens: 20
09:42:45.812 INFO ValidateVariants - Requester pays: disabled
09:42:45.812 INFO ValidateVariants - Initializing engine
09:42:46.345 INFO FeatureManager - Using codec VCFCodec to read file file:///nfs/projects/CNV_WGS/mutect2/dependencies_hg37/Homo_sapiens_assembly19.dbsnp138.vcf
09:42:46.602 INFO FeatureManager - Using codec VCFCodec to read file file:///nfs/projects/CNV_WGS/Mutetc2-PON-OUT/Roche-M/fetal0217D.Roche-M.mutect2.vcf
09:42:46.641 INFO ValidateVariants - Done initializing engine
09:42:46.642 INFO ProgressMeter - Starting traversal
09:42:46.642 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
09:42:58.543 INFO ProgressMeter - 1:112020224 0.2 5000 25218.6
09:43:11.075 INFO ProgressMeter - 2:31571241 0.4 11000 27012.6
09:43:21.590 INFO ProgressMeter - 2:176945012 0.6 15000 25753.3
09:43:31.769 INFO ProgressMeter - 3:49004723 0.8 19000 25262.0
09:43:43.291 INFO ProgressMeter - 3:196869518 0.9 23000 24361.0
09:43:53.697 INFO ProgressMeter - 4:122824210 1.1 26000 23264.8
09:44:06.431 INFO ProgressMeter - 5:107716774 1.3 29000 21807.8
09:44:18.578 INFO ProgressMeter - 6:87796161 1.5 36000 23494.9
09:44:29.349 INFO ProgressMeter - 7:44611077 1.7 39000 22783.3
09:44:40.436 INFO ProgressMeter - 8:17930772 1.9 43000 22672.5
09:44:51.111 INFO ProgressMeter - 9:18889544 2.1 46000 22174.2
09:45:01.820 INFO ProgressMeter - 10:28945223 2.3 50000 22193.1
09:45:14.084 INFO ProgressMeter - 11:57582908 2.5 55000 22381.8
09:45:24.269 INFO ProgressMeter - 12:56726518 2.6 60000 22838.9
09:45:36.038 INFO ProgressMeter - 14:23450548 2.8 64000 22668.8
09:45:47.023 INFO ProgressMeter - 15:79058298 3.0 69000 22951.4
09:45:58.477 INFO ProgressMeter - 17:39341473 3.2 76000 23770.4
09:46:08.710 INFO ProgressMeter - 19:16339715 3.4 82000 24348.4
09:46:19.635 INFO ProgressMeter - 21:46047633 3.5 89000 25071.2
09:46:28.756 INFO ProgressMeter - X:139099962 3.7 93314 25207.2
09:46:28.756 INFO ProgressMeter - Traversal complete. Processed 93314 total variants in 3.7 minutes.
09:46:28.756 INFO ValidateVariants - Shutting down engine
[July 15, 2021 9:46:28 AM EDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 3.72 minutes.
Runtime.totalMemory()=5972164608 -
Hi Enrico Cocchi,
There are some other members of the GATK team looking into why you are receiving this error. I will update you when they come to a solution.
Kind regards,
Pamela
-
ok Pamela Bretscher, what error exactly are you talking about? The ValidateVariants output or the GenomicsDBImport empty DB?
Looking forward to receive this answer to be able to proceed with my analysis
-
Hi Enrico Cocchi,
Sorry, I should have clarified. We are looking into the reason for the empty DB and output of GenomicsDBImport. The output from ValidateVariants looks good.
-
Hi Enrico Cocchi,
I spoke with some members of the GATK team today and they were looking at the "Invalid Character" errors when you run GenomicsDBImport. Could you check the VCF file and let me know what the type attribute for the AF field is in the VCF header?
Thanks,
Pamela
-
Hi Pamela Bretscher, here you go:
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
-
Hi Enrico Cocchi,
Thank you for providing this information. After confirming that the AF is a float field and talking with the GATK team, we think this may be some sort of bug with the GenomicsDB tool. I have created a GitHub ticket for the GenomicsDB engineers to look into and hopefully help solve your issue. Here is a link to the ticket: https://github.com/broadinstitute/gatk/issues/7362. I will reach out when I have any updates and please let me know if there is anything else I can do to help.
Kind regards,
Pamela
-
ok, thank you Pamela Bretscher, what can I do to have it working in the meanwhile? Use a different version?
-
Hello Enrico Cocchi, would it be possible to include a small vcf file highlighting the issue?
-
Hi Nalini Ganapati. You can see an example VCF here, but I'd like to ask again: this is taking much longer than expected, how can I solve this while you guys investigate the bug? Use a different GATK version etc ?
-
Using GenomicsDBImport on the test vcf provided threw the following exception
%: ./gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' GenomicsDBImport -L 1:883600-883650 -V ex001.mutect2-output.vcf --genomicsdb-workspace-path try
Caused by: htsjdk.tribble.TribbleException: An index is required, but none found., for input source: file://ex001.mutect2-output.vcfGot the example vcf to work by first block compressing and indexing or indexing directly -
%: ~/bcftools/bcftools view -Oz -o sample.vcf.gz ex001.mutect2-output.vcf
%: ~/bcftools/bcftools index -t sample.vcf.gz
OR
%: ./gatk IndexFeatureFile -I ex001.mutect2-output.vcfAnd then running GenomicsDBImport/SelectVariants on the resultant sample/workspace:
%: ./gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' GenomicsDBImport -L 1:883600-883650 -V sample.vcf.gz --genomicsdb-workspace-path try
%: ./gatk SelectVariants -V gendb://try -O test.vcf -R ~/vcfs/reference/hs37d5.faDoes this help?
-
Hi Enrico Cocchi,
Thank you for your patience while the GATK team has worked to solve your problem with GenomicsDB. I hope the solution posted by Nalini allows you to continue with your analysis and please let me know if you have any additional questions.
Kind regards,
Pamela
-
Hi Pamela Bretscher and Nalini Ganapati, thank you for your suggestion but this is not the issue. GATK Mutect2 automatically creates both a .idx and a .stats file for the generated VCF. I never got an "index-missing" error.
In fact, if I repeat your passages on the isolated VCF (1. indexing with GATK IndexFeatureFile, 2. GenomicsDBImport and 3. SelectVariants) I keep get an empty file.
-
Enrico Cocchi, we are not able to reproduce the issue. We do not get the [E::vcf_parse_format] errors and genomicsdb seems to be correctly generated.
./gatk --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' GenomicsDBImport -L 1:883600-883650 -V ~/gatk/mutect2_bug/ex001.mutect2-output.vcf --genomicsdb-workspace-path try
Using GATK jar /nfs/home/nalini/gatk_4.2/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /nfs/home/nalini/gatk_4.2/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar GenomicsDBImport -L 1:883600-883650 -V /nfs/home/nalini/gatk/mutect2_bug/ex001.mutect2-output.vcf --genomicsdb-workspace-path try
08:44:53.865 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/nfs/home/nalini/gatk_4.2/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 27, 2021 8:44:54 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:44:54.014 INFO GenomicsDBImport - ------------------------------------------------------------
08:44:54.014 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.2.0.0
08:44:54.014 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
08:44:54.014 INFO GenomicsDBImport - Executing as nalini on Linux v3.10.0-1160.31.1.el7.x86_64 amd64
08:44:54.015 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_292-b10
08:44:54.015 INFO GenomicsDBImport - Start Date/Time: July 27, 2021 8:44:53 AM PDT
08:44:54.015 INFO GenomicsDBImport - ------------------------------------------------------------
08:44:54.015 INFO GenomicsDBImport - ------------------------------------------------------------
08:44:54.016 INFO GenomicsDBImport - HTSJDK Version: 2.24.0
08:44:54.016 INFO GenomicsDBImport - Picard Version: 2.25.0
08:44:54.016 INFO GenomicsDBImport - Built for Spark Version: 2.4.5
08:44:54.016 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:44:54.016 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:44:54.016 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:44:54.016 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:44:54.016 INFO GenomicsDBImport - Deflater: IntelDeflater
08:44:54.017 INFO GenomicsDBImport - Inflater: IntelInflater
08:44:54.017 INFO GenomicsDBImport - GCS max retries/reopens: 20
08:44:54.017 INFO GenomicsDBImport - Requester pays: disabled
08:44:54.017 INFO GenomicsDBImport - Initializing engine
08:44:54.374 INFO IntervalArgumentCollection - Processing 51 bp from intervals
08:44:54.375 INFO GenomicsDBImport - Done initializing engine
08:44:54.630 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.2-e18fa63
08:44:54.631 INFO GenomicsDBImport - Vid Map JSON file will be written to /nfs/home/nalini/gatk_4.2/gatk-4.2.0.0/try/vidmap.json
08:44:54.631 INFO GenomicsDBImport - Callset Map JSON file will be written to /nfs/home/nalini/gatk_4.2/gatk-4.2.0.0/try/callset.json
08:44:54.631 INFO GenomicsDBImport - Complete VCF Header will be written to /nfs/home/nalini/gatk_4.2/gatk-4.2.0.0/try/vcfheader.vcf
08:44:54.631 INFO GenomicsDBImport - Importing to workspace - /nfs/home/nalini/gatk_4.2/gatk-4.2.0.0/try
08:44:54.631 INFO ProgressMeter - Starting traversal
08:44:54.631 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
08:44:54.782 INFO GenomicsDBImport - Importing batch 1 with 1 samples
08:44:54.908 INFO GenomicsDBImport - Done importing batch 1/1
08:44:54.909 INFO ProgressMeter - unmapped 0.0 1 216.6
08:44:54.910 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 0.0 minutes.
08:44:54.910 INFO GenomicsDBImport - Import completed!
08:44:54.910 INFO GenomicsDBImport - Shutting down engine
[July 27, 2021 8:44:54 AM PDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2311585792
Tool returned:
trueTail of output vcf from running SelectVariants:
##source=GenomicsDBImport
##source=SelectVariants
##tumor_sample=ex001
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ex001
1 883625 . A G . . AS_SB_TABLE=0,0|12,41;DP=54 GT:AD:AF:F1R2:F2R1:SB:DP ./.:
0,53:0.981:0,26:0,26:0,0,12,41:53The [E::vcf_parse_format] errors that you encounter are from htslib, so wondering if you have a libhtslib.so installed and if that is interfering with GenomicsDB. Some quick ways to check -
- ldconfig -p | grep libhts
- locate libhts
- Check $LD_LIBRARY_PATH for libhts.so
If any of these yield results, you may have to try a fresh, docker image of gatk - https://hub.docker.com/r/broadinstitute/gatk/ instead of a simple zip download of gatk while we continue to look for some solutions.
Please sign in to leave a comment.
15 comments