GenotypeGVCFs : VariantQueryProcessorException
REQUIRED for all errors and issues:
a) GATK version used: 4.3.0.0
Java runtime: OpenJDK 64-Bit Server VM v11.0.16+8-post-Debian-1deb11u1
b) Exact command used:
/usr/local/bin/gatk-4.3.0.0/gatk --java-options "-Xmx2G -Xms2G -XX:ParallelGCThreads=2" GenotypeGVCFs -R /home/NGS/Ressources/bwa_reference/hs38DH.fa -V gendb://../recordsDB/records_db_13_1 -O records_chr13_1_raw.vcf.gz --tmp-dir TMP 2> GenotypeGVCFs_chr13_1.log &
c) Tail of the program log:
hjg@spitzberg:/data/records_wgs/variants/raw_vcfs$ tail GenotypeGVCFs_chr13_1_backup.log
17:26:40.139 INFO ProgressMeter - chr13:22944237 281.2 6744000 23982.3
17:26:51.649 INFO ProgressMeter - chr13:22950240 281.4 6750000 23987.3
17:27:02.214 INFO ProgressMeter - chr13:22956240 281.6 6756000 23993.6
17:27:15.769 INFO ProgressMeter - chr13:22962246 281.8 6762000 23995.7
17:27:28.601 INFO ProgressMeter - chr13:22964250 282.0 6764000 23984.5
17:27:38.968 INFO ProgressMeter - chr13:22967280 282.2 6767000 23980.5
17:27:49.210 INFO ProgressMeter - chr13:22971290 282.4 6771000 23980.2
17:28:01.306 INFO ProgressMeter - chr13:22978290 282.6 6778000 23987.8
terminate called after throwing an instance of 'VariantQueryProcessorException'
what(): VariantQueryProcessorException : Unhandled overlapping variants at columns 2100026489 and 2100026490 for row 147
Additional information :
Rebuilding the records_db_13_1 store without the g.vcf sample corresponding to row 147 and running the above command again worked well.
Here is the "wrong" g.vcf around the last position called by the log file:
hjg@spitzberg:/data/records_wgs/variants/gVCFs/norm1$ bcftools view -Hr chr13:22977900-22978600 C002DUA.norm.g.vcf.gz
chr13 22977896 . A <NON_REF> . . END=22977928 GT:DP:GQ:MIN_DP:PL 0/0:38:99:36:0,100,1236
chr13 22977929 . A <NON_REF> . . END=22977929 GT:DP:GQ:MIN_DP:PL 0/0:43:80:43:0,80,1376
chr13 22977930 . G <NON_REF> . . END=22977968 GT:DP:GQ:MIN_DP:PL 0/0:48:99:43:0,115,1420
chr13 22977969 . A G,<NON_REF> 1635.06 . DP=50;ExcessHet=3.0103;MBQ=0,30,0;MFRL=0,373,0;MLEAC=2,0;MLEAF=1,0;MMQ=60,60,60;MPOS=35,50;RAW_MQandDP=180000,50 GT:AD:DP:GQ:PL:SB 1/1:0,48,0:48:99:1649,144,0,1649,144,1649:0,0,25,23
chr13 22977970 . C <NON_REF> . . END=22978174 GT:DP:GQ:MIN_DP:PL 0/0:45:99:38:0,102,1233
chr13 22978175 . T <NON_REF> . . END=22978175 GT:DP:GQ:MIN_DP:PL 0/0:41:89:41:0,89,1356
chr13 22978176 . A <NON_REF> . . END=22978337 GT:DP:GQ:MIN_DP:PL 0/0:46:99:41:0,110,1436
chr13 22978338 . T <NON_REF> . . END=22978338 GT:DP:GQ:MIN_DP:PL 0/0:47:92:47:0,92,1505
chr13 22978339 . T <NON_REF> . . END=22978362 GT:DP:GQ:MIN_DP:PL 0/0:44:99:42:0,106,1415
chr13 22978363 . T <NON_REF> . . END=22978363 GT:DP:GQ:MIN_DP:PL 0/0:44:97:44:0,97,1430
chr13 22978364 . G <NON_REF> . . END=22978389 GT:DP:GQ:MIN_DP:PL 0/0:45:99:43:0,110,1414
chr13 22978390 . C <NON_REF> . . END=22978390 GT:DP:GQ:MIN_DP:PL 0/0:48:75:48:0,75,1500
chr13 22978391 . C <NON_REF> . . END=22978416 GT:DP:GQ:MIN_DP:PL 0/0:45:99:41:0,109,1356
chr13 22978417 . G <NON_REF> . . END=22978417 GT:DP:GQ:MIN_DP:PL 0/0:41:98:41:0,98,1215
chr13 22978418 . T <NON_REF> . . END=22978454 GT:DP:GQ:MIN_DP:PL 0/0:41:99:37:0,102,1286
chr13 22978455 . A <NON_REF> . . END=22978455 GT:DP:GQ:MIN_DP:PL 0/0:36:94:36:0,94,1182
chr13 22978456 . T <NON_REF> . . END=22978465 GT:DP:GQ:MIN_DP:PL 0/0:36:99:34:0,99,1182
chr13 22978466 . C <NON_REF> . . END=22978473 GT:DP:GQ:MIN_DP:PL 0/0:37:90:36:0,90,1350
chr13 22978474 . G <NON_REF> . . END=22978483 GT:DP:GQ:MIN_DP:PL 0/0:33:84:31:0,84,1260
chr13 22978484 . G <NON_REF> . . END=22978488 GT:DP:GQ:MIN_DP:PL 0/0:32:90:32:0,90,1350
chr13 22978489 . G <NON_REF> . . END=22978494 GT:DP:GQ:MIN_DP:PL 0/0:34:99:33:0,99,1137
chr13 22978495 . A <NON_REF> . . END=22978495 GT:DP:GQ:MIN_DP:PL 0/0:32:96:32:0,96,1092
chr13 22978496 . C <NON_REF> . . END=22978496 GT:DP:GQ:MIN_DP:PL 0/0:33:99:33:0,99,1137
chr13 22978497 . C <NON_REF> . . END=22978497 GT:DP:GQ:MIN_DP:PL 0/0:33:96:33:0,96,1440
chr13 22978498 . T <NON_REF> . . END=22978498 GT:DP:GQ:MIN_DP:PL 0/0:33:85:33:0,85,1098
chr13 22978499 . G <NON_REF> . . END=22978499 GT:DP:GQ:MIN_DP:PL 0/0:33:96:33:0,96,1440
chr13 22978500 . T <NON_REF> . . END=22978500 GT:DP:GQ:MIN_DP:PL 0/0:32:82:32:0,82,1043
chr13 22978501 . A <NON_REF> . . END=22978512 GT:DP:GQ:MIN_DP:PL 0/0:35:90:32:0,90,1092
chr13 22978513 . A <NON_REF> . . END=22978527 GT:DP:GQ:MIN_DP:PL 0/0:31:81:30:0,81,1215
chr13 22978528 . C <NON_REF> . . END=22978551 GT:DP:GQ:MIN_DP:PL 0/0:28:75:26:0,75,1125
chr13 22978552 . A <NON_REF> . . END=22978572 GT:DP:GQ:MIN_DP:PL 0/0:25:60:23:0,60,769
chr13 22978573 . G <NON_REF> . . END=22978573 GT:DP:GQ:MIN_DP:PL 0/0:22:52:22:0,52,695
chr13 22978574 . G <NON_REF> . . END=22978578 GT:DP:GQ:MIN_DP:PL 0/0:22:63:22:0,63,744
chr13 22978579 . G <NON_REF> . . END=22978579 GT:DP:GQ:MIN_DP:PL 0/0:22:52:22:0,52,706
chr13 22978580 . G <NON_REF> . . END=22978591 GT:DP:GQ:MIN_DP:PL 0/0:24:66:22:0,66,704
chr13 22978592 . G <NON_REF> . . END=22978592 GT:DP:GQ:MIN_DP:PL 0/0:24:47:24:0,47,745
chr13 22978593 . T <NON_REF> . . END=22978602 GT:DP:GQ:MIN_DP:PL 0/0:23:66:23:0,66,759
Note that this C002DUA.g.vcf.gz sample worked well for the remaining whole genome.
I am working on a set of 701 WGS. Apart from this sample with this region of chr13, I have a similar issue of "VariantQueryProcessorException : Unhandled overlapping variants at columns ...." for two other samples, both for chr1:100,000,001-150,000,000.
In anticipation, many thanks for your help.
Henri-Jean
-
Hi Henri-Jean,
Thank you for writing into the GATK forum for this issue! It does seem like a strange error message, I haven't seen it before. What tool did you use to create these GVCFs? If it was HaplotypeCaller, could you provide the command you used?
Thank you,
Genevieve
-
Hi Genevieve,
Thank you for feedback.
Here is the HaplotypeCaller command, retrieved from the header of the sample VCF file :
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --standard-min-confidence-threshold-for-calling 0.0 --emit-ref-confidence GVCF --output /ccc/scratch/cont007/fg0166/fg0166/projet_RECORDS_747/ANALYSE/ANALYSE_R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me/gatk4/R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me_01.g.vcf.gz --intervals /ccc/scratch/cont007/fg0166/fg0166/projet_RECORDS_747/ANALYSE/ANALYSE_R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me/varscope_tmp/genomeRegionSplits/01_overlap.bed --input /ccc/scratch/cont007/fg0166/fg0166/projet_RECORDS_747/ANALYSE/ANALYSE_R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me/recalibration/BQSR.01_R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me.bam --reference /ccc/work/cont007/fg/fg/biobank/by-taxonid/9606/hs38me/hs38me_all_chr.fasta --annotation Coverage --annotation ChromosomeCounts --annotation BaseQuality --annotation FragmentLength --annotation MappingQuality --annotation ReadPosition --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --floor-blocks false --indel-size-to-eliminate-in-ref-model 10 --disable-optimizations false --just-determine-active-regions false --dont-genotype false --do-not-run-physical-phasing false --do-not-correct-overlapping-quality false --use-filtered-reads-for-annotations false --adaptive-pruning false --do-not-recover-dangling-branches false --recover-dangling-heads false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --max-mnp-distance 0 --force-call-filtered-alleles false --allele-informative-reads-overlap-margin 2 --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-reads-per-alignment-start 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false --allow-old-rms-mapping-quality-annotation-data false",Version="4.1.8.0",Date="January 10, 2022 1:54:45 PM CET">
Thank you for your help
Henri-Jean
-
I brought this issue up with the developers to see if they had any insight. We were wondering if you concatenated your GVCFs at all? And if you could check your bed file that you used for intervals to see if there is any overlap?
We would be able to determine the coordinates of this error if you can share the vidmapper.json file (in your GenomicsDB workspace) with us. Here are the instructions for how to submit this file: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671
Best,
Genevieve
-
Hi Genevieve,
I apologize for my late reply.
I just uploaded the vidmap.json at the indicated ftp site. The filename is "GenotypeGVCFs_VQPE_vidmap.json".
Best regards
Henri-Jean
-
Hi Genevieve,
I forgot to address the first question you raised in your comment.
The GVCF were not concatenated. I am not sure of which bedfile you are referring to.
Best regards
Henri-Jean
-
Hello,
I am wondering if this issue was ever resolved? I am running into the same error message originating from a single sample across multiple DB intervals. Is the only way to fix it to remove the sample causing the issues?
Thank you!
Joanna
-
I've worked with GenomicsDB a lot, but I've never seen this error message before. I have one suggestion you can try before we pass this issue on to the GenomicsDB development team. The GATK tool ReblockGVCF is very particular about gaps and overlaps in GVCFs and will correct overlapping reference blocks and reference blocks overlapping variants. Since you only have the one sample, you can process it in maybe an hour with that tool and hopefully it will fix the issue. The main purpose of the tool is to compress reference blocks by combining similar quality scores. This will only affect hom-ref genotypes, so in your single-sample case it may not make a difference at all. The default parameters will lead to significant decrease in file size by using a low quality [0,20) GQ band and a high quality [20,99] GQ band. (See the GVCF doc on the forum if this is unfamiliar.) There's an example command here https://github.com/broadinstitute/warp/blob/e3d65dba8f9e3682fc709d9bc29dfe077981e75c/tasks/broad/GermlineVariantDiscovery.wdl#L217 That's our internal best practices for compression, but if you want to replicate the HaplotypeCaller reference blocks exactly you can copy the gvcf-gq-bands arguments from the HaplotypeCaller command header at the top of the GVCF.
I hope that helps!
Laura
-
Hi Laura,
Thank you so much for your help and the suggestion! I actually had 96 samples in my dataset. Since you mentioned this seems to be a rare error, I went back and redid some steps (i.e. created new gvcfs) with more genome intervals (I increased it to 200 instead of 50). This seems to have fixed the issue entirely.
Thank you again!
Joanna
Please sign in to leave a comment.
8 comments