Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenotypeGVCFs : VariantQueryProcessorException

0

8 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Henri-Jean,

    Thank you for writing into the GATK forum for this issue! It does seem like a strange error message, I haven't seen it before. What tool did you use to create these GVCFs? If it was HaplotypeCaller, could you provide the command you used?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Henri-Jean Garchon

    Hi Genevieve,

    Thank you for feedback.

    Here is the HaplotypeCaller command, retrieved from the header of the sample VCF file :

    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --standard-min-confidence-threshold-for-calling 0.0 --emit-ref-confidence GVCF --output /ccc/scratch/cont007/fg0166/fg0166/projet_RECORDS_747/ANALYSE/ANALYSE_R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me/gatk4/R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me_01.g.vcf.gz --intervals /ccc/scratch/cont007/fg0166/fg0166/projet_RECORDS_747/ANALYSE/ANALYSE_R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me/varscope_tmp/genomeRegionSplits/01_overlap.bed --input /ccc/scratch/cont007/fg0166/fg0166/projet_RECORDS_747/ANALYSE/ANALYSE_R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me/recalibration/BQSR.01_R747_C002DUA_HL32CDSX2-1-2-3-4-DUAL285_hs38me.bam --reference /ccc/work/cont007/fg/fg/biobank/by-taxonid/9606/hs38me/hs38me_all_chr.fasta --annotation Coverage --annotation ChromosomeCounts --annotation BaseQuality --annotation FragmentLength --annotation MappingQuality --annotation ReadPosition --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --floor-blocks false --indel-size-to-eliminate-in-ref-model 10 --disable-optimizations false --just-determine-active-regions false --dont-genotype false --do-not-run-physical-phasing false --do-not-correct-overlapping-quality false --use-filtered-reads-for-annotations false --adaptive-pruning false --do-not-recover-dangling-branches false --recover-dangling-heads false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --max-mnp-distance 0 --force-call-filtered-alleles false --allele-informative-reads-overlap-margin 2 --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-reads-per-alignment-start 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false --allow-old-rms-mapping-quality-annotation-data false",Version="4.1.8.0",Date="January 10, 2022 1:54:45 PM CET">

    Thank you for your help

    Henri-Jean

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Henri-Jean Garchon,

    I brought this issue up with the developers to see if they had any insight. We were wondering if you concatenated your GVCFs at all? And if you could check your bed file that you used for intervals to see if there is any overlap? 

    We would be able to determine the coordinates of this error if you can share the vidmapper.json file (in your GenomicsDB workspace) with us. Here are the instructions for how to submit this file: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Henri-Jean Garchon

    Hi Genevieve,

    I apologize for my late reply.

    I just uploaded the vidmap.json at the indicated ftp site. The filename is "GenotypeGVCFs_VQPE_vidmap.json".

    Best regards

    Henri-Jean

    0
    Comment actions Permalink
  • Avatar
    Henri-Jean Garchon

    Hi Genevieve,

    I forgot to address the first question you raised in your comment.

    The GVCF were not concatenated. I am not sure of which bedfile you are referring to.

    Best regards

    Henri-Jean

    0
    Comment actions Permalink
  • Avatar
    Joanna Griffiths

    Hello,

    I am wondering if this issue was ever resolved? I am running into the same error message originating from a single sample across multiple DB intervals. Is the only way to fix it to remove the sample causing the issues?

     

    Thank you!

    Joanna

    0
    Comment actions Permalink
  • Avatar
    Laura Gauthier

    Hi Joanna Griffiths

    I've worked with GenomicsDB a lot, but I've never seen this error message before.  I have one suggestion you can try before we pass this issue on to the GenomicsDB development team.  The GATK tool ReblockGVCF is very particular about gaps and overlaps in GVCFs and will correct overlapping reference blocks and reference blocks overlapping variants.  Since you only have the one sample, you can process it in maybe an hour with that tool and hopefully it will fix the issue.  The main purpose of the tool is to compress reference blocks by combining similar quality scores.  This will only affect hom-ref genotypes, so in your single-sample case it may not make a difference at all.  The default parameters will lead to significant decrease in file size by using a low quality [0,20) GQ band and a high quality [20,99] GQ band.  (See the GVCF doc on the forum if this is unfamiliar.) There's an example command here https://github.com/broadinstitute/warp/blob/e3d65dba8f9e3682fc709d9bc29dfe077981e75c/tasks/broad/GermlineVariantDiscovery.wdl#L217  That's our internal best practices for compression, but if you want to replicate the HaplotypeCaller reference blocks exactly you can copy the gvcf-gq-bands arguments from the HaplotypeCaller command header at the top of the GVCF.

    I hope that helps!

    Laura

    0
    Comment actions Permalink
  • Avatar
    Joanna Griffiths

    Hi Laura, 

    Thank you so much for your help and the suggestion! I actually had 96 samples in my dataset. Since you mentioned this seems to be a rare error, I went back and redid some steps (i.e. created new gvcfs) with more genome intervals (I increased it to 200 instead of 50). This seems to have fixed the issue entirely.

     

    Thank you again!

    Joanna

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk