Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenotypeGVCFs: java.lang.NullPointerException

0

9 comments

  • Avatar
    sahuno

    Also facing a similar issue; I run haplotype-caller in gvcf mode with

    `gatk Version=4.1.0.0` on the cloud/Terra, then run GenomicsDBImport on our clusters with

    gatk Version="4.1.3.0"  followed by

    GenotypeGVCFs

    gatk Version="4.1.3.0 with dbsnp file downloaded from broad's public data GCP bucket --dbsnp `hg38_resources/Homo_sapiens_assembly38.dbsnp138.vcf`

    Question; i got similar warning message above as Anze Staric (pasted below) but got a final output.vcf with variants. is this something I should be worried about?

    I have also attached header of the final vcf containg all the commands used for your perusal.

     

    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records

    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records

    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records

    GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),2.3133200000000008E-4,Cpu time(s),2.21003E-4

    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records

    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records

    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records

    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records

    GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),6.935560000000002E-4,Cpu time(s),6.748520000000001E-4

    WARNING: No valid combination operation found for INFO field DS - the field will NOT be part of INFO fields in the generated VCF records

    WARNING: No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records

    WARNING: No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records

    WARNING: No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records

    GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),2.71984E-4,Cpu time(s),2.6861999999999996E-4

    10:26:55.578 INFO  GenotypeGVCFs - Shutting down engine

    [March 30, 2020 10:26:55 AM EDT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 4.00 minutes.

    Runtime.totalMemory()=1815609344

    java.lang.IllegalStateException: There are no sources based on those query parameters

            at org.genomicsdb.reader.GenomicsDBFeatureIterator.<init>(GenomicsDBFeatureIterator.java:132)

            at org.genomicsdb.reader.GenomicsDBFeatureReader.query(GenomicsDBFeatureReader.java:144)

            at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.queryNextInterval(FeatureIntervalIterator.java:135)

            at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.loadNextFeature(FeatureIntervalIterator.java:92)

            at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.loadNextNovelFeature(FeatureIntervalIterator.java:74)

            at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.next(FeatureIntervalIterator.java:62)

            at org.broadinstitute.hellbender.engine.FeatureIntervalIterator.next(FeatureIntervalIterator.java:24)

            at java.util.Iterator.forEachRemaining(Iterator.java:116)

            at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)

            at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)

            at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)

            at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)

            at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)

            at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)

            at java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:423)

            at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:134)

            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)

            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)

            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)

            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)

            at org.broadinstitute.hellbender.Main.main(Main.java:291)

     

     

    vcf header

    ##fileformat=VCFv4.2
    ##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
    ##FILTER=<ID=LowQual,Description="Low quality">
    ##FILTER=<ID=PASS,Description="All filters passed">
    ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
    ##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
    ##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
    ##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
    ##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
    ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
    ##GATKCommandLine=<ID=GenomicsDBImport,CommandLine="GenomicsDBImport --genomicsdb-workspace-path colon_Germline41_database --variant gvcfs/CC10008_Germline.hg38.g.vcf.gz --variant gvcfs/CC17136_Germline.hg38.g.vcf.gz --variant gvcfs/CC8422_Germline.hg38.g.vcf.gz --variant gvcfs/CC8540_Germline.hg38.g.vcf.gz --variant gvcfs/CC10010_Germline.hg38.g.vcf.gz --variant gvcfs/CC17137_Germline.hg38.g.vcf.gz --variant gvcfs/CC8425_Germline.hg38.g.vcf.gz --variant gvcfs/CC9209_Germline.hg38.g.vcf.gz --variant gvcfs/CC17006_Germline.hg38.g.vcf.gz --variant gvcfs/CC17156_Germline.hg38.g.vcf.gz --variant gvcfs/CC8437_Germline.hg38.g.vcf.gz --variant gvcfs/CC9216_Germline.hg38.g.vcf.gz --variant gvcfs/CC17015_Germline.hg38.g.vcf.gz --variant gvcfs/CC17162_Germline.hg38.g.vcf.gz --variant gvcfs/CC8441_Germline.hg38.g.vcf.gz --variant gvcfs/CC9222_Germline.hg38.g.vcf.gz --variant gvcfs/CC17035_Germline.hg38.g.vcf.gz --variant gvcfs/CC17167_Germline.hg38.g.vcf.gz --variant gvcfs/CC8458_Germline.hg38.g.vcf.gz --variant gvcfs/CC9264_Germline.hg38.g.vcf.gz --variant gvcfs/CC17056_Germline.hg38.g.vcf.gz --variant gvcfs/CC17184_Germline.hg38.g.vcf.gz --variant gvcfs/CC8471_Germline.hg38.g.vcf.gz --variant gvcfs/CC9337_Germline.hg38.g.vcf.gz --variant gvcfs/CC17057_Germline.hg38.g.vcf.gz --variant gvcfs/CC17202_Germline.hg38.g.vcf.gz --variant gvcfs/CC8473_Germline.hg38.g.vcf.gz --variant gvcfs/CC9394_Germline.hg38.g.vcf.gz --variant gvcfs/CC17070_Germline.hg38.g.vcf.gz --variant gvcfs/CC8206_Germline.hg38.g.vcf.gz --variant gvcfs/CC8479_Germline.hg38.g.vcf.gz --variant gvcfs/CC9535_Germline.hg38.g.vcf.gz --variant gvcfs/CC8339_Germline.hg38.g.vcf.gz --variant gvcfs/CC8497_Germline.hg38.g.vcf.gz --variant gvcfs/CC17110_Germline.hg38.g.vcf.gz --variant gvcfs/CC8348_Germline.hg38.g.vcf.gz --variant gvcfs/CC8504_Germline.hg38.g.vcf.gz --variant gvcfs/CC17122_Germline.hg38.g.vcf.gz --variant gvcfs/CC8349_Germline.hg38.g.vcf.gz --variant gvcfs_extra/CC8509_Germline.hg38.g.vcf.gz --variant gvcfs_extra/CC17091_Germline.hg38.g.vcf.gz --intervals ../Exome-Agilent_V6_UTR.bed --genomicsdb-segment-size 1048576 --genomicsdb-vcf-buffer-size 16384 --overwrite-existing-genomicsdb-workspace false --batch-size 0 --consolidate false --validate-sample-name-map false --merge-input-intervals false --reader-threads 1 --max-num-intervals-to-import-in-parallel 1 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 0 --cloud-index-prefetch-buffer 0 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.1.3.0",Date="March 30, 2020 7:49:11 AM EDT">
    ##GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs --output output.JointGenotype.Germline41samples.colon.vcf.gz --dbsnp /sc/hydra/projects/canan/colon_cancer10312019/joint_calling_colon/hg38_resources/Homo_sapiens_assembly38.dbsnp138.vcf --variant gendb://colon_Germline41_database --intervals ../Exome-Agilent_V6_UTR.bed --reference /sc/hydra/projects/canan/colon_cancer10312019/intervals_preprocessing/Homo_sapiens_assembly38.fasta --include-non-variant-sites false --merge-input-intervals false --input-is-somatic false --tumor-lod-to-emit 3.5 --allele-fraction-error 0.001 --keep-combined-raw-annotations false --use-new-qual-calculator true --use-old-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --only-output-calls-starting-in-intervals false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false --allow-old-rms-mapping-quality-annotation-data false",Version="4.1.3.0",Date="March 30, 2020 10:22:59 AM EDT">
    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --emit-ref-confidence GVCF --contamination-fraction-to-filter 0.0 --output CC10008_Germline.hg38.g.vcf.gz --intervals /cromwell_root/fc-1f2cec15-4c83-4842-8fde-c79a1131b2dd/scattered_calling_intervals/0011-scattered.interval_list --input gs://fc-1f2cec15-4c83-4842-8fde-c79a1131b2dd/b258dbac-459b-479e-81f0-bfdc79b110f2/PreProcessingForVariantDiscovery_GATK4/67847d90-640d-45c0-93b7-b62521f4b693/call-GatherBamFiles/CC10008_Germline.hg38.bam --reference /cromwell_root/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --disable-optimizations false --just-determine-active-regions false --dont-genotype false --max-mnp-distance 0 --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --adaptive-pruning false --do-not-recover-dangling-branches false --recover-dangling-heads false --consensus false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 1.0 --max-unpruned-variants 100 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false --use-filtered-reads-for-annotations false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --correct-overlapping-quality false --use-new-qual-calculator true --use-old-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --genotyping-mode DISCOVERY --genotype-filtered-alleles false --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version=4.1.0.0,Date="March 24, 2020 1:55:05 AM UTC">

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Anze Staric

     

    Here are some options for you to try:

    1. Use GATKv4.1.6.0 GenotypeGVCF with -genomicsdb-use-vcf-codec.
    2. Did you use same version for both GenomicsDBImport and GenotypeGVCFs? if not what versions did you use?
    3. If options 1 and 2 dont work, then try going back to GATKv4.1.4.1 and see if that works

    Please keep us posted on the progress. We would love to know which option worked and which didn't.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    sahuno you don't have to worry about warnings.

    0
    Comment actions Permalink
  • Avatar
    sahuno

    thanks for the reply Bhanu Gandham but actually the results(variants) in the final output.vcf i was talking about was truncated. it contains only few variants up to chr1 (chr1:6519807). results from rest of chr1, chr2-X,Y were not present. 

    0
    Comment actions Permalink
  • Avatar
    Anze Staric

    Bhanu Gandham, thanks for your suggestions.

    1) works. Should we always use this flag when calling GenotypeGVCFs on GenomicsDB inputs?
    2, 3)
    all tools from 4.1.4.1: works
    all tools from 4.1.5: NullPointerException
    all tools from 4.1.6: NullPointerException
    HC+Import 4.1.4.1, GGVCF 4.1.6 NullPointerException
    HC+Import 4.1.6, GGVCF 4.1.4.1: works

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    sahuno

     

    The issue you are facing is different from the one submitted by Anze. Can you please repost you question in  anew thread to make it easier to manage? Thank you so much!

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Anze Staric

     

    I am glad to see that the solution worked. I am going to check with the dev team and get back to you abour suggested guidelines for use of that argument. It might take a while for me to get back to you though, since there are other things on my priority list I am working on.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Anze Staric

     

    Would you be willing to share a subset of your data to reproduce the null pointer exception you experienced? This will help us troubleshoot and suggest guidelines for use of -genomicsdb-use-vcf-codec argument.

     

    Here is how you can share your data with us: https://gatk.zendesk.com/hc/en-us/articles/360035889671

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    We have created an issue ticket for this and you can follow its progress here: https://github.com/broadinstitute/gatk/issues/6548

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk