Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenotypeGVCFs error IndexOutOfBoundsException

Answered
0

14 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Quentin Chartreux,

    Thanks for posting this on the forum! This does look like a bug, so I have created an issue ticket so that the developers can take a look and make changes to the tools. Do you know if this issue occurs with GATK 4.2.1.0?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Quentin Chartreux

    Hi Genevieve Brandt (she/her)

    I don't try on 4.2 1.0 but it's work on 4.2.0.0 even if this, to my opinion, not a good deal to run all on gatk 4.2.2.0 eexcept genotypegvcf... 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yes, we definitely want to fix this issue! I am asking so that we can narrow in on what change caused this issue.

    0
    Comment actions Permalink
  • Avatar
    Karoliina Salenius

    Hi! I have the same issue with versions 4.2.2.0 & 4.2.1.0. Works fine with 4.2.0.0. For the prior steps (preprocessing, haplotypecaller & genomicsDBImport) I used the latest (4.2.2.0).

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Karoliina Salenius thank you for providing extra information! 

    We are looking into what is causing this bug and are wondering if the java.lang.IndexOutOfBoundsException comes up when certain annotations are missing.

    Could you paste a few lines from your VCF using SelectVariants, Karoliina Salenius or Quentin Chartreux?

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Quentin Chartreux

    Genevieve Brandt (she/her)

    I don't know what is useful for you so this is first the begining of the header of the single sample gvcf: 

     

    ##fileformat=VCFv4.2
    ##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT">
    ##FILTER=<ID=LowQual,Description="Low quality">
    ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
    ##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not int
    ended to describe called alleles">
    ##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
    ##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
    ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --gvcf-gq-bands 10 --gvcf-gq-bands 20 --gvcf-gq-bands 30 --gvcf-gq-bands 40 --gvcf-gq-bands 50 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq
    -bands 80 --gvcf-gq-bands 90 --pcr-indel-model NONE --emit-ref-confidence GVCF --output /shared/projects/gentaumix/processed_data/05_VCF/A_gVCF/G632_DA_C001KVY_1_HYH2CDSXX.DUAL2.trimmed.align.filtered.dedup.fi
    xed.recal.bam/temp_gVCF/0011.G632_DA_C001KVY_1_HYH2CDSXX.DUAL2.trimmed.align.filtered.dedup.fixed.recal.g.vcf.gz --intervals /shared/projects/gentaumix/processed_data/05_VCF/A_gVCF/G632_DA_C001KVY_1_HYH2CDSXX.
    DUAL2.trimmed.align.filtered.dedup.fixed.recal.bam/interval/temp_0011_of_50/scattered.interval_list --input /shared/projects/gentaumix/processed_data/03_Alignment/D_recal/G632_DA_C001KVY_1_HYH2CDSXX.DUAL2.trim
    med.align.filtered.dedup.fixed.recal.bam --reference /shared/projects/gentaumix/Ressources/grch38_BWA_2/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa --annotation-group StandardAnnotation --annota
    tion-group AS_StandardAnnotation --annotation-group StandardHCAnnotation --use-posteriors-to-calculate-qual false --dont-use-dragstr-priors false --use-new-qual-calculator true --annotate-with-num-discovered-a
    lleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-
    ploidy 2 --num-reference-samples-if-no-call 0 --genotype-assignment-method USE_PLS_TO_ASSIGN --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --floor-blocks false --
    indel-size-to-eliminate-in-ref-model 10 --disable-optimizations false --dragen-mode false --apply-bqd false --apply-frd false --disable-spanning-event-genotyping false --transform-dragen-mapping-quality false
    --mapping-quality-threshold-for-genotyping 20 --max-effective-depth-adjustment-for-frd 0 --just-determine-active-regions false --dont-genotype false --do-not-run-physical-phasing false --do-not-correct-overlap
    ping-quality false --use-filtered-reads-for-annotations false --adaptive-pruning false --do-not-recover-dangling-branches false --recover-dangling-heads false --kmer-size 10 --kmer-size 25 --dont-increase-kmer
    -sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2
    --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-art
    ificial-haplotype-recovery false --enable-legacy-graph-cycle-detection false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --num-matching-bases-in-dangling-end
    -to-recover -1 --error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --base-quality-score-threshold 18 --dragst
    r-het-hom-ratio 2 --dont-use-dragstr-pair-hmm-scores false --pair-hmm-gap-continuation-penalty 10 --expected-mismatch-rate-for-read-disqualification 0.02 --pair-hmm-implementation FASTEST_AVAILABLE --phred-sca
    led-global-read-mismapping-rate 45 --disable-symmetric-hmm-normalizing false --disable-cap-base-qualities-to-map-quality false --enable-dynamic-read-disqualification-for-genotyping false --dynamic-read-disqual
    ification-threshold 1.0 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-wate
    rman JAVA --max-mnp-distance 0 --force-call-filtered-alleles false --soft-clip-low-quality-ends false --allele-informative-reads-overlap-margin 2 --min-assembly-region-size 50 --max-assembly-region-size 300 --
    active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-exten
    sion-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assembly-region-trimming false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --int
    erval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --
    create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40
    --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jd
    k-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations
    false --allow-old-rms-mapping-quality-annotation-data false",Version="4.2.2.0",Date="4 septembre 2021 07:24:25 CEST">
    ##GVCFBlock0-10=minGQ=0(inclusive),maxGQ=10(exclusive)
    ##GVCFBlock10-20=minGQ=10(inclusive),maxGQ=20(exclusive)
    ##GVCFBlock20-30=minGQ=20(inclusive),maxGQ=30(exclusive)
    ##GVCFBlock30-40=minGQ=30(inclusive),maxGQ=40(exclusive)
    ##GVCFBlock40-50=minGQ=40(inclusive),maxGQ=50(exclusive)
    ##GVCFBlock50-60=minGQ=50(inclusive),maxGQ=60(exclusive)
    ##GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)
    ##GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)
    ##GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)
    ##GVCFBlock90-100=minGQ=90(inclusive),maxGQ=100(exclusive)
    ##INFO=<ID=AS_InbreedingCoeff,Number=A,Type=Float,Description="Allele-specific inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation"
    >
    ##INFO=<ID=AS_QD,Number=A,Type=Float,Description="Allele-specific Variant Confidence/Quality by Depth">
    ##INFO=<ID=AS_RAW_BaseQRankSum,Number=1,Type=String,Description="raw data for allele specific rank sum test of base qualities">
    ##INFO=<ID=AS_RAW_MQ,Number=1,Type=String,Description="Allele-specfic raw data for RMS Mapping Quality">
    ##INFO=<ID=AS_RAW_MQRankSum,Number=1,Type=String,Description="Allele-specfic raw data for Mapping Quality Rank Sum">
    ##INFO=<ID=AS_RAW_ReadPosRankSum,Number=1,Type=String,Description="allele specific raw data for rank sum test of read position bias">
    ##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests. Includes the reference and alleles separated by |.">
    ##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
    ##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
    ##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
    ##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
    ##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
    ##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
    ##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
    ##INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ formulation.">
    ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
    ##contig=<ID=chr1,length=248956422>
    ##contig=<ID=chr2,length=242193529>
    ##contig=<ID=chr3,length=198295559>
    ##contig=<ID=chr4,length=190214555>

     

    Then there is the rest of the contig. 

    Exemple of one variant from the gvcf: 

    chr1 59193 . T G,<NON_REF> 380.64 . AS_RAW_BaseQRankSum=|-2.0,1|NaN;AS_RAW_MQ=39500.00|19339.00|0.00;AS_RAW_MQRankSum=|-0.7,1|NaN;AS_RAW_ReadPosRankSum=|0.8,1|NaN;AS_SB_TABLE=15,17|10,6|0,0;BaseQRankSum=-1.964;DP=48;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-0.645;RAW_MQandDP=58839,48;ReadPosRankSum=0.810 GT:AD:DP:GQ:PL:SB 0/1:32,16,0:48:99:388,0,930,484,978,1463:15,17,10,6

     

    here is the begining of the header of the multi sample vcf obtained from the genomicsdb with selectvariant : 

    ##fileformat=VCFv4.2
    ##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT">
    ##FILTER=<ID=LowQual,Description="Low quality">
    ##FILTER=<ID=PASS,Description="All filters passed">
    ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
    ##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
    ##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
    ##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
    ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
    ##GATKCommandLine=<ID=GenomicsDBImport,CommandLine="GenomicsDBImport --genomicsdb-workspace-path /tmp/tmp.PKmeTjbuoM/vcf_database/Interval_1 --batch-size 74 --sample-name-map /shared/projects/gentaumix/processed_data/05_VCF/A_gVCF/gentaumix_cohort.sample_map --reader-threads 2 --genomicsdb-shared-posixfs-optimizations true --intervals /shared/projects/gentaumix/Ressources/interval_genomicsdbi/temp_1/interval.interval_list --tmp-dir /tmp/tmp.PKmeTjbuoM --genomicsdb-segment-size 1048576 --genomicsdb-vcf-buffer-size 16384 --overwrite-existing-genomicsdb-workspace false --consolidate false --validate-sample-name-map false --merge-input-intervals false --max-num-intervals-to-import-in-parallel 1 --merge-contigs-into-num-partitions 0 --genomicsdb-use-gcs-hdfs-connector false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 0 --cloud-index-prefetch-buffer 0 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.2.2.0",Date="10 septembre 2021 09:28:45 CEST">
    ##GATKCommandLine=<ID=SelectVariants,CommandLine="SelectVariants --output /shared/projects/gentaumix/processed_data/script/script_test/selectvariant/interval1.vcf.gz --variant gendb:///shared/projects/gentaumix/processed_data/05_VCF/B_vcf_database/Interval_1 --reference /shared/projects/gentaumix/Ressources/grch38_BWA_2/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa --invertSelect false --exclude-non-variants false --exclude-filtered false --preserve-alleles false --remove-unused-alternates false --restrict-alleles-to ALL --keep-original-ac false --keep-original-dp false --mendelian-violation false --invert-mendelian-violation false --mendelian-violation-qual-threshold 0.0 --select-random-fraction 0.0 --remove-fraction-genotypes 0.0 --fully-decode false --max-indel-size 2147483647 --min-indel-size 0 --max-filtered-genotypes 2147483647 --min-filtered-genotypes 0 --max-fraction-filtered-genotypes 1.0 --min-fraction-filtered-genotypes 0.0 --max-nocall-number 2147483647 --max-nocall-fraction 1.0 --set-filtered-gt-to-nocall false --allow-nonoverlapping-command-line-samples false --suppress-reference-path false --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --genomicsdb-use-gcs-hdfs-connector false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.2.2.0",Date="21 septembre 2021 10:37:20 CEST">
    ##GVCFBlock0-10=minGQ=0(inclusive),maxGQ=10(exclusive)
    ##GVCFBlock10-20=minGQ=10(inclusive),maxGQ=20(exclusive)
    ##GVCFBlock20-30=minGQ=20(inclusive),maxGQ=30(exclusive)
    ##GVCFBlock30-40=minGQ=30(inclusive),maxGQ=40(exclusive)
    ##GVCFBlock40-50=minGQ=40(inclusive),maxGQ=50(exclusive)
    ##GVCFBlock50-60=minGQ=50(inclusive),maxGQ=60(exclusive)
    ##GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)
    ##GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)
    ##GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)
    ##GVCFBlock90-100=minGQ=90(inclusive),maxGQ=100(exclusive)
    ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
    ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
    ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
    ##INFO=<ID=AS_InbreedingCoeff,Number=A,Type=Float,Description="Allele-specific inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
    ##INFO=<ID=AS_QD,Number=A,Type=Float,Description="Allele-specific Variant Confidence/Quality by Depth">
    ##INFO=<ID=AS_RAW_BaseQRankSum,Number=1,Type=String,Description="raw data for allele specific rank sum test of base qualities">
    ##INFO=<ID=AS_RAW_MQ,Number=1,Type=String,Description="Allele-specfic raw data for RMS Mapping Quality">
    ##INFO=<ID=AS_RAW_MQRankSum,Number=1,Type=String,Description="Allele-specfic raw data for Mapping Quality Rank Sum">
    ##INFO=<ID=AS_RAW_ReadPosRankSum,Number=1,Type=String,Description="allele specific raw data for rank sum test of read position bias">
    ##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests. Includes the reference and alleles separated by |.">
    ##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
    ##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
    ##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
    ##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
    ##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
    ##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
    ##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
    ##INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ formulation.">
    ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
    ##contig=<ID=chr1,length=248956422,assembly=GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa>
    ##contig=<ID=chr2,length=242193529,assembly=GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa>
    ##contig=<ID=chr3,length=198295559,assembly=GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa>
    ##contig=<ID=chr4,length=190214555,assembly=GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa>
    ##contig=<ID=chr5,length=181538259,assembly=GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa>
    ##contig=<ID=chr6,length=170805979,assembly=GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa>
    ##contig=<ID=chr7,length=159345973,assembly=GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fa>

     

    and a variant: 

     

    chr1 13868 . A G,<NON_REF> . . AS_RAW_BaseQRankSum=|-4.000,1,-3.000,14,-2.000,19,-1.000,10,-0.000,1,1.000,2,4.000,1|;AS_RAW_MQ=466387.000|245200.000|0.000;AS_RAW_MQRankSum=|-1.000,8,-0.000,17,1.000,9,2.000,12,3.000,2|;AS_RAW_ReadPosRankSum=|-1.000,3,0.000,30,1.000,13,2.000,2|;AS_SB_TABLE=117,611|53,338|0,0;BaseQRankSum=-2.483;DP=1670;ExcessHet=3.0103;MQRankSum=0.917;RAW_MQandDP=724138,1141;ReadPosRankSum=0.429 GT:AD:GQ:MIN_DP:PL:SB:DP ./.:.:81:28:0,81,1215,81,1215,1215:.:30 ./.:.:90:32:0,90,1247,90,1247,1247:.:40 ./.:.:30:11:0,30,450,30,450,450:.:14 ./.:22,4,0:38:.:38,0,523,104,535,639:2,20,0,4:26 ./.:.:81:30:0,81,1215,81,1215,1215:.:30 ./.:11,4,0:70:.:70,0,285,103,297,400:1,10,0,4:15 ./.:15,11,0:99:.:236,0,344,281,377,658:0,15,3,8:26 ./.:.:90:32:0,90,1350,90,1350,1350:.:37 ./.:9,7,0:99:.:165,0,209,192,230,422:0,9,0,7:16 ./.:.:72:26:0,72,992,72,992,992:.:29 ./.:5,11,0:89:.:266,0,89,281,122,403:0,5,0,11:16 ./.:.:21:7:0,21,253,21,253,253:.:10 ./.:.:93:33:0,93,1317,93,1317,1317:.:34 ./.:9,12,0:99:.:288,0,182,315,218,533:0,9,2,10:21 ./.:11,8,0:99:.:196,0,251,229,275,504:0,11,0,8:19 ./.:4,9,0:71:.:222,0,71,234,98,332:0,4,1,8:13 ./.:.:72:24:0,72,923,72,923,923:.:26 ./.:16,6,0:99:.:104,0,390,152,408,560:3,13,1,5:22 ./.:8,2,0:26:.:26,0,195,50,201,251:0,8,1,1:10 ./.:.:12:6:0,12,180,12,180,180:.:6 ./.:.:0:0:0,0,0,0,0,0:.:0 ./.:.:42:14:0,42,480,42,480,480:.:16 ./.:10,14,0:99:.:350,0,208,381,250,631:0,10,0,14:24 ./.:7,2,0:29:.:29,0,168,50,174,225:0,7,1,1:9 ./.:6,5,0:99:.:115,0,134,133,149,282:0,6,0,5:11 ./.:17,2,0:0:.:0,0,463,51,469,520:8,9,0,2:19 ./.:.:51:19:0,51,643,51,643,643:.:19 ./.:.:81:29:0,81,1215,81,1215,1215:.:30 ./.:.:18:7:0,18,270,18,270,270:.:8 ./.:18,4,0:48:.:48,0,447,102,459,561:6,12,0,4:22 ./.:5,21,0:67:.:536,0,67,551,130,681:0,5,3,18:26 ./.:4,3,0:67:.:67,0,86,79,95,174:0,4,0,3:7 ./.:.:42:15:0,42,630,42,630,630:.:17 ./.:41,6,0:33:.:33,0,1077,155,1095,1251:13,28,0,6:47 ./.:.:84:34:0,84,1260,84,1260,1260:.:34 ./.:21,9,0:99:.:166,0,528,229,555,784:4,17,0,9:30 ./.:39,7,0:63:.:63,0,983,179,1004,1184:8,31,1,6:46 ./.:11,7,0:99:.:147,0,232,179,253,433:0,11,2,5:18 ./.:.:63:22:0,63,861,63,861,861:.:22 ./.:31,6,0:58:.:58,0,771,150,789,940:6,25,2,4:37 ./.:36,3,0:27:.:0,27,969,107,978,1059:11,25,0,3:39 ./.:14,7,0:99:.:130,0,347,172,368,541:0,14,1,6:21 ./.:3,9,0:44:.:219,0,44,228,71,299:0,3,1,8:12 ./.:.:42:14:0,42,537,42,537,537:.:17 ./.:.:93:31:0,93,1115,93,1115,1115:.:33 ./.:27,5,0:53:.:53,0,690,134,705,839:2,25,0,5:32 ./.:13,4,0:63:.:63,0,343,102,355,457:0,13,1,3:17
    ./.:6,6,0:99:.:136,0,129,154,147,302:0,6,2,4:12 ./.:5,29,0:41:.:717,0,41,732,128,859:0,5,5,24:34 ./.:.:90:33:0,90,1185,90,1185,1185:.:35 ./.:21,19,0:99:.:442,0,439,504,496,1000:0,21,4,15:40 ./.:.:12:5:0,12,180,12,180,180:.:6 ./.:36,9,0:99:.:130,0,903,238,930,1168:11,25,2,7:45 ./.:11,8,0:99:.:178,0,238,211,262,473:0,11,1,7:19 ./.:12,15,0:99:.:349,0,254,385,299,684:0,12,2,13:27 ./.:2,5,0:34:.:118,0,34,124,49,173:0,2,1,4:7 ./.:8,12,0:99:.:281,0,166,305,202,507:2,6,4,8:20 ./.:25,5,0:55:.:55,0,625,130,640,769:2,23,0,5:30 ./.:19,22,0:99:.:511,0,436,568,502,1070:3,16,5,17:41 ./.:4,8,0:57:.:182,0,57,193,80,273:0,4,1,7:12 ./.:12,4,0:68:.:68,0,289,104,301,405:0,12,0,4:16 ./.:1,6,0:7:.:160,0,7,164,25,188:0,1,2,4:7 ./.:11,2,0:19:.:19,0,253,51,259,311:1,10,0,2:13 ./.:17,7,0:99:.:123,0,413,174,434,608:0,17,2,5:24 ./.:5,9,0:94:.:233,0,94,248,121,368:0,5,0,9:14 ./.:.:60:22:0,60,900,60,900,900:.:23 ./.:39,11,0:99:.:161,0,999,278,1031,1310:17,22,0,11:50 ./.:15,6,0:99:.:103,0,363,148,381,530:1,14,1,5:21 ./.:34,6,0:55:.:55,0,851,157,869,1025:11,23,0,6:40 ./.:.:30:10:0,30,378,30,378,378:.:12 ./.:.:42:14:0,42,475,42,475,475:.:14 ./.:21,6,0:91:.:91,0,545,154,563,717:5,16,0,6:27 ./.:11,8,0:99:.:178,0,252,211,276,487:0,11,1,7:19 ./.:.:90:31:0,90,1350,90,1350,1350:.:32

     

    and for info in the log of the selectvariant i get this : 

     

    10:37:16.657 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.1-d59e886
    10:37:18.433 info NativeGenomicsDB - pid=13958 tid=13959 No valid combination operation found for INFO field AS_InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    10:37:18.434 info NativeGenomicsDB - pid=13958 tid=13959 No valid combination operation found for INFO field AS_QD - the field will NOT be part of INFO fields in the generated VCF records
    10:37:18.434 info NativeGenomicsDB - pid=13958 tid=13959 No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
    10:37:18.434 info NativeGenomicsDB - pid=13958 tid=13959 No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
    10:37:18.434 info NativeGenomicsDB - pid=13958 tid=13959 No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you so much Quentin Chartreux! It looks like the lines you shared with me contain the AS_SB_TABLE, which is the annotation in question. Could you double check that the variant records all contain this annotation using grep -v for AS_SB_TABLE? If any variant lines are output, then they do not contain that annotation.

    0
    Comment actions Permalink
  • Avatar
    Karoliina Salenius

    Hi, can you please elaborate what is the issue when the variants don't contain the AS_SB_TABLE annotation. At which step would this occur and how to fix it? I'm following the allele-specific annotation and filtering of germline short variants and although the GenotypeGVCFs would run with the 4.2.0.0 -version, there were some issues with too many genotypes:

    Chromosome chr19 position 1654996 (TileDB column 2656066283) has too many genotypes in the combined VCF record : 1128 : current limit : 1024 (num_alleles, ploidy) = (47, 2). Fields, such as  PL, with length equal to the number of genotypes will NOT be added for this sample for this location.

    Is this related to the variants missing the AS_SB_TABLE annotations or a whole different issue altogether? Thanks!

    0
    Comment actions Permalink
  • Avatar
    Quentin Chartreux

    Genevieve Brandt (she/her) for which file ? 

    The raw gvcf (the output of gatk 4.2.2.0 haplotypecaller) didn't contain any Line without these annotations. 

    The vcf (output of gatk 4.2.0.0 genotypegvcf) contain a lot, if not all, of variants without these annotation. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Karoliina Salenius I think the issue with too many genotypes is an unrelated warning. You can find some forum posts from other users with the same issue for more information. I am still diagnosing the issue causing the java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 so I was just checking if there was something strange with that annotation. 

    Quentin Chartreux I was wondering about the raw GVCF, thank you for checking.

    I'll follow up again when I have more information.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Quentin Chartreux, could you upload a test GVCF so that we can look into the AS_SB_TABLE more closely? The last position in the progress meter before the exception in GenotypeGVCFs is chr2:60039632. So, if you can find a subset of the file in that region that replicates this issue, that would be ideal. Here are the instructions for uploading bug reports: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671

    Thank you!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Quentin Chartreux,

    We released a new GATK version 4.2.3.0 containing a patch that we believe will solve your issue. Could you test out the newest GATK version and let us know if it solves the issue?

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Quentin Chartreux

    Hi Genevieve Brandt (she/her)

    I just tried. It seems to be working. I just had an error on 2 interval of type out of memory so I will try to increase the memory (currently I do it in parallel on 50 intervals and I put 5 GB per interval for the job including 4 GB for the java Xmx and Xms)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Ok, let me know if you have more issues. For now I'll mark it as solved.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk