multiple errors and warnings with GenotypeGVCFs
AnsweredHi GATK team,
I'd like to thank all of you for the continuous support.
I got the below errors and I'm asking is it going to affect the downstream analysis. I have found some similar cases been raised here, but I didn't understand them clearly.
a) GATK version used
load gatk-4.2.2.0-gcc-8.4.1-ig3isjv
b) Exact GATK commands used
Below are the steps from hyplotypecaller until GenotypeGVCFs- sorry it's long
1- ### run hyplotypecaller in parallel, below is an example for read_group8###
#module load gatk-4.2.2.0-gcc-8.4.1-ig3isjv
#module load parallel/20200522
#module load samtools/1.13
#module load java/11.0.2
### Loop of individuals
#FILES=($(for i in /mnt/ursus/GROUP-sbifh3/c1845371/whole_genome/data_dog/align_out/read_group8/*.bam_dedup.bam
#do
# echo $(basename ${i%.bam_dedup.bam})
#done))
### Identify the reference (this needs to be exported to be used in parallel)
#REF=/mnt/ursus/GROUP-sbifh3/c1845371/whole_genome/C.lupus.familiaris_genome/GCF_014441545.1_ROS_Cfam_1.0_genomic.fna
#export REF
#IN_PATH=/mnt/ursus/GROUP-sbifh3/c1845371/whole_genome/data_dog/align_out/read_group8
#export IN_PATH
### Function to run HaplotypeCaller
#function parallel_call {
# gatk --java-options "-Xmx4g" HaplotypeCaller \
# -R $REF \
# -I $IN_PATH/${1}.bam_dedup.bam\
# -O /mnt/ursus/GROUP-sbifh3/c1845371/whole_genome/data_dog/align_out/read_group8/${1}.${2}.g.vcf.gz \
# -ERC GVCF \
# -L $2
#}
#export -f parallel_call
### USED: parallel_call indv region
#function bam_chromosomes {
# samtools idxstats $1 | cut -f 1 | grep -v '*'
#}
### USED: bam_chromosomes file.bam
### Apply the function in a loop
#for FILE in ${FILES[@]}
# do
# BAM=$IN_PATH/${FILE}.bam_dedup.bam
# chrom_set=`bam_chromosomes $BAM`
# parallel --verbose -j 70 parallel_call $FILE ::: ${chrom_set}
#done
########### then concatenate the x.g.vcf.gz files for each sample in each read_group and generate index (x.g.vcf.gz.tbi) and the final (x.g.vcf.gz)
### below is an example for read_group8###
module load bcftools/1.14
module load gatk-4.2.2.0-gcc-8.4.1-ig3isjv
FILES=($(for i in /mnt/ursus/GROUP-sbifh3/c1845371/whole_genome/data_dog/align_out/read_group6/*NC_051805.1.g.vcf.gz
do
echo $(basename ${i%.NC_051805.1.g.vcf.gz})
done))
for FILE in ${FILES[@]}
do
IN_PATH=/mnt/ursus/GROUP-sbifh3/c1845371/whole_genome/data_dog/align_out/read_group6
OUT=/mnt/ursus/GROUP-sbifh3/c1845371/whole_genome/data_dog/align_out/read_group/${FILE}.g.vcf.gz ### save outputs to the same place as the script
bcftools concat $IN_PATH/${FILE}*gz --threads 64 -Oz -o $OUT
gatk IndexFeatureFile -I $OUT
done
2-#module load gatk-4.2.2.0-gcc-8.4.1-ig3isjv
#gatk --java-options "-Xmx5g -Xms4g" GenomicsDBImport \
# --genomicsdb-shared-posixfs-optimizations \ ##### I add this falg to overcome the erro ([TileDB::FileSystem] Error: (write_to_file) C$
# -V read_group/123RG.g.vcf.gz \
# -V read_group/145RG.g.vcf.gz \
# -V read_group/199RG.g.vcf.gz \
# -V read_group/375RG.g.vcf.gz \
# -V read_group/376RG.g.vcf.gz \
# -V read_group/383RG.g.vcf.gz \
# -V read_group/ERR5417968RG.g.vcf.gz \
# -V read_group/ERR5417974RG.g.vcf.gz \
# -V read_group/SRR14750349RG.g.vcf.gz \
# -V read_group/SRR14750511RG.g.vcf.gz \
# -V read_group/SRR5328110RG.g.vcf.gz \
# -V read_group/UAE2RG.g.vcf.gz \
# -V read_group/VvAL09RG.g.vcf.gz \
# -V read_group/VvLY02RG.g.vcf.gz \
# --genomicsdb-workspace-path my_database14 \
# -L read_group/GCF_014441545.1_ROS_Cfam_1.0_assembly_report.bed
3- # module load gatk-4.2.2.0-gcc-8.4.1-ig3isjv
#gatk --java-options "-Xmx4g" GenotypeGVCFs \
# -R /mnt/ursus/GROUP-sbifh3/c1845371/C.lupus.familiaris_genome/GCF_014441545.1_ROS_Cfam_1.0_genomic.fna \
# -V gendb://my_database14 \
# -O allsamples14_nonfiltered.vcf
### re-run it again with "-Xmx4g -XX:ParallelGCThreads=95" and save the output as gz
module load gatk-4.2.2.0-gcc-8.4.1-ig3isjv
gatk --java-options "-Xmx4g -XX:ParallelGCThreads=95" GenotypeGVCFs \
-R /mnt/ursus/GROUP-sbifh3/c1845371/C.lupus.familiaris_genome/GCF_014441545.1_ROS_Cfam_1.0_genomic.fna \
-V gendb://my_database14 \
-O allsamples14_nonfiltered.vcf.gz
Both previuos commands produced the same errors
c) The entire error log if applicable.
The below are three different types of error, the job is still running
#Sample/Callset 123( TileDB row idx 0) at Chromosome NC_051806.1 position 79844557 (TileDB column 203158495) has too many genotypes in the combined VCF record : 1081 : current limit : 1024 (num_alleles, ploidy) = (46, 2). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
#Chromosome NC_051831.1 position 9073645 (TileDB column 1803800565) has too many alleles in the combined VCF record : 51 : current limit : 50. F
ields, such as PL, with length equal to the number of genotypes will NOT be added for this location.
#15:45:13.931 WARN MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location NC_051831.1:9073645
-
Hi Ali Basuony,
Thank you for including all of the above detail!
I have the opportunity to meet with our developers today. Can you provide me with your entire program log, including all errors/warnings/exceptions? It is essential to see everything to move ahead with troubleshooting properly.
I look forward to hearing back from you!
Best,
Anthony -
Dear Anthony,
Below is a part of the log file- can't put it all because of the large size.
I pasted the header, errors/warnings examples and the end.
Thanks
Ali
19:26:33.948 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/trinity/shared/apps/spack/opt/spack/iago/gcc-8.4.1/gatk-4.2.2.0-ig3isjvlmuyjpezxojwdlkqr3arfgnen/bin/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Nov 01, 2022 7:26:34 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
19:26:34.333 INFO GenotypeGVCFs - ------------------------------------------------------------
19:26:34.334 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.2.0
19:26:34.334 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
19:26:34.334 INFO GenotypeGVCFs - Executing as c1845371@f03-24.cluster on Linux v4.18.0-348.23.1.el8_5.x86_64 amd64
19:26:34.334 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
19:26:34.335 INFO GenotypeGVCFs - Start Date/Time: 01 November 2022 19:26:33 GMT
19:26:34.335 INFO GenotypeGVCFs - ------------------------------------------------------------
19:26:34.335 INFO GenotypeGVCFs - ------------------------------------------------------------
19:26:34.336 INFO GenotypeGVCFs - HTSJDK Version: 2.24.1
19:26:34.336 INFO GenotypeGVCFs - Picard Version: 2.25.4
19:26:34.336 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
19:26:34.336 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:26:34.336 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:26:34.336 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:26:34.336 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:26:34.336 INFO GenotypeGVCFs - Deflater: IntelDeflater
19:26:34.336 INFO GenotypeGVCFs - Inflater: IntelInflater
19:26:34.337 INFO GenotypeGVCFs - GCS max retries/reopens: 20
19:26:34.337 INFO GenotypeGVCFs - Requester pays: disabled
19:26:34.337 INFO GenotypeGVCFs - Initializing engine
19:26:35.328 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.4.1-d59e886
19:26:40.770 info NativeGenomicsDB - pid=3504469 tid=3504470 No valid combination operation found for INFO field InbreedingCoeff - the field will NOT be part of INFO fields in the generated VCF records
19:26:40.771 info NativeGenomicsDB - pid=3504469 tid=3504470 No valid combination operation found for INFO field MLEAC - the field will NOT be part of INFO fields in the generated VCF records
19:26:40.771 info NativeGenomicsDB - pid=3504469 tid=3504470 No valid combination operation found for INFO field MLEAF - the field will NOT be part of INFO fields in the generated VCF records
19:26:42.844 INFO GenotypeGVCFs - Done initializing engine
19:26:42.931 INFO ProgressMeter - Starting traversal
19:26:42.931 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
19:26:58.735 INFO ProgressMeter - NC_051805.1:8839 0.3 11000 41766.9
19:27:08.907 INFO ProgressMeter - NC_051805.1:122036 0.4 110000 254080.7
19:27:19.005 INFO ProgressMeter - NC_051805.1:250356 0.6 211000 350955.0
19:27:29.120 INFO ProgressMeter - NC_051805.1:353179 0.8 306000 397497.2
19:27:39.168 INFO ProgressMeter - NC_051805.1:482364 0.9 414000 441702.1
19:27:49.295 INFO ProgressMeter - NC_051805.1:564305 1.1 487000 440299.0
19:27:59.528 INFO ProgressMeter - NC_051805.1:679021 1.3 595000 466081.8
19:28:09.546 INFO ProgressMeter - NC_051805.1:781842 1.4 692000 479362.7
19:28:19.593 INFO ProgressMeter - NC_051805.1:899114 1.6 803000 498437.9
19:28:29.821 INFO ProgressMeter - NC_051805.1:981012 1.8 880000 493965.8
19:28:39.884 INFO ProgressMeter - NC_051805.1:1106443 1.9 997000 511487.5
19:28:50.825 INFO ProgressMeter - NC_051805.1:1221551 2.1 1102000 516994.7
19:29:00.846 INFO ProgressMeter - NC_051805.1:1337174 2.3 1211000 526846.2
19:29:13.275 INFO ProgressMeter - NC_051805.1:1448565 2.5 1316000 525199.0
19:29:23.315 INFO ProgressMeter - NC_051805.1:1568619 2.7 1429000 534592.0
19:29:33.929 INFO ProgressMeter - NC_051805.1:1681133 2.8 1531000 537199.3
19:29:43.953 INFO ProgressMeter - NC_051805.1:1806357 3.0 1640000 543580.3
19:29:56.519 INFO ProgressMeter - NC_051805.1:1919347 3.2 1745000 540839.3
19:30:06.669 INFO ProgressMeter - NC_051805.1:2035728 3.4 1855000 546292.5
19:30:17.706 INFO ProgressMeter - NC_051805.1:2150552 3.6 1962000 548108.5
19:30:27.739 INFO ProgressMeter - NC_051805.1:2270324 3.7 2075000 553805.9
19:30:40.137 INFO ProgressMeter - NC_051805.1:2380422 4.0 2179000 551166.5
19:30:50.163 INFO ProgressMeter - NC_051805.1:2503177 4.1 2293000 556481.4
19:31:01.174 INFO ProgressMeter - NC_051805.1:2618161 4.3 2400000 557616.5
19:31:11.186 INFO ProgressMeter - NC_051805.1:2731516 4.5 2507000 560735.1
19:31:24.027 INFO ProgressMeter - NC_051805.1:2848293 4.7 2617000 558601.2
19:31:34.104 INFO ProgressMeter - NC_051805.1:2975404 4.9 2730000 562554.1
19:31:44.918 INFO ProgressMeter - NC_051805.1:3086961 5.0 2835000 563269.3
19:31:54.978 INFO ProgressMeter - NC_051805.1:3199254 5.2 2941000 565491.7
19:32:07.694 INFO ProgressMeter - NC_051805.1:3316637 5.4 3052000 563857.3
19:32:17.811 INFO ProgressMeter - NC_051805.1:3442474 5.6 3169000 567785.5
19:32:28.243 INFO ProgressMeter - NC_051805.1:3553268 5.8 3272000 568529.3
19:32:38.358 INFO ProgressMeter - NC_051805.1:3678216 5.9 3386000 571616.7
19:32:50.954 INFO ProgressMeter - NC_051805.1:3801569 6.1 3495000 569801.3
19:33:01.015 INFO ProgressMeter - NC_051805.1:3929851 6.3 3611000 573047.3
19:33:11.504 INFO ProgressMeter - NC_051805.1:4039482 6.5 3715000 573637.4
19:33:21.553 INFO ProgressMeter - NC_051805.1:4156779 6.6 3826000 575883.9
19:33:33.846 INFO ProgressMeter - NC_051805.1:4266788 6.8 3930000 573842.7
19:33:43.888 INFO ProgressMeter - NC_051805.1:4391835 7.0 4047000 576829.9
19:33:54.515 INFO ProgressMeter - NC_051805.1:4506662 7.2 4154000 577500.6
19:34:04.558 INFO ProgressMeter - NC_051805.1:4630306 7.4 4269000 579991.7
19:34:16.836 INFO ProgressMeter - NC_051805.1:4742709 7.6 4374000 578182.7
19:34:26.845 INFO ProgressMeter - NC_051805.1:4861684 7.7 4486000 580193.7
19:34:37.983 INFO ProgressMeter - NC_051805.1:4976380 7.9 4594000 580231.2
19:34:48.084 INFO ProgressMeter - NC_051805.1:5094662 8.1 4705000 581879.5
19:34:58.110 INFO ProgressMeter - NC_051805.1:5213792 8.3 4813000 583183.1
19:35:08.159 INFO ProgressMeter - NC_051805.1:5300866 8.4 4893000 581084.2
19:35:18.195 INFO ProgressMeter - NC_051805.1:5421381 8.6 5004000 582692.7
19:35:28.204 INFO ProgressMeter - NC_051805.1:5527649 8.8 5103000 582896.9
19:35:38.221 INFO ProgressMeter - NC_051805.1:5645911 8.9 5213000 584318.8
19:35:48.291 INFO ProgressMeter - NC_051805.1:5730923 9.1 5292000 582220.9
19:35:58.381 INFO ProgressMeter - NC_051805.1:5852033 9.3 5405000 583850.9
19:36:08.431 INFO ProgressMeter - NC_051805.1:5956883 9.4 5503000 583872.7
19:36:18.495 INFO ProgressMeter - NC_051805.1:6080440 9.6 5618000 585651.6
19:36:28.990 INFO ProgressMeter - NC_051805.1:6167973 9.8 5700000 583560.0
19:36:39.126 INFO ProgressMeter - NC_051805.1:6286890 9.9 5810000 584708.0
19:36:50.500 INFO ProgressMeter - NC_051805.1:6407338 10.1 5921000 584723.7
19:37:00.607 INFO ProgressMeter - NC_051805.1:6525912 10.3 6030000 585744.0
19:37:10.778 INFO ProgressMeter - NC_051805.1:6650051 10.5 6140000 586768.1
19:37:20.901 INFO ProgressMeter - NC_051805.1:6738603 10.6 6216000 584604.3
19:37:30.970 INFO ProgressMeter - NC_051805.1:6868366 10.8 6324000 585520.3
19:37:41.008 INFO ProgressMeter - NC_051805.1:6969147 11.0 6417000 585069.2
19:37:51.045 INFO ProgressMeter - NC_051805.1:7096420 11.1 6526000 586067.6
19:38:01.050 INFO ProgressMeter - NC_051805.1:7177819 11.3 6602000 584145.3
19:38:11.171 INFO ProgressMeter - NC_051805.1:7300275 11.5 6715000 585406.3
19:38:21.271 INFO ProgressMeter - NC_051805.1:7404605 11.6 6812000 585273.6
19:38:31.347 INFO ProgressMeter - NC_051805.1:7525881 11.8 6921000 586181.0
19:38:44.154 INFO ProgressMeter - NC_051805.1:7641953 12.0 7029000 584756.7
19:38:54.263 INFO ProgressMeter - NC_051805.1:7759402 12.2 7139000 585699.2
19:39:05.471 INFO ProgressMeter - NC_051805.1:7873101 12.4 7245000 585423.0
19:39:15.484 INFO ProgressMeter - NC_051805.1:7991077 12.5 7355000 586403.9
19:39:28.200 INFO ProgressMeter - NC_051805.1:8110016 12.8 7466000 585362.8
19:39:38.332 INFO ProgressMeter - NC_051805.1:8233886 12.9 7581000 586612.6
19:39:49.390 INFO ProgressMeter - NC_051805.1:8349212 13.1 7688000 586527.7
19:39:59.418 INFO ProgressMeter - NC_051805.1:8467567 13.3 7799000 587504.9
19:40:12.468 INFO ProgressMeter - NC_051805.1:8588682 13.5 7910000 586261.0
19:40:22.523 INFO ProgressMeter - NC_051805.1:8707323 13.7 8021000 587194.6
19:40:33.859 INFO ProgressMeter - NC_051805.1:8823957 13.8 8130000 587054.5
19:40:43.890 INFO ProgressMeter - NC_051805.1:8942068 14.0 8239000 587828.9
19:40:53.935 INFO ProgressMeter - NC_051805.1:9065884 14.2 8349000 588646.6
19:41:03.972 INFO ProgressMeter - NC_051805.1:9156337 14.4 8428000 587289.1
19:41:13.983 INFO ProgressMeter - NC_051805.1:9272703 14.5 8536000 587978.7
19:41:24.104 INFO ProgressMeter - NC_051805.1:9374319 14.7 8631000 587693.9
19:41:34.220 INFO ProgressMeter - NC_051805.1:9496274 14.9 8739000 588294.0
19:41:44.237 INFO ProgressMeter - NC_051805.1:9578597 15.0 8816000 586881.7
19:41:54.264 INFO ProgressMeter - NC_051805.1:9692651 15.2 8923000 587469.1
19:42:04.364 INFO ProgressMeter - NC_051805.1:9792544 15.4 9015000 587020.4
19:42:14.406 INFO ProgressMeter - NC_051805.1:9906315 15.5 9121000 587520.4
19:42:24.484 INFO ProgressMeter - NC_051805.1:10020656 15.7 9227000 587986.6
19:42:34.503 INFO ProgressMeter - NC_051805.1:10100616 15.9 9302000 586524.2
19:42:44.610 INFO ProgressMeter - NC_051805.1:10218147 16.0 9409000 587036.4
19:42:54.627 INFO ProgressMeter - NC_051805.1:10319036 16.2 9502000 586726.7
19:43:04.679 INFO ProgressMeter - NC_051805.1:10431416 16.4 9606000 587075.3
19:43:14.824 INFO ProgressMeter - NC_051805.1:10513584 16.5 9679000 585486.5
19:43:24.839 INFO ProgressMeter - NC_051805.1:10632780 16.7 9786000 586041.8
19:43:34.852 INFO ProgressMeter - NC_051805.1:10748405 16.9 9891000 586468.7
19:43:44.947 INFO ProgressMeter - NC_051805.1:10847245 17.0 9983000 586076.9
19:43:54.994 INFO ProgressMeter - NC_051805.1:10966221 17.2 10093000 586767.1
19:44:05.040 INFO ProgressMeter - NC_051805.1:11046568 17.4 10168000 585428.2
19:44:15.102 INFO ProgressMeter - NC_051805.1:11165379 17.5 10274000 585874.3
19:44:25.207 INFO ProgressMeter - NC_051805.1:11271945 17.7 10370000 585723.5
19:44:35.227 INFO ProgressMeter - NC_051805.1:11391362 17.9 10478000 586293.3
19:44:46.236 INFO ProgressMeter - NC_051805.1:11481176 18.1 10561000 584932.2
19:44:56.291 INFO ProgressMeter - NC_051805.1:11600519 18.2 10673000 585699.7
19:45:07.674 INFO ProgressMeter - NC_051805.1:11716482 18.4 10782000 585584.2
19:45:17.675 INFO ProgressMeter - NC_051805.1:11832917 18.6 10890000 586143.5
19:45:27.758 INFO ProgressMeter - NC_051805.1:11949663 18.7 10997000 586596.9
19:45:37.797 INFO ProgressMeter - NC_051805.1:12031825 18.9 11073000 585425.9
19:45:47.822 INFO ProgressMeter - NC_051805.1:12149829 19.1 11179000 585854.9
19:45:57.924 INFO ProgressMeter - NC_051805.1:12257316 19.2 11274000 585665.9
19:46:07.981 INFO ProgressMeter - NC_051805.1:12376401 19.4 11382000 586172.3
19:46:18.033 INFO ProgressMeter - NC_051805.1:12458521 19.6 11459000 585089.6
19:46:28.140 INFO ProgressMeter - NC_051805.1:12580482 19.8 11567000 585567.6
19:46:38.861 INFO ProgressMeter - NC_051805.1:12687493 19.9 11666000 585285.1
19:46:48.929 INFO ProgressMeter - NC_051805.1:12810828 20.1 11776000 585871.6
19:46:59.007 INFO ProgressMeter - NC_051805.1:12928227 20.3 11886000 586443.6
19:47:09.026 INFO ProgressMeter - NC_051805.1:13011188 20.4 11963000 585419.6
19:47:19.080 INFO ProgressMeter - NC_051805.1:13124027 20.6 12069000 585803.2
19:47:29.134 INFO ProgressMeter - NC_051805.1:13223427 20.8 12163000 585602.8
19:47:39.192 INFO ProgressMeter - NC_051805.1:13341141 20.9 12272000 586120.7
19:47:49.265 INFO ProgressMeter - NC_051805.1:13421376 21.1 12347000 585012.0
19:47:59.314 INFO ProgressMeter - NC_051805.1:13541217 21.3 12457000 585576.6
19:48:09.429 INFO ProgressMeter - NC_051805.1:13643837 21.4 12554000 585496.4
19:48:19.500 INFO ProgressMeter - NC_051805.1:13763328 21.6 12667000 586177.8
19:48:32.096 INFO ProgressMeter - NC_051805.1:13869364 21.8 12767000 585121.1
19:48:42.150 INFO ProgressMeter - NC_051805.1:13992653 22.0 12883000 585937.6
19:48:52.876 INFO ProgressMeter - NC_051805.1:14103827 22.2 12988000 585949.5
19:49:02.888 INFO ProgressMeter - NC_051805.1:14216208 22.3 13094000 586317.8
19:49:12.901 INFO ProgressMeter - NC_051805.1:14330762 22.5 13201000 586724.1
19:49:22.989 INFO ProgressMeter - NC_051805.1:14411398 22.7 13276000 585680.9
19:49:33.031 INFO ProgressMeter - NC_051805.1:14531062 22.8 13388000 586293.0
19:49:43.047 INFO ProgressMeter - NC_051805.1:14637729 23.0 13486000 586298.5
19:49:53.059 INFO ProgressMeter - NC_051805.1:14753210 23.2 13592000 586651.0
19:50:03.074 INFO ProgressMeter - NC_051805.1:14830403 23.3 13665000 585583.0
19:50:13.137 INFO ProgressMeter - NC_051805.1:14953315 23.5 13773000 585999.5
19:50:23.196 INFO ProgressMeter - NC_051805.1:15052352 23.7 13865000 585735.8
19:50:33.219 INFO ProgressMeter - NC_051805.1:15172673 23.8 13978000 586371.4
19:50:45.827 INFO ProgressMeter - NC_051805.1:15286007 24.0 14085000 585697.5
19:50:55.853 INFO ProgressMeter - NC_051805.1:15407212 24.2 14199000 586363.2
19:51:06.824 INFO ProgressMeter - NC_051805.1:15518299 24.4 14304000 586272.4
19:51:16.876 INFO ProgressMeter - NC_051805.1:15633268 24.6 14410000 586589.1
19:51:26.975 INFO ProgressMeter - NC_051805.1:15748100 24.7 14517000 586923.3
19:51:37.123 INFO ProgressMeter - NC_051805.1:15831755 24.9 14594000 586029.1
19:51:47.238 INFO ProgressMeter - NC_051805.1:15944024 25.1 14699000 586277.0
19:51:57.314 INFO ProgressMeter - NC_051805.1:16042889 25.2 14791000 586020.8
19:52:07.427 INFO ProgressMeter - NC_051805.1:16156030 25.4 14897000 586305.2
19:52:17.509 INFO ProgressMeter - NC_051805.1:16235551 25.6 14971000 585346.6
19:52:27.540 INFO ProgressMeter - NC_051805.1:16348167 25.7 15074000 585546.2
19:52:37.582 INFO ProgressMeter - NC_051805.1:16458382 25.9 15178000 585777.8
19:52:47.606 INFO ProgressMeter - NC_051805.1:16556305 26.1 15270000 585552.9
19:52:57.618 INFO ProgressMeter - NC_051805.1:16675758 26.2 15377000 585906.9
19:53:07.624 INFO ProgressMeter - NC_051805.1:16751616 26.4 15449000 584933.5
19:53:17.631 INFO ProgressMeter - NC_051805.1:16857426 26.6 15549000 585025.4
19:53:27.662 INFO ProgressMeter - NC_051805.1:16950038 26.7 15636000 584621.3
19:53:37.752 INFO ProgressMeter - NC_051805.1:17059472 26.9 15738000 584758.7
19:53:48.939 INFO ProgressMeter - NC_051805.1:17153082 27.1 15826000 583982.4
19:53:58.994 INFO ProgressMeter - NC_051805.1:17264851 27.3 15928000 584134.3
19:54:08.995 INFO ProgressMeter - NC_051805.1:17374433 27.4 16031000 584339.4
19:54:19.019 INFO ProgressMeter - NC_051805.1:17468046 27.6 16119000 583990.7
19:54:29.136 INFO ProgressMeter - NC_051805.1:17577732 27.8 16223000 584189.8
19:54:39.156 INFO ProgressMeter - NC_051805.1:17656092 27.9 16297000 583346.5
19:54:49.241 INFO ProgressMeter - NC_051805.1:17762612 28.1 16398000 583451.8
19:54:59.334 INFO ProgressMeter - NC_051805.1:17856713 28.3 16487000 583128.3
19:55:09.385 INFO ProgressMeter - NC_051805.1:17964329 28.4 16589000 583279.7
19:55:21.542 INFO ProgressMeter - NC_051805.1:18063894 28.6 16683000 582435.5
19:55:31.616 INFO ProgressMeter - NC_051805.1:18175345 28.8 16788000 582685.7
19:55:41.616 INFO ProgressMeter - NC_051805.1:18282188 29.0 16888000 582785.3
19:55:51.735 INFO ProgressMeter - NC_051805.1:18383074 29.1 16983000 582672.5
19:56:01.737 INFO ProgressMeter - NC_051805.1:18494052 29.3 17088000 582940.9
19:56:11.815 INFO ProgressMeter - NC_051805.1:18574255 29.5 17164000 582197.6
19:56:21.935 INFO ProgressMeter - NC_051805.1:18681273 29.7 17265000 582292.1
19:56:32.011 INFO ProgressMeter - NC_051805.1:18786003 29.8 17361000 582232.2
19:56:42.084 INFO ProgressMeter - NC_051805.1:18902210 30.0 17470000 582607.5
19:56:52.826 INFO ProgressMeter - NC_051805.1:18993645 30.2 17556000 582000.6
19:57:02.917 INFO ProgressMeter - NC_051805.1:19116000 30.3 17667000 582433.1
19:57:14.103 INFO ProgressMeter - NC_051805.1:19229118 30.5 17774000 582381.1#Sample/Callset 123( TileDB row idx 0) at Chromosome NC_051806.1 position 79844557 (TileDB column 203158495) has too many genotypes in the combined VCF record : 1081 : current limit : 1024 (num_alleles, ploidy) = (46, 2). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
#Chromosome NC_051831.1 position 9073645 (TileDB column 1803800565) has too many alleles in the combined VCF record : 51 : current limit : 50. F
ields, such as PL, with length equal to the number of genotypes will NOT be added for this location.#15:45:13.931 WARN MinimalGenotypingEngine - Attempting to genotype more than 50 alleles. Site will be skipped at location NC_051831.1:9073645
04:51:26.622 INFO ProgressMeter - NW_024010775.1:1225 3444.7 2129757000 618265.6
04:51:36.704 INFO ProgressMeter - NW_024010778.1:99257 3444.9 2129838000 618259.0
GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),96013.77429798454,Cpu time(s),93259.58436876263
04:51:37.455 INFO ProgressMeter - NW_024010778.1:109746 3444.9 2129843767 618258.4
04:51:37.455 INFO ProgressMeter - Traversal complete. Processed 2129843767 total variants in 3444.9 minutes.
04:51:37.585 INFO GenotypeGVCFs - Shutting down engine
[04 November 2022 04:51:37 GMT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 3,445.06 minutes.
Runtime.totalMemory()=3689938944
Using GATK jar /trinity/shared/apps/spack/opt/spack/iago/gcc-8.4.1/gatk-4.2.2.0-ig3isjvlmuyjpezxojwdlkqr3arfgnen/bin/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -XX:ParallelGCThreads=95 -jar /trinity/shared/apps/spack/opt/spack/iago/gcc-8.4.1/gatk-4.2.2.0-ig3isjvlmuyjpezxojwdlkqr3arfgnen/bin/gatk-package-4.2.2.0-local.jar GenotypeGVCFs -R /mnt/ursus/GROUP-sbifh3/c1845371/C.lupus.familiaris_genome/GCF_014441545.1_ROS_Cfam_1.0_genomic.fna -V gendb://my_database14 -O allsamples14_nonfiltered.vcf.gz -
Hi Ali Basuony,
Thank you for your much-appreciated patience while running some diagnostics! I have some feedback from our developers. The following is the exact advice I received from the developers. It is complex, so I kept their language unaltered. Sorry for the long read!
- "First, it seems the root cause of the messages seen by the user are coming from some highly variable regions in the genome they're studying. In particular, there are just warnings about those (two, it seems) sites where the computations became too difficult so some corners were cut at those locations alone. I'll say more about those later, but my inclination is to say that if the user is intending to use this data for whole genome analysis, it's probably safe to ignore the misbehaving at just those sites and still use the overall data set. But it's up to the user to decide if it makes scientific sense for their application to ignore those 2 sites and document that carefully going forward.
- The two sites have slightly different extremes that caused the warnings to appear. I'll link to some source code for the version of GATK they're using when appropriate. Both cases seem to stem from a site where there are too many possible alternate alleles across the samples being combined (which makes me believe these are "noisy" regions across the population).
- In the first warning, at Chromosome NC_051806.1 position 79844557, we're told (num_alleles, ploidy) = (46, 2). Adding in a ref allele, we have 47 options, so 47 choose 2 ( = 47 * 46 / 2 ) = 1081 options, which tracks with the number of possible genotypes at the site. The default for the tool is 1024 according to the docs (https://gatk.broadinstitute.org/hc/en-us/articles/4405451397659-GenotypeGVCFs#--max-genotype-count). As a consequence, it seems the tool is going to skip sample-specific format fields besides genotypes here, like PL, but it seems to imply it'll still write genotype info for each sample, so not too much info is lost.
- The other site, at NC_051831.1:9073645, seems to hit another upper limit in the tool. In this case, there were 51 alt alleles across the samples, which just crosses over the limit of 50 for the tool. The messages seems to imply this is a stronger limit, and the entire site is skipped, including genotypes. So technically I'd guess the output VCF wouldn't have an entry there (though the user could confirm), so the user should be aware there was lots of variation in their population at that site despite being blank. This one actually isn't controlled by the max-alternate-alleles flag here (https://gatk.broadinstitute.org/hc/en-us/articles/4405451397659-GenotypeGVCFs#--max-alternate-alleles), but rather deeper in the codebase in htsjdk (here https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/variant/variantcontext/GenotypeLikelihoods.java#L61). That makes me think this is probably tough to access as an end user as it's not clearly exposed at the top level.
- In the end, I still recommend for most use cases to safely ignore those two sites but make a note in their analysis as to what happened in the logs. In the former case, they could try tweaking that max-genotype-count to go above 1081, but this seems to be highly recommended against in the docs. In the latter case, it's not clear to me that toggling the max-alternate-allele flag would actually bump their limit over 50 since this seems to be hardcoded deeper in htsjdk."
I hope this helps! If additional clarity is needed, please do not hesitate to reach back out. Thank you again for being such a valued member of the GATK community.
Best,
Anthony
-
Dear Anthony,
Thanks very much for the detailed response.
I actually ignored all of those warning and error messages as I expected them to be of less effect and just lossing vew varinats.
Kind regards,
Ali
-
Hi Ali Basuony,
Thank you for following up! I'm glad you found a workaround that allowed you to continue forward.
Please do not hesitate to reach out for anything else in the future!
Best,
Anthony
-
Hello Anthony DiCi I am having the same problem as Ali in many sites. I am working with merged samples (samtools merge) product of 2 RADseq of the same libraries for 352 samples of tetraploids. This because we needed to increase the read depth.
Sample/Callset 016( TileDB row idx 0) at Chromosome Chromosome_1 position 5490482 (TileDB column 1610426421) has too many genotypes in the combined VCF record : 1820 : current limit : 1024 (num_alleles, ploidy) = (13, 4). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
Sample/Callset 018( TileDB row idx 1) at Chromosome Chromosome_1 position 5490482 (TileDB column 1610426421) has too many genotypes in the combined VCF record : 1820 : current limit : 1024 (num_alleles, ploidy) = (13, 4). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
Sample/Callset 021( TileDB row idx 2) at Chromosome Chromosome_1 position 5490482 (TileDB column 1610426421) has too many genotypes in the combined VCF record : 1820 : current limit : 1024 (num_alleles, ploidy) = (13, 4). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.
Sample/Callset 022( TileDB row idx 3) at Chromosome Chromosome_1 position 5490482 (TileDB column 1610426421) has too many genotypes in the combined VCF record : 1820 : current limit : 1024 (num_alleles, ploidy) = (13, 4). Fields, such as PL, with length equal to the number of genotypes will NOT be added for this sample for this location.I found kind of an answer here https://gatk.broadinstitute.org/hc/en-us/community/posts/360072168712-GenomicsDBImport-Attempting-to-genotype-more-than-50-alleles?page=1#community_comment_360012343671, but I still don't understand the error. I even have 7315 genotypes from 4 possible genotypes as tetraploids from (19,4)? I don't understand where the 7315 comes from and if I should try another germline for genotype calling...
Thanks!
-
Hi Paula Andrea Espitia Buitrago
We provided an answer to your question in the other topic. You can check it from there.
Post is closed for comments.
7 comments