CombineGVCFs
AnsweredSo I noticed I was having trouble combining my g.vcf files, which is saying my index is out of bounds. My HaplotypeCaller command seemed to work fine and all of these codes work fine when I use amplicons as my reference which lends me to believe the index is indeed the issue. Since I'm working with barley were some chromosomes are close to 700 Mb, is there a max that CreateSequenceDictionary can handle? If so, is there an option to extend this like the -c option in samtools faidx? Maybe there is some other issue I'm not aware of
GATK v4.2.3.0
Code used:
ref=CI5791
project=ExomeSNPs
java -jar /mnt/d/picard.jar CreateSequenceDictionary R=${ref}.fasta O=${ref}.dict
ls *.markdup.bam | parallel --eta -j 20 "gatk --java-options "-Xmx4g" HaplotypeCaller -R ${ref}.fasta -ploidy 2 -I {} -O {.}.g.vcf -ERC GVCF"
ls -d -1 $PWD/*.g.vcf > gvcf.list
gatk CombineGVCFs --java-options "-Xmx20g" -R ${ref}.fasta --variant gvcf.list -OVI -O ${project}.g.vcf.gz
Error log from indexing:
Using GATK jar /home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar CreateSequenceDictionary -R CI5791.fasta
INFO 2022-01-11 16:18:07 CreateSequenceDictionary Output dictionary will be written in CI5791.dict
16:18:07.216 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Jan 11 16:18:07 PST 2022] CreateSequenceDictionary --REFERENCE CI5791.fasta --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Jan 11, 2022 4:18:07 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Tue Jan 11 16:18:07 PST 2022] Executing as barlex@DESKTOP-VDKQE0G on Linux 4.4.0-19041-Microsoft amd64; OpenJDK 64-Bit Server VM 11.0.11+9-Ubuntu-0ubuntu2.20.04; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.2.3.0
[Tue Jan 11 16:18:07 PST 2022] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2155872256
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
picard.PicardException: /mnt/d/Data_Shaun/CI9819_Tifang_Exome/CI5791.dict already exists. Delete this file and try again, or specify a different output file.
at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:220)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Error log from CombineGVCFs
Using GATK jar /home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx12g -jar /home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar CombineGVCFs -R CI5791.fasta --variant gvcf.list -OVI -O ExomeSNPs.g.vcf.gz
10:24:42.447 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 11, 2022 10:24:42 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:24:42.598 INFO CombineGVCFs - ------------------------------------------------------------
10:24:42.598 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.2.3.0
10:24:42.599 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
10:24:42.600 INFO CombineGVCFs - Executing as barlex@DESKTOP-VDKQE0G on Linux v4.4.0-19041-Microsoft amd64
10:24:42.600 INFO CombineGVCFs - Java runtime: OpenJDK 64-Bit Server VM v11.0.11+9-Ubuntu-0ubuntu2.20.04
10:24:42.601 INFO CombineGVCFs - Start Date/Time: January 11, 2022 at 10:24:42 AM PST
10:24:42.601 INFO CombineGVCFs - ------------------------------------------------------------
10:24:42.602 INFO CombineGVCFs - ------------------------------------------------------------
10:24:42.603 INFO CombineGVCFs - HTSJDK Version: 2.24.1
10:24:42.603 INFO CombineGVCFs - Picard Version: 2.25.4
10:24:42.603 INFO CombineGVCFs - Built for Spark Version: 2.4.5
10:24:42.606 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:24:42.607 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:24:42.607 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:24:42.608 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:24:42.609 INFO CombineGVCFs - Deflater: IntelDeflater
10:24:42.609 INFO CombineGVCFs - Inflater: IntelInflater
10:24:42.610 INFO CombineGVCFs - GCS max retries/reopens: 20
10:24:42.610 INFO CombineGVCFs - Requester pays: disabled
10:24:42.611 INFO CombineGVCFs - Initializing engine
10:24:42.758 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CI9819_Tifang_Exome/687.markdup.g.vcf
10:24:42.885 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CI9819_Tifang_Exome/741.markdup.g.vcf
10:24:42.931 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CI9819_Tifang_Exome/797.markdup.g.vcf
10:24:42.987 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CI9819_Tifang_Exome/805.markdup.g.vcf
10:24:43.027 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CI9819_Tifang_Exome/855.markdup.g.vcf
10:24:43.074 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CI9819_Tifang_Exome/CI9819.markdup.g.vcf
10:24:43.122 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CI9819_Tifang_Exome/Tifang.markdup.g.vcf
10:24:44.031 INFO CombineGVCFs - Done initializing engine
10:24:44.094 INFO ProgressMeter - Starting traversal
10:24:44.095 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
10:24:44.182 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location Chr1:15932 the annotation MLEAC=[1, 0] was not a numerical value and was ignored
10:24:54.103 INFO ProgressMeter - Chr1:10412521 0.2 382000 2290167.9
10:25:04.117 INFO ProgressMeter - Chr1:34979984 0.3 961000 2879976.0
10:25:14.124 INFO ProgressMeter - Chr1:68240645 0.5 1572000 3140963.7
10:25:24.138 INFO ProgressMeter - Chr1:101442963 0.7 2192000 3284469.2
10:25:34.152 INFO ProgressMeter - Chr1:143849857 0.8 2814000 3372954.8
10:25:44.153 INFO ProgressMeter - Chr1:187793920 1.0 3443000 3439675.0
10:25:54.159 INFO ProgressMeter - Chr1:229011678 1.2 4096000 3507650.1
10:26:04.172 INFO ProgressMeter - Chr1:269802217 1.3 4733000 3546380.9
10:26:14.185 INFO ProgressMeter - Chr1:305531363 1.5 5357000 3567765.6
10:26:24.193 INFO ProgressMeter - Chr1:336536937 1.7 5936000 3558148.6
10:26:34.199 INFO ProgressMeter - Chr1:365788383 1.8 6564000 3576981.8
10:26:44.202 INFO ProgressMeter - Chr1:394287316 2.0 7182000 3587800.9
10:26:54.206 INFO ProgressMeter - Chr1:421227692 2.2 7796000 3595084.2
10:27:04.214 INFO ProgressMeter - Chr1:447733724 2.3 8445000 3616211.9
10:27:14.229 INFO ProgressMeter - Chr1:467581320 2.5 9104000 3638349.7
10:27:24.244 INFO ProgressMeter - Chr1:485451299 2.7 9746000 3651349.7
10:27:34.251 INFO ProgressMeter - Chr2:11269587 2.8 10362000 3653823.6
10:27:44.263 INFO ProgressMeter - Chr2:30804733 3.0 11012000 3667243.9
10:27:54.270 INFO ProgressMeter - Chr2:56582991 3.2 11646000 3674300.0
10:28:04.274 INFO ProgressMeter - Chr2:84467144 3.3 12248000 3671114.4
10:28:14.282 INFO ProgressMeter - Chr2:113217763 3.5 12854000 3669304.0
10:28:24.294 INFO ProgressMeter - Chr2:144915964 3.7 13442000 3662686.9
10:28:34.306 INFO ProgressMeter - Chr2:173148897 3.8 14026000 3655602.9
10:28:44.314 INFO ProgressMeter - Chr2:203171617 4.0 14614000 3650184.4
10:28:54.313 INFO ProgressMeter - Chr2:245246776 4.2 15245000 3655612.3
10:29:04.315 INFO ProgressMeter - Chr2:286852990 4.3 15896000 3665206.4
10:29:14.320 INFO ProgressMeter - Chr2:331479585 4.5 16549000 3674493.5
10:29:24.320 INFO ProgressMeter - Chr2:365795597 4.7 17164000 3675046.8
10:29:34.335 INFO ProgressMeter - Chr2:396385179 4.8 17749000 3669170.3
10:29:44.336 INFO ProgressMeter - Chr2:430947367 5.0 18357000 3668453.0
10:29:54.342 INFO ProgressMeter - Chr2:459482982 5.2 18933000 3661534.2
10:30:04.352 INFO ProgressMeter - Chr2:486890195 5.3 19524000 3657812.3
10:30:14.363 INFO ProgressMeter - Chr2:515708914 5.5 20109000 3653215.0
10:30:23.366 INFO CombineGVCFs - Shutting down engine
[January 11, 2022 at 10:30:23 AM PST] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 5.68 minutes.
Runtime.totalMemory()=5450498048
java.lang.ArrayIndexOutOfBoundsException: Index 32770 out of bounds for length 32770
at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:142)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeIndex(TabixIndexCreator.java:129)
at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:177)
at htsjdk.variant.variantcontext.writer.VCFWriter.close(VCFWriter.java:233)
at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.closeTool(CombineGVCFs.java:514)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1091)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
-
Hi Shaun Clare,
We have seen this issue before with long chromosomes and creating the index for the VCF output. Another user found a workaround, which is to work with unzipped VCF files. So, try making your input and output files unzipped because then the .tbi index is not mandatory.
Let me know if this works for you or you have any further questions.
Best,
Genevieve
-
Thank you, that seems to have worked!
-
Great! Thanks for the update!
-
I do have another issue on another project. This one uses the exact same codes but it's on 96 samples called against a much smaller amplicon based reference.
gatk CombineGVCFs --java-options "-Xmx20g" -R ${ref}.fasta --variant gvcf.list -O ${project}.g.vcf.gz
Though this time it never gets passed initializing the engine, it just reads all of the samples as soon below and stalls (only copied in the first few, but it reads all 96):
Using GATK jar /home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx20g -jar /home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar CombineGVCFs -R 50k.fasta --variant gvcf.list -O CIxGP_SNPs.g.vcf.gz
09:11:30.941 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jan 14, 2022 9:11:31 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:11:31.089 INFO CombineGVCFs - ------------------------------------------------------------
09:11:31.089 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.2.3.0
09:11:31.089 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
09:11:31.090 INFO CombineGVCFs - Executing as barlex@DESKTOP-VDKQE0G on Linux v4.4.0-19041-Microsoft amd64
09:11:31.090 INFO CombineGVCFs - Java runtime: OpenJDK 64-Bit Server VM v11.0.11+9-Ubuntu-0ubuntu2.20.04
09:11:31.090 INFO CombineGVCFs - Start Date/Time: January 14, 2022 at 9:11:30 AM PST
09:11:31.090 INFO CombineGVCFs - ------------------------------------------------------------
09:11:31.090 INFO CombineGVCFs - ------------------------------------------------------------
09:11:31.091 INFO CombineGVCFs - HTSJDK Version: 2.24.1
09:11:31.091 INFO CombineGVCFs - Picard Version: 2.25.4
09:11:31.091 INFO CombineGVCFs - Built for Spark Version: 2.4.5
09:11:31.091 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:11:31.091 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:11:31.091 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:11:31.092 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:11:31.092 INFO CombineGVCFs - Deflater: IntelDeflater
09:11:31.092 INFO CombineGVCFs - Inflater: IntelInflater
09:11:31.092 INFO CombineGVCFs - GCS max retries/reopens: 20
09:11:31.092 INFO CombineGVCFs - Requester pays: disabled
09:11:31.092 INFO CombineGVCFs - Initializing engine
09:11:32.042 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_1_S50_R1_001.sorted.g.vcf
09:11:32.507 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_2_S40_R1_001.sorted.g.vcf
09:11:33.163 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_3_S18_R1_001.sorted.g.vcf
09:11:33.434 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_4_S21_R1_001.sorted.g.vcf
09:11:33.793 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_5_S94_R1_001.sorted.g.vcf
09:11:34.125 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A10_S10_R1_001.sorted.g.vcf
09:11:34.473 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A11_S11_R1_001.sorted.g.vcf
09:11:34.884 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A12_S12_R1_001.sorted.g.vcf
09:11:35.125 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A1_S1_R1_001.sorted.g.vcf
09:11:35.502 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A2_S2_R1_001.sorted.g.vcf -
Shaun Clare could you create a new post for this since it's not the same issue as the original thread?
Please sign in to leave a comment.
5 comments