Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CombineGVCFs

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Shaun Clare,

    We have seen this issue before with long chromosomes and creating the index for the VCF output. Another user found a workaround, which is to work with unzipped VCF files. So, try making your input and output files unzipped because then the .tbi index is not mandatory.

    Let me know if this works for you or you have any further questions.

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Shaun Clare

    Genevieve Brandt (she/her)

    Thank you, that seems to have worked!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great! Thanks for the update!

    0
    Comment actions Permalink
  • Avatar
    Shaun Clare

    I do have another issue on another project. This one uses the exact same codes but it's on 96 samples called against a much smaller amplicon based reference. 

    gatk CombineGVCFs --java-options "-Xmx20g" -R ${ref}.fasta --variant gvcf.list -O ${project}.g.vcf.gz

    Though this time it never gets passed initializing the engine, it just reads all of the samples as soon below and stalls (only copied in the first few, but it reads all 96):

    Using GATK jar /home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx20g -jar /home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar CombineGVCFs -R 50k.fasta --variant gvcf.list -O CIxGP_SNPs.g.vcf.gz
    09:11:30.941 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/barlex/miniconda3/share/gatk4-4.2.3.0-0/gatk-package-4.2.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jan 14, 2022 9:11:31 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    09:11:31.089 INFO CombineGVCFs - ------------------------------------------------------------
    09:11:31.089 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.2.3.0
    09:11:31.089 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    09:11:31.090 INFO CombineGVCFs - Executing as barlex@DESKTOP-VDKQE0G on Linux v4.4.0-19041-Microsoft amd64
    09:11:31.090 INFO CombineGVCFs - Java runtime: OpenJDK 64-Bit Server VM v11.0.11+9-Ubuntu-0ubuntu2.20.04
    09:11:31.090 INFO CombineGVCFs - Start Date/Time: January 14, 2022 at 9:11:30 AM PST
    09:11:31.090 INFO CombineGVCFs - ------------------------------------------------------------
    09:11:31.090 INFO CombineGVCFs - ------------------------------------------------------------
    09:11:31.091 INFO CombineGVCFs - HTSJDK Version: 2.24.1
    09:11:31.091 INFO CombineGVCFs - Picard Version: 2.25.4
    09:11:31.091 INFO CombineGVCFs - Built for Spark Version: 2.4.5
    09:11:31.091 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:11:31.091 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:11:31.091 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:11:31.092 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:11:31.092 INFO CombineGVCFs - Deflater: IntelDeflater
    09:11:31.092 INFO CombineGVCFs - Inflater: IntelInflater
    09:11:31.092 INFO CombineGVCFs - GCS max retries/reopens: 20
    09:11:31.092 INFO CombineGVCFs - Requester pays: disabled
    09:11:31.092 INFO CombineGVCFs - Initializing engine
    09:11:32.042 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_1_S50_R1_001.sorted.g.vcf
    09:11:32.507 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_2_S40_R1_001.sorted.g.vcf
    09:11:33.163 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_3_S18_R1_001.sorted.g.vcf
    09:11:33.434 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_4_S21_R1_001.sorted.g.vcf
    09:11:33.793 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/CI_5_S94_R1_001.sorted.g.vcf
    09:11:34.125 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A10_S10_R1_001.sorted.g.vcf
    09:11:34.473 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A11_S11_R1_001.sorted.g.vcf
    09:11:34.884 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A12_S12_R1_001.sorted.g.vcf
    09:11:35.125 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A1_S1_R1_001.sorted.g.vcf
    09:11:35.502 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/Data_Shaun/CIxGP_F2_Mapping/Genotyping/F2-A2_S2_R1_001.sorted.g.vcf
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Shaun Clare could you create a new post for this since it's not the same issue as the original thread?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk