Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Combine GVCFs

Answered
0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    The issue you found with ValidateVariants should not cause issues in CombineGVCFs. The warning in CombineGVCFs is also not a problem.

    I'm still looking into the GATKException in CombineGVCFs at position chr5:180055863. Is there a reason you are using CombineGVCFs instead of GenomicsDBImport? How did you create these GVCFs?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Priyadarshini Thirunavukkarasu we have identified that the issue is coming from an MNP at chr5:180055862 in your BRA_54-51_S23.genome.vcf.gz file. CombineGVCFs does not support MNPs. If you want to run CombineGVCFs without MNPs, you can remove them with the following command:

    bcftools view --exclude-types mnps in.vcf -o out.vcf
    0
    Comment actions Permalink
  • Avatar
    Priyadarshini Thirunavukkarasu

    Thank you. I removed the MNPs in all the gvcf files. This time, when I try to combine GVCFs, I get another error. The error shows the gvcf files are not gzipped. Please find the command and error message below

     gatk CombineGVCFs \
    -R /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/reference/hg19.fa \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-8_S5.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-7_S4.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-6_S3.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-5_S2.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-52_S24.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-51_S23.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-48_S22.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-47_S21.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-45_S20.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-43_S19.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-41_S18.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-40_S17.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-39_S16.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-38_S15.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-26_S14.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-22_S13.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-21_S12.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-1_S1.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-18_S11.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-17_S10.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-15_S9.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-14_S8.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-11_S7.genome.vcf.gz \
    --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-10_S6.genome.vcf.gz \
    -O /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/cohort.genome.vcf.gz
    12:04:04.583 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scicore/soft/apps/GATK/4.1.2.0-foss-2018b-Java-1.8/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Nov 19, 2021 12:04:04 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    12:04:04.729 INFO CombineGVCFs - ------------------------------------------------------------
    12:04:04.729 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.1.2.0
    12:04:04.729 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    12:04:04.729 INFO CombineGVCFs - Executing as thirun0000@shi101.cluster.bc2.ch on Linux v3.10.0-1160.el7.x86_64 amd64
    12:04:04.730 INFO CombineGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-b03
    12:04:04.730 INFO CombineGVCFs - Start Date/Time: November 19, 2021 12:04:04 PM CET
    12:04:04.730 INFO CombineGVCFs - ------------------------------------------------------------
    12:04:04.730 INFO CombineGVCFs - ------------------------------------------------------------
    12:04:04.730 INFO CombineGVCFs - HTSJDK Version: 2.19.0
    12:04:04.730 INFO CombineGVCFs - Picard Version: 2.19.0
    12:04:04.730 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    12:04:04.730 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    12:04:04.730 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    12:04:04.730 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    12:04:04.730 INFO CombineGVCFs - Deflater: IntelDeflater
    12:04:04.730 INFO CombineGVCFs - Inflater: IntelInflater
    12:04:04.730 INFO CombineGVCFs - GCS max retries/reopens: 20
    12:04:04.730 INFO CombineGVCFs - Requester pays: disabled
    12:04:04.730 INFO CombineGVCFs - Initializing engine
    12:04:05.096 INFO FeatureManager - Using codec VCFCodec to read file file:///scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-8_S5.genome.vcf.gz
    12:04:05.101 INFO CombineGVCFs - Shutting down engine
    [November 19, 2021 12:04:05 PM CET] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=491257856
    org.broadinstitute.hellbender.exceptions.GATKException: Error initializing feature reader for path /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-8_S5.genome.vcf.gz
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:353)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getFeatureReader(FeatureDataSource.java:305)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.<init>(FeatureDataSource.java:256)
    at org.broadinstitute.hellbender.engine.FeatureManager.addToFeatureSources(FeatureManager.java:234)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$initializeDrivingVariants$0(MultiVariantWalker.java:73)
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.initializeDrivingVariants(MultiVariantWalker.java:63)
    at org.broadinstitute.hellbender.engine.VariantWalkerBase.initializeFeatures(VariantWalkerBase.java:55)
    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:697)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.onStartup(MultiVariantWalker.java:46)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:137)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Caused by: htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Not in GZIP format, for input source: /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-8_S5.genome.vcf.gz
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:263)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:102)
    at htsjdk.tribble.TribbleIndexedFeatureReader.<init>(TribbleIndexedFeatureReader.java:127)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:120)
    at org.broadinstitute.hellbender.engine.FeatureDataSource.getTribbleFeatureReader(FeatureDataSource.java:350)
    ... 16 more
    Caused by: java.util.zip.ZipException: Not in GZIP format
    at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:257)
    ... 20 more
    Using GATK jar /scicore/soft/apps/GATK/4.1.2.0-foss-2018b-Java-1.8/gatk-package-4.1.2.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scicore/soft/apps/GATK/4.1.2.0-foss-2018b-Java-1.8/gatk-package-4.1.2.0-local.jar CombineGVCFs -R /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/reference/hg19.fa --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-8_S5.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-7_S4.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-6_S3.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-5_S2.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-52_S24.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-51_S23.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-48_S22.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-47_S21.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-45_S20.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-43_S19.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-41_S18.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-40_S17.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-39_S16.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-38_S15.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-26_S14.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-22_S13.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-21_S12.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-1_S1.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-18_S11.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-17_S10.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-15_S9.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-14_S8.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-11_S7.genome.vcf.gz --variant /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/BRA_54-10_S6.genome.vcf.gz -O /scicore/home/cichon/thirun0000/HAE_panel/Illumina-panel-HAE/family_54/gvcf/gvcf_files/exclude_mnps/cohort.genome.vcf.gz
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Are your files truly gzipped? Or are they not zipped and named with the extension .vcf.gz?

    0
    Comment actions Permalink
  • Avatar
    Priyadarshini Thirunavukkarasu

    Hello
    These gvcf files were generated by the illumina software (miniseq) so not sure if it gzipped or end with an extension vcf.gz

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    You'll need to figure that out so that GATK can read the file correctly.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk