GenotypeGVCF Output Only 1 Locus
Can you please provide
a) GATK version used: 4.1.7.0
Hello,
I have this problem with genotyping my GVCFs. As you can see from the error below, only 1 locus was genotyped and the vcf produced. I have checked my GVCFs and find them to be in order (with all the chromosome positions present).
gatk --java-options "-Xmx32g" GenotypeGVCFs \
> -R /home/himawari/REF/GRCh37/B37/human_g1k_v37.fasta \
> -V /home/himawari/SCD/OUTPUT/SCDO_3_X5/SCDO_3_X5.g.vcf.gz \
> -O /home/himawari/SCD/OUTPUT/SCDO_3_X5/SCDO_3_X5_test.vcf.gz
Using GATK jar /home/himawari/Downloads/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx32g -jar /home/himawari/Downloads/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar GenotypeGVCFs -R /home/himawari/REF/GRCh37/B37/human_g1k_v37.fasta -V /home/himawari/SCD/OUTPUT/SCDO_3_X5/SCDO_3_X5.g.vcf.gz -O /home/himawari/SCD/OUTPUT/SCDO_3_X5/SCDO_3_X5_test.vcf.gz
18:53:00.022 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/himawari/Downloads/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 01, 2020 6:53:00 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
18:53:00.115 INFO GenotypeGVCFs - ------------------------------------------------------------
18:53:00.115 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.1.7.0
18:53:00.115 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
18:53:00.115 INFO GenotypeGVCFs - Executing as himawari@ncgm001 on Linux v4.15.0-101-generic amd64
18:53:00.115 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v11.0.1-internal+0-adhoc..src
18:53:00.116 INFO GenotypeGVCFs - Start Date/Time: 1 June 2020 at 18:52:59 CST
18:53:00.116 INFO GenotypeGVCFs - ------------------------------------------------------------
18:53:00.116 INFO GenotypeGVCFs - ------------------------------------------------------------
18:53:00.116 INFO GenotypeGVCFs - HTSJDK Version: 2.21.2
18:53:00.116 INFO GenotypeGVCFs - Picard Version: 2.21.9
18:53:00.116 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:53:00.116 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:53:00.116 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:53:00.116 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:53:00.116 INFO GenotypeGVCFs - Deflater: IntelDeflater
18:53:00.116 INFO GenotypeGVCFs - Inflater: IntelInflater
18:53:00.116 INFO GenotypeGVCFs - GCS max retries/reopens: 20
18:53:00.116 INFO GenotypeGVCFs - Requester pays: disabled
18:53:00.116 INFO GenotypeGVCFs - Initializing engine
18:53:00.201 INFO FeatureManager - Using codec VCFCodec to read file file:///home/himawari/SCD/OUTPUT/SCDO_3_X5/SCDO_3_X5.g.vcf.gz
18:53:00.279 INFO GenotypeGVCFs - Done initializing engine
18:53:00.301 INFO ProgressMeter - Starting traversal
18:53:00.302 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
18:53:00.365 WARN InbreedingCoeff - InbreedingCoeff will not be calculated; at least 10 samples must have called genotypes
18:53:10.303 INFO ProgressMeter - 1:227192693 0.2 1197000 7181281.9
18:53:11.090 INFO GenotypeGVCFs - Shutting down engine
[1 June 2020 at 18:53:11 CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.18 minutes.
Runtime.totalMemory()=1275068416
java.lang.IllegalArgumentException: Invalid interval. Contig:1 start:249213232 end:41385
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:733)
at org.broadinstitute.hellbender.utils.SimpleInterval.validatePositions(SimpleInterval.java:59)
at org.broadinstitute.hellbender.utils.SimpleInterval.<init>(SimpleInterval.java:35)
at org.broadinstitute.hellbender.utils.SimpleInterval.<init>(SimpleInterval.java:47)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$0(VariantLocusWalker.java:134)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:502)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:132)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
I am unsure where I am doing wrong.
BTW, I used CombineGVCFs to consolidate my GVCFs instead of GenomicsDB because I do not know how to use it properly. It took too long to consolidate. I am unsure of how to split it too. I think I don't quite understand it as well.
Himawari.
-
Hi Himawari
CombineGVCFs for combining GVCFs is perfectly fine.
If you have checked the order of GVCF, then I advise you to check again. Look at the first VCF entry for chromosome 1 (contig 1) and the last. Have a glimpse at all the positions in your GVCF for chromosome 1.
Additionally, look at the GVCF header (bcftools view -O v -h ) and grep it for entries starting with `##contig=<ID=`. You might find an erroneous entry for chromosome 1.
If not found, run GATK ValidateVariants (https://gatk.broadinstitute.org/hc/en-us/articles/360037057272-ValidateVariants) specifying --validate-GVCF and --reference <reference_genome.fa> file (used for genotyping).It might give a hint on what is wrong with your GVCF. If anything strange found, please, post here the full header of the GVCF, the first 10 entries of your VCF (not header), warnings and full error logs.
Please sign in to leave a comment.
1 comment