Error using output of ScatterIntervalsByNs by SplitIntervals
Hi there,
when building an interval file with ScatterIntervalsByNs on the hg38 reference the output leads to an error when fed to SplitIntervals because of "A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "@HD VN:1.6 SO:coordinate"is not valid for this input." I want to create the intervals file to be used by BaseRecalibrator as recommended by the INTEL GATK4 performance guide.
Can you help?
a) GATK version used
The Genome Analysis Toolkit (GATK) v4.1.6.0
HTSJDK Version: 2.21.2
Picard Version: 2.21.9
b) Exact GATK commands used
ref_fasta="/home/zyto/unger/GATK_ref/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta"
interval_list="/home/zyto/unger/GATK_ref/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta.intervals.list"
interval_list_folder="/home/zyto/unger/GATK_ref/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta.intervals.list.folder"
singularity run /home/zyto/unger/gatk_latest.sif gatk --java-options "-Xmx4G -XX:+UseParallelGC -XX:ParallelGCThreads=4" \
ScatterIntervalsByNs \
--OUTPUT $interval_list \
--OUTPUT_TYPE=N \
--REFERENCE $ref_fasta
singularity run /home/zyto/unger/gatk_latest.sif gatk --java-options "-Xmx4G -XX:+UseParallelGC -XX:ParallelGCThreads=4" \
SplitIntervals \
--reference $ref_fasta \
--intervals $interval_list \
--scatter-count 4 \
--output $interval_list_folder
c) The entire error log if applicable.
[unger@frontser GATK_Exome_Lisa_HD]$ singularity run /home/zyto/unger/gatk_latest.sif gatk --java-options "-Xmx4G -XX:+UseParallelGC -XX:ParallelGCThreads=4" \
> SplitIntervals \
> --reference $ref_fasta \
> --intervals $interval_list \
> --scatter-count 4 \
> --output $interval_list_folder
Using GATK jar /gatk/gatk-package-4.1.6.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4G -XX:+UseParallelGC -XX:ParallelGCThreads=4 -jar /gatk/gatk-package-4.1.6.0-local.jar SplitIntervals --reference /home/zyto/unger/GATK_ref/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta --intervals /home/zyto/unger/GATK_ref/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta.intervals.list --scatter-count 4 --output /home/zyto/unger/GATK_ref/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta.intervals.list.folder
20:33:59.963 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 07, 2020 8:34:00 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
20:34:00.180 INFO SplitIntervals - ------------------------------------------------------------
20:34:00.180 INFO SplitIntervals - The Genome Analysis Toolkit (GATK) v4.1.6.0
20:34:00.180 INFO SplitIntervals - For support and documentation go to https://software.broadinstitute.org/gatk/
20:34:00.180 INFO SplitIntervals - Executing as unger@frontser on Linux v3.10.0-957.el7.x86_64 amd64
20:34:00.180 INFO SplitIntervals - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03
20:34:00.181 INFO SplitIntervals - Start Date/Time: July 7, 2020 8:33:59 PM UTC
20:34:00.181 INFO SplitIntervals - ------------------------------------------------------------
20:34:00.181 INFO SplitIntervals - ------------------------------------------------------------
20:34:00.181 INFO SplitIntervals - HTSJDK Version: 2.21.2
20:34:00.181 INFO SplitIntervals - Picard Version: 2.21.9
20:34:00.181 INFO SplitIntervals - HTSJDK Defaults.COMPRESSION_LEVEL : 2
20:34:00.181 INFO SplitIntervals - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
20:34:00.182 INFO SplitIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
20:34:00.182 INFO SplitIntervals - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
20:34:00.182 INFO SplitIntervals - Deflater: IntelDeflater
20:34:00.182 INFO SplitIntervals - Inflater: IntelInflater
20:34:00.182 INFO SplitIntervals - GCS max retries/reopens: 20
20:34:00.182 INFO SplitIntervals - Requester pays: disabled
20:34:00.182 INFO SplitIntervals - Initializing engine
20:34:00.512 INFO SplitIntervals - Shutting down engine
[July 7, 2020 8:34:00 PM UTC] org.broadinstitute.hellbender.tools.walkers.SplitIntervals done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2174746624
***********************************************************************
A USER ERROR has occurred: Badly formed genome unclippedLoc: Query interval "@HD VN:1.6 SO:coordinate"is not valid for this input.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
-
Hi Kristian Unger , it seems as though you are supplying an interval "@HD VN:1.6 SO:coordinate" that cannot be used. Please check all your intervals and make sure they are valid.
Please sign in to leave a comment.
1 comment