Error Creatting Dictionary for genome
Answered
java -jar picard.jar CreateSequenceDictionary R=Bos_taurus_UMD_3.1.1_genome/Bos_taurus_NCBI_UMD_3.1.1/Bos_taurus/NCBI/UMD_3.1.1/Sequence/BWAIndex/version0.6.0/genome.fa O=Bos_taurus_UMD_3.1.1_genome/Bos_taurus_NCBI_UMD_3.1.1/Bos_taurus/NCBI/UMD_3.1.1/Sequence/BWAIndex/version0.6.0/genome.dict
#Error
INFO 2020-12-14 19:02:40 CreateSequenceDictionary
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** CreateSequenceDictionary -R /media/shazia/Data/Bison_Project1_2/Bison_Project_2/Bos_taurus_UMD_3.1.1_genome/Bos_taurus_NCBI_UMD_3.1.1/Bos_taurus/NCBI/UMD_3.1.1/Sequence/BWAIndex/version0.6.0/genome.fa -O /media/shazia/Data/Bison_Project1_2/Bison_Project_2/Bos_taurus_UMD_3.1.1_genome/Bos_taurus_NCBI_UMD_3.1.1/Bos_taurus/NCBI/UMD_3.1.1/Sequence/BWAIndex/version0.6.0/genome.dict
**********
19:02:41.819 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/shazia/Software/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Dec 14 19:02:41 CET 2020] CreateSequenceDictionary OUTPUT=/media/shazia/Data/Bison_Project1_2/Bison_Project_2/Bos_taurus_UMD_3.1.1_genome/Bos_taurus_NCBI_UMD_3.1.1/Bos_taurus/NCBI/UMD_3.1.1/Sequence/BWAIndex/version0.6.0/genome.dict REFERENCE=/media/shazia/Data/Bison_Project1_2/Bison_Project_2/Bos_taurus_UMD_3.1.1_genome/Bos_taurus_NCBI_UMD_3.1.1/Bos_taurus/NCBI/UMD_3.1.1/Sequence/BWAIndex/version0.6.0/genome.fa TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Mon Dec 14 19:02:41 CET 2020] Executing as shazia@shazia-Lin on Linux 4.15.0-128-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.7-SNAPSHOT
[Mon Dec 14 19:02:41 CET 2020] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1011351552
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Cannot write file: /media/shazia/Data/Bison_Project1_2/Bison_Project_2/Bos_taurus_UMD_3.1.1_genome/Bos_taurus_NCBI_UMD_3.1.1/Bos_taurus/NCBI/UMD_3.1.1/Sequence/BWAIndex/version0.6.0/genome.dict. Neither file nor parent directory exist.
at htsjdk.samtools.util.IOUtil.assertFileIsWritable(IOUtil.java:554)
at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:223)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:303)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
I am getting this error, when I am trying to create dictionary.
-
Hi Abrish, were you able to solve this issue? Your question was posted while our GATK Team was Out of Office and we did not get to your question.
If you still would like support from the GATK Team, please repost your question with [Repost] in the title and we will get to it as soon as possible.
-
Hello
I have exactly the same problem with hg38 reference genome downloaded from google resources of broad institute. I would be thankful if you let us know whether you found the solution?
Best
-
zdr j could you share your complete command and stack trace?
-
Hi Genevieve,
Thank you for your help. This is the command I run and the error I get:
java -jar ~/picard-2.25.7/picard.jar CreateSequenceDictionary \
R=~/my_project/Homo_sapiens_assembly38.fasta\
O=~/test3.dict
INFO 2021-08-03 08:54:03 CreateSequenceDictionary
********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
**********
********** The command line looks like this in the new syntax:
**********
********** CreateSequenceDictionary -R ~/my_project/Homo_sapiens_assembly38.fasta -O ~/test3.dict
**********
08:54:09.163 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/z/picard-2.25.7/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib
[Tue Aug 03 08:54:10 IRDT 2021] CreateSequenceDictionary OUTPUT=~/test3.dict REFERENCE=~/my_project/Homo_sapiens_assembly38.fasta TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Aug 03 08:54:10 IRDT 2021] Executing as z@Zs-MacBook-Pro.local on Mac OS X 11.4 x86_64; Java HotSpot(TM) 64-Bit Server VM 16.0.2+7-67; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.7
[Tue Aug 03 08:54:10 IRDT 2021] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=272629760
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Cannot write file: /Users/z/~/test3.dict. Neither file nor parent directory exist.
at htsjdk.samtools.util.IOUtil.assertFileIsWritable(IOUtil.java:554)
at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:223)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
-
zdr j It looks like the way you are providing the file path is incorrect because the command is looking for /Users/z/~/test3.dict. You can provide the argument with -O ~/test3.dict following the syntax in this document: https://gatk.broadinstitute.org/hc/en-us/articles/360057439791-CreateSequenceDictionary-Picard-
Let me know if that works!
-
Thank you Genevieve,
Picard worked to make the dict file. I used samtools to make the fai file. Now running mutect2, I still get this error which is similar to the error I got before when made dict file using samtools:
-R hg38_1.fasta \
> -I tumor.hg38.sam \
> -I ctrl.hg38.sam \
> -normal UCR_1 \
> -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list\
> --sequence-dictionary hg38_1_2.dict\
> --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz \
> --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz\
> --ignore-itr-artifacts\
> --output somatic.vcf.gz\
>
Using GATK jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar Mutect2 -R hg38_1.fasta -I tumor.hg38.sam -I ctrl.hg38.sam -normal UCR_1 -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list --sequence-dictionary hg38_1_2.dict --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz --ignore-itr-artifacts --output somatic.vcf.gz
10:04:43.271 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 06, 2021 10:04:43 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
10:04:43.403 INFO Mutect2 - ------------------------------------------------------------
10:04:43.404 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.9.0-SNAPSHOT
10:04:43.404 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
10:04:43.404 INFO Mutect2 - Executing as root@389f0de5f95d on Linux v5.4.39-linuxkit amd64
10:04:43.404 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
10:04:43.405 INFO Mutect2 - Start Date/Time: August 6, 2021 10:04:43 AM GMT
10:04:43.405 INFO Mutect2 - ------------------------------------------------------------
10:04:43.405 INFO Mutect2 - ------------------------------------------------------------
10:04:43.406 INFO Mutect2 - HTSJDK Version: 2.23.0
10:04:43.406 INFO Mutect2 - Picard Version: 2.23.3
10:04:43.406 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:04:43.407 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:04:43.407 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:04:43.407 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:04:43.407 INFO Mutect2 - Deflater: IntelDeflater
10:04:43.408 INFO Mutect2 - Inflater: IntelInflater
10:04:43.408 INFO Mutect2 - GCS max retries/reopens: 20
10:04:43.408 INFO Mutect2 - Requester pays: disabled
10:04:43.408 INFO Mutect2 - Initializing engine
10:04:43.816 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/somatic-hg38_1000g_pon.hg38.vcf.gz
10:04:44.024 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/somatic-hg38_af-only-gnomad.hg38.vcf.gz
10:04:44.138 INFO FeatureManager - Using codec IntervalListCodec to read file file:///gatk/my_data/resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list
10:04:44.175 INFO Mutect2 - Shutting down engine
[August 6, 2021 10:04:44 AM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=185073664
***********************************************************************
A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary
I would be thankful if you could help
I am stuck at this stage and had no progress for a while because of the reference issue.
Best
-
zdr j This looks like a reference mismatch issue with one of your files. You will need to make sure that the naming of your chromosomes and the reference versions used are all the same. You can find more information about which specific file is causing the problem by running this command with -java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'. Here is more info on how to do it: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax
Here is another forum post with someone who had a similar issue and solved their problem: https://gatk.broadinstitute.org/hc/en-us/community/posts/360077577711-Badly-formed-genome-unclippedLoc-Contig-NC-007605-given-as-location-but-this-contig-isn-t-present-in-the-Fasta-sequence-dictionary
Hope this helps!
-
Hi Genevieve,
Thank you for your reply. I still have the problem. I think the problem is the reference hg38 file that I am using which I got from Broadinstitute Google bundle. It looks like it does not have contig1 of chr1? I am new at genomics and I would be very grateful if you could help me find a good reference hg38 file for my mutect2 analysis.
here is my complete story:
my reads are sliced reads from cancer genome sequences in ICGC, downloaded by score-client5 from aligned WGS reads of ICGC which according to ICGC were originally aligned against GRCh37 (hg19) and I had to realign them using crossmap or galaxy against hg38. Files obtained by both methods gave me an error in mutect2 about contig1, chr1.... which you may see in my last post above.
After your last post above:
I ran Picard ReorderSam on my input tumor and control files as suggested by a former user in the post you mentioned:
java -jar ~/picard-2.25.7/picard.jar ReorderSam \
I=/Users/z/my_project/ctrl.hg38.sam\
O=/Users/z/my_project/reorderedctrl2.bam\
SD=/Users/z/my_project/hg38_1_2.dict\
but this also gave an error:
picard.PicardException: New reference sequence does not contain a matching contig for 1
I then used this command trying to bypass contig 1:
java -jar ~/picard-2.25.7/picard.jar ReorderSam \
I=/Users/z/my_project/ctrl.hg38.sam\
O=/Users/z/my_project/reorderedctrl2.bam\
SD=/Users/z/my_project/hg38_1_2.dict\
S=TRUE\
Now I get a reordered bam file, but running GATK mutect2 I get the old error I had:
gatk Mutect2 \
-R hg38_1.fasta \
-I reordered.bam \
-I reorderedctrl.bam \
-normal UCR_1 \
-L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list\
--sequence-dictionary hg38_1_2.dict\
--germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz \
--panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz\
--ignore-itr-artifacts\
--output somatic.vcf.gz\
--java-options -DGATK_STACKTRACE_ON_USER_EXCEPTION=true
the log and the error:
> -R hg38_1.fasta \
> -I reordered.bam \
> -I reorderedctrl.bam \
> -normal UCR_1 \
> -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list\
> --sequence-dictionary hg38_1_2.dict\
> --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz \
> --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz\
> --ignore-itr-artifacts\
> --output somatic.vcf.gz\
> --java-options -DGATK_STACKTRACE_ON_USER_EXCEPTION=true
Using GATK jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar Mutect2 -R hg38_1.fasta -I reordered.bam -I reorderedctrl.bam -normal UCR_1 -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list --sequence-dictionary hg38_1_2.dict --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz --ignore-itr-artifacts --output somatic.vcf.gz
08:47:57.823 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 21, 2021 8:47:57 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
08:47:58.000 INFO Mutect2 - ------------------------------------------------------------
08:47:58.001 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.9.0-SNAPSHOT
08:47:58.001 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
08:47:58.001 INFO Mutect2 - Executing as root@f6ed04a13e19 on Linux v5.4.39-linuxkit amd64
08:47:58.001 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
08:47:58.001 INFO Mutect2 - Start Date/Time: August 21, 2021 8:47:57 AM GMT
08:47:58.002 INFO Mutect2 - ------------------------------------------------------------
08:47:58.002 INFO Mutect2 - ------------------------------------------------------------
08:47:58.002 INFO Mutect2 - HTSJDK Version: 2.23.0
08:47:58.002 INFO Mutect2 - Picard Version: 2.23.3
08:47:58.002 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:47:58.003 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:47:58.003 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:47:58.003 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:47:58.003 INFO Mutect2 - Deflater: IntelDeflater
08:47:58.003 INFO Mutect2 - Inflater: IntelInflater
08:47:58.004 INFO Mutect2 - GCS max retries/reopens: 20
08:47:58.004 INFO Mutect2 - Requester pays: disabled
08:47:58.004 INFO Mutect2 - Initializing engine
08:47:58.552 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/somatic-hg38_1000g_pon.hg38.vcf.gz
08:47:58.767 INFO FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/somatic-hg38_af-only-gnomad.hg38.vcf.gz
08:47:58.927 INFO FeatureManager - Using codec IntervalListCodec to read file file:///gatk/my_data/resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list
08:47:59.378 INFO Mutect2 - Shutting down engine
[August 21, 2021 8:47:59 AM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=220200960
***********************************************************************
A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary
***********************************************************************
org.broadinstitute.hellbender.exceptions.UserException$MalformedGenomeLoc: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary
at org.broadinstitute.hellbender.utils.GenomeLocParser.getContigInfo(GenomeLocParser.java:107)
at org.broadinstitute.hellbender.utils.GenomeLocParser.getContigIndex(GenomeLocParser.java:119)
at org.broadinstitute.hellbender.utils.GenomeLocParser.createGenomeLoc(GenomeLocParser.java:150)
at org.broadinstitute.hellbender.utils.GenomeLocParser.createGenomeLoc(GenomeLocParser.java:396)
at org.broadinstitute.hellbender.utils.IntervalUtils.featureFileToIntervals(IntervalUtils.java:359)
at org.broadinstitute.hellbender.utils.IntervalUtils.parseIntervalArguments(IntervalUtils.java:318)
at org.broadinstitute.hellbender.utils.IntervalUtils.loadIntervals(IntervalUtils.java:238)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.parseIntervals(IntervalArgumentCollection.java:200)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getTraversalParameters(IntervalArgumentCollection.java:180)
at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getIntervals(IntervalArgumentCollection.java:111)
at org.broadinstitute.hellbender.engine.GATKTool.initializeIntervals(GATKTool.java:516)
at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:711)
at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:79)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Shall I share my input reads that I have to use with mutect2 and you kindly try it with your reference hg38 dictionary in mutect2? If possible I would really appreciate. If kindly accepted by you here is the link to my input sequences:
https://drive.google.com/drive/folders/12_Qry7felNyjwljka9Ozqf0uX2pok8dq
These are tumor and control sliced reads downloaded from ICGC (aligned to GRCh37 (hg19) according to ICGC): Tumor_0a2055597b75c8cb09569f964bf87dc0.1_10925410-10925611 and ctrl_0a2055597b75c8cb09569f964bf87dc0.1_10925410-10925611
The second files are realigned by me to hg38 using crossmap (but I am not sure about my hg38 reference file):
tumor.hg38.sam
ctrl.hg38.sam
I would be grateful if you could try both with mutect2 on your system.
Thank you so much
Best Regards
-
maybe could not use a liftover tool (crossmap) and I had to realign my reads to hg38? like revert bam to fastq and then BWA MEM and ...?
-
Hi zdr j,
I think instead of the roundabout steps you have described here, you will want to verify that you are using the same reference file for Mutect2 that you used when you did your mapping step (most likely with BWA). You will need to use a consistent reference the entire time. If you are doing so, then you shouldn't have this issue.
The hg38 reference does have a chr1 in the file, so you may just want to re-download it in case your file is malformed.
If you are continuing to have issues, could you create a new post? Your issue doesn't seem to be the same as the one above.
Best,
Genevieve
-
Hi Genevieve,
Well, as I explained before there is no roundabout step I am taking, the problem is that ICGC data that I am downloading are originally mapped by ICGC or illumina? to GRCh37 (hg19) and GATK mutect2 does not support mutant calling with reference to this assembly, only supports mutant calling for reads aligned to hg38. So I have to remap the mapped sequences I download from ICGC to hg38, which I did using crossmap. Maybe the problem is liftover tool instead of mapping from beginning. Probably it will be easier to forget mutect2 and switch to another mutant calling tool like strelka which supports GRCh37 (hg19).
-
zdr j Yeah, I'm wondering if something went wrong during your crossmap step. Because chr1 does exist in hg38, so there must be something wrong that you are not able to see yet in your error messages. Like I said, go ahead and make a new post and we can continue to try to troubleshoot if you want! I'm just not an expert on crossmap.
-
Good day,
I am getting the same error message as that stated about.
Not sure that it matters but I am using GATK4 through Ubuntu/Docker Desktop.
The commands are:
gatk CreateSequenceDictionary -R /mnt/f/BeerGenomes/s288c_R64_genomic.fa -O /mnt/f/BeerGenomes/s288c_R64_genomic.dict
The entire message is as follows:
Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.3.0-local.jar CreateSequenceDictionary -R /mnt/f/BeerGenomes/s288c_R64_genomic.fa -O /mnt/f/BeerGenomes/s288c_R64_genomic.dict
23:25:28.261 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Dec 15 23:25:28 UTC 2022] CreateSequenceDictionary --OUTPUT /mnt/f/BeerGenomes/s288c_R64_genomic.dict --REFERENCE /mnt/f/BeerGenomes/s288c_R64_genomic.fa --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Dec 15, 2022 11:25:28 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Thu Dec 15 23:25:28 UTC 2022] Executing as root@70512af252c1 on Linux 5.15.79.1-microsoft-standard-WSL2 amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.3.0
[Thu Dec 15 23:25:28 UTC 2022] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=620756992
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
picard.PicardException: /mnt/f/BeerGenomes/s288c_R64_genomic.dict already exists. Delete this file and try again, or specify a different output file.
at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:193)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)Initially it claims that it has created a .dict file but it is not there:
[Thu Dec 15 23:25:28 UTC 2022] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
Thanks in advance.
-
Hi
I am running a similar problem. I bisulfite-sequenced some samples and then I called variants with sorted-bam files created with bwa-meth and with bismark and variants called with cgmaptools.
Since bismark author is the same for cgmaptools, I called this the truth vcf, and the vcf from bwa-meth the eval, and when I tried to compare them with gatk Concordance, I get a dictionary error.
Thus, I try to create the dictionary with picard, and I get a file that doesn't work at all.I got this:
java -Xmx300G -jar /home/juaguila/appz/picard/build/libs/picard.jar CreateSequenceDictionary -R Bvos.fasta -O Bvos.dict
15:32:54.084 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/juaguila/appz/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Jun 12 15:32:54 EDT 2024] CreateSequenceDictionary --OUTPUT Bvos.dict --REFERENCE Bvos.fasta --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Wed Jun 12 15:32:54 EDT 2024] Executing as juaguila@u05.panther.net on Linux 3.10.0-1160.105.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 20.0.1+9-29; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:3.0.0-1-g62ec81c-SNAPSHOT
[Wed Jun 12 15:32:55 EDT 2024] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.02 minutes.Then, when I try to use the dictionary for the Concordance analysis, I get this error:
gatk Concordance -R ../Bvos.fasta -eval V00001.mrkdup.vcf --truth ../CGMAP_feng/bayes.vcf --summary summary.tsv
Using GATK jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar Concordance -R ../Bvos.fasta -eval V00001.mrkdup.vcf --truth ../CGMAP_feng/bayes.vcf --summary summary.tsv
23:52:18.414 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
23:52:19.405 INFO Concordance - ------------------------------------------------------------
23:52:19.457 INFO Concordance - The Genome Analysis Toolkit (GATK) v4.5.0.0
23:52:19.457 INFO Concordance - For support and documentation go to https://software.broadinstitute.org/gatk/
23:52:19.458 INFO Concordance - Executing as juaguila@u05.panther.net on Linux v3.10.0-1160.105.1.el7.x86_64 amd64
23:52:19.459 INFO Concordance - Java runtime: Java HotSpot(TM) 64-Bit Server VM v20.0.1+9-29
23:52:19.459 INFO Concordance - Start Date/Time: June 12, 2024, 11:52:17 PM EDT
23:52:19.459 INFO Concordance - ------------------------------------------------------------
23:52:19.460 INFO Concordance - ------------------------------------------------------------
23:52:19.461 INFO Concordance - HTSJDK Version: 4.1.0
23:52:19.461 INFO Concordance - Picard Version: 3.1.1
23:52:19.461 INFO Concordance - Built for Spark Version: 3.5.0
23:52:19.462 INFO Concordance - HTSJDK Defaults.COMPRESSION_LEVEL : 2
23:52:19.462 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
23:52:19.463 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
23:52:19.463 INFO Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
23:52:19.463 INFO Concordance - Deflater: IntelDeflater
23:52:19.464 INFO Concordance - Inflater: IntelInflater
23:52:19.464 INFO Concordance - GCS max retries/reopens: 2023:52:19.464 INFO Concordance - Requester pays: disabled
23:52:19.465 INFO Concordance - Initializing engine
23:52:19.473 INFO Concordance - Shutting down engine
[June 12, 2024, 11:52:19 PM EDT] org.broadinstitute.hellbender.tools.walkers.validation.Concordance done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=285212672
***********************************************************************A USER ERROR has occurred: Fasta dict file file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/../Bvos.dict for reference file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/../Bvos.fasta does not exist. Please see https://gatk.broadinstitute.org/hc/articles/360035531652-FASTA-Reference-genome-format for help creating it.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.Anything that I need to check what the dictionary is wrong?
-
Good morning,
In response to Juan -this was the command for me that seemed to fix my problem. It worked anyways:
/gatk/my_data# gatk CreateSequenceDictionary -R s288c_R64_1_1.fa.gz -O s288c(gatk) root@e8f7793000eb:/gatk/my_data# gatk CreateSequenceDictionary -R s288c_R64_1_1.fa.gz -O s288c_R64_genomic.dict
Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar -
Your command line has 300G of heap space assigned to the tool however CreateSequenceDictionary tool does not need that much of a heapsize. Also there does not seem to be any success code at the end of the tool run therefore it might be better for you to reduce this heapsize to 4G and try running it again. Please let us know about the full message thrown by the tool so that we can debug what is causing this issue if it still persists.
Please sign in to leave a comment.
16 comments