Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Error Creatting Dictionary for genome

Answered
0

16 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Abrish, were you able to solve this issue? Your question was posted while our GATK Team was Out of Office and we did not get to your question.

    If you still would like support from the GATK Team, please repost your question with [Repost] in the title and we will get to it as soon as possible.

    0
    Comment actions Permalink
  • Avatar
    zdr j

    Hello

    I have exactly the same problem with hg38 reference genome downloaded from google resources of broad institute. I would be thankful if you let us know whether you found the solution? 

     

    Best

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    zdr j could you share your complete command and stack trace?

    0
    Comment actions Permalink
  • Avatar
    zdr j

    Hi Genevieve,

    Thank you for your help. This is the command I run and the error I get: 

    java -jar ~/picard-2.25.7/picard.jar CreateSequenceDictionary \

          R=~/my_project/Homo_sapiens_assembly38.fasta\

          O=~/test3.dict

    INFO 2021-08-03 08:54:03 CreateSequenceDictionary

     

    ********** NOTE: Picard's command line syntax is changing.

    **********

    ********** For more information, please see:

    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

    **********

    ********** The command line looks like this in the new syntax:

    **********

    **********    CreateSequenceDictionary -R ~/my_project/Homo_sapiens_assembly38.fasta -O ~/test3.dict

    **********

     

     

    08:54:09.163 INFO  NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/z/picard-2.25.7/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib

    [Tue Aug 03 08:54:10 IRDT 2021] CreateSequenceDictionary OUTPUT=~/test3.dict REFERENCE=~/my_project/Homo_sapiens_assembly38.fasta    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

    [Tue Aug 03 08:54:10 IRDT 2021] Executing as z@Zs-MacBook-Pro.local on Mac OS X 11.4 x86_64; Java HotSpot(TM) 64-Bit Server VM 16.0.2+7-67; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.7

    [Tue Aug 03 08:54:10 IRDT 2021] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.02 minutes.

    Runtime.totalMemory()=272629760

    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

    Exception in thread "main" htsjdk.samtools.SAMException: Cannot write file: /Users/z/~/test3.dict. Neither file nor parent directory exist.

    at htsjdk.samtools.util.IOUtil.assertFileIsWritable(IOUtil.java:554)

    at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:223)

    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)

    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)

    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    zdr j It looks like the way you are providing the file path is incorrect because the command is looking for  /Users/z/~/test3.dict. You can provide the argument with -O ~/test3.dict following the syntax in this document: https://gatk.broadinstitute.org/hc/en-us/articles/360057439791-CreateSequenceDictionary-Picard-

    Let me know if that works!

    0
    Comment actions Permalink
  • Avatar
    zdr j

    Thank you Genevieve,

    Picard worked to make the dict file. I used samtools to make the fai file. Now running mutect2, I still get this error which is similar to the error I got before when made dict file using samtools: 

    -R hg38_1.fasta \

    >      -I tumor.hg38.sam \

    >      -I ctrl.hg38.sam \

    >      -normal UCR_1 \

    >      -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list\

    >      --sequence-dictionary hg38_1_2.dict\

    >      --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz \

    >      --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz\

    >      --ignore-itr-artifacts\

    >      --output somatic.vcf.gz\

    > 

    Using GATK jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar Mutect2 -R hg38_1.fasta -I tumor.hg38.sam -I ctrl.hg38.sam -normal UCR_1 -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list --sequence-dictionary hg38_1_2.dict --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz --ignore-itr-artifacts --output somatic.vcf.gz

    10:04:43.271 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so

    Aug 06, 2021 10:04:43 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    10:04:43.403 INFO  Mutect2 - ------------------------------------------------------------

    10:04:43.404 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.9.0-SNAPSHOT

    10:04:43.404 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/

    10:04:43.404 INFO  Mutect2 - Executing as root@389f0de5f95d on Linux v5.4.39-linuxkit amd64

    10:04:43.404 INFO  Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08

    10:04:43.405 INFO  Mutect2 - Start Date/Time: August 6, 2021 10:04:43 AM GMT

    10:04:43.405 INFO  Mutect2 - ------------------------------------------------------------

    10:04:43.405 INFO  Mutect2 - ------------------------------------------------------------

    10:04:43.406 INFO  Mutect2 - HTSJDK Version: 2.23.0

    10:04:43.406 INFO  Mutect2 - Picard Version: 2.23.3

    10:04:43.406 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    10:04:43.407 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    10:04:43.407 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    10:04:43.407 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    10:04:43.407 INFO  Mutect2 - Deflater: IntelDeflater

    10:04:43.408 INFO  Mutect2 - Inflater: IntelInflater

    10:04:43.408 INFO  Mutect2 - GCS max retries/reopens: 20

    10:04:43.408 INFO  Mutect2 - Requester pays: disabled

    10:04:43.408 INFO  Mutect2 - Initializing engine

    10:04:43.816 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/somatic-hg38_1000g_pon.hg38.vcf.gz

    10:04:44.024 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/somatic-hg38_af-only-gnomad.hg38.vcf.gz

    10:04:44.138 INFO  FeatureManager - Using codec IntervalListCodec to read file file:///gatk/my_data/resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list

    10:04:44.175 INFO  Mutect2 - Shutting down engine

    [August 6, 2021 10:04:44 AM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.02 minutes.

    Runtime.totalMemory()=185073664

    ***********************************************************************

     

    A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary

    I would be thankful if you could help

    I am stuck at this stage and had no progress for a while because of the reference issue. 

    Best

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    zdr j This looks like a reference mismatch issue with one of your files. You will need to make sure that the naming of your chromosomes and the reference versions used are all the same. You can find more information about which specific file is causing the problem by running this command with -java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'. Here is more info on how to do it: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax

    Here is another forum post with someone who had a similar issue and solved their problem: https://gatk.broadinstitute.org/hc/en-us/community/posts/360077577711-Badly-formed-genome-unclippedLoc-Contig-NC-007605-given-as-location-but-this-contig-isn-t-present-in-the-Fasta-sequence-dictionary

    Hope this helps!

    0
    Comment actions Permalink
  • Avatar
    zdr j

    Hi Genevieve,

    Thank you for your reply. I still have the problem. I think the problem is the reference hg38 file that I am using which I got from Broadinstitute Google bundle. It looks like it does not have contig1 of chr1? I am new at genomics and I would be very grateful if you could help me find a good reference hg38 file for my mutect2 analysis. 

    here is my complete story:

    my reads are sliced reads from cancer genome sequences in ICGC, downloaded by score-client5 from aligned WGS reads of ICGC which according to ICGC were originally aligned against GRCh37 (hg19)  and I had to realign them using crossmap or galaxy against hg38. Files obtained by both methods gave me an error in mutect2 about contig1, chr1.... which you may see in my last post above.

    After your last post above: 

    I ran Picard ReorderSam on my input tumor and control files as suggested by a former user in the post you mentioned: 

    java -jar ~/picard-2.25.7/picard.jar ReorderSam \

     I=/Users/z/my_project/ctrl.hg38.sam\

     O=/Users/z/my_project/reorderedctrl2.bam\

     SD=/Users/z/my_project/hg38_1_2.dict\

    but this also gave an error:

    picard.PicardException: New reference sequence does not contain a matching contig for 1

     

     

    I then used this command trying to bypass contig 1:

    java -jar ~/picard-2.25.7/picard.jar ReorderSam \

     I=/Users/z/my_project/ctrl.hg38.sam\

     O=/Users/z/my_project/reorderedctrl2.bam\

     SD=/Users/z/my_project/hg38_1_2.dict\

     S=TRUE\

    Now I get a reordered bam file, but running GATK mutect2 I get the old error I had: 

    gatk Mutect2 \

         -R hg38_1.fasta \

         -I reordered.bam \

         -I reorderedctrl.bam \

         -normal UCR_1 \

         -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list\

         --sequence-dictionary hg38_1_2.dict\

         --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz \

         --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz\

         --ignore-itr-artifacts\

         --output somatic.vcf.gz\

          --java-options  -DGATK_STACKTRACE_ON_USER_EXCEPTION=true

     

    the log and the error: 

     

    >      -R hg38_1.fasta \

    >      -I reordered.bam \

    >      -I reorderedctrl.bam \

    >      -normal UCR_1 \

    >      -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list\

    >      --sequence-dictionary hg38_1_2.dict\

    >      --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz \

    >      --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz\

    >      --ignore-itr-artifacts\

    >      --output somatic.vcf.gz\

    >       --java-options  -DGATK_STACKTRACE_ON_USER_EXCEPTION=true

    Using GATK jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar Mutect2 -R hg38_1.fasta -I reordered.bam -I reorderedctrl.bam -normal UCR_1 -L resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list --sequence-dictionary hg38_1_2.dict --germline-resource somatic-hg38_af-only-gnomad.hg38.vcf.gz --panel-of-normals somatic-hg38_1000g_pon.hg38.vcf.gz --ignore-itr-artifacts --output somatic.vcf.gz

    08:47:57.823 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.9.0-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so

     

    Aug 21, 2021 8:47:57 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine

    INFO: Failed to detect whether we are running on Google Compute Engine.

    08:47:58.000 INFO  Mutect2 - ------------------------------------------------------------

    08:47:58.001 INFO  Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.9.0-SNAPSHOT

    08:47:58.001 INFO  Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/

    08:47:58.001 INFO  Mutect2 - Executing as root@f6ed04a13e19 on Linux v5.4.39-linuxkit amd64

    08:47:58.001 INFO  Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08

    08:47:58.001 INFO  Mutect2 - Start Date/Time: August 21, 2021 8:47:57 AM GMT

    08:47:58.002 INFO  Mutect2 - ------------------------------------------------------------

    08:47:58.002 INFO  Mutect2 - ------------------------------------------------------------

    08:47:58.002 INFO  Mutect2 - HTSJDK Version: 2.23.0

    08:47:58.002 INFO  Mutect2 - Picard Version: 2.23.3

    08:47:58.002 INFO  Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    08:47:58.003 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    08:47:58.003 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    08:47:58.003 INFO  Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    08:47:58.003 INFO  Mutect2 - Deflater: IntelDeflater

    08:47:58.003 INFO  Mutect2 - Inflater: IntelInflater

    08:47:58.004 INFO  Mutect2 - GCS max retries/reopens: 20

    08:47:58.004 INFO  Mutect2 - Requester pays: disabled

    08:47:58.004 INFO  Mutect2 - Initializing engine

    08:47:58.552 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/somatic-hg38_1000g_pon.hg38.vcf.gz

    08:47:58.767 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/my_data/somatic-hg38_af-only-gnomad.hg38.vcf.gz

    08:47:58.927 INFO  FeatureManager - Using codec IntervalListCodec to read file file:///gatk/my_data/resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list

    08:47:59.378 INFO  Mutect2 - Shutting down engine

    [August 21, 2021 8:47:59 AM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.03 minutes.

    Runtime.totalMemory()=220200960

    ***********************************************************************

     

    A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary

     

    ***********************************************************************

    org.broadinstitute.hellbender.exceptions.UserException$MalformedGenomeLoc: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary

    at org.broadinstitute.hellbender.utils.GenomeLocParser.getContigInfo(GenomeLocParser.java:107)

    at org.broadinstitute.hellbender.utils.GenomeLocParser.getContigIndex(GenomeLocParser.java:119)

    at org.broadinstitute.hellbender.utils.GenomeLocParser.createGenomeLoc(GenomeLocParser.java:150)

    at org.broadinstitute.hellbender.utils.GenomeLocParser.createGenomeLoc(GenomeLocParser.java:396)

    at org.broadinstitute.hellbender.utils.IntervalUtils.featureFileToIntervals(IntervalUtils.java:359)

    at org.broadinstitute.hellbender.utils.IntervalUtils.parseIntervalArguments(IntervalUtils.java:318)

    at org.broadinstitute.hellbender.utils.IntervalUtils.loadIntervals(IntervalUtils.java:238)

    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.parseIntervals(IntervalArgumentCollection.java:200)

    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getTraversalParameters(IntervalArgumentCollection.java:180)

    at org.broadinstitute.hellbender.cmdline.argumentcollections.IntervalArgumentCollection.getIntervals(IntervalArgumentCollection.java:111)

    at org.broadinstitute.hellbender.engine.GATKTool.initializeIntervals(GATKTool.java:516)

    at org.broadinstitute.hellbender.engine.GATKTool.onStartup(GATKTool.java:711)

    at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.onStartup(AssemblyRegionWalker.java:79)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

    Shall I share my  input reads that I have to use with mutect2 and you kindly try it with your reference hg38 dictionary in mutect2? If possible I would really appreciate. If kindly accepted by you here is the link to my input sequences: 

    https://drive.google.com/drive/folders/12_Qry7felNyjwljka9Ozqf0uX2pok8dq

    These are tumor and control sliced reads downloaded from ICGC (aligned to GRCh37 (hg19)  according to ICGC):  Tumor_0a2055597b75c8cb09569f964bf87dc0.1_10925410-10925611 and ctrl_0a2055597b75c8cb09569f964bf87dc0.1_10925410-10925611

    The second files are realigned by me to hg38 using crossmap (but I am not sure about my hg38 reference file):

    tumor.hg38.sam

    ctrl.hg38.sam

    I would be grateful if you could try both with mutect2 on your system.

    Thank you so much

    Best Regards

    0
    Comment actions Permalink
  • Avatar
    zdr j

    maybe could not use a liftover tool (crossmap) and I had to realign my reads to hg38? like revert bam to fastq and then BWA MEM and ...?

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi zdr j,

    I think instead of the roundabout steps you have described here, you will want to verify that you are using the same reference file for Mutect2 that you used when you did your mapping step (most likely with BWA). You will need to use a consistent reference the entire time. If you are doing so, then you shouldn't have this issue.

    The hg38 reference does have a chr1 in the file, so you may just want to re-download it in case your file is malformed.

    If you are continuing to have issues, could you create a new post? Your issue doesn't seem to be the same as the one above.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    zdr j

    Hi Genevieve, 

    Well, as I explained before there is no roundabout step I am taking, the problem is that ICGC data that I am downloading are originally mapped by ICGC or illumina? to GRCh37 (hg19) and GATK mutect2 does not support mutant calling with reference to this assembly, only supports mutant calling for reads aligned to hg38. So I have to remap the mapped sequences I download from ICGC to hg38, which I did using crossmap. Maybe the problem is liftover tool instead of mapping from beginning. Probably it will be easier to forget mutect2 and switch to another mutant calling tool like strelka which supports GRCh37 (hg19).  

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    zdr j Yeah, I'm wondering if something went wrong during your crossmap step. Because chr1 does exist in hg38, so there must be something wrong that you are not able to see yet in your error messages. Like I said, go ahead and make a new post and we can continue to try to troubleshoot if you want! I'm just not an expert on crossmap.

    0
    Comment actions Permalink
  • Avatar
    Chrsitopher Eskiw

    Good day,

    I am getting the same error message as that stated about.

    Not sure that it matters but I am using GATK4 through Ubuntu/Docker Desktop.

    The commands are:

    gatk CreateSequenceDictionary -R /mnt/f/BeerGenomes/s288c_R64_genomic.fa -O /mnt/f/BeerGenomes/s288c_R64_genomic.dict

    The entire message is as follows:

    Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.3.0-local.jar CreateSequenceDictionary -R /mnt/f/BeerGenomes/s288c_R64_genomic.fa -O /mnt/f/BeerGenomes/s288c_R64_genomic.dict
    23:25:28.261 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Dec 15 23:25:28 UTC 2022] CreateSequenceDictionary  --OUTPUT /mnt/f/BeerGenomes/s288c_R64_genomic.dict --REFERENCE /mnt/f/BeerGenomes/s288c_R64_genomic.fa  --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    Dec 15, 2022 11:25:28 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    [Thu Dec 15 23:25:28 UTC 2022] Executing as root@70512af252c1 on Linux 5.15.79.1-microsoft-standard-WSL2 amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.3.0
    [Thu Dec 15 23:25:28 UTC 2022] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.
    Runtime.totalMemory()=620756992
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    picard.PicardException: /mnt/f/BeerGenomes/s288c_R64_genomic.dict already exists.  Delete this file and try again, or specify a different output file.
            at picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:193)
            at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
            at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
            at org.broadinstitute.hellbender.Main.main(Main.java:291)

    Initially it claims that it has created a .dict file but it is not there:

    [Thu Dec 15 23:25:28 UTC 2022] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.00 minutes.

    Thanks in advance.

     

    0
    Comment actions Permalink
  • Avatar
    Juan Pablo Aguilar Cabezas

    Hi

    I am running a similar problem. I bisulfite-sequenced some samples and then I called variants with sorted-bam files created with bwa-meth and with bismark and variants called with cgmaptools.

    Since bismark author is the same for cgmaptools, I called this the truth vcf, and the vcf from bwa-meth the eval, and when I tried to compare them with gatk Concordance, I get a dictionary error.
    Thus, I try to create the dictionary with picard, and I get a file that doesn't work at all.

    I got this:

    java -Xmx300G -jar /home/juaguila/appz/picard/build/libs/picard.jar CreateSequenceDictionary -R Bvos.fasta -O Bvos.dict
    15:32:54.084 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/juaguila/appz/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Jun 12 15:32:54 EDT 2024] CreateSequenceDictionary --OUTPUT Bvos.dict --REFERENCE Bvos.fasta --TRUNCATE_NAMES_AT_WHITESPACE true --NUM_SEQUENCES 2147483647 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    [Wed Jun 12 15:32:54 EDT 2024] Executing as juaguila@u05.panther.net on Linux 3.10.0-1160.105.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 20.0.1+9-29; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:3.0.0-1-g62ec81c-SNAPSHOT
    [Wed Jun 12 15:32:55 EDT 2024] picard.sam.CreateSequenceDictionary done. Elapsed time: 0.02 minutes.

     

    Then, when I try to use the dictionary for the Concordance analysis, I get this error:

     gatk Concordance -R ../Bvos.fasta -eval V00001.mrkdup.vcf --truth ../CGMAP_feng/bayes.vcf  --summary summary.tsv
    Using GATK jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar Concordance -R ../Bvos.fasta -eval V00001.mrkdup.vcf --truth ../CGMAP_feng/bayes.vcf --summary summary.tsv
    23:52:18.414 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/juaguila/.conda/envs/gatk4/share/gatk4-4.5.0.0-0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    23:52:19.405 INFO  Concordance - ------------------------------------------------------------
    23:52:19.457 INFO  Concordance - The Genome Analysis Toolkit (GATK) v4.5.0.0
    23:52:19.457 INFO  Concordance - For support and documentation go to https://software.broadinstitute.org/gatk/
    23:52:19.458 INFO  Concordance - Executing as juaguila@u05.panther.net on Linux v3.10.0-1160.105.1.el7.x86_64 amd64
    23:52:19.459 INFO  Concordance - Java runtime: Java HotSpot(TM) 64-Bit Server VM v20.0.1+9-29
    23:52:19.459 INFO  Concordance - Start Date/Time: June 12, 2024, 11:52:17 PM EDT
    23:52:19.459 INFO  Concordance - ------------------------------------------------------------
    23:52:19.460 INFO  Concordance - ------------------------------------------------------------
    23:52:19.461 INFO  Concordance - HTSJDK Version: 4.1.0
    23:52:19.461 INFO  Concordance - Picard Version: 3.1.1
    23:52:19.461 INFO  Concordance - Built for Spark Version: 3.5.0
    23:52:19.462 INFO  Concordance - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    23:52:19.462 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    23:52:19.463 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    23:52:19.463 INFO  Concordance - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    23:52:19.463 INFO  Concordance - Deflater: IntelDeflater
    23:52:19.464 INFO  Concordance - Inflater: IntelInflater
    23:52:19.464 INFO  Concordance - GCS max retries/reopens: 20

    23:52:19.464 INFO  Concordance - Requester pays: disabled
    23:52:19.465 INFO  Concordance - Initializing engine
    23:52:19.473 INFO  Concordance - Shutting down engine
    [June 12, 2024, 11:52:19 PM EDT] org.broadinstitute.hellbender.tools.walkers.validation.Concordance done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=285212672
    ***********************************************************************

    A USER ERROR has occurred: Fasta dict file file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/../Bvos.dict for reference file:///home/juaguila/BombusMethylSeq/Rec-5/mrkdup/../Bvos.fasta does not exist. Please see https://gatk.broadinstitute.org/hc/articles/360035531652-FASTA-Reference-genome-format for help creating it.

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    Anything that I need to check what the dictionary is wrong?

    0
    Comment actions Permalink
  • Avatar
    Chrsitopher Eskiw

    Good morning,

    In response to Juan -this was the command for me that seemed to fix my problem. It worked anyways:

    /gatk/my_data# gatk CreateSequenceDictionary -R s288c_R64_1_1.fa.gz -O s288c(gatk) root@e8f7793000eb:/gatk/my_data# gatk CreateSequenceDictionary -R s288c_R64_1_1.fa.gz -O s288c_R64_genomic.dict
    Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Juan Pablo Aguilar Cabezas

    Your command line has  300G of heap space assigned to the tool however CreateSequenceDictionary tool does not need that much of a heapsize. Also there does not seem to be any success code at the end of the tool run therefore it might be better for you to reduce this heapsize to 4G and try running it again. Please let us know about the full message thrown by the tool so that we can debug what is causing this issue if it still persists. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk