Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

.GenomicsDBException: Duplicate sample name found:

0

17 comments

  • Avatar
    Charles H. Langley

    We did upgrade to gatk V4.1.8.1.  

    But the same error appears, "org.genomicsdb.exception.GenomicsDBException: Duplicate sample"

    Thanks for any help.

    Cheers,

    Chuck

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Charles H. Langley are you running this in parallel? Could you explain which commands are running at the same time?

    0
    Comment actions Permalink
  • Avatar
    Charles H. Langley

    Hello Genevieve:

    "running this is parallel" ?  

    I am not sure what level of parallelization to which you refer.

    The previous commands included multithreading as in "--reader-threads 12 \"

    It also included "--batch-size 24 \".  But it did not involve mpi or sparks.

    Indeed no other gatk job was running on the system.

    ----------

    I have further stripped down the command (defaults for --reader-threads and batch-size; see below).

    Still the same error occurs.  

    We look forward to hearing further from you and your colleagues with ideas about what may be wrong here.

    Cheers,

    Chuck

    ________________________________________________________________________________
    | => gatk --java-options "-Xmx16g -Xms16g" GenomicsDBImport \
    | => --genomicsdb-update-workspace-path /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1 \
    | => --intervals chr1:118739963-147510543 \
    | => --verbosity DEBUG \
    | => -V /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz
    Using GATK jar /afs/genomecenter.ucdavis.edu/software/gatk/4.1.8.1/static/gatk-package-4.1.8.1-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx16g -Xms16g -jar /afs/genomecenter.ucdavis.edu/software/gatk/4.1.8.1/static/gatk-package-4.1.8.1-local.jar GenomicsDBImport --genomicsdb-update-workspace-path /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1 --intervals chr1:118739963-147510543 --verbosity DEBUG -V /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz
    11:12:45.223 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.1.8.1/static/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    11:12:45.275 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/libgkl_compression8725280501251879565.so
    Sep 03, 2020 11:12:45 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    11:12:45.569 INFO  GenomicsDBImport - ------------------------------------------------------------
    11:12:45.570 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.8.1
    11:12:45.570 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    11:12:45.570 INFO  GenomicsDBImport - Executing as chuck@rooted3 on Linux v4.15.0-66-generic amd64
    11:12:45.570 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-8u265-b01-0ubuntu2~16.04-b01
    11:12:45.571 INFO  GenomicsDBImport - Start Date/Time: September 3, 2020 11:12:45 AM PDT
    11:12:45.571 INFO  GenomicsDBImport - ------------------------------------------------------------
    11:12:45.571 INFO  GenomicsDBImport - ------------------------------------------------------------
    11:12:45.572 INFO  GenomicsDBImport - HTSJDK Version: 2.23.0
    11:12:45.572 INFO  GenomicsDBImport - Picard Version: 2.22.8
    11:12:45.574 INFO  GenomicsDBImport - HTSJDK Defaults.BUFFER_SIZE : 131072
    11:12:45.574 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.CREATE_INDEX : false
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.CREATE_MD5 : false
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.CUSTOM_READER_FACTORY :
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.REFERENCE_FASTA : null
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
    11:12:45.575 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    11:12:45.576 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    11:12:45.576 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    11:12:45.576 INFO  GenomicsDBImport - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
    11:12:45.576 DEBUG ConfigFactory - Configuration file values:
    11:12:45.580 DEBUG ConfigFactory - gcsMaxRetries = 20
    11:12:45.580 DEBUG ConfigFactory - gcsProjectForRequesterPays =
    11:12:45.581 DEBUG ConfigFactory - gatk_stacktrace_on_user_exception = false
    11:12:45.581 DEBUG ConfigFactory - samjdk.use_async_io_read_samtools = false
    11:12:45.581 DEBUG ConfigFactory - samjdk.use_async_io_write_samtools = true
    11:12:45.581 DEBUG ConfigFactory - samjdk.use_async_io_write_tribble = false
    11:12:45.581 DEBUG ConfigFactory - samjdk.compression_level = 2
    11:12:45.581 DEBUG ConfigFactory - spark.kryoserializer.buffer.max = 512m
    11:12:45.581 DEBUG ConfigFactory - spark.driver.maxResultSize = 0
    11:12:45.581 DEBUG ConfigFactory - spark.driver.userClassPathFirst = true
    11:12:45.581 DEBUG ConfigFactory - spark.io.compression.codec = lzf
    11:12:45.582 DEBUG ConfigFactory - spark.executor.memoryOverhead = 600
    11:12:45.582 DEBUG ConfigFactory - spark.driver.extraJavaOptions =
    11:12:45.582 DEBUG ConfigFactory - spark.executor.extraJavaOptions =
    11:12:45.582 DEBUG ConfigFactory - codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
    11:12:45.582 DEBUG ConfigFactory - read_filter_packages = [org.broadinstitute.hellbender.engine.filters]
    11:12:45.582 DEBUG ConfigFactory - annotation_packages = [org.broadinstitute.hellbender.tools.walkers.annotator]
    11:12:45.582 DEBUG ConfigFactory - cloudPrefetchBuffer = 40
    11:12:45.582 DEBUG ConfigFactory - cloudIndexPrefetchBuffer = -1
    11:12:45.582 DEBUG ConfigFactory - createOutputBamIndex = true
    11:12:45.583 INFO  GenomicsDBImport - Deflater: IntelDeflater
    11:12:45.583 INFO  GenomicsDBImport - Inflater: IntelInflater
    11:12:45.583 INFO  GenomicsDBImport - GCS max retries/reopens: 20
    11:12:45.583 INFO  GenomicsDBImport - Requester pays: disabled
    11:12:45.583 INFO  GenomicsDBImport - Initializing engine
    11:12:45.794 WARN  GenomicsDBImport - genomicsdb-update-workspace-path was set, so ignoring specified intervals.The tool will use the intervals specified by the initial import
    11:12:46.188 INFO  GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
    11:12:46.651 DEBUG GenomeLocParser - Prepared reference sequence contig dictionary
    11:12:46.651 DEBUG GenomeLocParser -  chr1 (248956422 bp)
    11:12:46.652 DEBUG GenomeLocParser -  chr2 (242193529 bp)
    11:12:46.652 DEBUG GenomeLocParser -  chr3 (198295559 bp)
    11:12:46.652 DEBUG GenomeLocParser -  chr4 (190214555 bp)
    11:12:46.652 DEBUG GenomeLocParser -  chr5 (181538259 bp)
    11:12:46.652 DEBUG GenomeLocParser -  chr6 (170805979 bp)
    11:12:46.652 DEBUG GenomeLocParser -  chr7 (159345973 bp)
    11:12:46.652 DEBUG GenomeLocParser -  chr8 (145138636 bp)
    11:12:46.653 DEBUG GenomeLocParser -  chr9 (138394717 bp)
    11:12:46.653 DEBUG GenomeLocParser -  chr10 (133797422 bp)
    11:12:46.653 DEBUG GenomeLocParser -  chr11 (135086622 bp)
    11:12:46.653 DEBUG GenomeLocParser -  chr12 (133275309 bp)
    11:12:46.653 DEBUG GenomeLocParser -  chr13 (114364328 bp)
    11:12:46.653 DEBUG GenomeLocParser -  chr14 (107043718 bp)
    11:12:46.653 DEBUG GenomeLocParser -  chr15 (101991189 bp)
    11:12:46.653 DEBUG GenomeLocParser -  chr16 (90338345 bp)
    11:12:46.654 DEBUG GenomeLocParser -  chr17 (83257441 bp)
    11:12:46.654 DEBUG GenomeLocParser -  chr18 (80373285 bp)
    11:12:46.654 DEBUG GenomeLocParser -  chr19 (58617616 bp)
    11:12:46.654 DEBUG GenomeLocParser -  chr20 (64444167 bp)
    11:12:46.654 DEBUG GenomeLocParser -  chr21 (46709983 bp)
    11:12:46.655 DEBUG GenomeLocParser -  chr22 (50818468 bp)
    11:12:46.655 DEBUG GenomeLocParser -  chrX (156040895 bp)
    11:12:46.655 DEBUG GenomeLocParser -  chrY (57227415 bp)
    11:12:46.655 DEBUG GenomeLocParser -  chrM (16569 bp)
    11:12:46.655 DEBUG GenomeLocParser -  chr1_KI270706v1_random (175055 bp)
    11:12:46.655 DEBUG GenomeLocParser -  chr1_KI270707v1_random (32032 bp)
    11:12:46.655 DEBUG GenomeLocParser -  chr1_KI270708v1_random (127682 bp)
    11:12:46.655 DEBUG GenomeLocParser -  chr1_KI270709v1_random (66860 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr1_KI270710v1_random (40176 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr1_KI270711v1_random (42210 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr1_KI270712v1_random (176043 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr1_KI270713v1_random (40745 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr1_KI270714v1_random (41717 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr2_KI270715v1_random (161471 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr2_KI270716v1_random (153799 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr3_GL000221v1_random (155397 bp)
    11:12:46.656 DEBUG GenomeLocParser -  chr4_GL000008v2_random (209709 bp)
    11:12:46.657 DEBUG GenomeLocParser -  chr5_GL000208v1_random (92689 bp)
    11:12:46.657 DEBUG GenomeLocParser -  chr9_KI270717v1_random (40062 bp)
    11:12:46.657 DEBUG GenomeLocParser -  chr9_KI270718v1_random (38054 bp)
    11:12:46.657 DEBUG GenomeLocParser -  chr9_KI270719v1_random (176845 bp)
    11:12:46.657 DEBUG GenomeLocParser -  chr9_KI270720v1_random (39050 bp)
    11:12:46.657 DEBUG GenomeLocParser -  chr11_KI270721v1_random (100316 bp)
    11:12:46.657 DEBUG GenomeLocParser -  chr14_GL000009v2_random (201709 bp)
    11:12:46.657 DEBUG GenomeLocParser -  chr14_GL000225v1_random (211173 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr14_KI270722v1_random (194050 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr14_GL000194v1_random (191469 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr14_KI270723v1_random (38115 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr14_KI270724v1_random (39555 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr14_KI270725v1_random (172810 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr14_KI270726v1_random (43739 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr15_KI270727v1_random (448248 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr16_KI270728v1_random (1872759 bp)
    11:12:46.658 DEBUG GenomeLocParser -  chr17_GL000205v2_random (185591 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr17_KI270729v1_random (280839 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr17_KI270730v1_random (112551 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr22_KI270731v1_random (150754 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr22_KI270732v1_random (41543 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr22_KI270733v1_random (179772 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr22_KI270734v1_random (165050 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr22_KI270735v1_random (42811 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr22_KI270736v1_random (181920 bp)
    11:12:46.659 DEBUG GenomeLocParser -  chr22_KI270737v1_random (103838 bp)
    11:12:46.660 DEBUG GenomeLocParser -  chr22_KI270738v1_random (99375 bp)
    11:12:46.660 DEBUG GenomeLocParser -  chr22_KI270739v1_random (73985 bp)
    11:12:46.660 DEBUG GenomeLocParser -  chrY_KI270740v1_random (37240 bp)
    11:12:46.660 DEBUG GenomeLocParser -  chrUn_KI270302v1 (2274 bp)
    11:12:46.661 DEBUG GenomeLocParser -  chrUn_KI270304v1 (2165 bp)
    11:12:46.661 DEBUG GenomeLocParser -  chrUn_KI270303v1 (1942 bp)
    ...
    lots of unassembled scaffolds and decoys
    ...
    11:12:46.793 DEBUG GenomeLocParser -  HLA-DRB1*15:03:01:02 (11569 bp)
    11:12:46.793 DEBUG GenomeLocParser -  HLA-DRB1*16:02:01 (11005 bp)
    11:12:46.812 INFO  IntervalArgumentCollection - Processing 28770581 bp from intervals
    11:12:46.814 INFO  GenomicsDBImport - Done initializing engine
    11:12:46.814 INFO  GenomicsDBImport - Callset Map JSON file will be re-written to /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1/callset.json
    11:12:46.814 INFO  GenomicsDBImport - Incrementally importing to workspace - /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1
    11:12:46.814 INFO  ProgressMeter - Starting traversal
    11:12:46.815 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
    11:12:47.254 INFO  GenomicsDBImport - Shutting down engine
    [September 3, 2020 11:12:47 AM PDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=16464216064
    org.genomicsdb.exception.GenomicsDBException: Duplicate sample name found: SSC00007. Sample was originally in /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz
    at org.genomicsdb.importer.extensions.CallSetMapExtensions.checkDuplicateCallsetsForIncrementalImport(CallSetMapExtensions.java:270)
    at org.genomicsdb.importer.extensions.CallSetMapExtensions.mergeCallsetsForIncrementalImport(CallSetMapExtensions.java:241)
    at org.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:252)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:745)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)
    ________________________________________________________________________________
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Charles H. Langley could you submit a bug report?

    Please upload or provide notes for where to obtain:

    • workspace you are updating
    • reference
    • sample
    • interval to test
    0
    Comment actions Permalink
  • Avatar
    Melvin Lathara

    Hello Charles H. Langley,

    Is it possible that this "workspace update" process failed with a different error before? If so, it's possible that some metadata within the workspace is in an inconsistent state. Can you list the contents of the workspace - I'm specifically interested in any files that end in *.json or *.inc.backup?

    0
    Comment actions Permalink
  • Avatar
    Charles H. Langley

    contents of DB_chr1/:

    -rwxrwx--- 1 sasha radusr 0 2020-04-20-14:02 __tiledb_workspace.tdb
    -rwxrwx--- 1 sasha radusr 308K 2020-04-20-14:02 vcfheader.vcf
    -rwxrwx--- 1 sasha radusr 286K 2020-04-20-14:02 vidmap.json
    drwxrwx--- 282 sasha radusr 284 2020-05-12-10:04 chr1$118739963$147510543/
    -rwxrwx--- 1 chuck chuck 825K 2020-08-28-13:38 callset.json
    -rwx------ 1 chuck chuck 19K 2020-09-03-11:12 callset.json.fragmentlist
    -rwx------ 1 chuck chuck 825K 2020-09-03-11:12 callset.json.inc.backup

     

    Shall I upload certain of these?

     

    Cheers,

    Chuck

     

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Melvin Lathara

    Not yet -- can you do a search/grep for the duplicate sample name within callset.json and callset.json.inc.backup? So something like:

    grep SSC00007 callset.json

    and

    grep SSC00007 callset.json.inc.backup

    Assuming the sample name it complained about is SSC00007

    Offhand, they look identical, which makes me wonder if the update was tried (and failed) multiple times. Also, do you have a backup of this workspace?

    0
    Comment actions Permalink
  • Avatar
    Charles H. Langley

    "SSC00007" in not in either callset.json or in callset.json.inc.backup .

    Thanks,

    Chuck

    0
    Comment actions Permalink
  • Avatar
    Melvin Lathara

    Ah - interesting. Yes, if you don't mind uploading the callset.json and callset.json.fragmentlist as part of the bug report that would be useful.

    If callset.json.inc.backup is different from callset.json, please include that as well.

    0
    Comment actions Permalink
  • Avatar
    Charles H. Langley

    Bug Report    Re:

    GenomicsDBException: Duplicate sample name found:

    CH Langley      2Sept2020

     

    Ah - interesting. Yes, if you don't mind uploading the callset.json and callset.json.fragmentlist as part of the bug report that would be useful.

     

    The command_log file and these to requested files are in

          Genomic_DBImport_bug_report.zip  at    ftp.broadinstitute.org

     

     

    If callset.json.inc.backup is different from callset.json, please include that as well.

    diff reported NO differences between that two files.

     

    Thanks for the help.

    Cheers,

    Chuck

     

    0
    Comment actions Permalink
  • Avatar
    Charles H. Langley

    One added item, the directory listing from the DB: (notice the permissions, should they matter).

    | => ll
    total 2.7M
    -rwxrwx--- 1 sasha radusr 0 2020-04-20-14:02 __tiledb_workspace.tdb
    -rwxrwx--- 1 sasha radusr 308K 2020-04-20-14:02 vcfheader.vcf
    -rwxrwx--- 1 sasha radusr 286K 2020-04-20-14:02 vidmap.json
    drwxrwx--- 282 sasha radusr 284 2020-05-12-10:04 chr1$118739963$147510543/
    -rwxrwx--- 1 chuck chuck 825K 2020-08-28-13:38 callset.json
    -rwx------ 1 chuck chuck 19K 2020-09-03-11:12 callset.json.fragmentlist
    -rwx------ 1 chuck chuck 825K 2020-09-03-11:12 callset.json.inc.backup
    __________________________________________________________________________
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Charles H. Langley thank you for uploading the file, we will work on it and let you know if we have any questions or updates.

    0
    Comment actions Permalink
  • Avatar
    Melvin Lathara

    Genevieve Brandt (she/her) thanks for passing the file along!

    Charles H. Langley -- the callset.json file you uploaded did have SSC00007 in there...any chance you looked at (or uploaded) the wrong file? Or is it possible that you searched using the letter O instead of the number zero (0)?

    In any case, the metadata for the datastore indicates it already has data for that sample. If you want to figure out which samples the metadata thinks are part of the datastore currently, you could try a command like this:

    python -m json.tool callset.json |grep sample_name|cut -d\" -f4

    Keep in mind, it is possible that the metadata might be in an inconsistent state due to workspace update failure...but it is unlikely SSC00007 is in there due to a similar failure, it is one of the earlier samples in the callset.json file you sent.

     

    0
    Comment actions Permalink
  • Avatar
    Nicholas Bailey

    Hello GATK Team,

    I am having this same issue using GATK 4.1.7.0 and found that the samples I intend to add to the database are in fact present in the callset.json file and respective backup file, but I know they have not actually been fully added to the database because most of the chromosome directories have not been updated (when viewing with something like ls -l). If I'm understanding the output correctly, it seems GenomicsDBImport ended before finishing and attempted to restart, so that would explain why these samples are only partially added. 

    Reading this thread has helped me come to the above conclusion (thank you Genevieve and Melvin) but I am still unsure how to fix the issue. Is there any way to remove a sample from the database and try again?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Nicholas Bailey,

    Unfortunately there is no way to remove the samples from the GenomicsDB workspace. This is why we recommend that users create a backup of the GenomicsDB workspace before updating.

    Here is a ticket where the developers are discussing this: https://github.com/broadinstitute/gatk/issues/6558

    Best,

    Genevieve

    1
    Comment actions Permalink
  • Avatar
    Nicholas Bailey

    Hi Genevieve,

    Thank you for getting back to me and posting that link. I figured that may not be an option. No harm, I haven't added too many samples yet and I'll be sure to back up the DB I'm generating now.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Oh good! Glad you found it out on the earlier side. Thanks for writing into the forum!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk