Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Import GVCFs using GenomicsDBImport one chromosome at a time and parallel the jobs encounter a Duplicate Sample Name Error

1

8 comments

  • Avatar
    Charles H. Langley

    Hello: 

    I just posted and a cryptic message flitted by complaining about the size perhaps - I could not read it before it disappeared.  So I'll repost with some abridging to keep the size down.

    We have run into the GenomicsDBImport error: "org.genomicsdb.exception.GenomicsDBException: Duplicate sample name found" on our attempts to update DBs.  Below is (first) a command, followed by the output.

    Any help will be appreciated.

    Cheers,

    Chuck

     

    ==========  command  ==============

    module load gatk/4.1.6.0

    gatk --java-options "-Xmx16g -Xms16g" GenomicsDBImport \
    --batch-size 24 \
    --reader-threads 12 \
    --genomicsdb-update-workspace-path /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1 \
    --intervals chr1:118739963-147510543 \
    --verbosity DEBUG \
    -V /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz

    ============================

    =======  output    =============

    | phase2_CPRs @ rooted3 (chuck)
    | => gatk --java-options "-Xmx16g -Xms16g" GenomicsDBImport \
    | => --batch-size 24 \
    | => --reader-threads 12 \
    | => --genomicsdb-update-workspace-path /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1 \
    | => --intervals chr1:118739963-147510543 \
    | => --verbosity DEBUG \
    | => -V /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz
    Using GATK jar /afs/genomecenter.ucdavis.edu/software/gatk/4.1.6.0/lssc0-linux/gatk-package-4.1.6.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx16g -Xms16g -jar /afs/genomecenter.ucdavis.edu/software/gatk/4.1.6.0/lssc0-linux/gatk-package-4.1.6.0-local.jar GenomicsDBImport --batch-size 24 --reader-threads 12 --genomicsdb-update-workspace-path /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1 --intervals chr1:118739963-147510543 --verbosity DEBUG -V /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz
    16:16:35.954 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.1.6.0/lssc0-linux/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    16:16:36.003 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/libgkl_compression5245166187604030095.so
    Aug 28, 2020 4:16:36 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    16:16:36.284 INFO  GenomicsDBImport - ------------------------------------------------------------
    16:16:36.285 INFO  GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.6.0
    16:16:36.285 INFO  GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    16:16:36.285 INFO  GenomicsDBImport - Executing as chuck@rooted3 on Linux v4.15.0-66-generic amd64
    16:16:36.285 INFO  GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-8u265-b01-0ubuntu2~16.04-b01
    16:16:36.286 INFO  GenomicsDBImport - Start Date/Time: August 28, 2020 4:16:35 PM PDT
    16:16:36.286 INFO  GenomicsDBImport - ------------------------------------------------------------
    16:16:36.286 INFO  GenomicsDBImport - ------------------------------------------------------------
    16:16:36.287 INFO  GenomicsDBImport - HTSJDK Version: 2.21.2
    16:16:36.287 INFO  GenomicsDBImport - Picard Version: 2.21.9
    16:16:36.289 INFO  GenomicsDBImport - HTSJDK Defaults.BUFFER_SIZE : 131072
    16:16:36.289 INFO  GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    16:16:36.289 INFO  GenomicsDBImport - HTSJDK Defaults.CREATE_INDEX : false
    16:16:36.289 INFO  GenomicsDBImport - HTSJDK Defaults.CREATE_MD5 : false
    16:16:36.289 INFO  GenomicsDBImport - HTSJDK Defaults.CUSTOM_READER_FACTORY :
    16:16:36.289 INFO  GenomicsDBImport - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
    16:16:36.289 INFO  GenomicsDBImport - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
    16:16:36.289 INFO  GenomicsDBImport - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
    16:16:36.290 INFO  GenomicsDBImport - HTSJDK Defaults.REFERENCE_FASTA : null
    16:16:36.290 INFO  GenomicsDBImport - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
    16:16:36.290 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    16:16:36.290 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    16:16:36.290 INFO  GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    16:16:36.290 INFO  GenomicsDBImport - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
    16:16:36.290 DEBUG ConfigFactory - Configuration file values:
    16:16:36.295 DEBUG ConfigFactory - gcsMaxRetries = 20
    16:16:36.295 DEBUG ConfigFactory - gcsProjectForRequesterPays =
    16:16:36.295 DEBUG ConfigFactory - gatk_stacktrace_on_user_exception = false
    16:16:36.296 DEBUG ConfigFactory - samjdk.use_async_io_read_samtools = false
    16:16:36.296 DEBUG ConfigFactory - samjdk.use_async_io_write_samtools = true
    16:16:36.296 DEBUG ConfigFactory - samjdk.use_async_io_write_tribble = false
    16:16:36.296 DEBUG ConfigFactory - samjdk.compression_level = 2
    16:16:36.296 DEBUG ConfigFactory - spark.kryoserializer.buffer.max = 512m
    16:16:36.296 DEBUG ConfigFactory - spark.driver.maxResultSize = 0
    16:16:36.296 DEBUG ConfigFactory - spark.driver.userClassPathFirst = true
    16:16:36.296 DEBUG ConfigFactory - spark.io.compression.codec = lzf
    16:16:36.296 DEBUG ConfigFactory - spark.executor.memoryOverhead = 600
    16:16:36.297 DEBUG ConfigFactory - spark.driver.extraJavaOptions =
    16:16:36.297 DEBUG ConfigFactory - spark.executor.extraJavaOptions =
    16:16:36.297 DEBUG ConfigFactory - codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
    16:16:36.297 DEBUG ConfigFactory - read_filter_packages = [org.broadinstitute.hellbender.engine.filters]
    16:16:36.297 DEBUG ConfigFactory - annotation_packages = [org.broadinstitute.hellbender.tools.walkers.annotator]
    16:16:36.297 DEBUG ConfigFactory - cloudPrefetchBuffer = 40
    16:16:36.297 DEBUG ConfigFactory - cloudIndexPrefetchBuffer = -1
    16:16:36.297 DEBUG ConfigFactory - createOutputBamIndex = true
    16:16:36.298 INFO  GenomicsDBImport - Deflater: IntelDeflater
    16:16:36.298 INFO  GenomicsDBImport - Inflater: IntelInflater
    16:16:36.298 INFO  GenomicsDBImport - GCS max retries/reopens: 20
    16:16:36.298 INFO  GenomicsDBImport - Requester pays: disabled
    16:16:36.298 INFO  GenomicsDBImport - Initializing engine
    16:16:36.523 WARN  GenomicsDBImport - genomicsdb-update-workspace-path was set, so ignoring specified intervals.The tool will use the intervals specified by the initial import
    16:16:37.372 DEBUG GenomeLocParser - Prepared reference sequence contig dictionary
    16:16:37.372 DEBUG GenomeLocParser -  chr1 (248956422 bp)
    16:16:37.373 DEBUG GenomeLocParser -  chr2 (242193529 bp)
    16:16:37.373 DEBUG GenomeLocParser -  chr3 (198295559 bp)
    16:16:37.373 DEBUG GenomeLocParser -  chr4 (190214555 bp)
    16:16:37.373 DEBUG GenomeLocParser -  chr5 (181538259 bp)
    16:16:37.373 DEBUG GenomeLocParser -  chr6 (170805979 bp)
    16:16:37.373 DEBUG GenomeLocParser -  chr7 (159345973 bp)
    16:16:37.374 DEBUG GenomeLocParser -  chr8 (145138636 bp)
    16:16:37.374 DEBUG GenomeLocParser -  chr9 (138394717 bp)
    16:16:37.374 DEBUG GenomeLocParser -  chr10 (133797422 bp)
    16:16:37.374 DEBUG GenomeLocParser -  chr11 (135086622 bp)
    16:16:37.374 DEBUG GenomeLocParser -  chr12 (133275309 bp)
    16:16:37.374 DEBUG GenomeLocParser -  chr13 (114364328 bp)
    16:16:37.375 DEBUG GenomeLocParser -  chr14 (107043718 bp)
    16:16:37.375 DEBUG GenomeLocParser -  chr15 (101991189 bp)
    16:16:37.375 DEBUG GenomeLocParser -  chr16 (90338345 bp)
    16:16:37.375 DEBUG GenomeLocParser -  chr17 (83257441 bp)
    16:16:37.376 DEBUG GenomeLocParser -  chr18 (80373285 bp)
    16:16:37.376 DEBUG GenomeLocParser -  chr19 (58617616 bp)
    16:16:37.376 DEBUG GenomeLocParser -  chr20 (64444167 bp)
    16:16:37.376 DEBUG GenomeLocParser -  chr21 (46709983 bp)
    16:16:37.376 DEBUG GenomeLocParser -  chr22 (50818468 bp)
    16:16:37.377 DEBUG GenomeLocParser -  chrX (156040895 bp)
    16:16:37.377 DEBUG GenomeLocParser -  chrY (57227415 bp)
    16:16:37.377 DEBUG GenomeLocParser -  chrM (16569 bp)
    16:16:37.377 DEBUG GenomeLocParser -  chr1_KI270706v1_random (175055 bp)
    16:16:37.377 DEBUG GenomeLocParser -  chr1_KI270707v1_random (32032 bp)
    16:16:37.377 DEBUG GenomeLocParser -  chr1_KI270708v1_random (127682 bp)
    16:16:37.378 DEBUG GenomeLocParser -  chr1_KI270709v1_random (66860 bp)
    16:16:37.378 DEBUG GenomeLocParser -  chr1_KI270710v1_random (40176 bp)
    16:16:37.378 DEBUG GenomeLocParser -  chr1_KI270711v1_random (42210 bp)
    16:16:37.378 DEBUG GenomeLocParser -  chr1_KI270712v1_random (176043 bp)
    16:16:37.379 DEBUG GenomeLocParser -  chr1_KI270713v1_random (40745 bp)
    16:16:37.379 DEBUG GenomeLocParser -  chr1_KI270714v1_random (41717 bp)
    16:16:37.379 DEBUG GenomeLocParser -  chr2_KI270715v1_random (161471 bp)
    16:16:37.379 DEBUG GenomeLocParser -  chr2_KI270716v1_random (153799 bp)
    16:16:37.379 DEBUG GenomeLocParser -  chr3_GL000221v1_random (155397 bp)
    16:16:37.379 DEBUG GenomeLocParser -  chr4_GL000008v2_random (209709 bp)
    16:16:37.380 DEBUG GenomeLocParser -  chr5_GL000208v1_random (92689 bp)
    16:16:37.380 DEBUG GenomeLocParser -  chr9_KI270717v1_random (40062 bp)
    16:16:37.380 DEBUG GenomeLocParser -  chr9_KI270718v1_random (38054 bp)
    16:16:37.380 DEBUG GenomeLocParser -  chr9_KI270719v1_random (176845 bp)
    16:16:37.380 DEBUG GenomeLocParser -  chr9_KI270720v1_random (39050 bp)
    16:16:37.380 DEBUG GenomeLocParser -  chr11_KI270721v1_random (100316 bp)
    16:16:37.381 DEBUG GenomeLocParser -  chr14_GL000009v2_random (201709 bp)
    16:16:37.381 DEBUG GenomeLocParser -  chr14_GL000225v1_random (211173 bp)
    16:16:37.381 DEBUG GenomeLocParser -  chr14_KI270722v1_random (194050 bp)
    16:16:37.381 DEBUG GenomeLocParser -  chr14_GL000194v1_random (191469 bp)
    16:16:37.381 DEBUG GenomeLocParser -  chr14_KI270723v1_random (38115 bp)
    16:16:37.381 DEBUG GenomeLocParser -  chr14_KI270724v1_random (39555 bp)
    16:16:37.382 DEBUG GenomeLocParser -  chr14_KI270725v1_random (172810 bp)
    16:16:37.382 DEBUG GenomeLocParser -  chr14_KI270726v1_random (43739 bp)
    16:16:37.382 DEBUG GenomeLocParser -  chr15_KI270727v1_random (448248 bp)
    16:16:37.382 DEBUG GenomeLocParser -  chr16_KI270728v1_random (1872759 bp)
    16:16:37.382 DEBUG GenomeLocParser -  chr17_GL000205v2_random (185591 bp)
    16:16:37.382 DEBUG GenomeLocParser -  chr17_KI270729v1_random (280839 bp)
    16:16:37.383 DEBUG GenomeLocParser -  chr17_KI270730v1_random (112551 bp)
    16:16:37.383 DEBUG GenomeLocParser -  chr22_KI270731v1_random (150754 bp)
    16:16:37.383 DEBUG GenomeLocParser -  chr22_KI270732v1_random (41543 bp)
    16:16:37.383 DEBUG GenomeLocParser -  chr22_KI270733v1_random (179772 bp)
    16:16:37.383 DEBUG GenomeLocParser -  chr22_KI270734v1_random (165050 bp)
    16:16:37.384 DEBUG GenomeLocParser -  chr22_KI270735v1_random (42811 bp)
    16:16:37.384 DEBUG GenomeLocParser -  chr22_KI270736v1_random (181920 bp)
    16:16:37.384 DEBUG GenomeLocParser -  chr22_KI270737v1_random (103838 bp)
    16:16:37.384 DEBUG GenomeLocParser -  chr22_KI270738v1_random (99375 bp)
    16:16:37.384 DEBUG GenomeLocParser -  chr22_KI270739v1_random (73985 bp)
    16:16:37.385 DEBUG GenomeLocParser -  chrY_KI270740v1_random (37240 bp)
    16:16:37.385 DEBUG GenomeLocParser -  chrUn_KI270302v1 (2274 bp)
    16:16:37.385 DEBUG GenomeLocParser -  chrUn_KI270304v1 (2165 bp)
    16:16:37.385 DEBUG GenomeLocParser -  chrUn_KI270303v1 (1942 bp)
    16:16:37.385 DEBUG GenomeLocParser -  chrUn_KI270305v1 (1472 bp)
    16:16:37.385 DEBUG GenomeLocParser -  chrUn_KI270322v1 (21476 bp)
    16:16:37.386 DEBUG GenomeLocParser -  chrUn_KI270320v1 (4416 bp)
    16:16:37.386 DEBUG GenomeLocParser -  chrUn_KI270310v1 (1201 bp)
    16:16:37.386 DEBUG GenomeLocParser -  chrUn_KI270316v1 (1444 bp)
    16:16:37.386 DEBUG GenomeLocParser -  chrUn_KI270315v1 (2276 bp)
    16:16:37.386 DEBUG GenomeLocParser -  chrUn_KI270312v1 (998 bp)
    16:16:37.387 DEBUG GenomeLocParser -  chrUn_KI270311v1 (12399 bp)
    16:16:37.387 DEBUG GenomeLocParser -  chrUn_KI270317v1 (37690 bp)
    ... many reference scaffolds ...
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*04:02 (6210 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*05:01:01:01 (5806 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*05:01:01:02 (6529 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*05:03 (6121 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*05:05:01:01 (6593 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*05:05:01:02 (6597 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*05:05:01:03 (6393 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*05:11 (6589 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQA1*06:01:01 (5878 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*02:01:01 (7480 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*02:02:01 (7471 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*03:01:01:01 (7231 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*03:01:01:02 (7230 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*03:01:01:03 (7231 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*03:02:01 (7126 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*03:03:02:01 (7126 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*03:03:02:02 (7126 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*03:03:02:03 (6800 bp)
    16:16:37.515 DEBUG GenomeLocParser -  HLA-DQB1*03:05:01 (6934 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DQB1*05:01:01:01 (7090 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DQB1*05:01:01:02 (7090 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DQB1*05:03:01:01 (7089 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DQB1*05:03:01:02 (7089 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DQB1*06:01:01 (7111 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DQB1*06:02:01 (7102 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DQB1*06:03:01 (7103 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DQB1*06:09:01 (7102 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DRB1*01:01:01 (10741 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DRB1*01:02:01 (11229 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DRB1*03:01:01:01 (13908 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DRB1*03:01:01:02 (13426 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DRB1*04:03:01 (15246 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DRB1*07:01:01:01 (16110 bp)
    16:16:37.516 DEBUG GenomeLocParser -  HLA-DRB1*07:01:01:02 (16120 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*08:03:02 (13562 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*09:21 (16039 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*10:01:01 (13501 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*11:01:01 (13921 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*11:01:02 (13931 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*11:04:01 (13919 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*12:01:01 (13404 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*12:17 (11260 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*13:01:01 (13935 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*13:02:01 (13941 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*14:05:01 (13933 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*14:54:01 (13936 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*15:01:01:01 (11080 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*15:01:01:02 (11571 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*15:01:01:03 (11056 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*15:01:01:04 (11056 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*15:02:01 (10313 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*15:03:01:01 (11567 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*15:03:01:02 (11569 bp)
    16:16:37.524 DEBUG GenomeLocParser -  HLA-DRB1*16:02:01 (11005 bp)
    16:16:37.546 INFO  IntervalArgumentCollection - Processing 28770581 bp from intervals
    16:16:37.548 INFO  GenomicsDBImport - Done initializing engine
    16:16:37.548 INFO  GenomicsDBImport - Callset Map JSON file will be re-written to /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1/callset.json
    16:16:37.548 INFO  GenomicsDBImport - Incrementally importing to array - /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1/genomicsdb_array
    16:16:37.549 INFO  ProgressMeter - Starting traversal
    16:16:37.550 INFO  ProgressMeter -        Current Locus  Elapsed Minutes     Batches Processed   Batches/Minute
    16:16:38.061 INFO  GenomicsDBImport - Shutting down engine
    [August 28, 2020 4:16:38 PM PDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.04 minutes.
    Runtime.totalMemory()=16464216064
    org.genomicsdb.exception.GenomicsDBException: Duplicate sample name found: SSC00007. Sample was originally in /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz
    at org.genomicsdb.importer.extensions.CallSetMapExtensions.checkDuplicateCallsetsForIncrementalImport(CallSetMapExtensions.java:270)
    at org.genomicsdb.importer.extensions.CallSetMapExtensions.mergeCallsetsForIncrementalImport(CallSetMapExtensions.java:241)
    at org.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:222)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:743)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)
    ________________________________________________________________________________

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Yangyxt and Charles H. Langley, it looks like this issue is from having multiple VCFs with the same sample name, which would be occurring in the sample name map. You can use this command to check and see if a sample name appears twice in your map file:

    cat sample_name_map_file | awk '{ print $1; }' | sort | uniq -c

    0
    Comment actions Permalink
  • Avatar
    Charles H. Langley

    Thanks for the quick response.

    This is not likely to be our issue, since we are not now using the sample_name_map_file option.

    We have a large DB created in a single run.  Now we are trying to add new vcfs.  But each and all seem to elicit the same error.  

    Could our problem be address by the recent update 4.1.8 ?

    thanks for the help.

    Cheers,

    Chuck

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Charles H. Langley, could you then open a different post so that we can look into your issue separately?

    0
    Comment actions Permalink
  • Avatar
    Charles H. Langley

    OK

    0
    Comment actions Permalink
  • Avatar
    Yangyxt

    Dear Brandt,

    I also checked and confirmed that the sample map does not have multiple VCFs with the same sample name. 

    I wonder whether it's available to import to genomicsDB in a per-chromosome way. 

    Imagine I use this sample map:

    A    A.vcf.gz

    B    B.vcf.gz

    C    C.vcf.gz

    I set the GenomicDBImport parameters like this:

    --tmp-dir /paedwy/disk1/yangyxt/test_tmp \
    --genomicsdb-update-workspace-path ${probe_dir}/genomicdbimport_chr${1} \
    -R ${ref_gen}/ucsc.hg19.fasta \
    --batch-size 0 \
    --sample-name-map ${gvcf}/batch_cohort.sample_map \
    --reader-threads 5 \
    --intervals chr1.      <--------import different chromsome from the same vcf files

    Use one job to only import one chromosome's variant and paralleled the jobs. Will the GenomicDB support simultaneous writing operations? Will I hit the duplicate name error when trying to import variants on chr2 while the GenomicDBImport finishes importing variants on chr1 from the same vcf file? 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yangyxt could you submit a bug report?

    Please upload or provide notes for where to obtain:

    • workspace you are updating
    • reference
    • sample map
    • interval to test
    0
    Comment actions Permalink
  • Avatar
    Melvin Lathara

    Yangyxt it may not be multiple vcfs within the current import but is it possible that the sample you are trying to import already exists in the workspace you are updating? That is what the error is suggesting.

    Alternatively, did the update workspace process fail with a different error earlier - in that case, it is possible that the metadata related to the workspace is inconsistent. Can you search the callset.json file in the workspace for A130489?

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk