Import GVCFs using GenomicsDBImport one chromosome at a time and parallel the jobs encounter a Duplicate Sample Name Error
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.1.8.1
b) Exact command used:
time ${gatk} --java-options "-Xmx8g -Xms2g" GenomicsDBImport \
--tmp-dir /paedwy/disk1/yangyxt/test_tmp \
--genomicsdb-update-workspace-path ${probe_dir}/genomicdbimport_chr${1} \
-R ${ref_gen}/ucsc.hg19.fasta \
--batch-size 0 \
--sample-name-map ${gvcf}/batch_cohort.sample_map \
--reader-threads 5 \
--intervals chr${1}
c) Entire error log:
01:07:01.704 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/yangyxt/software/gatk-4.1.8.1/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Aug 29, 2020 1:07:01 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
01:07:02.001 INFO GenomicsDBImport - ------------------------------------------------------------
01:07:02.002 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.8.1
01:07:02.002 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
01:07:02.002 INFO GenomicsDBImport - Executing as yangyxt@paedwy01 on Linux v3.10.0-957.10.1.el7.x86_64 amd64
01:07:02.002 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v11.0.1+13-LTS
01:07:02.003 INFO GenomicsDBImport - Start Date/Time: August 29, 2020 at 1:07:01 AM HKT
01:07:02.003 INFO GenomicsDBImport - ------------------------------------------------------------
01:07:02.003 INFO GenomicsDBImport - ------------------------------------------------------------
01:07:02.004 INFO GenomicsDBImport - HTSJDK Version: 2.23.0
01:07:02.005 INFO GenomicsDBImport - Picard Version: 2.22.8
01:07:02.005 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:07:02.005 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:07:02.005 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:07:02.005 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:07:02.005 INFO GenomicsDBImport - Deflater: IntelDeflater
01:07:02.005 INFO GenomicsDBImport - Inflater: IntelInflater
01:07:02.006 INFO GenomicsDBImport - GCS max retries/reopens: 20
01:07:02.006 INFO GenomicsDBImport - Requester pays: disabled
01:07:02.006 INFO GenomicsDBImport - Initializing engine
01:07:02.331 WARN GenomicsDBImport - genomicsdb-update-workspace-path was set, so ignoring specified intervals.The tool will use the intervals specified by the initial import
01:07:02.702 INFO GenomicsDBLibLoader - GenomicsDB native library version : 1.3.0-e701905
01:07:02.868 INFO IntervalArgumentCollection - Processing 135534747 bp from intervals
01:07:02.869 INFO GenomicsDBImport - Done initializing engine
01:07:02.870 INFO GenomicsDBImport - Callset Map JSON file will be re-written to /paedwy/disk1/yangyxt/wes/healthy_bams_for_CNV/using_v6_probe/genomicdbimport_chr10/callset.json
01:07:02.870 INFO GenomicsDBImport - Incrementally importing to workspace - /paedwy/disk1/yangyxt/wes/healthy_bams_for_CNV/using_v6_probe/genomicdbimport_chr10
01:07:02.871 INFO ProgressMeter - Starting traversal
01:07:02.871 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
01:07:03.006 INFO GenomicsDBImport - Shutting down engine
[August 29, 2020 at 1:07:03 AM HKT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2147483648
org.genomicsdb.exception.GenomicsDBException: Duplicate sample name found: A130489. Sample was originally in /paedwy/disk1/yangyxt/wes/batch11_13/gvcfs/A130489.HC.g.vcf.gz
at org.genomicsdb.importer.extensions.CallSetMapExtensions.checkDuplicateCallsetsForIncrementalImport(CallSetMapExtensions.java:270)
at org.genomicsdb.importer.extensions.CallSetMapExtensions.mergeCallsetsForIncrementalImport(CallSetMapExtensions.java:241)
at org.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:252)
at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:745)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1049)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
If not an error, choose a category for your question(REQUIRED):
a)How do I (......)?
b) What does (......) mean?
c) Why do I see (......)?
d) Where do I find (......)?
e) Will (......) be in future releases?
-
Hello:
I just posted and a cryptic message flitted by complaining about the size perhaps - I could not read it before it disappeared. So I'll repost with some abridging to keep the size down.
We have run into the GenomicsDBImport error: "org.genomicsdb.exception.GenomicsDBException: Duplicate sample name found" on our attempts to update DBs. Below is (first) a command, followed by the output.
Any help will be appreciated.
Cheers,
Chuck
========== command ==============
module load gatk/4.1.6.0
gatk --java-options "-Xmx16g -Xms16g" GenomicsDBImport \
--batch-size 24 \
--reader-threads 12 \
--genomicsdb-update-workspace-path /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1 \
--intervals chr1:118739963-147510543 \
--verbosity DEBUG \
-V /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz============================
======= output =============
| phase2_CPRs @ rooted3 (chuck)| => gatk --java-options "-Xmx16g -Xms16g" GenomicsDBImport \| => --batch-size 24 \| => --reader-threads 12 \| => --genomicsdb-update-workspace-path /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1 \| => --intervals chr1:118739963-147510543 \| => --verbosity DEBUG \| => -V /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gzUsing GATK jar /afs/genomecenter.ucdavis.edu/software/gatk/4.1.6.0/lssc0-linux/gatk-package-4.1.6.0-local.jarRunning:java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx16g -Xms16g -jar /afs/genomecenter.ucdavis.edu/software/gatk/4.1.6.0/lssc0-linux/gatk-package-4.1.6.0-local.jar GenomicsDBImport --batch-size 24 --reader-threads 12 --genomicsdb-update-workspace-path /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1 --intervals chr1:118739963-147510543 --verbosity DEBUG -V /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gz16:16:35.954 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/afs/genomecenter.ucdavis.edu/software/gatk/4.1.6.0/lssc0-linux/gatk-package-4.1.6.0-local.jar!/com/intel/gkl/native/libgkl_compression.so16:16:36.003 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/libgkl_compression5245166187604030095.soAug 28, 2020 4:16:36 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngineINFO: Failed to detect whether we are running on Google Compute Engine.16:16:36.284 INFO GenomicsDBImport - ------------------------------------------------------------16:16:36.285 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.1.6.016:16:36.285 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/16:16:36.285 INFO GenomicsDBImport - Executing as chuck@rooted3 on Linux v4.15.0-66-generic amd6416:16:36.285 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-8u265-b01-0ubuntu2~16.04-b0116:16:36.286 INFO GenomicsDBImport - Start Date/Time: August 28, 2020 4:16:35 PM PDT16:16:36.286 INFO GenomicsDBImport - ------------------------------------------------------------16:16:36.286 INFO GenomicsDBImport - ------------------------------------------------------------16:16:36.287 INFO GenomicsDBImport - HTSJDK Version: 2.21.216:16:36.287 INFO GenomicsDBImport - Picard Version: 2.21.916:16:36.289 INFO GenomicsDBImport - HTSJDK Defaults.BUFFER_SIZE : 13107216:16:36.289 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 216:16:36.289 INFO GenomicsDBImport - HTSJDK Defaults.CREATE_INDEX : false16:16:36.289 INFO GenomicsDBImport - HTSJDK Defaults.CREATE_MD5 : false16:16:36.289 INFO GenomicsDBImport - HTSJDK Defaults.CUSTOM_READER_FACTORY :16:16:36.289 INFO GenomicsDBImport - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false16:16:36.289 INFO GenomicsDBImport - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s16:16:36.289 INFO GenomicsDBImport - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 13107216:16:36.290 INFO GenomicsDBImport - HTSJDK Defaults.REFERENCE_FASTA : null16:16:36.290 INFO GenomicsDBImport - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL16:16:36.290 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false16:16:36.290 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true16:16:36.290 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false16:16:36.290 INFO GenomicsDBImport - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false16:16:36.290 DEBUG ConfigFactory - Configuration file values:16:16:36.295 DEBUG ConfigFactory - gcsMaxRetries = 2016:16:36.295 DEBUG ConfigFactory - gcsProjectForRequesterPays =16:16:36.295 DEBUG ConfigFactory - gatk_stacktrace_on_user_exception = false16:16:36.296 DEBUG ConfigFactory - samjdk.use_async_io_read_samtools = false16:16:36.296 DEBUG ConfigFactory - samjdk.use_async_io_write_samtools = true16:16:36.296 DEBUG ConfigFactory - samjdk.use_async_io_write_tribble = false16:16:36.296 DEBUG ConfigFactory - samjdk.compression_level = 216:16:36.296 DEBUG ConfigFactory - spark.kryoserializer.buffer.max = 512m16:16:36.296 DEBUG ConfigFactory - spark.driver.maxResultSize = 016:16:36.296 DEBUG ConfigFactory - spark.driver.userClassPathFirst = true16:16:36.296 DEBUG ConfigFactory - spark.io.compression.codec = lzf16:16:36.296 DEBUG ConfigFactory - spark.executor.memoryOverhead = 60016:16:36.297 DEBUG ConfigFactory - spark.driver.extraJavaOptions =16:16:36.297 DEBUG ConfigFactory - spark.executor.extraJavaOptions =16:16:36.297 DEBUG ConfigFactory - codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]16:16:36.297 DEBUG ConfigFactory - read_filter_packages = [org.broadinstitute.hellbender.engine.filters]16:16:36.297 DEBUG ConfigFactory - annotation_packages = [org.broadinstitute.hellbender.tools.walkers.annotator]16:16:36.297 DEBUG ConfigFactory - cloudPrefetchBuffer = 4016:16:36.297 DEBUG ConfigFactory - cloudIndexPrefetchBuffer = -116:16:36.297 DEBUG ConfigFactory - createOutputBamIndex = true16:16:36.298 INFO GenomicsDBImport - Deflater: IntelDeflater16:16:36.298 INFO GenomicsDBImport - Inflater: IntelInflater16:16:36.298 INFO GenomicsDBImport - GCS max retries/reopens: 2016:16:36.298 INFO GenomicsDBImport - Requester pays: disabled16:16:36.298 INFO GenomicsDBImport - Initializing engine16:16:36.523 WARN GenomicsDBImport - genomicsdb-update-workspace-path was set, so ignoring specified intervals.The tool will use the intervals specified by the initial import16:16:37.372 DEBUG GenomeLocParser - Prepared reference sequence contig dictionary16:16:37.372 DEBUG GenomeLocParser - chr1 (248956422 bp)16:16:37.373 DEBUG GenomeLocParser - chr2 (242193529 bp)16:16:37.373 DEBUG GenomeLocParser - chr3 (198295559 bp)16:16:37.373 DEBUG GenomeLocParser - chr4 (190214555 bp)16:16:37.373 DEBUG GenomeLocParser - chr5 (181538259 bp)16:16:37.373 DEBUG GenomeLocParser - chr6 (170805979 bp)16:16:37.373 DEBUG GenomeLocParser - chr7 (159345973 bp)16:16:37.374 DEBUG GenomeLocParser - chr8 (145138636 bp)16:16:37.374 DEBUG GenomeLocParser - chr9 (138394717 bp)16:16:37.374 DEBUG GenomeLocParser - chr10 (133797422 bp)16:16:37.374 DEBUG GenomeLocParser - chr11 (135086622 bp)16:16:37.374 DEBUG GenomeLocParser - chr12 (133275309 bp)16:16:37.374 DEBUG GenomeLocParser - chr13 (114364328 bp)16:16:37.375 DEBUG GenomeLocParser - chr14 (107043718 bp)16:16:37.375 DEBUG GenomeLocParser - chr15 (101991189 bp)16:16:37.375 DEBUG GenomeLocParser - chr16 (90338345 bp)16:16:37.375 DEBUG GenomeLocParser - chr17 (83257441 bp)16:16:37.376 DEBUG GenomeLocParser - chr18 (80373285 bp)16:16:37.376 DEBUG GenomeLocParser - chr19 (58617616 bp)16:16:37.376 DEBUG GenomeLocParser - chr20 (64444167 bp)16:16:37.376 DEBUG GenomeLocParser - chr21 (46709983 bp)16:16:37.376 DEBUG GenomeLocParser - chr22 (50818468 bp)16:16:37.377 DEBUG GenomeLocParser - chrX (156040895 bp)16:16:37.377 DEBUG GenomeLocParser - chrY (57227415 bp)16:16:37.377 DEBUG GenomeLocParser - chrM (16569 bp)16:16:37.377 DEBUG GenomeLocParser - chr1_KI270706v1_random (175055 bp)16:16:37.377 DEBUG GenomeLocParser - chr1_KI270707v1_random (32032 bp)16:16:37.377 DEBUG GenomeLocParser - chr1_KI270708v1_random (127682 bp)16:16:37.378 DEBUG GenomeLocParser - chr1_KI270709v1_random (66860 bp)16:16:37.378 DEBUG GenomeLocParser - chr1_KI270710v1_random (40176 bp)16:16:37.378 DEBUG GenomeLocParser - chr1_KI270711v1_random (42210 bp)16:16:37.378 DEBUG GenomeLocParser - chr1_KI270712v1_random (176043 bp)16:16:37.379 DEBUG GenomeLocParser - chr1_KI270713v1_random (40745 bp)16:16:37.379 DEBUG GenomeLocParser - chr1_KI270714v1_random (41717 bp)16:16:37.379 DEBUG GenomeLocParser - chr2_KI270715v1_random (161471 bp)16:16:37.379 DEBUG GenomeLocParser - chr2_KI270716v1_random (153799 bp)16:16:37.379 DEBUG GenomeLocParser - chr3_GL000221v1_random (155397 bp)16:16:37.379 DEBUG GenomeLocParser - chr4_GL000008v2_random (209709 bp)16:16:37.380 DEBUG GenomeLocParser - chr5_GL000208v1_random (92689 bp)16:16:37.380 DEBUG GenomeLocParser - chr9_KI270717v1_random (40062 bp)16:16:37.380 DEBUG GenomeLocParser - chr9_KI270718v1_random (38054 bp)16:16:37.380 DEBUG GenomeLocParser - chr9_KI270719v1_random (176845 bp)16:16:37.380 DEBUG GenomeLocParser - chr9_KI270720v1_random (39050 bp)16:16:37.380 DEBUG GenomeLocParser - chr11_KI270721v1_random (100316 bp)16:16:37.381 DEBUG GenomeLocParser - chr14_GL000009v2_random (201709 bp)16:16:37.381 DEBUG GenomeLocParser - chr14_GL000225v1_random (211173 bp)16:16:37.381 DEBUG GenomeLocParser - chr14_KI270722v1_random (194050 bp)16:16:37.381 DEBUG GenomeLocParser - chr14_GL000194v1_random (191469 bp)16:16:37.381 DEBUG GenomeLocParser - chr14_KI270723v1_random (38115 bp)16:16:37.381 DEBUG GenomeLocParser - chr14_KI270724v1_random (39555 bp)16:16:37.382 DEBUG GenomeLocParser - chr14_KI270725v1_random (172810 bp)16:16:37.382 DEBUG GenomeLocParser - chr14_KI270726v1_random (43739 bp)16:16:37.382 DEBUG GenomeLocParser - chr15_KI270727v1_random (448248 bp)16:16:37.382 DEBUG GenomeLocParser - chr16_KI270728v1_random (1872759 bp)16:16:37.382 DEBUG GenomeLocParser - chr17_GL000205v2_random (185591 bp)16:16:37.382 DEBUG GenomeLocParser - chr17_KI270729v1_random (280839 bp)16:16:37.383 DEBUG GenomeLocParser - chr17_KI270730v1_random (112551 bp)16:16:37.383 DEBUG GenomeLocParser - chr22_KI270731v1_random (150754 bp)16:16:37.383 DEBUG GenomeLocParser - chr22_KI270732v1_random (41543 bp)16:16:37.383 DEBUG GenomeLocParser - chr22_KI270733v1_random (179772 bp)16:16:37.383 DEBUG GenomeLocParser - chr22_KI270734v1_random (165050 bp)16:16:37.384 DEBUG GenomeLocParser - chr22_KI270735v1_random (42811 bp)16:16:37.384 DEBUG GenomeLocParser - chr22_KI270736v1_random (181920 bp)16:16:37.384 DEBUG GenomeLocParser - chr22_KI270737v1_random (103838 bp)16:16:37.384 DEBUG GenomeLocParser - chr22_KI270738v1_random (99375 bp)16:16:37.384 DEBUG GenomeLocParser - chr22_KI270739v1_random (73985 bp)16:16:37.385 DEBUG GenomeLocParser - chrY_KI270740v1_random (37240 bp)16:16:37.385 DEBUG GenomeLocParser - chrUn_KI270302v1 (2274 bp)16:16:37.385 DEBUG GenomeLocParser - chrUn_KI270304v1 (2165 bp)16:16:37.385 DEBUG GenomeLocParser - chrUn_KI270303v1 (1942 bp)16:16:37.385 DEBUG GenomeLocParser - chrUn_KI270305v1 (1472 bp)16:16:37.385 DEBUG GenomeLocParser - chrUn_KI270322v1 (21476 bp)16:16:37.386 DEBUG GenomeLocParser - chrUn_KI270320v1 (4416 bp)16:16:37.386 DEBUG GenomeLocParser - chrUn_KI270310v1 (1201 bp)16:16:37.386 DEBUG GenomeLocParser - chrUn_KI270316v1 (1444 bp)16:16:37.386 DEBUG GenomeLocParser - chrUn_KI270315v1 (2276 bp)16:16:37.386 DEBUG GenomeLocParser - chrUn_KI270312v1 (998 bp)16:16:37.387 DEBUG GenomeLocParser - chrUn_KI270311v1 (12399 bp)16:16:37.387 DEBUG GenomeLocParser - chrUn_KI270317v1 (37690 bp)... many reference scaffolds ...16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*04:02 (6210 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*05:01:01:01 (5806 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*05:01:01:02 (6529 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*05:03 (6121 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*05:05:01:01 (6593 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*05:05:01:02 (6597 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*05:05:01:03 (6393 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*05:11 (6589 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQA1*06:01:01 (5878 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*02:01:01 (7480 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*02:02:01 (7471 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*03:01:01:01 (7231 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*03:01:01:02 (7230 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*03:01:01:03 (7231 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*03:02:01 (7126 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*03:03:02:01 (7126 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*03:03:02:02 (7126 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*03:03:02:03 (6800 bp)16:16:37.515 DEBUG GenomeLocParser - HLA-DQB1*03:05:01 (6934 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DQB1*05:01:01:01 (7090 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DQB1*05:01:01:02 (7090 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DQB1*05:03:01:01 (7089 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DQB1*05:03:01:02 (7089 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DQB1*06:01:01 (7111 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DQB1*06:02:01 (7102 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DQB1*06:03:01 (7103 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DQB1*06:09:01 (7102 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DRB1*01:01:01 (10741 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DRB1*01:02:01 (11229 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DRB1*03:01:01:01 (13908 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DRB1*03:01:01:02 (13426 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DRB1*04:03:01 (15246 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DRB1*07:01:01:01 (16110 bp)16:16:37.516 DEBUG GenomeLocParser - HLA-DRB1*07:01:01:02 (16120 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*08:03:02 (13562 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*09:21 (16039 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*10:01:01 (13501 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*11:01:01 (13921 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*11:01:02 (13931 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*11:04:01 (13919 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*12:01:01 (13404 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*12:17 (11260 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*13:01:01 (13935 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*13:02:01 (13941 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*14:05:01 (13933 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*14:54:01 (13936 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*15:01:01:01 (11080 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*15:01:01:02 (11571 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*15:01:01:03 (11056 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*15:01:01:04 (11056 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*15:02:01 (10313 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*15:03:01:01 (11567 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*15:03:01:02 (11569 bp)16:16:37.524 DEBUG GenomeLocParser - HLA-DRB1*16:02:01 (11005 bp)16:16:37.546 INFO IntervalArgumentCollection - Processing 28770581 bp from intervals16:16:37.548 INFO GenomicsDBImport - Done initializing engine16:16:37.548 INFO GenomicsDBImport - Callset Map JSON file will be re-written to /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1/callset.json16:16:37.548 INFO GenomicsDBImport - Incrementally importing to array - /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/CPRs_100_proto/DB_chr1/genomicsdb_array16:16:37.549 INFO ProgressMeter - Starting traversal16:16:37.550 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute16:16:38.061 INFO GenomicsDBImport - Shutting down engine[August 28, 2020 4:16:38 PM PDT] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.04 minutes.Runtime.totalMemory()=16464216064org.genomicsdb.exception.GenomicsDBException: Duplicate sample name found: SSC00007. Sample was originally in /rooted3/langley/work/home/chuck/rad/SFARI/SSC_hg38/WGS/phase2_CPRs/SSC00007_CPR/SSC00007.haplotypeCalls.CPR.er.raw.vcf.gzat org.genomicsdb.importer.extensions.CallSetMapExtensions.checkDuplicateCallsetsForIncrementalImport(CallSetMapExtensions.java:270)at org.genomicsdb.importer.extensions.CallSetMapExtensions.mergeCallsetsForIncrementalImport(CallSetMapExtensions.java:241)at org.genomicsdb.importer.GenomicsDBImporter.<init>(GenomicsDBImporter.java:222)at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.traverse(GenomicsDBImport.java:743)at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)at org.broadinstitute.hellbender.Main.main(Main.java:292)________________________________________________________________________________ -
Hi Yangyxt and Charles H. Langley, it looks like this issue is from having multiple VCFs with the same sample name, which would be occurring in the sample name map. You can use this command to check and see if a sample name appears twice in your map file:
cat sample_name_map_file | awk '{ print $1; }' | sort | uniq -c
-
Thanks for the quick response.
This is not likely to be our issue, since we are not now using the sample_name_map_file option.
We have a large DB created in a single run. Now we are trying to add new vcfs. But each and all seem to elicit the same error.
Could our problem be address by the recent update 4.1.8 ?
thanks for the help.
Cheers,
Chuck
-
Hi Charles H. Langley, could you then open a different post so that we can look into your issue separately?
-
OK
-
Dear Brandt,
I also checked and confirmed that the sample map does not have multiple VCFs with the same sample name.
I wonder whether it's available to import to genomicsDB in a per-chromosome way.
Imagine I use this sample map:
A A.vcf.gz
B B.vcf.gz
C C.vcf.gz
I set the GenomicDBImport parameters like this:
--tmp-dir /paedwy/disk1/yangyxt/test_tmp \
--genomicsdb-update-workspace-path ${probe_dir}/genomicdbimport_chr${1} \
-R ${ref_gen}/ucsc.hg19.fasta \
--batch-size 0 \
--sample-name-map ${gvcf}/batch_cohort.sample_map \
--reader-threads 5 \
--intervals chr1. <--------import different chromsome from the same vcf filesUse one job to only import one chromosome's variant and paralleled the jobs. Will the GenomicDB support simultaneous writing operations? Will I hit the duplicate name error when trying to import variants on chr2 while the GenomicDBImport finishes importing variants on chr1 from the same vcf file?
-
Yangyxt could you submit a bug report?
Please upload or provide notes for where to obtain:
- workspace you are updating
- reference
- sample map
- interval to test
-
Yangyxt it may not be multiple vcfs within the current import but is it possible that the sample you are trying to import already exists in the workspace you are updating? That is what the error is suggesting.
Alternatively, did the update workspace process fail with a different error earlier - in that case, it is possible that the metadata related to the workspace is inconsistent. Can you search the callset.json file in the workspace for A130489?
Please sign in to leave a comment.
8 comments