Cohort_denoising.py parameter issue in GermlineCNVCaller
AnsweredI am getting errors in the cohort_denoising script about unrecognized parameters that are set by the GermlineCNVCaller. How do I correct these parameters?
REQUIRED for all errors and issues:
a) GATK version used:
The Genome Analysis Toolkit (GATK) v4.2.5.0
HTSJDK Version: 2.24.1
Picard Version: 2.25.4
b) Exact command used:
gatk GermlineCNVCaller \
--run-mode COHORT \
-L grch38.preprocessed.interval_list \
-I CVG/GDN101BL.tsv -I CVG/GDN123BL.tsv -I CVG/GDN23BL.tsv -I CVG/GDN41BL.tsv -I CVG/MCDBOSE206BRWGS1.tsv -I CVG/MCDBOSE363BRWGS1.tsv \
-I CVG/GDN111BL.tsv -I CVG/GDN131BL.tsv -I CVG/GDN31BL.tsv -I CVG/GDN51BL.tsv -I CVG/MCDBOSE284BRWGS1.tsv -I CVG/MCDGG13004BR.tsv \
-I CVG/GDN121BL.tsv -I CVG/GDN21BL.tsv -I CVG/GDN32BL.tsv -I CVG/GDN71BL.tsv -I CVG/MCDBOSE297BRWGS1.tsv -I CVG/SHPCRFREEWGS.tsv \
-I CVG/GDN122BL.tsv -I CVG/GDN22BL.tsv -I CVG/GDN33BL.tsv -I CVG/GDN91BL.tsv -I CVG/MCDBOSE304BR.tsv -I CVG/UTH0002BR.tsv \
--contig-ploidy-calls ploidy-model \
--annotated-intervals grch38.annotated.tsv \
--interval-merging-rule OVERLAPPING_ONLY \
--output CNV \
--output-prefix CNV \
--verbosity DEBUG
c) Entire program log:
09:28:32.731 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/nas/longleaf/apps/gatk/4.2.5.0/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
09:28:32.768 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/libgkl_compression3479089345337976951.so
Sep 12, 2022 9:28:33 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:28:33.244 INFO GermlineCNVCaller - ------------------------------------------------------------
09:28:33.244 INFO GermlineCNVCaller - The Genome Analysis Toolkit (GATK) v4.2.5.0
09:28:33.244 INFO GermlineCNVCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
09:28:33.244 INFO GermlineCNVCaller - Executing as sguirale@c0314.ll.unc.edu on Linux v4.18.0-348.2.1.el8_5.x86_64 amd64
09:28:33.244 INFO GermlineCNVCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
09:28:33.244 INFO GermlineCNVCaller - Start Date/Time: September 12, 2022 9:28:32 AM EDT
09:28:33.244 INFO GermlineCNVCaller - ------------------------------------------------------------
09:28:33.244 INFO GermlineCNVCaller - ------------------------------------------------------------
09:28:33.245 INFO GermlineCNVCaller - HTSJDK Version: 2.24.1
09:28:33.245 INFO GermlineCNVCaller - Picard Version: 2.25.4
09:28:33.245 INFO GermlineCNVCaller - Built for Spark Version: 2.4.5
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.BUFFER_SIZE : 131072
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.CREATE_INDEX : false
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.CREATE_MD5 : false
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.CUSTOM_READER_FACTORY :
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
09:28:33.246 INFO GermlineCNVCaller - HTSJDK Defaults.REFERENCE_FASTA : null
09:28:33.247 INFO GermlineCNVCaller - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
09:28:33.247 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:28:33.247 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:28:33.247 INFO GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:28:33.247 INFO GermlineCNVCaller - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
09:28:33.247 DEBUG ConfigFactory - Configuration file values:
09:28:33.251 DEBUG ConfigFactory - gcsMaxRetries = 20
09:28:33.251 DEBUG ConfigFactory - gcsProjectForRequesterPays =
09:28:33.251 DEBUG ConfigFactory - codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
09:28:33.251 DEBUG ConfigFactory - gatk_stacktrace_on_user_exception = false
09:28:33.251 DEBUG ConfigFactory - samjdk.use_async_io_read_samtools = false
09:28:33.251 DEBUG ConfigFactory - samjdk.use_async_io_write_samtools = true
09:28:33.251 DEBUG ConfigFactory - samjdk.use_async_io_write_tribble = false
09:28:33.251 DEBUG ConfigFactory - samjdk.compression_level = 2
09:28:33.251 DEBUG ConfigFactory - spark.kryoserializer.buffer.max = 512m
09:28:33.251 DEBUG ConfigFactory - spark.driver.maxResultSize = 0
09:28:33.251 DEBUG ConfigFactory - spark.driver.userClassPathFirst = true
09:28:33.251 DEBUG ConfigFactory - spark.io.compression.codec = lzf
09:28:33.252 DEBUG ConfigFactory - spark.executor.memoryOverhead = 600
09:28:33.252 DEBUG ConfigFactory - spark.driver.extraJavaOptions =
09:28:33.252 DEBUG ConfigFactory - spark.executor.extraJavaOptions =
09:28:33.252 DEBUG ConfigFactory - read_filter_packages = [org.broadinstitute.hellbender.engine.filters]
09:28:33.253 DEBUG ConfigFactory - annotation_packages = [org.broadinstitute.hellbender.tools.walkers.annotator]
09:28:33.253 DEBUG ConfigFactory - cloudPrefetchBuffer = 40
09:28:33.253 DEBUG ConfigFactory - cloudIndexPrefetchBuffer = -1
09:28:33.253 DEBUG ConfigFactory - createOutputBamIndex = true
09:28:33.253 INFO GermlineCNVCaller - Deflater: IntelDeflater
09:28:33.253 INFO GermlineCNVCaller - Inflater: IntelInflater
09:28:33.253 INFO GermlineCNVCaller - GCS max retries/reopens: 20
09:28:33.253 INFO GermlineCNVCaller - Requester pays: disabled
09:28:33.253 INFO GermlineCNVCaller - Initializing engine
09:28:33.258 DEBUG ScriptExecutor - Executing:
09:28:33.258 DEBUG ScriptExecutor - python
09:28:33.258 DEBUG ScriptExecutor - -c
09:28:33.258 DEBUG ScriptExecutor - import gcnvkernel
09:28:51.239 DEBUG ScriptExecutor - Result: 0
09:28:51.239 INFO GermlineCNVCaller - Done initializing engine
09:28:51.273 INFO GermlineCNVCaller - Intervals specified...
09:28:54.004 DEBUG GenomeLocParser - Prepared reference sequence contig dictionary
09:28:54.004 DEBUG GenomeLocParser - chr1 (248956422 bp)
09:28:54.004 DEBUG GenomeLocParser - chr2 (242193529 bp)
09:28:54.004 DEBUG GenomeLocParser - chr3 (198295559 bp)
09:28:54.004 DEBUG GenomeLocParser - chr4 (190214555 bp)
09:28:54.016 DEBUG GenomeLocParser - chr5 (181538259 bp)
09:28:54.017 DEBUG GenomeLocParser - chr6 (170805979 bp)
09:28:54.017 DEBUG GenomeLocParser - chr7 (159345973 bp)
09:28:54.017 DEBUG GenomeLocParser - chr8 (145138636 bp)
09:28:54.017 DEBUG GenomeLocParser - chr9 (138394717 bp)
09:28:54.017 DEBUG GenomeLocParser - chr10 (133797422 bp)
09:28:54.017 DEBUG GenomeLocParser - chr11 (135086622 bp)
09:28:54.017 DEBUG GenomeLocParser - chr12 (133275309 bp)
09:28:54.017 DEBUG GenomeLocParser - chr13 (114364328 bp)
09:28:54.017 DEBUG GenomeLocParser - chr14 (107043718 bp)
09:28:54.017 DEBUG GenomeLocParser - chr15 (101991189 bp)
09:28:54.017 DEBUG GenomeLocParser - chr16 (90338345 bp)
09:28:54.017 DEBUG GenomeLocParser - chr17 (83257441 bp)
09:28:54.017 DEBUG GenomeLocParser - chr18 (80373285 bp)
09:28:54.017 DEBUG GenomeLocParser - chr19 (58617616 bp)
09:28:54.017 DEBUG GenomeLocParser - chr20 (64444167 bp)
09:28:54.017 DEBUG GenomeLocParser - chr21 (46709983 bp)
09:28:54.017 DEBUG GenomeLocParser - chr22 (50818468 bp)
09:28:54.017 DEBUG GenomeLocParser - chrX (156040895 bp)
09:28:54.018 DEBUG GenomeLocParser - chrY (57227415 bp)
09:28:54.018 DEBUG GenomeLocParser - chrM (16569 bp)
09:29:19.538 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN101BL.tsv (1 / 24)
09:29:27.046 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN123BL.tsv (2 / 24)
09:29:33.162 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN23BL.tsv (3 / 24)
09:29:43.769 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN41BL.tsv (4 / 24)
09:29:52.459 INFO GermlineCNVCaller - Aggregating read-count file CVG/MCDBOSE206BRWGS1.tsv (5 / 24)
09:30:01.431 INFO GermlineCNVCaller - Aggregating read-count file CVG/MCDBOSE363BRWGS1.tsv (6 / 24)
09:30:11.619 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN111BL.tsv (7 / 24)
09:30:21.895 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN131BL.tsv (8 / 24)
09:30:31.971 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN31BL.tsv (9 / 24)
09:30:37.719 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN51BL.tsv (10 / 24)
09:30:47.006 INFO GermlineCNVCaller - Aggregating read-count file CVG/MCDBOSE284BRWGS1.tsv (11 / 24)
09:30:56.920 INFO GermlineCNVCaller - Aggregating read-count file CVG/MCDGG13004BR.tsv (12 / 24)
09:31:06.147 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN121BL.tsv (13 / 24)
09:31:15.264 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN21BL.tsv (14 / 24)
09:31:24.545 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN32BL.tsv (15 / 24)
09:31:33.801 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN71BL.tsv (16 / 24)
09:31:44.082 INFO GermlineCNVCaller - Aggregating read-count file CVG/MCDBOSE297BRWGS1.tsv (17 / 24)
09:31:53.296 INFO GermlineCNVCaller - Aggregating read-count file CVG/SHPCRFREEWGS.tsv (18 / 24)
09:32:02.175 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN122BL.tsv (19 / 24)
09:32:12.809 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN22BL.tsv (20 / 24)
09:32:21.137 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN33BL.tsv (21 / 24)
09:32:30.558 INFO GermlineCNVCaller - Aggregating read-count file CVG/GDN91BL.tsv (22 / 24)
09:32:38.633 INFO GermlineCNVCaller - Aggregating read-count file CVG/MCDBOSE304BR.tsv (23 / 24)
09:32:48.020 INFO GermlineCNVCaller - Aggregating read-count file CVG/UTH0002BR.tsv (24 / 24)
09:32:57.076 DEBUG ScriptExecutor - Executing:
09:32:57.077 DEBUG ScriptExecutor - python
09:32:57.077 DEBUG ScriptExecutor - /tmp/cohort_denoising_calling.1524673573792793253.py
09:32:57.077 DEBUG ScriptExecutor - --ploidy_calls_path=/pine/scr/s/g/sguirale/06012022_Heinzen/07082022/ploidy-model
09:32:57.077 DEBUG ScriptExecutor - --output_calls_path=/pine/scr/s/g/sguirale/06012022_Heinzen/07082022/CNV/CNV-calls
09:32:57.077 DEBUG ScriptExecutor - --output_tracking_path=/pine/scr/s/g/sguirale/06012022_Heinzen/07082022/CNV/CNV-tracking
09:32:57.077 DEBUG ScriptExecutor - --random_seed=1984
09:32:57.077 DEBUG ScriptExecutor - --modeling_interval_list=/tmp/intervals6871353887598513013.tsv
09:32:57.077 DEBUG ScriptExecutor - --output_model_path=/pine/scr/s/g/sguirale/06012022_Heinzen/07082022/CNV/CNV-model
09:32:57.077 DEBUG ScriptExecutor - --enable_explicit_gc_bias_modeling=True
09:32:57.077 DEBUG ScriptExecutor - --read_count_tsv_files
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN101BL.rc939332260208444009.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN123BL.rc8514608064846487864.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN23BL.rc570553589287621379.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN41BL.rc8275632420839712632.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/MCDBOSE206BRWGS1.rc2448158046520843728.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/MCDBOSE363BRWGS1.rc2609004411066286088.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN111BL.rc2056931346217190821.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN131BL.rc1884122845946184972.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN31BL.rc6151636762102992540.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN51BL.rc7169375844784094007.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/MCDBOSE284BRWGS1.rc3977521076516359766.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/MCDGG13004BR.rc7878872231401778622.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN121BL.rc6850619861466530291.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN21BL.rc6145045931939917532.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN32BL.rc4615805100580127937.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN71BL.rc4012810838920452652.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/MCDBOSE297BRWGS1.rc3928254119206504253.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/SHPCRFREEWGS.rc8440016932528190545.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN122BL.rc1749868918973928278.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN22BL.rc6801229383337316226.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN33BL.rc5763783794453172115.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/GDN91BL.rc108521236918828388.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/MCDBOSE304BR.rc8307060646251751152.tsv
09:32:57.077 DEBUG ScriptExecutor - /tmp/UTH0002BR.rc2725315300290664508.tsv
09:32:57.077 DEBUG ScriptExecutor - --psi_s_scale=1.000000e-04
09:32:57.077 DEBUG ScriptExecutor - --mapping_error_rate=1.000000e-02
09:32:57.077 DEBUG ScriptExecutor - --depth_correction_tau=1.000000e+04
09:32:57.077 DEBUG ScriptExecutor - --q_c_expectation_mode=hybrid
09:32:57.077 DEBUG ScriptExecutor - --num_samples_copy_ratio_approx=200
09:32:57.077 DEBUG ScriptExecutor - --max_bias_factors=5
09:32:57.077 DEBUG ScriptExecutor - --psi_t_scale=1.000000e-03
09:32:57.077 DEBUG ScriptExecutor - --log_mean_bias_std=1.000000e-01
09:32:57.077 DEBUG ScriptExecutor - --init_ard_rel_unexplained_variance=1.000000e-01
09:32:57.077 DEBUG ScriptExecutor - --num_gc_bins=20
09:32:57.077 DEBUG ScriptExecutor - --gc_curve_sd=1.000000e+00
09:32:57.077 DEBUG ScriptExecutor - --active_class_padding_hybrid_mode=50000
09:32:57.077 DEBUG ScriptExecutor - --enable_bias_factors=True
09:32:57.077 DEBUG ScriptExecutor - --disable_bias_factors_in_active_class=False
09:32:57.077 DEBUG ScriptExecutor - --p_alt=1.000000e-06
09:32:57.077 DEBUG ScriptExecutor - --cnv_coherence_length=1.000000e+04
09:32:57.077 DEBUG ScriptExecutor - --max_copy_number=5
09:32:57.077 DEBUG ScriptExecutor - --p_active=0.010000
09:32:57.077 DEBUG ScriptExecutor - --class_coherence_length=10000.000000
09:32:57.077 DEBUG ScriptExecutor - --learning_rate=1.000000e-02
09:32:57.077 DEBUG ScriptExecutor - --adamax_beta1=9.000000e-01
09:32:57.077 DEBUG ScriptExecutor - --adamax_beta2=9.900000e-01
09:32:57.077 DEBUG ScriptExecutor - --log_emission_samples_per_round=50
09:32:57.077 DEBUG ScriptExecutor - --log_emission_sampling_rounds=10
09:32:57.077 DEBUG ScriptExecutor - --log_emission_sampling_median_rel_error=5.000000e-03
09:32:57.077 DEBUG ScriptExecutor - --max_advi_iter_first_epoch=5000
09:32:57.077 DEBUG ScriptExecutor - --max_advi_iter_subsequent_epochs=200
09:32:57.077 DEBUG ScriptExecutor - --min_training_epochs=10
09:32:57.077 DEBUG ScriptExecutor - --max_training_epochs=50
09:32:57.077 DEBUG ScriptExecutor - --initial_temperature=1.500000e+00
09:32:57.077 DEBUG ScriptExecutor - --num_thermal_advi_iters=2500
09:32:57.077 DEBUG ScriptExecutor - --convergence_snr_averaging_window=500
09:32:57.077 DEBUG ScriptExecutor - --convergence_snr_trigger_threshold=1.000000e-01
09:32:57.077 DEBUG ScriptExecutor - --convergence_snr_countdown_window=10
09:32:57.077 DEBUG ScriptExecutor - --max_calling_iters=10
09:32:57.077 DEBUG ScriptExecutor - --caller_update_convergence_threshold=1.000000e-03
09:32:57.077 DEBUG ScriptExecutor - --caller_internal_admixing_rate=7.500000e-01
09:32:57.077 DEBUG ScriptExecutor - --caller_external_admixing_rate=1.000000e+00
09:32:57.077 DEBUG ScriptExecutor - --disable_caller=false
09:32:57.077 DEBUG ScriptExecutor - --disable_sampler=false
09:32:57.077 DEBUG ScriptExecutor - --disable_annealing=false
usage: cohort_denoising_calling.1524673573792793253.py [-h]
[--console_log_level {INFO,WARNING,DEBUG}]
[--logfile_log_level {INFO,WARNING,DEBUG}]
[--logfile str]
--modeling_interval_list
str
--read_count_tsv_files
str [str ...]
--ploidy_calls_path str
--output_model_path str
--output_calls_path str
--output_tracking_path
str
[--output_opt_path str]
[--input_model_path str]
[--input_calls_path str]
[--input_opt_path str]
[--max_bias_factors int]
[--mapping_error_rate float]
[--psi_t_scale float]
[--psi_s_scale float]
[--depth_correction_tau float]
[--log_mean_bias_std float]
[--init_ard_rel_unexplained_variance float]
[--num_gc_bins int]
[--gc_curve_sd float]
[--q_c_expectation_mode {map,exact,hybrid}]
[--active_class_padding_hybrid_mode int]
[--enable_bias_factors str_to_bool]
[--enable_explicit_gc_bias_modeling str_to_bool]
[--disable_bias_factors_in_active_class str_to_bool]
[--p_alt float]
[--p_active float]
[--cnv_coherence_length float]
[--class_coherence_length float]
[--max_copy_number int]
[--num_calling_processes int]
[--learning_rate float]
[--adamax_beta1 float]
[--adamax_beta2 float]
[--log_emission_samples_per_round int]
[--log_emission_sampling_median_rel_error float]
[--log_emission_sampling_rounds int]
[--max_advi_iter_first_epoch int]
[--max_advi_iter_subsequent_epochs int]
[--min_training_epochs int]
[--max_training_epochs int]
[--initial_temperature float]
[--num_thermal_advi_iters int]
[--convergence_snr_averaging_window int]
[--convergence_snr_trigger_threshold float]
[--convergence_snr_countdown_window int]
[--max_calling_iters int]
[--caller_update_convergence_threshold float]
[--caller_internal_admixing_rate float]
[--caller_external_admixing_rate float]
[--disable_sampler str_to_bool]
[--disable_caller str_to_bool]
[--disable_annealing str_to_bool]
cohort_denoising_calling.1524673573792793253.py: error: unrecognized arguments: --random_seed=1984 --num_samples_copy_ratio_approx=200
09:33:08.137 INFO GermlineCNVCaller - Shutting down engine
[September 12, 2022 9:33:08 AM EDT] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 4.59 minutes.
Runtime.totalMemory()=10474749952
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 2
Command Line: python /tmp/cohort_denoising_calling.1524673573792793253.py --ploidy_calls_path=/pine/scr/s/g/sguirale/06012022_Heinzen/07082022/ploidy-model --output_calls_path=/pine/scr/s/g/sguirale/06012022_Heinzen/07082022/CNV/CNV-calls --output_tracking_path=/pine/scr/s/g/sguirale/06012022_Heinzen/07082022/CNV/CNV-tracking --random_seed=1984 --modeling_interval_list=/tmp/intervals6871353887598513013.tsv --output_model_path=/pine/scr/s/g/sguirale/06012022_Heinzen/07082022/CNV/CNV-model --enable_explicit_gc_bias_modeling=True --read_count_tsv_files /tmp/GDN101BL.rc939332260208444009.tsv /tmp/GDN123BL.rc8514608064846487864.tsv /tmp/GDN23BL.rc570553589287621379.tsv /tmp/GDN41BL.rc8275632420839712632.tsv /tmp/MCDBOSE206BRWGS1.rc2448158046520843728.tsv /tmp/MCDBOSE363BRWGS1.rc2609004411066286088.tsv /tmp/GDN111BL.rc2056931346217190821.tsv /tmp/GDN131BL.rc1884122845946184972.tsv /tmp/GDN31BL.rc6151636762102992540.tsv /tmp/GDN51BL.rc7169375844784094007.tsv /tmp/MCDBOSE284BRWGS1.rc3977521076516359766.tsv /tmp/MCDGG13004BR.rc7878872231401778622.tsv /tmp/GDN121BL.rc6850619861466530291.tsv /tmp/GDN21BL.rc6145045931939917532.tsv /tmp/GDN32BL.rc4615805100580127937.tsv /tmp/GDN71BL.rc4012810838920452652.tsv /tmp/MCDBOSE297BRWGS1.rc3928254119206504253.tsv /tmp/SHPCRFREEWGS.rc8440016932528190545.tsv /tmp/GDN122BL.rc1749868918973928278.tsv /tmp/GDN22BL.rc6801229383337316226.tsv /tmp/GDN33BL.rc5763783794453172115.tsv /tmp/GDN91BL.rc108521236918828388.tsv /tmp/MCDBOSE304BR.rc8307060646251751152.tsv /tmp/UTH0002BR.rc2725315300290664508.tsv --psi_s_scale=1.000000e-04 --mapping_error_rate=1.000000e-02 --depth_correction_tau=1.000000e+04 --q_c_expectation_mode=hybrid --num_samples_copy_ratio_approx=200 --max_bias_factors=5 --psi_t_scale=1.000000e-03 --log_mean_bias_std=1.000000e-01 --init_ard_rel_unexplained_variance=1.000000e-01 --num_gc_bins=20 --gc_curve_sd=1.000000e+00 --active_class_padding_hybrid_mode=50000 --enable_bias_factors=True --disable_bias_factors_in_active_class=False --p_alt=1.000000e-06 --cnv_coherence_length=1.000000e+04 --max_copy_number=5 --p_active=0.010000 --class_coherence_length=10000.000000 --learning_rate=1.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.900000e-01 --log_emission_samples_per_round=50 --log_emission_sampling_rounds=10 --log_emission_sampling_median_rel_error=5.000000e-03 --max_advi_iter_first_epoch=5000 --max_advi_iter_subsequent_epochs=200 --min_training_epochs=10 --max_training_epochs=50 --initial_temperature=1.500000e+00 --num_thermal_advi_iters=2500 --convergence_snr_averaging_window=500 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=10 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=1.000000e+00 --disable_caller=false --disable_sampler=false --disable_annealing=false
at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:351)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /nas/longleaf/apps/gatk/4.2.5.0/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /nas/longleaf/apps/gatk/4.2.5.0/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar GermlineCNVCaller --run-mode COHORT -L grch38.preprocessed.interval_list -I CVG/GDN101BL.tsv -I CVG/GDN123BL.tsv -I CVG/GDN23BL.tsv -I CVG/GDN41BL.tsv -I CVG/MCDBOSE206BRWGS1.tsv -I CVG/MCDBOSE363BRWGS1.tsv -I CVG/GDN111BL.tsv -I CVG/GDN131BL.tsv -I CVG/GDN31BL.tsv -I CVG/GDN51BL.tsv -I CVG/MCDBOSE284BRWGS1.tsv -I CVG/MCDGG13004BR.tsv -I CVG/GDN121BL.tsv -I CVG/GDN21BL.tsv -I CVG/GDN32BL.tsv -I CVG/GDN71BL.tsv -I CVG/MCDBOSE297BRWGS1.tsv -I CVG/SHPCRFREEWGS.tsv -I CVG/GDN122BL.tsv -I CVG/GDN22BL.tsv -I CVG/GDN33BL.tsv -I CVG/GDN91BL.tsv -I CVG/MCDBOSE304BR.tsv -I CVG/UTH0002BR.tsv --contig-ploidy-calls ploidy-model --annotated-intervals grch38.annotated.tsv --interval-merging-rule OVERLAPPING_ONLY --output CNV --output-prefix CNV --verbosity DEBUG
-
Hi Sayal Guirales,
Thank you for writing to the GATK forum! I hope that we can help you sort this out.
I forwarded the issue you are encountering to our developers; they have some initial thoughts on its origin. There may be a mismatch between the python environment and the GATK jar versions.
The first thing you could try is ensuring that the GATK version your Python environment is using is synced with the version of GATK that you are using overall. Both should be consistent for GermlineCNVCaller (4.2.5.0) to work.
I hope this helps! Please let me know what you find. If any other questions come up in the meantime, please do not hesitate to reach out.
Best,
Anthony -
Hi Sayal Guirales,
We haven’t heard from you in a while so we will be closing out your ticket in our system. If you still require assistance, you need only respond to this thread, and we’ll create a follow-up ticket to pick up where we left off.
Thank you again for contributing to our GATK forum!
Best,
Anthony
-
Hi Anthony,
I have updated both the GATK version and the Python environment to the latest versions and the same error persists when running GermlineCNVCaller.
-
Hi Sayal Guirales,
I’m sorry to hear that you are still having trouble! I brought this issue back to our developers, and I have some next steps to try out.
Firstly, could you please clarify what exactly you did to update your Python environment? Please provide the exact command(s) you used to switch/update Python environments.
If you haven’t already, we recommend using a Conda command .yml file to do this. We have exact instructions on how to do this on our GitHub README.md. It is easiest to find using Command/Control+F and searching for "Python Dependencies."
Please give this a try! If you are still having trouble after updating with the Conda command, please respond with the method you used to update, and we will figure out our next steps.
Best,
Anthony -
Anthony,
1. Installed GATK4 using gatk-4.2.6.1.zip from release archive (https://github.com/broadinstitute/gatk/releases)
2. Created conda environment using: conda env create -f gatkcondaenv.yml
3. Added gcnvkernel to run GermlineCNVCaller using: conda install -c bioconda gcnvkernel
Python version in gatk conda environment is 3.6.10. Error persists even with these updates.
-
Hi Sayal Guirales,
I see two issues here:
- You do not appear to be actually activating the GATK conda environment via: conda activate gatk
- You should not be installing the gcnvkernel from an external source like bioconda -- that will almost certainly mismatch the GATK version you're using! The official GATK environment comes with the gcnvkernel. You don't need to install it from a third-party source.
Hope this helps,David
-
Hi David,
Sorry for the confusion. I did activate the gatk conda environment prior to running the GermlineCNVCaller.
Also, the gatk conda environment included in the gatkcondaenv.yml file does not include the gcnvkernel. I ran the program prior to obtaining the gcnvkernel from bioconda and received this message:
"java.lang.RuntimeException: A required Python package ("gcnvkernel") could not be imported into the Python environment. This tool requires that the GATK Python environment is properly established and activated."
This led me to trying to obtain the gcnvkernel externally.
-Sayal
-
Hi Sayal Guirales,
After activating the GATK conda environment (the one in gatkcondaenv.yml), please run the command:
pip install gatkPythonPackageArchive.zip
This zip file is distributed with GATK, and contains the gcnvkernel and other Python packages that are part of GATK.
Please let me know if that resolves your issue!
David -
Hi David,
This did not solve the issue. I created a new conda environment from the gatkcondaenv.yml file. In the stdout, during the environment making, it states that gatkPythonPackageArchive.zip is used. To be certain I ran the pip install of that zip file as well. I receive again the issue of the gcnvkernel not being found or available.
-Sayal
-
Sayal Guirales Sorry you're having these issues with the conda env. As mentioned above, the original error message you referenced at the start of this thread definitely indicates that the python code you were running was not in sync with the GATK java code you were running.
From your most recent message though, it sounds like you're in a state where you get a message saying `gcvnkernel` is not available. If so, I would suggest trying to run python (just type "python" at the command prompt, from within the same activated gatk conda environment that you run gatk), and then type "import gcnvkernel" and then enter/return at the python prompt, and see if you get the same message.
Also, can you let us know what platform (OS and hardware) you're running on ?
-
Chris, this is the error I received following your steps. This is being done on a university's high performance cluster running redhat8.
Python 3.6.10 | packaged by conda-forge | (default, Apr 24 2020, 16:44:11)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gcnvkernel
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/__init__.py", line 1, in <module>
from pymc3 import __version__ as pymc3_version
File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/pymc3/__init__.py", line 5, in <module>
from .distributions import *
File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/pymc3/distributions/__init__.py", line 1, in <module>
from . import timeseries
File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/pymc3/distributions/timeseries.py", line 5, in <module>
from .continuous import get_tau_sd, Normal, Flat
File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/pymc3/distributions/continuous.py", line 12, in <module>
from scipy import stats
File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/scipy/stats/__init__.py", line 345, in <module>
from .morestats import *
File "/nas/longleaf/home/sguirale/miniconda3/envs/gatk/lib/python3.6/site-packages/scipy/stats/morestats.py", line 12, in <module>
from numpy.testing.decorators import setastest
ModuleNotFoundError: No module named 'numpy.testing.decorators' -
Ok, that looks the wrong version of numpy is present, or maybe that some underlying dependency has changed out from under us. Can you try (from within python, within the GATK conda env):
import numpy
print(numpy.__version__)
import scipy
print(scipy__version__)and let us know what versions are displayed.
-
Chris,
(gatk) python
Python 3.6.10 | packaged by conda-forge | (default, Apr 24 2020, 16:44:11)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> print(numpy.__version__)
1.19.5
>>> import scipy
>>> print(scipy.__version__)
1.0.0
>>> -
Ok, thanks. I think that illustrates the problem - you're running version 1.19.5 of numpy, which is newer than the version GATK requires (you can see in the gatkcondaenv.yml file, where 1.17.5 is specified, with a comment saying newer versions don't work):
- conda-forge::numpy=1.17.5 # do not update, this will break scipy=1.0.0
It's hard for me to speculate about why you have the wrong version. The gatk conda environment should have 1.17.5. Are you sure absolutely certain that you're running in a pure/unmodified gatk conda environment, and that nothing has been installed over it ?
-
Chris,
You are correct. Within the conda environment I do have:
- numpy 1.17.5 py36h2aa4a07_1 conda-forge
Outside of the conda environment I have numpy 1.19.5 in my normal python environment. The program seems to be using my python environment instead of the conda python environment.
I was able to get around this whole issue by using the docker image. The program was able to run successfully.
Please sign in to leave a comment.
15 comments