Can't generate ploidy-calls directory and ploidy-calls/SAMPLE_0 when use DetermineGermlineContigPloidy
AnsweredIn the instruction documents(How to) Call common and rare germline copy number variants – GATK (broadinstitute.org), it should generates two directories, ploidy-calls and ploidy-model after use DetermineGermlineContigPloidy command.However I only have ploidy-model directory which contains two files,contig_ploidy_prior.tsv and interval_list.tsv. I don't know why it loses some files and because of this, I can't use PostprocessGermlineCNVCalls command to get my results.
"A USER ERROR has occurred: Bad input: Could not read the sample name text file at /CNV/gatk_gCNV/POF_igenetech/temp/baseline/baseline-calls/SAMPLE_0/sample_name.txt."
Do anyone knows what's wrong with it?
REQUIRED for all errors and issues:
a) GATK version used:4.2.1
b) Exact command used:
#I use perl to combine the input command:
my $ploidy="$outputdir/contig_ploidy.tsv";#I generated this file followed by the instruction documentation.
my $command=get_array();
sub get_array{
my %bam=get_bam($inputdir);
my @sample_array;
foreach my $sample (sort{$a cmp $b}keys %bam){
if (-e "$rd/$sample.tsv"){
my $sample_list="-I\t$rd/$sample.tsv";
push(@sample_array,$sample_list);
}
}
my $command=join"\t",@sample_array;
return $command;
say "$command\n";
}
system"gatk DetermineGermlineContigPloidy $command --interval-merging-rule OVERLAPPING_ONLY --contig-ploidy-priors $ploidy -L target.gcfiltered.interval_list -O ploidy_calls --output-prefix ploidy --verbosity DEBUG";
#I run this process in my conda enviroment.I install gatk by this command and it has gcnvkernel module.
wget https://github.com/broadinstitute/gatk/releases/download/4.2.1.0/gatk-4.2.1.0.zip
unzip gatk-4.2.1.0.zip
conda env create -n gatk -f gatkcondaenv.yml
source activate gatk
my ploidy file:
CONTIG PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
1 0.01 0.01 0.97 0.01
2 0.01 0.01 0.97 0.01
3 0.01 0.01 0.97 0.01
4 0.01 0.01 0.97 0.01
5 0.01 0.01 0.97 0.01
6 0.01 0.01 0.97 0.01
7 0.01 0.01 0.97 0.01
8 0.01 0.01 0.97 0.01
9 0.01 0.01 0.97 0.01
10 0.01 0.01 0.97 0.01
11 0.01 0.01 0.97 0.01
12 0.01 0.01 0.97 0.01
13 0.01 0.01 0.97 0.01
14 0.01 0.01 0.97 0.01
15 0.01 0.01 0.97 0.01
16 0.01 0.01 0.97 0.01
17 0.01 0.01 0.97 0.01
18 0.01 0.01 0.97 0.01
19 0.01 0.01 0.97 0.01
20 0.01 0.01 0.97 0.01
21 0.01 0.01 0.97 0.01
22 0.01 0.01 0.97 0.01
X 0.01 0.49 0.49 0.01
Y 0.50 0.50 0.00 0.00
I have tried to edit the CONTIG_NAME to CONTIG but it didn't work.
c) Entire program log:
20:06:19.183 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/gatk-4.2.1.0/gatk-package-4.2.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Mar 17, 2023 8:06:19 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
20:06:19.360 INFO DetermineGermlineContigPloidy - ------------------------------------------------------------
20:06:19.360 INFO DetermineGermlineContigPloidy - The Genome Analysis Toolkit (GATK) v4.2.1.0
20:06:19.361 INFO DetermineGermlineContigPloidy - For support and documentation go to https://software.broadinstitute.org/gatk/
20:06:19.361 INFO DetermineGermlineContigPloidy - Executing as liushuang@ZhangLab on Linux v3.10.0-957.27.2.el7.x86_64 amd64
20:06:19.361 INFO DetermineGermlineContigPloidy - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_332-b09
20:06:19.361 INFO DetermineGermlineContigPloidy - Start Date/Time: March 17, 2023 8:06:19 PM CST
20:06:19.361 INFO DetermineGermlineContigPloidy - ------------------------------------------------------------
20:06:19.361 INFO DetermineGermlineContigPloidy - ------------------------------------------------------------
20:06:19.361 INFO DetermineGermlineContigPloidy - HTSJDK Version: 2.24.1
20:06:19.361 INFO DetermineGermlineContigPloidy - Picard Version: 2.25.4
20:06:19.362 INFO DetermineGermlineContigPloidy - Built for Spark Version: 2.4.5
20:06:19.362 INFO DetermineGermlineContigPloidy - HTSJDK Defaults.COMPRESSION_LEVEL : 2
20:06:19.362 INFO DetermineGermlineContigPloidy - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
20:06:19.362 INFO DetermineGermlineContigPloidy - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
20:06:19.362 INFO DetermineGermlineContigPloidy - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
20:06:19.362 INFO DetermineGermlineContigPloidy - Deflater: IntelDeflater
20:06:19.362 INFO DetermineGermlineContigPloidy - Inflater: IntelInflater
20:06:19.362 INFO DetermineGermlineContigPloidy - GCS max retries/reopens: 20
20:06:19.362 INFO DetermineGermlineContigPloidy - Requester pays: disabled
20:06:19.362 INFO DetermineGermlineContigPloidy - Initializing engine
20:06:23.778 INFO DetermineGermlineContigPloidy - Done initializing engine
20:06:23.798 INFO DetermineGermlineContigPloidy - No contig-ploidy model was provided, running in cohort mode...
20:06:23.798 INFO DetermineGermlineContigPloidy - Intervals specified...
20:06:24.513 INFO FeatureManager - Using codec IntervalListCodec to read file file:///CNV/gatk_gCNV/POF_igenetech/temp/target.gcfiltered.interval_list
20:06:25.636 INFO IntervalArgumentCollection - Processing 199136917 bp from intervals
20:06:25.897 INFO DetermineGermlineContigPloidy - Validating and aggregating coverage per contig from input read-count files...
20:06:25.971 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1002.tsv (1 / 1000)
20:06:26.338 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1003.tsv (2 / 1000)
20:06:26.707 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1004.tsv (3 / 1000)
20:06:27.120 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1005.tsv (4 / 1000)
20:06:27.577 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1007.tsv (5 / 1000)
20:06:28.119 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1008.tsv (6 / 1000)
20:06:28.583 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1009.tsv (7 / 1000)
20:06:28.928 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1010.tsv (8 / 1000)
20:06:29.339 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1011.tsv (9 / 1000)
20:06:29.682 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1013.tsv (10 / 1000)
20:06:30.065 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1014.tsv (11 / 1000)
20:06:30.463 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1015.tsv (12 / 1000)
20:06:30.871 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1017.tsv (13 / 1000)
20:06:31.281 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1019.tsv (14 / 1000)
20:06:31.678 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1023.tsv (15 / 1000)
20:06:32.004 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1025.tsv (16 / 1000)
20:06:32.379 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1027.tsv (17 / 1000)
20:06:32.766 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1028.tsv (18 / 1000)
20:06:33.252 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1029.tsv (19 / 1000)
20:06:33.712 INFO DetermineGermlineContigPloidy - Aggregating read-count file /CNV/gatk_gCNV/POF_igenetech/rd/POF1030.tsv (20 / 1000)
...complete 1000 samples
20:12:00.838 INFO DetermineGermlineContigPloidy - Shutting down engine
[March 17, 2023 8:12:00 PM CST] org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy done. Elapsed time: 5.69 minutes.
Runtime.totalMemory()=1930952704
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 1
Command Line: python /tmp/cohort_determine_ploidy_and_depth.6069775880962909508.py --sample_coverage_metadata=/tmp/samples-by-coverage-per-contig2810132599048685450.tsv --output_calls_path=/CNV/gatk_gCNV/POF_igenetech/temp/ploidy_calls/ploidy-calls --mapping_error_rate=1.000000e-02 --psi_s_scale=1.000000e-04 --mean_bias_sd=1.000000e-02 --psi_j_scale=1.000000e-03 --learning_rate=5.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.990000e-01 --log_emission_samples_per_round=2000 --log_emission_sampling_rounds=100 --log_emission_sampling_median_rel_error=5.000000e-04 --max_advi_iter_first_epoch=1000 --max_advi_iter_subsequent_epochs=1000 --min_training_epochs=20 --max_training_epochs=100 --initial_temperature=2.000000e+00 --num_thermal_advi_iters=5000 --convergence_snr_averaging_window=5000 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=1 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=7.500000e-01 --disable_caller=false --disable_sampler=false --disable_annealing=false --interval_list=/tmp/intervals766529649720358643.tsv --contig_ploidy_prior_table=/CNV/gatk_gCNV/POF_igenetech/contig_ploidy.tsv --output_model_path=/CNV/gatk_gCNV/POF_igenetech/temp/ploidy_calls/ploidy-model
Stdout: 20:12:00.475 INFO cohort_determine_ploidy_and_depth - THEANO_FLAGS environment variable has been set to: device=cpu,floatX=float64,optimizer=fast_run,compute_test_value=ignore,openmp=true,blas.ldflags=-lmkl_rt,openmp_elemwise_minsize=10
Stderr: Traceback (most recent call last):
File "/tmp/cohort_determine_ploidy_and_depth.6069775880962909508.py", line 101, in <module>
args.contig_ploidy_prior_table)
File "/home/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_ploidy.py", line 188, in get_contig_ploidy_prior_map_from_tsv_file
delimiter=delimiter)
File "/home/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_commons.py", line 57, in read_csv
assert_mandatory_columns(dtypes_dict_keys_set, found_columns_set, input_file)
File "/home/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_commons.py", line 599, in assert_mandatory_columns
"cannot continue: {1}".format(input_tsv_file, not_found_set)
AssertionError: The following mandatory columns could not be found in "/CNV/gatk_gCNV/POF_igenetech/contig_ploidy.tsv"; cannot continue: {'CONTIG_NAME'}
at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:112)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:193)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:168)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:139)
at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.executeDeterminePloidyAndDepthPythonScript(DetermineGermlineContigPloidy.java:424)
at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.doWork(DetermineGermlineContigPloidy.java:321)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
-
Hi Chipmunks, can you double check the formatting of your contig-ploidy priors table file? For instance, please ensure that it is tab separated. The first column header should also be `CONTIG_NAME` rather than `CONTIG` (I see that you "tried to edit the CONTIG_NAME to CONTIG", but note that `CONTIG_NAME` is indeed correct).
You might find other helpful discussion if you search the forum, e.g. https://gatk.broadinstitute.org/hc/en-us/community/posts/360074399831-What-is-contig-ploidy-priors-table-and-how-to-make-it- Hope this helps! -
Hi ,Samuel Lee, thanks for your suggestions.
You are right, my contig file is not separated by tab.I change my blanks to tabs and it works!
Thank you very much!
Please sign in to leave a comment.
2 comments