Assistance Required with DetermineGermlineContigPloidy in GATK 4.6
Dear GATK Team,
I am currently using the GATK pipeline for germline copy number variant analysis and have encountered an issue during the third step, specifically with the DetermineGermlineContigPloidy tool.
I have been advised to use version 4.6 before, and I noticed a similar issue mentioned in another post but it the earlier version. You suggested adjusting the hyperparameters, but I am uncertain about which specific changes to make. I would greatly appreciate any guidance or assistance you could provide to help resolve this issue.
16:48:10.463 DEBUG ScriptExecutor - Executing:
16:48:10.463 DEBUG ScriptExecutor - python
16:48:10.463 DEBUG ScriptExecutor - /tmp/cohort_determine_ploidy_and_depth.5504500954945608088.py
16:48:10.463 DEBUG ScriptExecutor - --sample_coverage_metadata=/tmp/samples-by-coverage-per-contig6012447752332097455.tsv
16:48:10.463 DEBUG ScriptExecutor - --output_calls_path=/gpfs0/biores/CNV_Plates/Plates_9_24/ploidy_Plates924-calls
16:48:10.463 DEBUG ScriptExecutor - --mapping_error_rate=3.000000e-01
16:48:10.463 DEBUG ScriptExecutor - --psi_s_scale=1.000000e-04
16:48:10.463 DEBUG ScriptExecutor - --mean_bias_sd=1.000000e+00
16:48:10.463 DEBUG ScriptExecutor - --psi_j_scale=1.000000e-03
16:48:10.463 DEBUG ScriptExecutor - --learning_rate=5.000000e-02
16:48:10.463 DEBUG ScriptExecutor - --adamax_beta1=9.000000e-01
16:48:10.463 DEBUG ScriptExecutor - --adamax_beta2=9.990000e-01
16:48:10.463 DEBUG ScriptExecutor - --log_emission_samples_per_round=2000
16:48:10.463 DEBUG ScriptExecutor - --log_emission_sampling_rounds=100
16:48:10.463 DEBUG ScriptExecutor - --log_emission_sampling_median_rel_error=5.000000e-04
16:48:10.463 DEBUG ScriptExecutor - --max_advi_iter_first_epoch=1000
16:48:10.463 DEBUG ScriptExecutor - --max_advi_iter_subsequent_epochs=1000
16:48:10.463 DEBUG ScriptExecutor - --min_training_epochs=20
16:48:10.463 DEBUG ScriptExecutor - --max_training_epochs=100
16:48:10.463 DEBUG ScriptExecutor - --initial_temperature=2.000000e+00
16:48:10.463 DEBUG ScriptExecutor - --num_thermal_advi_iters=5000
16:48:10.463 DEBUG ScriptExecutor - --convergence_snr_averaging_window=5000
16:48:10.463 DEBUG ScriptExecutor - --convergence_snr_trigger_threshold=1.000000e-01
16:48:10.463 DEBUG ScriptExecutor - --convergence_snr_countdown_window=10
16:48:10.463 DEBUG ScriptExecutor - --max_calling_iters=1
16:48:10.463 DEBUG ScriptExecutor - --caller_update_convergence_threshold=1.000000e-03
16:48:10.463 DEBUG ScriptExecutor - --caller_internal_admixing_rate=7.500000e-01
16:48:10.463 DEBUG ScriptExecutor - --caller_external_admixing_rate=7.500000e-01
16:48:10.463 DEBUG ScriptExecutor - --disable_caller=false
16:48:10.463 DEBUG ScriptExecutor - --disable_sampler=false
16:48:10.463 DEBUG ScriptExecutor - --disable_annealing=false
16:48:10.463 DEBUG ScriptExecutor - --interval_list=/tmp/intervals3379844300857658485.tsv
16:48:10.463 DEBUG ScriptExecutor - --contig_ploidy_prior_table=/gpfs0/biores/CNV_Plates/CNV_tools/contig_ploidy_priors.tsv
16:48:10.463 DEBUG ScriptExecutor - --output_model_path=/gpfs0/biores/projects/CNV_Plates/Plates_9_24/ploidy_Plates924-model
Traceback (most recent call last):
File "/tmp/cohort_determine_ploidy_and_depth.5504500954945608088.py", line 125, in <module>
ploidy_task.engage()
File "/gpfs0/biores/users/gatk/lib/python3.6/site-packages/gcnvkernel/tasks/inference_task_base.py", line 346, in engage
converged_continuous = self._update_continuous_posteriors()
File "/gpfs0/biores/users/gatk/lib/python3.6/site-packages/gcnvkernel/tasks/inference_task_base.py", line 403, in _update_continuous_posteriors
raise ConvergenceError
gcnvkernel.tasks.inference_task_base.ConvergenceError
16:50:48.557 DEBUG ScriptExecutor - Result: 1
16:50:48.558 INFO DetermineGermlineContigPloidy - Shutting down engine
[September 21, 2024 at 4:50:48 PM IDT] org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy done. Elapsed time: 8.57 minutes.
Runtime.totalMemory()=1224736768
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 1
Command Line: python /tmp/cohort_determine_ploidy_and_depth.5504500954945608088.py --sample_coverage_metadata=/tmp/samples-by-coverage-per-contig6012447752332097455.tsv --output_calls_path=/gpfs0/biores/CNV_Plates/Plates_9_24/ploidy_Plates924-calls --mapping_error_rate=3.000000e-01 --psi_s_scale=1.000000e-04 --mean_bias_sd=1.000000e+00 --psi_j_scale=1.000000e-03 --learning_rate=5.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.990000e-01 --log_emission_samples_per_round=2000 --log_emission_sampling_rounds=100 --log_emission_sampling_median_rel_error=5.000000e-04 --max_advi_iter_first_epoch=1000 --max_advi_iter_subsequent_epochs=1000 --min_training_epochs=20 --max_training_epochs=100 --initial_temperature=2.000000e+00 --num_thermal_advi_iters=5000 --convergence_snr_averaging_window=5000 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=1 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=7.500000e-01 --disable_caller=false --disable_sampler=false --disable_annealing=false --interval_list=/tmp/intervals3379844300857658485.tsv --contig_ploidy_prior_table=/gpfs0/biores/CNV_Plates/CNV_tools/contig_ploidy_priors.tsv --output_model_path=/gpfs0/biores/CNV_Plates/ploidy_Plates924-model
at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:112)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:193)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:168)
at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:139)
at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.executeDeterminePloidyAndDepthPythonScript(DetermineGermlineContigPloidy.java:427)
at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.doWork(DetermineGermlineContigPloidy.java:324)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:166)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:209)
at org.broadinstitute.hellbender.Main.main(Main.java:306)
Using GATK jar /gpfs0/biores/users/gatk/installation_files/gatk-4.6.0.0/gatk-4.6.0.0/gatk-package-4.6.0.0-local.jar
Thank you for your support.
Best, Shai
-
Hi Shai Casif
Before any hyperparameter adjustments have you performed any QC on your samples to see if you have any outliers in terms of coverage, evenness of coverage, AT/GC dropout etc? Also are you using human samples or another organism? Finally are you working on only primary contigs or do you have chrM, and other decoy/unplaced contigs in your analysis?
Let us know about these items and we will be able to help you better.
Regards.
-
Gökalp Çelik Thank you for your reply.
1. I followed the pipline and performed QC using GATK’s
CollectReadCounts
tool to assess coverage across my samples, andAnnotateIntervals
for explicit GC correction. Could you recommend the best way to check for these specific issues using GATK or other tools?2. I am working with human samples
3. The
ploidy_priors.tsv
file I am using for step 3 includes only chr1 through chr22, as well as chrX and chrY. I came across a post suggesting that any other contigs will be skipped by the tool. Could you confirm if this is the expected behavior or if additional steps are required to handle other contigs?Any additional guidance or recommendations you can provide would be greatly appreciated.
Shai -
Hi again.
Depending on the type of study you have we recommend using CollectWgsMetrics or CollectHsMetrics tools to check mean and median depth as well as evenness of coverage so that you can eliminate outliers from your analysis.
Also, can you share your ploidy priors file with us?
-
Hello Gökalp Çelik and thanks again for the response.
This is my ploidy priors file:
CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3
chr1 0 0.01 0.98 0.01
chr2 0 0.01 0.98 0.01
chr3 0 0.01 0.98 0.01
chr4 0 0.01 0.98 0.01
chr5 0 0.01 0.98 0.01
chr6 0 0.01 0.98 0.01
chr7 0 0.01 0.98 0.01
chr8 0 0.01 0.98 0.01
chr9 0 0.01 0.98 0.01
chr10 0 0.01 0.98 0.01
chr11 0 0.01 0.98 0.01
chr12 0 0.01 0.98 0.01
chr13 0 0.01 0.98 0.01
chr14 0 0.01 0.98 0.01
chr15 0 0.01 0.98 0.01
chr16 0 0.01 0.98 0.01
chr17 0 0.01 0.98 0.01
chr18 0 0.01 0.98 0.01
chr19 0 0.01 0.98 0.01
chr20 0 0.01 0.98 0.01
chr21 0 0.01 0.98 0.01
chr22 0 0.01 0.98 0.01
chrX 0.01 0.49 0.49 0.01
chrY 0.495 0.495 0.01 0 -
Hi Shai Casif
Your ploidy priors file seems fine. Depending on how your samples are represented for the baseline these values can be adjusted.
About the other part of my question, Have you performed additional QC to see if any of your samples are outliers compared to the rest?
Regards.
Please sign in to leave a comment.
5 comments