Error in GATK-SV joint-calling terra pipeline, 07-FilterBatchSites step
REQUIRED for all errors and issues:
a) GATK version used: GATK-SV v1.0
b) Exact command used: Terra workspace 07-FilterBatchSites
c) Entire program log:
Hello, I am currently trying to do a pilot run using the GATK-SV pipeline in cohort mode on terra. I've posted this issue on terra support forum and cross-posting here. I'm advancing through the pipeline using pre-configured settings and inputs, but encountering an error at step 07-FilterBatchSites.
The error message is:
Adjudicating BAF (1)...
Traceback (most recent call last):
File "/opt/conda/envs/gatk-sv/bin/svtk", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/opt/svtk/scripts/svtk", line 65, in <module>
main()
File "/opt/svtk/scripts/svtk", line 62, in main
getattr(cli, command)(sys.argv[2:])
File "/opt/svtk/svtk/cli/adjudicate.py", line 33, in main
scores, cutoffs = adjudicate_SV(metrics)
File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 342, in adjudicate_SV
cutoffs[0] = adjudicate_BAF1(metrics)
File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 67, in adjudicate_BAF1
cutoffs = adjudicate_BAF(
File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 34, in adjudicate_BAF
del_cutoffs = rf_classify(metrics, trainable, testable, features,
File "/opt/svtk/svtk/adjudicate/random_forest.py", line 19, in rf_classify
rf = RandomForest(trainable, testable, features, cutoffs, labeler, name,
File "/opt/svtk/svtk/adjudicate/random_forest.py", line 44, in __init__
raise Exception('No clean variants found')
Exception: No clean variants found
Digging a little deeper, this is caused by the batch metrics file generated in the previous steps missing these two columns: BAF_snp_ratio
and BAF_del_loglik
. I think there are other columns as well, but these two directly caused this error. I'm not sure if this is a bug or because I'm inputting something wrong. I'm still getting used to terra and understanding the pipeline, so I appreciate any help, thanks!
-
Hi Dong Wang
I guess this question has already been answered under github issues. It requires more samples to get a result for this step.
Regards.
-
Yes, thanks for following up! I am currently re-running the pipeline with more samples. If anything arises, would the Terra support forum be a more efficient place to troubleshoot than here?
-
Hi again.
You can definitely use here as well. GATK-SV team members will take your questions gladly.
Regards.
-
Hello, I'm running into a new error and cross-posting here again. Running batches of 100 samples fixed my previous issue.
Now at step 18-SVConcordance, I'm running into this error:
htsjdk.tribble.TribbleException$InternalCodecException: The allele with index 2 is not defined in the REF/ALT columns in the record
.
The VCF causing this is the output from step 16-RefineComplexVariants, namedcpx_refined.vcf.gz
. What does this error indicate?Thanks for your help!
-
Hi Dong Wang
This could be bug at one of the tools that generate the target vcf. I will relay to GATK-SV team to check it out.
-
Hi Gökalp Çelik, sounds good, please let me know who/where I should follow up with!
-
Hi Dong Wang
Here is the response from the GATK-SV team. This is a known issue and it is fixed in a later version of docker image. They are recommending updating the
sv_pipeline_docker
variable to
us.gcr.io/broad-dsde-methods/gatk-sv/sv-pipeline:2025-02-10-v1.0.2-72c15c6b
and rerunning the
CleanVcf
and onward.
Developers also mentioned that the live workspace has already been updated so you may need to check your workspace configuration against the working configuration of the live workspace for GATK-SV.
Regards.
-
Hi Gökalp Çelik, I appreciate your quick response! I will retry the steps with the updated docker. It seems like the pipeline is being actively updated to address breaking bugs like this. For the future, should I run the pipeline with the default v1.0 code or use whatever is the most updated (v1.0.2 now for most workflows)? Thanks for your help.
-
Hi again.
Using the most updated code would be the best.
Regards.
Please sign in to leave a comment.
9 comments