GATK-SV GenotypeBatch: svtk count error
Version Used
- Latest master branch as of [Feb 22 2022]
- https://github.com/broadinstitute/gatk-sv/
I am running the GATK-SV pipeline in Single sample mode (GATKSVPipelineSingleSample.wdl) with the test input file proved (test_GATKSVPipelineSingleSample.ref_panel_1kg.na12878.no_melt.json). I am running the program with a cromwell HPC backend that I have configured to work with an LSF HPC system (LSF_singu_4.config). I am running the pipeline with the one NA12878 test sample specified in the GATKSVPipelineSingleSample.ref_panel_1kg.na12878.no_melt.json file I based my input file on.
Inside the execution script for the call-CountSRBca module (code generated by the CountSR task in the TasksGenotypeBatch.wdl) my job will run up until it encounters the line with the svtk command:svtk count-sr -s ~{write_lines(samples)} --medianfile ~{medianfile} ~{vcf} local.SR.txt.gz ~{prefix}.sr_counts.txt
The command with my HPC pathing:svtk count-sr -s /cromwell-executions/GATKSVPipelineSingleSample/1681c241-c1f7-478d-99b1-5b061475c6fb/call-GenotypeBatch/GenotypeBatch/3f0f008b-7cc7-419e-b955-ea7b7590758d/call-GenotypePESRPart2/GenotypePESRPart2/792ae6e5-6583-40ce-b662-3d9f11ad8b00/call-CountSRBca/shard-0/attempt-2/execution/write_lines_316220c6f11b746ff9b6cdff48e70640.tmp --medianfile /cromwell-executions/GATKSVPipelineSingleSample/1681c241-c1f7-478d-99b1-5b061475c6fb/call-GenotypeBatch/GenotypeBatch/3f0f008b-7cc7-419e-b955-ea7b7590758d/call-GenotypePESRPart2/GenotypePESRPart2/792ae6e5-6583-40ce-b662-3d9f11ad8b00/call-CountSRBca/shard-0/attempt-2/inputs/1308181802/test_NA12878_medianCov.transposed.bed /cromwell-executions/GATKSVPipelineSingleSample/1681c241-c1f7-478d-99b1-5b061475c6fb/call-GenotypeBatch/GenotypeBatch/3f0f008b-7cc7-419e-b955-ea7b7590758d/call-GenotypePESRPart2/GenotypePESRPart2/792ae6e5-6583-40ce-b662-3d9f11ad8b00/call-CountSRBca/shard-0/attempt-2/inputs/-1356191980/bca.aaaaaa.vcf.gz local.SR.txt.gz bca.aaaaaa.vcf.gz.sr_counts.txt
It attempts to run this command and the program exits with a stack trace indicating there is an invalid character in the command input data.
Traceback (most recent call last):
File "/opt/conda/bin/svtk", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/opt/svtk/scripts/svtk", line 68, in
main()
File "/opt/svtk/scripts/svtk", line 65, in main
getattr(cli, command)(sys.argv[2:])
File "/opt/svtk/svtk/cli/pesr_test.py", line 326, in count_sr
counts = counts.reindex(whitelist).fillna(0).astype(int)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 5882, in astype
dtype=dtype, copy=copy, errors=errors, **kwargs
File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 581, in astype
return self.apply("astype", dtype=dtype, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 438, in apply
applied = getattr(b, f)(**kwargs)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 559, in astype
return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 643, in _astype
values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 700, in astype_nansafe
"Cannot convert non-finite values (NA or inf) to " "integer"
ValueError: Cannot convert non-finite values (NA or inf) to integer
I have searched these input files for NA and inf values, but have had no success finding anything that would violate the datatype requirement. Even if entries in the VCF file were to violate the input datatype requirement, I would naively think/suggest the program would be designed to throw out those invalid vcf entries and continue with the valid ones.
Steps to reproduce
If you want to reproduce my error exactly a docker or singularity environment should be setup running the us.gcr.io/broad-dsde-methods/eph/sv-pipeline@sha256:b01b9531d5ba68896581ad5dfa68beb5b0cce2c22291cc9e81eea6047db9cee3 docker image and the following command should be run with the input files I've provided.svtk count-sr -s write_lines_316220c6f11b746ff9b6cdff48e70640.tmp --medianfile test_NA12878_medianCov.transposed.bed bca.aaaaaa.vcf.gz local.SR.txt.gz bca.aaaaaa.vcf.gz.sr_counts.txt
Expected behavior
The command should run without error and populate the bca.aaaaaa.vcf.gz file with data.
Actual behavior
The command exists with an error and the bca.aaaaaa.vcf.gz file is written empty with only the header. The GATK_SV pipeline fails to run the next steps without successful completion of the module.
I've also created an issue request on the github page for the project, I understand if the GATK team would not support this request as GATK-SV is a separate tool currently still in development. Please note, I was unable to find a way to upload files to this request but the files I mentioned in my post are available for download through the github issue request.
See forum topic details at forum guidelines page: https://gatk.broadinstitute.org/hc/en-us/articles/360053845952-Forum-Guidelines
-
Hi Arosato,
Our GATK Support team on this forum supports the GATK-SV single sample featured workspace on Terra. We are not able to support individual cromwell instances for our WDLs. You can see our support policy here.
I think the best step to determine if the issue you are seeing is with a bug from GATKSVPipelineSingleSample.wdl is to get support in the issue request ticket you already created. Someone from the gatk-sv team will be able to get back to you there.
Here are some other links for Cromwell support as well:
- Bioinformatics Stack Exchange
- Cromwell slack organization: cromwellhq.slack.com
- Cromwell Documentation
I'm sorry we are not able to provide more support for this issue, I hope you are able to get the WDL working soon.
Best,
Genevieve
-
Hi Genevieve,
I figured as much, perhaps I can convince my group to investigate running the GATK-SV single sample terra workflow.
Thanks for getting back to me!
Best,
Andrew
-
No problem, sorry I can't be of more help!
Please sign in to leave a comment.
3 comments