VQSR on dog: error in GATK WDL pipeline
Running joint-discovery-gatk4.13.wdl on the Broad slurm cluster (using singularity) with dog samples. Docker broadinstitute/gatk:4.1.2.0.
Error:
htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header wi\
th error: /seq/vgb/jphekman/cloud/docker/cromwell-executions/JointGenotyping/94\
2d5d50-99a4-4b74-bf0a-7cebfd4c76e7/call-HardFilterAndMakeSitesOnlyVcf/shard-0/e\
xecution/TBTF.0.sites_only.variant_filtered.vcf.gz, for input source: file:///s\
eq/vgb/jphekman/cloud/docker/cromwell-executions/JointGenotyping/942d5d50-99a4-\
4b74-bf0a-7cebfd4c76e7/call-HardFilterAndMakeSitesOnlyVcf/shard-0/execution/TBT\
F.0.sites_only.variant_filtered.vcf.gz
Caused by: java.nio.file.NoSuchFileException: /seq/vgb/jphekman/cloud/docker/cromwell-executions/JointGenotyping/942d5d50-99a4-4b74-bf0a-7cebfd4c76e7/call-HardFilterAndMakeSitesOnlyVcf/shard-0/execution/TBTF.0.sites_only.variant_filtered.vcf.gz
However, the …variant_filtered.vcf.gz file exists on local disk (as does its .tbi). I strongly suspect that the problem is that the WDL is, for whatever reason, looking for this local filename instead of looking on its virtual disk (giveaway, /seq/vgb doesn't exist in the virtual machine!).
This isn't a problem with the full pipeline - other files have been successfully read on the virtual disk in this pipeline.
I recognize that my workflow is a bit weird (singularity? Really, Jessica?) but I will also note that someone else seems to have had this same issue in an earlier issue of the docker, AND they solved it by running locally not using a docker, which supports my theory that the problem is specific to running this WDL inside a docker:
I really hope you folks can help or at least point me where to go next as I have been beating my head against a wall on this one for a while!
Best,
Jessica
-
We talked about this offline, but I think it's worth documenting here for posterity (especially since the old forum with the hit you found is disappearing.) Our joint calling pipeline only runs in the cloud these days. You'll note that the tool invoked in the GatherVcfs task is GatherVcfsCloud. The docs claim that this tool should work locally, but the fact that the paths in the error aren't right and the fact that the NoSuchFileException is coming from the NIO library, which we use to stream data out of cloud buckets, leads me to believe something is wonky with the way it's trying to read off of your slurm setup. We don't seem to have a GatherVcfs classic anymore, but you could try removing the block in the task that has "localization_optional: true". That will force Cromwell to copy your files to the VM (which it might not be doing now) and then hopefully from there the rest will just work. If editing the WDL doesn't seem like it's easy peasy, you can try to confirm my hypothesis by checking the cromwell task log to see if it did actually localize your files and if it did, make sure the paths look right. (In the cloud this is a *.log file that has things in in like "Pulling docker image" and "Running docker", but I'm not sure if you have one running locally.)
Post is closed for comments.
1 comment