fix(worker): persist workspace path after creation#306
Conversation
Adds debug logging around SetWorkspacePath to diagnose why managed workspaces end up with path='' in the database after creation via UI.
✅ Deploy Preview for nebi-docs canceled.
|
… UpdateWorkspaceSize from overwriting it UpdateWorkspaceSize calls db.Save(ws) with the full workspace struct. Since ws is loaded at job start with path="", the Save overwrites whatever SetWorkspacePath just wrote to the DB. Keeping ws.Path in sync after a successful SetWorkspacePath prevents the clobber. Closes #294
|
Tested on the hetzner cluster. After restarting the JLab pod with Earlier workspaces ( |
Drives executeJob end-to-end with a stale workspace (Path="") via a minimal fake executor and asserts the path column is persisted after the create flow. Pins the in-memory ws.Path sync so a future revert of the SetWorkspacePath/UpdateWorkspaceSize ordering would be caught. Refs #294.
|
@aktech the regression test is in and CI is green. Mind taking another look? |
|
I started reviewing this but wanted to reproduce the error and found a new issue that prevented me from doing so. The linux nebi-desktop app crashes on startup. #350 |
|
Hey @Adam-D-Lewis, thanks for having a look, and good catch. I only ran this against NIC, so I wasn't aware that we needed to test this through the desktop version too. I wonder though, if testing on cloud would be enough as validation for this PR specifically, since the desktop probably would need to wait for upstream contribution or workaround to take effect first. |
|
@viniciusdc Testing on cloud should be good enough to prove it works. I was trying to reproduce locally to be sure I understand the issue. With a recent fix, I was able to reproduce the issue today. I plan on reviewing this tomorrow. Thanks for your patience! |
|
One concern on the error-handling branch: when Suggest returning early (or transitioning to if err := w.svc.SetWorkspacePath(ws.ID, resolvedPath); err != nil {
w.logger.Error("failed to persist workspace path", "workspace_id", ws.ID, "resolved_path", resolvedPath, "error", err)
w.svc.SetWorkspaceStatus(ws.ID, models.WsStatusFailed)
return err
}
ws.Path = resolvedPathIt's a low-likelihood path in practice (the only |
|
On the regression test — nice that it drives PackageManager: "test-pm-not-registered",The test depends on Two safer options:
At minimum, a |
|
I also have a related follow-up PR if you have time to review @viniciusdc. #353 |
If SetWorkspacePath returned an error during create, the worker logged it and fell through to SetWorkspaceStatus(ready), reproducing the exact "ready in UI, empty path in DB" state this fix exists to prevent. Match the CreateWorkspace error branch above: set status to failed and return.
…actors The regression test previously relied on SyncPackagesFromWorkspace and CreateVersionSnapshot failing with an unknown-manager error and on the worker's log-and-continue behavior to still reach db.Save(ws). If either were refactored to fail-fast, the test would pass without exercising the regression site. Register a no-op test package manager and write empty pixi.toml/pixi.lock files in the fake executor, so the create flow stays on its happy path and the regression site is reached unconditionally.
Adam-D-Lewis
left a comment
There was a problem hiding this comment.
Both review points addressed and verified locally (vet clean, regression test passes):
-
Error-handling branch (
dc082123d) —SetWorkspacePathfailure now setsWsStatusFailedand returns instead of falling through toready, making the "ready with empty path" state unreachable. -
Regression test robustness (
a4808dffb) — registers a no-op package manager and writes emptypixi.toml/pixi.lockso the create flow stays on its happy path and reachesdb.Save(ws)unconditionally, independent of future error-handling refactors.
Thanks for the thorough fix and for verifying on the cluster. LGTM 🚀
Summary
Closes #294
SetWorkspacePathwas correctly writing the path to the DB, butUpdateWorkspaceSizeruns immediately after and callsdb.Save(ws)with the stale in-memorywsstruct (loaded at job start withpath=""). That full-row save was overwriting the path back to empty every time.SQL evidence from the cluster:
Fix
After
SetWorkspacePathsucceeds, setws.Path = resolvedPathin memory so the subsequentdb.Save(ws)inUpdateWorkspaceSizepreserves the path instead of overwriting it.How to test
Update
nebi.image.taginclusters/hetzner-nebari/apps/nebi-pack.yamlto the sha built from this branch, create a workspace via the UI, then verify:The
pathcolumn should be non-empty for newly created workspaces.Keeping draft until verified on the cluster.