Skip to content

fix(worker): persist workspace path after creation#306

Merged
viniciusdc merged 12 commits into
mainfrom
debug/workspace-path-294
May 29, 2026
Merged

fix(worker): persist workspace path after creation#306
viniciusdc merged 12 commits into
mainfrom
debug/workspace-path-294

Conversation

@viniciusdc

@viniciusdc viniciusdc commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #294

SetWorkspacePath was correctly writing the path to the DB, but UpdateWorkspaceSize runs immediately after and calls db.Save(ws) with the stale in-memory ws struct (loaded at job start with path=""). That full-row save was overwriting the path back to empty every time.

SQL evidence from the cluster:

-- SetWorkspacePath works:
UPDATE workspaces SET path="/home/jovyan/.local/share/nebi/workspaces/test8-54c8da39-..." WHERE id = "54c8da39-..."

-- UpdateWorkspaceSize immediately clobbers it:
UPDATE workspaces SET name="test8", path="", size_bytes=294733936, ... WHERE id = "54c8da39-..."

Fix

After SetWorkspacePath succeeds, set ws.Path = resolvedPath in memory so the subsequent db.Save(ws) in UpdateWorkspaceSize preserves the path instead of overwriting it.

How to test

Update nebi.image.tag in clusters/hetzner-nebari/apps/nebi-pack.yaml to the sha built from this branch, create a workspace via the UI, then verify:

export KUBECONFIG=~/Library/Caches/nic/hetzner-k3s/nic-nebari/kubeconfig
kubectl exec -n nebari-system <jlab-pod> -- sqlite3 /home/jovyan/.local/share/nebi/nebi.db \
  "SELECT name, path FROM workspaces ORDER BY created_at DESC LIMIT 5;"

The path column should be non-empty for newly created workspaces.

Keeping draft until verified on the cluster.

Adds debug logging around SetWorkspacePath to diagnose why managed
workspaces end up with path='' in the database after creation via UI.
@netlify

netlify Bot commented Apr 24, 2026

Copy link
Copy Markdown

Deploy Preview for nebi-docs canceled.

Name Link
🔨 Latest commit 305d69a
🔍 Latest deploy log https://app.netlify.com/projects/nebi-docs/deploys/6a19c575e39419000809c137

… UpdateWorkspaceSize from overwriting it

UpdateWorkspaceSize calls db.Save(ws) with the full workspace struct. Since ws
is loaded at job start with path="", the Save overwrites whatever SetWorkspacePath
just wrote to the DB. Keeping ws.Path in sync after a successful SetWorkspacePath
prevents the clobber.

Closes #294
@viniciusdc viniciusdc changed the title debug(worker): log workspace path resolution for issue #294 fix(worker): persist workspace path after creation Apr 24, 2026
@viniciusdc viniciusdc marked this pull request as ready for review April 24, 2026 20:19
@viniciusdc

Copy link
Copy Markdown
Contributor Author

Tested on the hetzner cluster. After restarting the JLab pod with sha-e81f609, created test11 and the path is correctly persisted:

test11 | /home/jovyan/.local/share/nebi/workspaces/test11-1a0a2608-70e7-4f20-88ab-e71772d658ba | ready

Earlier workspaces (test8, test10) still show empty path since they were created before the fix — expected.

@viniciusdc viniciusdc requested a review from aktech April 25, 2026 15:28
Comment thread internal/worker/worker.go
aktech and others added 2 commits May 7, 2026 14:45
Drives executeJob end-to-end with a stale workspace (Path="") via a
minimal fake executor and asserts the path column is persisted after
the create flow. Pins the in-memory ws.Path sync so a future revert
of the SetWorkspacePath/UpdateWorkspaceSize ordering would be caught.

Refs #294.
@viniciusdc

Copy link
Copy Markdown
Contributor Author

@aktech the regression test is in and CI is green. Mind taking another look?

@Adam-D-Lewis

Adam-D-Lewis commented May 22, 2026

Copy link
Copy Markdown
Member

I started reviewing this but wanted to reproduce the error and found a new issue that prevented me from doing so. The linux nebi-desktop app crashes on startup. #350

@viniciusdc

Copy link
Copy Markdown
Contributor Author

Hey @Adam-D-Lewis, thanks for having a look, and good catch. I only ran this against NIC, so I wasn't aware that we needed to test this through the desktop version too. I wonder though, if testing on cloud would be enough as validation for this PR specifically, since the desktop probably would need to wait for upstream contribution or workaround to take effect first.

@Adam-D-Lewis

Copy link
Copy Markdown
Member

@viniciusdc Testing on cloud should be good enough to prove it works. I was trying to reproduce locally to be sure I understand the issue. With a recent fix, I was able to reproduce the issue today. I plan on reviewing this tomorrow. Thanks for your patience!

@Adam-D-Lewis

Copy link
Copy Markdown
Member

One concern on the error-handling branch: when SetWorkspacePath returns an error, the new code logs it but falls through — the create flow continues and SetWorkspaceStatus(ws.ID, models.WsStatusReady) still runs a few lines later. The user-observable result is identical to the bug this PR fixes: workspace shows ready in the UI but path is empty in the DB and the CLI reports "missing": true.

Suggest returning early (or transitioning to WsStatusFailed) after logging, matching the CreateWorkspace error handling four lines above:

if err := w.svc.SetWorkspacePath(ws.ID, resolvedPath); err != nil {
    w.logger.Error("failed to persist workspace path", "workspace_id", ws.ID, "resolved_path", resolvedPath, "error", err)
    w.svc.SetWorkspaceStatus(ws.ID, models.WsStatusFailed)
    return err
}
ws.Path = resolvedPath

It's a low-likelihood path in practice (the only SetWorkspacePath failure mode is the DB write itself), but the whole point of the fix is to make the "ready with empty path" state unreachable — leaving the silent-continue branch keeps the original symptom possible.

@Adam-D-Lewis

Adam-D-Lewis commented May 27, 2026

Copy link
Copy Markdown
Member

On the regression test — nice that it drives executeJob end-to-end and actually reaches db.Save(ws) via the real UpdateWorkspaceSize path. One thing worth tightening:

PackageManager: "test-pm-not-registered",

The test depends on SyncPackagesFromWorkspace and CreateVersionSnapshot failing with an unknown-manager error, and on the worker's current behavior of logging and continuing on those failures (w.logger.Error("Failed to sync packages", ...)). If either of those is later refactored to return early, fail-fast, or stop calling UpdateWorkspaceSize when the manager is invalid, this test stops covering the regression — and it does so silently, since it would still pass (the bug requires db.Save(ws) to actually run).

Two safer options:

  1. Extend fakeExecutor (or add a sibling fake) to implement enough of the package-manager surface that SyncPackagesFromWorkspace/CreateVersionSnapshot succeed as no-ops on a registered manager name.
  2. Use a registered manager ("pixi" or whatever the lightest one is) and inject test doubles for the two methods.

At minimum, a // NOTE: comment on the PackageManager line spelling out that the value is deliberately invalid and why the test still reaches the regression site would save the next person a debugging session.

@Adam-D-Lewis Adam-D-Lewis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

@Adam-D-Lewis

Copy link
Copy Markdown
Member

I also have a related follow-up PR if you have time to review @viniciusdc. #353

viniciusdc and others added 4 commits May 29, 2026 11:30
If SetWorkspacePath returned an error during create, the worker logged
it and fell through to SetWorkspaceStatus(ready), reproducing the exact
"ready in UI, empty path in DB" state this fix exists to prevent. Match
the CreateWorkspace error branch above: set status to failed and return.
…actors

The regression test previously relied on SyncPackagesFromWorkspace and
CreateVersionSnapshot failing with an unknown-manager error and on the
worker's log-and-continue behavior to still reach db.Save(ws). If either
were refactored to fail-fast, the test would pass without exercising the
regression site.

Register a no-op test package manager and write empty pixi.toml/pixi.lock
files in the fake executor, so the create flow stays on its happy path
and the regression site is reached unconditionally.

@Adam-D-Lewis Adam-D-Lewis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both review points addressed and verified locally (vet clean, regression test passes):

  1. Error-handling branch (dc082123d) — SetWorkspacePath failure now sets WsStatusFailed and returns instead of falling through to ready, making the "ready with empty path" state unreachable.

  2. Regression test robustness (a4808dffb) — registers a no-op package manager and writes empty pixi.toml/pixi.lock so the create flow stays on its happy path and reaches db.Save(ws) unconditionally, independent of future error-handling refactors.

Thanks for the thorough fix and for verifying on the cluster. LGTM 🚀

@viniciusdc viniciusdc merged commit 11d6218 into main May 29, 2026
14 checks passed
@viniciusdc viniciusdc deleted the debug/workspace-path-294 branch May 29, 2026 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Workspace path not stored in database, kernels not discoverable

3 participants