fix(runtime): BoxSockets — bind box sockets via short /tmp symlinks (sun_path)#742
Conversation
…sun_path) A deep BOXLITE_HOME pushes boxes/<id>/sockets/net.sock-krun.sock past the 104-byte sun_path limit. libkrun's bind fails silently: no vCPU, empty console, and a 30s guest-ready timeout with zero console bytes. The existing SocketShortener was dead code, and its threshold was sized against ready.sock instead of krun's derived net.sock-krun.sock — a 9-char dead zone where our sockets bind but krun's silently fails. Replace it with BoxSockets, the single authority for box socket paths: - Unconditional short binding: every socket binds/dials through /tmp/bl-<uid>/<box_id> -> <box_home>/sockets (OVS symlink indirection; containerd-style always-short addresses). One code path, no length-conditional behavior, no dead zones. - tmux-style 0700 owner-verified per-uid parent dir — predictable names can't be squatted by other local users (loud refusal, never silent). - BoxConfig stops persisting transport/ready_socket_path; both derive from (box_home, id) at point of use, so stored paths can never go stale. Legacy DB rows with the old fields still deserialize (tested). - Every box-removal path unlinks the binding symlink; runtime startup sweeps dangling ones (crash/temp-home leftovers); guest-connect re-ensures the symlink, self-healing the macOS /tmp reaper case. - Seatbelt profile generation keeps symlink leaves literal when canonicalizing — full resolution emitted rules for the target and the sandbox denied the bind at the symlink path. Both sandbox behaviors carry regression tests. - Vendor-pin test fails if vendored libkrun stops deriving -krun.sock, so the sanity-check constant cannot drift silently. Verified: 721 boxlite lib tests; long-home boot probe (the original repro); reaper-simulation reattach+exec; 11-service infra-local integration round-trip.
health_check_becomes_unhealthy_when_shim_killed leaves a live shim at PerTestBoxHome drop — pre-existing on main (reproduces identically with src/boxlite flipped to parent commit 577f4a5): the pid the test kills (BoxInfo.pid) is evidently not the process that wrote shim.pid, so a live shim survives even though the box reports Stopped. This fails the harness leak check and blocks every rust push via the pre-push test:changed hook. Tear down through the production remove path per the leak check's own contract (same class of fix as #615 applied to the sibling healthy-box test). FIXME(health) left for the owners to reconcile BoxInfo.pid with shim.pid and identify the surviving process.
📝 WalkthroughWalkthroughThis PR refactors socket path management in Boxlite by replacing the ChangesSocket Path Refactoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/boxlite/src/jailer/mod.rs`:
- Around line 207-210: The policy generation incorrectly relies on on-disk state
for the writable binding directory: call sockets.ensure() (or otherwise ensure
layout.sockets_dir() binding symlink exists) immediately before building the
sandbox context so the subsequent call to layout.sockets().policy_paths() yields
directory-aware entries; alternatively, modify PathAccess creation in that code
path to carry explicit directory intent (e.g., mark the binding_dir entry as a
directory/writable) so seatbelt::build_dynamic_* receives a directory grant
regardless of /tmp state. Ensure the change targets the code that iterates
layout.sockets().policy_paths() and produces PathAccess so descendant socket
grants for net.sock/ready.sock are emitted as subpath-capable directory rules.
In `@src/boxlite/src/net/socket_path.rs`:
- Around line 155-177: Replace string equality checks against self.real_dir()
with an inode/dev comparison: when you see
std::fs::read_link(&symlink_path).ok().as_deref() == Some(self.real_dir()) (in
the initial Ok(meta) branch) and the racing_winner_ok check that reads the link,
instead call std::fs::read_link(&symlink_path) to get the target PathBuf and
then std::fs::metadata(&target).ok() and std::fs::metadata(self.real_dir()).ok()
and compare their device and inode via std::os::unix::fs::MetadataExt::dev() and
::ino(); treat them as equal if both metadata calls succeed and dev/ino match.
Update both the "already correct" early return and the AlreadyExists race check
to use this dev/ino comparison and fall back to the original string check only
if metadata info is unavailable.
In `@src/boxlite/tests/health_check.rs`:
- Around line 153-162: The teardown using t.runtime.remove(box_id.as_str(),
true) can mask the PID-mismatch bug because remove only kills when
state.status.is_active() or state.pid.is_some(), but the health-check path
clears state.pid and sets BoxStatus::Stopped; update the test to assert the shim
process is actually dead before calling remove (or split into a separate
regression): capture the shim pid (from shim.pid or BoxInfo.pid) after starting
the box, verify the process identity is not running (e.g., check /proc/<pid> or
attempt a signal) and assert that it is dead when info.status ==
BoxStatus::Stopped, and only then perform t.runtime.remove(...); alternatively
create a dedicated test that reproduces PID mismatch and asserts the surviving
shim is killed by remove when appropriate, referencing remove, BoxInfo,
BoxStatus::Stopped, state.pid, and kill_process to locate the relevant code
paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e1773de5-7dcc-458a-8f01-8686e56eb24f
📒 Files selected for processing (14)
src/boxlite/src/db/boxes.rssrc/boxlite/src/jailer/mod.rssrc/boxlite/src/jailer/sandbox/seatbelt.rssrc/boxlite/src/litebox/box_impl.rssrc/boxlite/src/litebox/config.rssrc/boxlite/src/litebox/init/tasks/guest_connect.rssrc/boxlite/src/litebox/init/types.rssrc/boxlite/src/litebox/manager.rssrc/boxlite/src/net/socket_path.rssrc/boxlite/src/runtime/constants.rssrc/boxlite/src/runtime/layout.rssrc/boxlite/src/runtime/rt_impl.rssrc/boxlite/src/runtime/types.rssrc/boxlite/tests/health_check.rs
💤 Files with no reviewable changes (5)
- src/boxlite/src/litebox/manager.rs
- src/boxlite/src/litebox/box_impl.rs
- src/boxlite/src/runtime/constants.rs
- src/boxlite/src/litebox/init/types.rs
- src/boxlite/src/runtime/types.rs
| if layout.sockets_dir().exists() { | ||
| for (path, writable) in layout.sockets().policy_paths() { | ||
| paths.push(PathAccess { path, writable }); | ||
| } |
There was a problem hiding this comment.
Re-ensure the binding symlink before emitting its PathAccess.
Line 207 only gates on layout.sockets_dir().exists(), but the writable binding_dir entry from Line 208 is still interpreted later from live filesystem state. If /tmp/bl-{uid}/{box_id} has been cleaned up when the Seatbelt policy is built, seatbelt::build_dynamic_* will treat that path as non-directory and emit only a literal rule, so descendant binds/connects under .../net.sock / .../ready.sock lose the required subpath grant. The new test here already has to call sockets.ensure() to make the binding path behave like a directory. Please either ensure() immediately before building the sandbox context or carry explicit directory intent in PathAccess so the policy no longer depends on /tmp state at generation time.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/boxlite/src/jailer/mod.rs` around lines 207 - 210, The policy generation
incorrectly relies on on-disk state for the writable binding directory: call
sockets.ensure() (or otherwise ensure layout.sockets_dir() binding symlink
exists) immediately before building the sandbox context so the subsequent call
to layout.sockets().policy_paths() yields directory-aware entries;
alternatively, modify PathAccess creation in that code path to carry explicit
directory intent (e.g., mark the binding_dir entry as a directory/writable) so
seatbelt::build_dynamic_* receives a directory grant regardless of /tmp state.
Ensure the change targets the code that iterates layout.sockets().policy_paths()
and produces PathAccess so descendant socket grants for net.sock/ready.sock are
emitted as subpath-capable directory rules.
| match std::fs::symlink_metadata(&symlink_path) { | ||
| Ok(meta) if meta.file_type().is_symlink() => { | ||
| // Stale symlink — safe to replace | ||
| if std::fs::read_link(&symlink_path).ok().as_deref() == Some(self.real_dir()) { | ||
| return Ok(()); // already correct | ||
| } | ||
| // Points elsewhere (stale from a previous box) — replace. | ||
| let _ = std::fs::remove_file(&symlink_path); | ||
| } | ||
| Ok(_) => { | ||
| // Regular file or directory — refuse to overwrite | ||
| // Regular file or directory — refuse to overwrite. | ||
| return Err(BoxliteError::Internal(format!( | ||
| "{} exists but is not a symlink — refusing to overwrite", | ||
| symlink_path.display(), | ||
| ))); | ||
| } | ||
| Err(_) => { | ||
| // Doesn't exist — good | ||
| } | ||
| Err(_) => {} // doesn't exist — good | ||
| } | ||
|
|
||
| std::os::unix::fs::symlink(sockets_dir, &symlink_path).map_err(|e| { | ||
| BoxliteError::Storage(format!( | ||
| "Failed to create socket symlink {} → {}: {}", | ||
| symlink_path.display(), | ||
| sockets_dir.display(), | ||
| e, | ||
| )) | ||
| })?; | ||
| if let Err(e) = std::os::unix::fs::symlink(self.real_dir(), &symlink_path) { | ||
| // Tolerate a concurrent ensure() for the same box as long as the | ||
| // winner points at our target. | ||
| let racing_winner_ok = e.kind() == std::io::ErrorKind::AlreadyExists | ||
| && std::fs::read_link(&symlink_path).ok().as_deref() == Some(self.real_dir()); |
There was a problem hiding this comment.
Compare the existing binding target by inode, not by read_link() string.
Lines 157-177 still treat the binding as "correct" only when the symlink target string exactly matches self.real_dir(). That breaks the upgrade/self-heal invariant described in this refactor: two callers can legitimately describe the same sockets directory with different path strings (BOXLITE_HOME through a symlink, .. segments, canonicalized vs original). In that case ensure() will either replace a valid binding or fail the concurrent-winner path with AlreadyExists, even though both paths resolve to the same directory. Please compare metadata() dev/inode instead.
Suggested fix
impl BoxSockets {
+ fn binding_matches_real_dir(&self, path: &Path) -> bool {
+ let Ok(bound_meta) = std::fs::metadata(path) else {
+ return false;
+ };
+ let Ok(real_meta) = std::fs::metadata(self.real_dir()) else {
+ return false;
+ };
+ bound_meta.dev() == real_meta.dev() && bound_meta.ino() == real_meta.ino()
+ }
+
/// Ensure the binding symlink exists and is correct. Idempotent;
/// tolerates concurrent callers for the same box.
pub fn ensure(&self) -> BoxliteResult<()> {
@@
match std::fs::symlink_metadata(&symlink_path) {
Ok(meta) if meta.file_type().is_symlink() => {
- if std::fs::read_link(&symlink_path).ok().as_deref() == Some(self.real_dir()) {
+ if self.binding_matches_real_dir(&symlink_path) {
return Ok(()); // already correct
}
// Points elsewhere (stale from a previous box) — replace.
let _ = std::fs::remove_file(&symlink_path);
}
@@
if let Err(e) = std::os::unix::fs::symlink(self.real_dir(), &symlink_path) {
// Tolerate a concurrent ensure() for the same box as long as the
// winner points at our target.
let racing_winner_ok = e.kind() == std::io::ErrorKind::AlreadyExists
- && std::fs::read_link(&symlink_path).ok().as_deref() == Some(self.real_dir());
+ && self.binding_matches_real_dir(&symlink_path);
if !racing_winner_ok {
return Err(BoxliteError::Storage(format!(
"Failed to create socket symlink {} → {}: {}",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| match std::fs::symlink_metadata(&symlink_path) { | |
| Ok(meta) if meta.file_type().is_symlink() => { | |
| // Stale symlink — safe to replace | |
| if std::fs::read_link(&symlink_path).ok().as_deref() == Some(self.real_dir()) { | |
| return Ok(()); // already correct | |
| } | |
| // Points elsewhere (stale from a previous box) — replace. | |
| let _ = std::fs::remove_file(&symlink_path); | |
| } | |
| Ok(_) => { | |
| // Regular file or directory — refuse to overwrite | |
| // Regular file or directory — refuse to overwrite. | |
| return Err(BoxliteError::Internal(format!( | |
| "{} exists but is not a symlink — refusing to overwrite", | |
| symlink_path.display(), | |
| ))); | |
| } | |
| Err(_) => { | |
| // Doesn't exist — good | |
| } | |
| Err(_) => {} // doesn't exist — good | |
| } | |
| std::os::unix::fs::symlink(sockets_dir, &symlink_path).map_err(|e| { | |
| BoxliteError::Storage(format!( | |
| "Failed to create socket symlink {} → {}: {}", | |
| symlink_path.display(), | |
| sockets_dir.display(), | |
| e, | |
| )) | |
| })?; | |
| if let Err(e) = std::os::unix::fs::symlink(self.real_dir(), &symlink_path) { | |
| // Tolerate a concurrent ensure() for the same box as long as the | |
| // winner points at our target. | |
| let racing_winner_ok = e.kind() == std::io::ErrorKind::AlreadyExists | |
| && std::fs::read_link(&symlink_path).ok().as_deref() == Some(self.real_dir()); | |
| fn binding_matches_real_dir(&self, path: &Path) -> bool { | |
| let Ok(bound_meta) = std::fs::metadata(path) else { | |
| return false; | |
| }; | |
| let Ok(real_meta) = std::fs::metadata(self.real_dir()) else { | |
| return false; | |
| }; | |
| bound_meta.dev() == real_meta.dev() && bound_meta.ino() == real_meta.ino() | |
| } | |
| /// Ensure the binding symlink exists and is correct. Idempotent; | |
| /// tolerates concurrent callers for the same box. | |
| pub fn ensure(&self) -> BoxliteResult<()> { | |
| match std::fs::symlink_metadata(&symlink_path) { | |
| Ok(meta) if meta.file_type().is_symlink() => { | |
| if self.binding_matches_real_dir(&symlink_path) { | |
| return Ok(()); // already correct | |
| } | |
| // Points elsewhere (stale from a previous box) — replace. | |
| let _ = std::fs::remove_file(&symlink_path); | |
| } | |
| Ok(_) => { | |
| // Regular file or directory — refuse to overwrite. | |
| return Err(BoxliteError::Internal(format!( | |
| "{} exists but is not a symlink — refusing to overwrite", | |
| symlink_path.display(), | |
| ))); | |
| } | |
| Err(_) => {} // doesn't exist — good | |
| } | |
| if let Err(e) = std::os::unix::fs::symlink(self.real_dir(), &symlink_path) { | |
| // Tolerate a concurrent ensure() for the same box as long as the | |
| // winner points at our target. | |
| let racing_winner_ok = e.kind() == std::io::ErrorKind::AlreadyExists | |
| && self.binding_matches_real_dir(&symlink_path); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/boxlite/src/net/socket_path.rs` around lines 155 - 177, Replace string
equality checks against self.real_dir() with an inode/dev comparison: when you
see std::fs::read_link(&symlink_path).ok().as_deref() == Some(self.real_dir())
(in the initial Ok(meta) branch) and the racing_winner_ok check that reads the
link, instead call std::fs::read_link(&symlink_path) to get the target PathBuf
and then std::fs::metadata(&target).ok() and
std::fs::metadata(self.real_dir()).ok() and compare their device and inode via
std::os::unix::fs::MetadataExt::dev() and ::ino(); treat them as equal if both
metadata calls succeed and dev/ino match. Update both the "already correct"
early return and the AlreadyExists race check to use this dev/ino comparison and
fall back to the original string check only if metadata info is unavailable.
| // Teardown through the production path. Without this, PerTestBoxHome's | ||
| // leak check fails: a live shim remains even though status is Stopped — | ||
| // the pid killed above (BoxInfo.pid) is evidently not the process that | ||
| // wrote shim.pid. Pre-existing on main (reproduces at 577f4a57). | ||
| // FIXME(health): reconcile BoxInfo.pid with shim.pid and decide what | ||
| // the surviving process is. | ||
| t.runtime | ||
| .remove(box_id.as_str(), true) | ||
| .await | ||
| .expect("force remove at teardown"); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect whether the harness has an independent leak check and whether
# remove(force=true) can bypass it without killing the real shim.
fd -i 'health_check.rs|box_test.rs|box_test_base.rs|per_test_box_home.rs|rt_impl.rs|box_impl.rs|types.rs' src tests . 2>/dev/null | while read -r f; do
printf '\n===== %s =====\n' "$f"
rg -n -C3 'PerTestBoxHome|leak|shim\.pid|BoxInfo|pid|remove\(|force_status|set_pid|kill_process' "$f" || true
doneRepository: boxlite-ai/boxlite
Length of output: 50374
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Locate and print the relevant portion of remove_box in rt_impl.rs
RT_IMPL="src/boxlite/src/runtime/rt_impl.rs"
echo "===== remove_box in $RT_IMPL ====="
rg -n "fn remove_box\\(" "$RT_IMPL"
line=$(rg -n "fn remove_box\\(" "$RT_IMPL" | head -n1 | cut -d: -f1)
start=$((line-40))
end=$((line+180))
sed -n "${start},${end}p" "$RT_IMPL" | nl -ba | sed -n '1,260p'
# 2) Inspect the health-check failure branch around where it sets pid/status
HC="src/boxlite/tests/health_check.rs"
echo
echo "===== health_check.rs (context around teardown/setup) ====="
rg -n "health failed|Unhealthy|set_pid\\(None\\)|force_status\\(|remove\\(" "$HC"
sed -n '70,200p' "$HC" | nl -ba
# 3) Locate PerTestBoxHome leak check implementation
PE="src"
echo
echo "===== PerTestBoxHome implementation (leak check) ====="
home_file=$(fd -i "per_test_box_home.rs" "$PE" | head -n1 || true)
echo "Found: $home_file"
if [ -n "${home_file:-}" ]; then
rg -n "leak|Drop|assert|shim\\.pid|pid_file|is_process_alive" "$home_file" || true
# Print Drop impl vicinity if present
drop_line=$(rg -n "impl Drop for PerTestBoxHome|impl\\s+Drop\\s+for\\s+PerTestBoxHome" "$home_file" | head -n1 | cut -d: -f1 || true)
if [ -n "${drop_line:-}" ]; then
start=$((drop_line-60)); end=$((drop_line+140))
sed -n "${start},${end}p" "$home_file" | nl -ba | sed -n '1,260p'
else
# fallback: print around leak assertions
leak_line=$(rg -n "leak|shim\\.pid|is_process_alive" "$home_file" | head -n1 | cut -d: -f1 || true)
if [ -n "${leak_line:-}" ]; then
start=$((leak_line-60)); end=$((leak_line+140))
sed -n "${start},${end}p" "$home_file" | nl -ba | sed -n '1,260p'
fi
fi
fi
# 4) Also inspect any test helpers for teardown/remove semantics (optional)
BTB="src/test-utils/src/box_test.rs"
echo
echo "===== box_test.rs context around t.runtime.remove(force=true) ====="
rg -n "remove\\(.*true\\)" "$BTB" || trueRepository: boxlite-ai/boxlite
Length of output: 273
🏁 Script executed:
#!/bin/bash
set -euo pipefail
RT_IMPL="src/boxlite/src/runtime/rt_impl.rs"
HC="src/boxlite/tests/health_check.rs"
echo "===== remove_box control flow (rt_impl.rs, around 828) ====="
# show enough to capture the force/kill decision
sed -n '800,910p' "$RT_IMPL" | cat -n
echo
echo "===== health_check.rs around teardown ====="
sed -n '130,190p' "$HC" | cat -n
echo
echo "===== PerTestBoxHome location ====="
HOME_FILE="$(fd -i 'per_test_box_home\.rs' src tests . | head -n 1 || true)"
echo "Found: ${HOME_FILE:-<none>}"
if [ -n "${HOME_FILE:-}" ]; then
echo
echo "===== PerTestBoxHome leak check / Drop (context) ====="
# print the Drop impl plus nearby leak assertions
DROP_LINE="$(rg -n 'impl\s+Drop\s+for\s+PerTestBoxHome|impl\s+Drop\s+for\s+.*PerTestBoxHome' "$HOME_FILE" | head -n1 | cut -d: -f1 || true)"
if [ -n "${DROP_LINE:-}" ]; then
start=$((DROP_LINE-80)); end=$((DROP_LINE+220))
sed -n "${start},${end}p" "$HOME_FILE" | cat -n
else
# fallback: print first leak assertion block
rg -n 'leak|shim\.pid|pid_file|is_process_alive|assert!|Drop' "$HOME_FILE" | head -n 120
fi
fi
echo
echo "===== box_test helper calls (t.runtime.remove(force=true)) ====="
sed -n '1,260p' src/test-utils/src/box_test.rs | cat -nRepository: boxlite-ai/boxlite
Length of output: 17230
Teardown can mask the PID-mismatch bug—remove(force=true) won’t kill when health-check has cleared state.pid
remove_box(force=true) only calls kill_process when state.status.is_active() or state.pid.is_some(), and it kills only if state.pid is Some(pid). The health-check failure path sets BoxStatus::Stopped and state.pid = None, and the test teardown runs after asserting info.status == Stopped, so this cleanup can remove DB/files without killing the surviving shim process—weakening the FIXME’s regression signal. Add an explicit assertion that the shim PID (or pidfile process identity) is dead before/around teardown, or split this into a dedicated regression that checks PID mismatch behavior without relying on remove(force=true).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/boxlite/tests/health_check.rs` around lines 153 - 162, The teardown using
t.runtime.remove(box_id.as_str(), true) can mask the PID-mismatch bug because
remove only kills when state.status.is_active() or state.pid.is_some(), but the
health-check path clears state.pid and sets BoxStatus::Stopped; update the test
to assert the shim process is actually dead before calling remove (or split into
a separate regression): capture the shim pid (from shim.pid or BoxInfo.pid)
after starting the box, verify the process identity is not running (e.g., check
/proc/<pid> or attempt a signal) and assert that it is dead when info.status ==
BoxStatus::Stopped, and only then perform t.runtime.remove(...); alternatively
create a dedicated test that reproduces PID mismatch and asserts the surviving
shim is killed by remove when appropriate, referencing remove, BoxInfo,
BoxStatus::Stopped, state.pid, and kill_process to locate the relevant code
paths.
Source: Coding guidelines
Problem
A deep
BOXLITE_HOME(e.g. a repo-scoped<repo>/.apps-local/boxlite) pushesboxes/<id>/sockets/net.sock-krun.sockpast the 104-bytesun_pathlimit on macOS. libkrun's bind fails silently: gvproxy comes up, devices configure, but no vCPU is ever created — empty console, 30s guest-ready timeout, zero diagnostics. (net.sock-krun.sockis derived by vendored libkrun by appending-krun.sockto the gvproxy socket path —vendor/libkrun/src/devices/src/virtio/net/unixgram.rs:79.)The repo already contained an OVS-style
SocketShortenerfor exactly this — but it was dead code (zero production callers), and its threshold was sized againstready.sockinstead of the true longest name, leaving a 9-char dead zone where our sockets bind but krun's silently fails. Two further construction sites (filenames::unix_socket_path, a rawjoin("ready.sock")) bypassed the layout getters entirely and persisted the resulting long paths into the DB viaBoxConfig.Design
BoxSockets(net/socket_path.rs) becomes the single authority for box socket paths:/tmp/bl-<uid>/<box_id>→<box_home>/sockets/. One code path for every box — no length-conditional behavior, no dead zones, and thesun_pathbudget constant is demoted to a sanity check. Precedent: OVSshorten_name_via_symlink()(lib/socket-util-unix.c), containerd's always-shortSocketAddress()(runtime/v2/shim/util_unix.go)./tmp/bl-<uid>is created 0700 and owner-verified (permissions repaired, foreign ownership refused loudly) — predictable symlink names can't be squatted by other local users.BoxConfig.transport/ready_socket_pathfields are deleted;transport()/ready_transport()derive from(box_home, id)at point of use. Legacy DB rows with the old fields still deserialize (unknown-field tolerance, tested).prepare(); every removal path unlinks it; runtime startup sweeps dangling leftovers (crash/temp-home runs); guest-connect re-ensures — self-healing against the macOS periodic /tmp cleaner. Upgrade invariant: host and shim may know a socket under different strings; correctness is resolve-equality (same inode), never string equality.canonicalize_or_originalkeeps symlink leaves literal (parent-canonicalized) — full resolution emitted policy rules for the target and the sandbox denied the bind at the symlink path. The socket policy entries (real dir RW, binding symlink RW, parent RO) are emitted deterministically, gated only on the real dir existing.Second commit: pre-existing test leak unblocked
health_check_becomes_unhealthy_when_shim_killedfails the harness leak check on main too (reproduced at the merge-base 577f4a5 with this branch's src flipped back): a live shim survives the test's kill even though the box reports Stopped —BoxInfo.pidis evidently not the process that wroteshim.pid. Since the pre-push hook runs this suite for any rust change, the test now tears down through the production remove path (same class of fix as #615 applied to the sibling test), with aFIXME(health)for the owners on the pid question.Tests
socket_pathunit tests incl. the dead-zone regression (real dir whereready.sockfits butnet.sock-krun.sockdoesn't) and a vendor-pin test that fails if vendored libkrun stops deriving-krun.sock.test_canonicalize_keeps_symlink_leaf_literal,test_build_path_access_socket_policy_entries.BoxConfiglegacy-row deserialization test;remove_boxunlinks-symlink assertion; layout-level long-home test.Verification
make lint:rustclean; pre-push VM integration suite 229/229./tmp/bl-<uid>/<id>present, parent 0700; gone after remove./tmp/bl-<uid>tree under a running box → fresh process reattach + exec succeeds (self-heal).Notes for reviewers
.apps-localworkspace consolidation that surfaced this bug is a separate changeset/PR.Summary by CodeRabbit
New Features
Bug Fixes
Improvements