Skip to content

Inject run config store into WorkloadFromContainerInfo#4342

Merged
JAORMX merged 5 commits intomainfrom
fix-flaky-workload-from-container-info
Mar 25, 2026
Merged

Inject run config store into WorkloadFromContainerInfo#4342
JAORMX merged 5 commits intomainfrom
fix-flaky-workload-from-container-info

Conversation

@JAORMX
Copy link
Copy Markdown
Collaborator

@JAORMX JAORMX commented Mar 24, 2026

Summary

  • WorkloadFromContainerInfo called loadRunConfigFields which internally created a real state.LocalStore hitting the XDG filesystem on every call. In CI, parallel tests could create or truncate runconfig files, causing intermittent EOF errors from json.Decode — making TestRuntimeStatusManager_GetWorkload flaky.
  • Add a state.Store parameter to WorkloadFromContainerInfo and loadRunConfigFields so callers inject the store, matching the existing DI pattern in fileStatusManager. Add a runConfigStore field to runtimeStatusManager for parity.
  • Fix file_status_test.go mock readers to use DoAndReturn for fresh readers on each call, preventing EOF when the store is read more than once per test.

Fixes #4341

Type of change

  • Bug fix

Test plan

  • Unit tests (task test)

Changes

File Change
pkg/workloads/types/types.go Add state.Store param to WorkloadFromContainerInfo and loadRunConfigFields; rewrite loadRunConfigFields to use injected store directly
pkg/workloads/types/workload_test.go Pass existing store to updated WorkloadFromContainerInfo
pkg/workloads/statuses/status.go Add runConfigStore field to runtimeStatusManager; update NewStatusManagerFromRuntime to return error; pass store at 2 call sites
pkg/workloads/statuses/file_status.go Pass f.runConfigStore at 6 call sites; name receiver on mergeHealthyWorkloadData
pkg/workloads/statuses/status_test.go Inject mock store into runtimeStatusManager; set up Exists expectations
pkg/workloads/statuses/file_status_test.go Change Return(mockReader, nil).AnyTimes() to DoAndReturn(...) for fresh readers

Does this introduce a user-facing change?

No

Special notes for reviewers

The loadRunConfigFields rewrite switches from state.LoadRunConfig (which creates a new store internally) to direct store.Exists + store.GetReader calls. The semantics are identical: returns empty config when not found, propagates errors otherwise — but no longer creates a store on every invocation.

Generated with Claude Code

@github-actions github-actions bot added the size/S Small PR: 100-299 lines changed label Mar 24, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 48.57143% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.42%. Comparing base (074326e) to head (4a578a3).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloads/types/types.go 33.33% 5 Missing and 5 partials ⚠️
pkg/workloads/statuses/status.go 54.54% 3 Missing and 2 partials ⚠️
pkg/workloads/statuses/file_status.go 66.66% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4342      +/-   ##
==========================================
- Coverage   68.45%   68.42%   -0.04%     
==========================================
  Files         479      479              
  Lines       48642    48652      +10     
==========================================
- Hits        33300    33288      -12     
- Misses      12373    12385      +12     
- Partials     2969     2979      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 24, 2026
Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DI approach is architecturally correct and cleanly fixes the flaky test root cause. The DoAndReturn changes are exactly right — sharing a single pre-consumed reader across .AnyTimes() calls was the actual bug.

Two things need addressing before merge (marked blocker); two more are nits that can be tackled in a follow-up.


Blockers

  1. loadRunConfigFields: reader.Close() error is silently discarded — diverges from established project pattern
  2. loadRunConfigFields: TOCTOU between store.Exists and store.GetReader — a concurrent deletion between the two calls returns an error that was previously handled gracefully as "not found"

Nits (follow-up PR OK)
3. store.Exists and store.GetReader errors are returned without workload name context, making debugging harder
4. TestNewStatusManagerFromRuntime hits the real XDG filesystem via NewRunConfigStore, inconsistent with the mock-based pattern used everywhere else in the suite

@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 24, 2026
JAORMX and others added 4 commits March 25, 2026 05:34
WorkloadFromContainerInfo called loadRunConfigFields which internally
created a real state.LocalStore hitting the XDG filesystem on every
call. This caused flaky tests when parallel tests created or truncated
runconfig files, leading to intermittent EOF errors from json.Decode.

Add a state.Store parameter to WorkloadFromContainerInfo and
loadRunConfigFields so callers inject the store, matching the existing
dependency injection pattern in fileStatusManager. Add a runConfigStore
field to runtimeStatusManager for parity.

Fix file_status_test.go mock readers to use DoAndReturn for fresh
readers on each call, preventing EOF when the store is read more than
once per test.

Fixes #4341

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Check reader.Close error return and break long function signature line
to satisfy errcheck and lll linters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…elds

The Exists+GetReader sequence had a race where a concurrent workload
deletion between the two calls would return an error instead of the
intended empty config. Remove the Exists call entirely and handle
not-found from GetReader directly via httperr status code.

Also log reader.Close() errors with slog.Warn to match the established
project pattern, and wrap errors with workload name for debuggability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX force-pushed the fix-flaky-workload-from-container-info branch from ff81f7d to f4fc388 Compare March 25, 2026 05:39
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 25, 2026
Move store creation out of NewStatusManagerFromRuntime into
NewStatusManagerWithEnv so the constructor is pure DI. Tests
inject a mock store instead of hitting the real XDG filesystem.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX force-pushed the fix-flaky-workload-from-container-info branch from e7ec9e1 to 4a578a3 Compare March 25, 2026 05:45
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 25, 2026
@JAORMX JAORMX merged commit d208cfc into main Mar 25, 2026
69 of 71 checks passed
@JAORMX JAORMX deleted the fix-flaky-workload-from-container-info branch March 25, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: WorkloadFromContainerInfo hits real XDG filesystem

2 participants