Skip to content

Fix flaky api-workloads E2E test timeout#4106

Merged
JAORMX merged 1 commit intomainfrom
worktree-agent-a692c1e3
Mar 11, 2026
Merged

Fix flaky api-workloads E2E test timeout#4106
JAORMX merged 1 commit intomainfrom
worktree-agent-a692c1e3

Conversation

@JAORMX
Copy link
Copy Markdown
Collaborator

@JAORMX JAORMX commented Mar 11, 2026

Summary

  • The "E2E Tests Core (api-workloads)" CI job fails intermittently because Chi's global middleware.Timeout(60s) cancels the request context before container image pulls can complete. When images are cached the pull finishes in <60s (pass); when uncached it takes >60s (HTTP 500).
  • Replace the global timeout with per-route timeouts: workload create and edit routes get an 11-minute timeout (slightly longer than the 10-minute imageRetrievalTimeout), while all other routes keep the standard 60-second timeout.

Type of change

  • Bug fix

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

Changes

File Change
pkg/api/server.go Remove global middleware.Timeout from r.Use() block; apply standard timeout per-mount in setupDefaultRoutes for all non-workload routers
pkg/api/v1/workloads.go Add per-route timeouts via r.With(): 11min for create/edit (image pulls), 60s for all other routes

Special notes for reviewers

  • longRunningRouteTimeout is derived from imageRetrievalTimeout + 1*time.Minute (same package) so the invariant is maintained if the image timeout changes.
  • Custom routes added via WithRoute are not currently used anywhere, but now have a comment noting callers are responsible for their own timeout management.
  • The two pre-existing flaky test failures in pkg/secrets and pkg/transport/proxy/transparent are unrelated to this change.

Generated with Claude Code

The global Chi middleware.Timeout(60s) was canceling workload create
requests before the image pull could complete, causing intermittent
HTTP 500 errors in CI when images weren't cached.

Replace the global timeout with per-route timeouts: workload create
and edit routes get an 11-minute timeout (slightly longer than the
10-minute imageRetrievalTimeout), while all other routes keep the
standard 60-second timeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JAORMX JAORMX requested a review from amirejaz as a code owner March 11, 2026 15:39
@github-actions github-actions bot added the size/XS Extra small PR: < 100 lines changed label Mar 11, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 0% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.69%. Comparing base (144a4da) to head (964381e).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
pkg/api/v1/workloads.go 0.00% 16 Missing ⚠️
pkg/api/server.go 0.00% 13 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4106   +/-   ##
=======================================
  Coverage   68.68%   68.69%           
=======================================
  Files         453      453           
  Lines       46011    46013    +2     
=======================================
+ Hits        31604    31608    +4     
+ Misses      11967    11965    -2     
  Partials     2440     2440           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX JAORMX merged commit 0b80686 into main Mar 11, 2026
44 of 45 checks passed
@JAORMX JAORMX deleted the worktree-agent-a692c1e3 branch March 11, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Extra small PR: < 100 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants