feat: increase resolver worker queue length from 100 to 1000#2085
Conversation
Signed-off-by: Frederic Wilhelm <frederic.wilhelm@sap.com>
📝 WalkthroughWalkthroughThis PR increases the default worker queue length for component version resolution from 100 to 1000 across Helm chart configuration files, CLI defaults, and the underlying workerpool implementation, with a minor documentation clarification to CRD schema. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
I mean sure, but wouldn't it make more sense in the future to have two options?
where we allocate This has nothing to do with this PR, just an idea for future fine-tuning |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
kubernetes/controller/internal/resolution/workerpool/workerpool.go (1)
128-130: Fallback default alignment looks good; add saturation alerting if not already present.The fallback to
1000matches CLI/Helm defaults and avoids mismatched behavior paths. Non-blocking suggestion: consider alert thresholds on queue depth/backlog duration now that buffering is deeper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kubernetes/controller/internal/resolution/workerpool/workerpool.go` around lines 128 - 130, The current fallback sets opts.QueueSize to 1000 but lacks saturation/backlog alerts; add configurable alerting that monitors queue depth and backlog duration using the same QueueSize value: introduce threshold constants or config (e.g., QueueAlertThresholdPct) and emit metrics/log warnings when current queue length (from the workerpool's work queue) exceeds threshold or when items wait longer than a configurable maxBacklogDuration; implement this in the workerpool run/dispatcher loop (where the queue is enqueued/dequeued) and expose Prometheus/Golang metrics (gauge for queue_depth, counter/histogram for backlog_time) and warning logs so operators get alerted before saturation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@kubernetes/controller/internal/resolution/workerpool/workerpool.go`:
- Around line 128-130: The current fallback sets opts.QueueSize to 1000 but
lacks saturation/backlog alerts; add configurable alerting that monitors queue
depth and backlog duration using the same QueueSize value: introduce threshold
constants or config (e.g., QueueAlertThresholdPct) and emit metrics/log warnings
when current queue length (from the workerpool's work queue) exceeds threshold
or when items wait longer than a configurable maxBacklogDuration; implement this
in the workerpool run/dispatcher loop (where the queue is enqueued/dequeued) and
expose Prometheus/Golang metrics (gauge for queue_depth, counter/histogram for
backlog_time) and warning logs so operators get alerted before saturation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 99066bae-9bbd-4671-9bc7-eb2b8b82390f
📒 Files selected for processing (6)
kubernetes/controller/chart/README.mdkubernetes/controller/chart/templates/crd/resources.delivery.ocm.software.yamlkubernetes/controller/chart/test-values.yamlkubernetes/controller/chart/values.yamlkubernetes/controller/cmd/main.gokubernetes/controller/internal/resolution/workerpool/workerpool.go
Summary
Motivation
With large OCM installations managing many component versions, the resolver
worker pool queue can fill up at 100 items, causing reconciliation failures
with
work queue is full; cannot resolve requests for <component>. Increasingthe buffer to 1000 allows the system to absorb larger bursts of resolution
requests without dropping them, while keeping the same number of worker
goroutines (10).
Changes
kubernetes/controller/cmd/main.go— Update--resolver-worker-queue-lengthflag default from 100 → 1000kubernetes/controller/internal/resolution/workerpool/workerpool.go— UpdateNewWorkerPoolfallback default from 100 → 1000kubernetes/controller/chart/values.yaml— Update Helm default from 100 → 1000kubernetes/controller/chart/test-values.yaml— Update test values from 100 → 1000kubernetes/controller/chart/README.md— Regenerated to reflect new default