fix(tools): throttle ripgrep CPU usage with thread limits and concurrency control#2009
Conversation
…ency control - Add --threads=4 flag to all rg invocations (grep and glob) - Add global semaphore limiting concurrent rg processes to 2 - Reduce grep timeout from 300s to 60s (matches tool description) - Reduce max output from 10MB to 256KB (prevents excessive memory usage) - Add output_mode parameter (content/files_with_matches/count) - Add head_limit parameter for incremental result fetching Closes code-yeongyu#2008 Ref: code-yeongyu#674, code-yeongyu#1722
|
All contributors have signed the CLA. Thank you! ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
3 issues found across 8 files
Confidence score: 3/5
src/tools/grep/cli.tshas a concrete behavior bug:outputMode: "files_with_matches"yields empty results due toparseOutputexpecting line-numbered output, which can directly affect users relying on that mode.- Two compatibility issues in
src/tools/grep/tools.tsaround schema enum usage and an unsafe type assertion could cause agent/tool schema mismatches at runtime. - The score reflects a real user-facing regression risk in the CLI path plus schema compatibility concerns, though the fixes are localized.
- Pay close attention to
src/tools/grep/cli.tsandsrc/tools/grep/tools.ts- output parsing for files-only mode and enum schema/type safety.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/tools/grep/tools.ts">
<violation number="1" location="src/tools/grep/tools.ts:24">
P1: Custom agent: **Opencode Compatibility**
Use `tool.schema.enum` instead of a generic `string()` for `output_mode` to ensure the agent receives the exact allowed values in its JSON schema and to enforce strict runtime validation.</violation>
<violation number="2" location="src/tools/grep/tools.ts:40">
P1: Custom agent: **Opencode Compatibility**
Remove the unsafe type assertion, as the type should be automatically inferred from a correctly defined `tool.schema.enum`.</violation>
</file>
<file name="src/tools/grep/cli.ts">
<violation number="1" location="src/tools/grep/cli.ts:57">
P1: `outputMode: "files_with_matches"` produces empty results due to incompatible regex parsing in `parseOutput`. When using `--files-with-matches`, ripgrep outputs only file paths (e.g., "src/file.ts") without line numbers or content, but `parseOutput` uses regex `/^(.+?):(\\d+):(.*)$/` expecting `file:line:content` format. This causes all lines to fail matching, returning an empty `matches` array.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Use tool.schema.enum() for output_mode instead of generic string() - Remove unsafe type assertion for output_mode - Fix files_with_matches mode returning empty results by adding filesOnly flag to parseOutput for --files-with-matches rg output
|
Thanks for the review, @cubic-dev-ai! All 3 issues have been addressed in 02017a1:
All existing tests pass (24/24). |
Summary
Fixes #2008 — ripgrep (
rg) causes sustained CPU spikes (700%+) on macOS during agent workflows, making the system unresponsive for extended periods (1h+).Related: #674 (comment-checker CPU spikes), #1722 (rg CPU spikes on Windows — closed as "working as designed")
Changes
--threads=4to allrginvocations in bothgrepandglobtools, preventing a single process from saturating all CPU coresSemaphore(2)that limits concurrentrgprocesses to 2, queuing additional requestsDEFAULT_TIMEOUT_MSfrom 300s to 60s ingrep/constants.ts— the tool description already claims "60s timeout" but the actual value was 5 minutesDEFAULT_MAX_OUTPUT_BYTESfrom 10MB to 256KB — 10MB is excessive for LLM context consumptionoutput_modeparameter: Supportscontent,files_with_matches(default), andcountmodes — inspired by Claude Code's approach of fetching file lists first, then reading specific fileshead_limitparameter: Limits result count for incremental fetching instead of retrieving everything at onceFiles Changed (8)
src/tools/shared/semaphore.tssrc/tools/grep/constants.tsDEFAULT_RG_THREADS=4, timeout 300s→60s, output 10MB→256KBsrc/tools/grep/types.tsthreads,outputMode,headLimittoGrepOptionssrc/tools/grep/cli.ts--threadsflag, semaphore wrapping, headLimit/outputMode supportsrc/tools/grep/tools.tsoutput_mode,head_limitparams, count mode supportsrc/tools/glob/types.tsthreadstoGlobOptionssrc/tools/glob/constants.tsDEFAULT_RG_THREADSsrc/tools/glob/cli.ts--threadsflag, semaphore wrappingImpact
Before: Two
rgprocesses × 12 threads each = 24 threads at 100% = system freezeAfter: Two
rgprocesses (max) × 4 threads each = 8 threads at 100% = smooth operationNon-breaking
output_modedefaults tofiles_with_matcheswhich is more efficient than the previous behavior of always returning full contentTest plan
bun test src/tools/grep src/tools/glob— 24 pass, 0 failbun run typecheck— No new errors (3 pre-existing errors in unrelated files)Summary by cubic
Throttle ripgrep to prevent CPU spikes by capping to 4 threads per process and allowing at most 2 concurrent runs. Tightens timeouts/output and adds lighter output modes to make searches faster and safer.
Bug Fixes
New Features
Written for commit 02017a1. Summary will update on new commits.
I have read the CLA Document and I hereby sign the CLA