Skip to content

feat(record): add live recording command for API capture#300

Merged
jackwener merged 2 commits intojackwener:mainfrom
yee94:feat/record-command
Mar 23, 2026
Merged

feat(record): add live recording command for API capture#300
jackwener merged 2 commits intojackwener:mainfrom
yee94:feat/record-command

Conversation

@yee94
Copy link
Copy Markdown
Contributor

@yee94 yee94 commented Mar 23, 2026

Summary

  • Add opencli record <url> command that injects fetch/XHR interceptors into all tabs in the automation window, polls captured requests every 2s, and auto-generates YAML candidate adapters
  • Support multi-tab recording: new tabs discovered during polling are automatically injected (handles pages that open new tabs mid-session)
  • Add --timeout (default 60s) for agent-friendly non-blocking operation; stops on Enter, timeout, or SIGINT — whichever comes first
  • Fix idempotent re-injection: restores original fetch/XHR before re-patching so the guard flag no longer blocks subsequent record runs on the same tab
  • Expand SKILL.md with a full Record Workflow section

New options

opencli record <url>
  --site <name>     Site name (inferred from URL if omitted)
  --timeout <ms>    Auto-stop after N ms (default: 60000)
  --poll <ms>       Poll interval (default: 2000)
  --out <dir>       Output directory for candidates

Output

.opencli/record/<site>/captured.json        # raw captured requests
.opencli/record/<site>/candidates/*.yaml    # high-confidence YAML candidates (score >= 8)

- Add `opencli record <url>` command that injects fetch/XHR interceptors
  into all tabs in the automation window, polls captured requests, and
  auto-generates YAML candidate adapters
- Support multi-tab recording: new tabs discovered during polling are
  automatically injected
- Add --timeout (default 60s) for agent-friendly non-blocking operation;
  stops on Enter, timeout, or SIGINT — whichever comes first
- Fix idempotent re-injection: restores original fetch/XHR before
  re-patching so guard flag no longer blocks subsequent record runs
- Add --poll interval option (default 2000ms)
- Expand SKILL.md with full Record Workflow section: interceptor
  internals, page-type capture expectations, YAML→TS conversion guide,
  and troubleshooting table
Astro-Han

This comment was marked as duplicate.

@Astro-Han

This comment was marked as duplicate.

@Astro-Han
Copy link
Copy Markdown
Contributor

Really interesting concept — recording live API traffic to auto-generate adapters is a powerful complement to the existing exploresynthesizegenerate workflow. Several significant concerns for core code at this scope:

Security

Captured data written to disk without redaction

The interceptor captures all fetch/XHR responses and writes them verbatim to captured.json — including access_token, CSRF tokens, session IDs, and PII. At minimum, sensitive values should be redacted before writing to disk.

Interceptor Issues

fetch patch chain accumulation — If a page's fetch was already patched by a third-party library (Sentry, DataDog), the restore step saves the third-party's patch as "original". Multiple injection cycles build a call chain, each adding clone+json overhead.

XHR send leaks event listeners — Each send() adds addEventListener('load', ...) without removing previous ones. If an XHR is reused (abort → open → send), the same response gets pushed into the buffer multiple times.

Existing interceptor.ts not reused — The project already has a mature interceptor in src/interceptor.ts (with patchGuard, error logging). record.ts reimplements a separate one with different global variable names and no error recording. Consider extending the existing one.

CDP Mode Incompatibility

recordSession() bypasses IPage and calls the daemon's tabs/exec directly. This works for Extension mode but breaks under OPENCLI_CDP_ENDPOINT (CDP uses WebSocket, not the daemon).

Generated YAML Issues

  • Args declared but never interpolatedkeyword/page args are declared but the fetch URL is hardcoded to the recorded URL, so the args have no effect
  • pathChain empty string → syntax error — When findArrayPath returns "" (root-level array), pathChain becomes ?., generating data?. which is invalid JS
  • Dedup uses URL pattern only — The existing explore.ts deduplicates by method:pattern; here only the URL pattern is used

Cleanup & Robustness

  • readline.Interface not closed on --timeout → process hangs waiting for stdin
  • Page patches (fetch/XHR overrides) never restored on exit
  • parseInt(opts.poll) with non-numeric input → NaNsetInterval(fn, 0) → floods the daemon
  • allRequests array has no size cap — long sessions can exhaust memory
  • Injection errors silently swallowed — daemon disconnect appears as "no data captured"

Architecture

record.ts re-implements URL pattern matching, auth detection, scoring, and YAML generation from explore.ts/synthesize.ts, but drops auth header analysis and store-action classification. Consider whether record could feed captured requests INTO the existing synthesizegenerate pipeline rather than building a parallel generator.

Tests

573 lines of core infrastructure with zero test coverage. At minimum, the pure functions (urlToPattern, findArrayPath, scoreRequest, buildRecordedYaml) are easily unit-testable.

…hang & args interpolation

- XHR send(): add __rec_listener_added guard to prevent duplicate event
  listeners when XHR is reused (abort → open → send)
- pathChain: when findArrayPath returns '' (root-level array), data access
  is just 'data' not 'data?.' which was invalid JS syntax
- waitForEnter(): return cleanup fn so timeout path can close readline.Interface
  preventing the process from hanging on stdin after auto-timeout
- buildRecordedYaml: replace search/page query param values with template
  vars ({{args.keyword}}, {{args.page}}) so generated YAML actually uses
  the declared args instead of hardcoding the recorded URL
@jackwener jackwener merged commit dff0fe5 into jackwener:main Mar 23, 2026
@jackwener
Copy link
Copy Markdown
Owner

感谢 @yee94 贡献 record 命令!这个功能非常棒,让 API 发现工作流更加顺畅 🎉

在合并前我直接在你的分支上做了几处 bug 修复:

  • XHR listener 泄漏: send() 每次调用都会新增 addEventListener,XHR 重用(abort→open→send)时会重复采集同一响应。加了 __rec_listener_added guard 修复。
  • pathChain 空字符串语法错误: findArrayPath 返回 path: ''(根级数组)时,原代码生成 data?. 是无效 JS。现在正确处理为直接访问 data
  • readline 未关闭导致进程 hang: timeout 触发时 readline.Interface 没有被关闭,进程会一直等待 stdin。改为返回 cleanup 函数,timeout 时显式关闭。
  • Args 声明但未 interpolated: 生成的 YAML 里声明了 keyword/page args,但 evaluate 脚本里 URL 是硬编码的。现在会把 search/page 参数值替换为 {{args.keyword}} / {{args.page}} 模板变量。

再次感谢你的贡献!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants