Skip to content

Bound external agent session detection work#26291

Merged
stefanstokic-oai merged 2 commits into
mainfrom
codex/optimize-agent-import-detection
Jun 4, 2026
Merged

Bound external agent session detection work#26291
stefanstokic-oai merged 2 commits into
mainfrom
codex/optimize-agent-import-detection

Conversation

@stefanstokic-oai

@stefanstokic-oai stefanstokic-oai commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Why

External agent migration detection parsed and hashed every JSONL session file. For users with many large conversations, launching migration could consume substantial CPU and disk resources.

Detection only needs the most recent sessions for the migration UI, so full-content work should be bounded.

What

  • Use file modification metadata to select the 50 most recent eligible sessions before parsing JSONL content.
  • Skip unchanged imported sessions using metadata stored in the import ledger.
  • Preserve content hashing when metadata indicates a session may have changed.
  • Stream SHA-256 calculation through a 64 KiB buffer instead of loading an entire session into memory.
  • Continue detecting older sessions in subsequent batches after newer sessions are imported.

Validation

  • RUST_MIN_STACK=8388608 cargo nextest run --no-fail-fast -p codex-external-agent-sessions
    • 20 tests passed.
  • Benchmarked release builds against 250 valid JSONL sessions totaling 501 MiB:
    • Median detection time decreased from 1,138.8 ms to 47.0 ms.
    • CPU instructions decreased by 95.8%.
    • Both versions returned the expected 50 sessions.

The benchmark used warm filesystem caches and measured the reduction in parsing, hashing, and CPU work.

@stefanstokic-oai stefanstokic-oai changed the title Optimize external agent session detection Bound external agent session detection work Jun 4, 2026
@stefanstokic-oai stefanstokic-oai marked this pull request as ready for review June 4, 2026 15:55

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: defa48a5e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +75 to +78
file_candidates.push((Reverse(modified_at_nanos), path));
if file_candidates.len() > SESSION_IMPORT_MAX_COUNT {
file_candidates.pop();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Backfill candidates after filtering unimportable sessions

When a project has more than 50 recent .jsonl files but the newest files are not importable (for example their cwd has been deleted or summarize_session returns None), this caps the heap before those files are parsed and validated. The later filters then drop them without considering older valid sessions, so detection can return an empty or partial list even though importable recent sessions exist; the previous implementation truncated only after validation.

Useful? React with 👍 / 👎.

Comment on lines +104 to +105
if ledger_changed {
save_import_ledger(codex_home, &ledger)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not fail detection on opportunistic ledger writes

If codex_home or the ledger file is not writable (for example read-only permissions or a full disk), this opportunistic refresh turns an otherwise successful detection into an Err after migrations has already been built. The write only caches source mtimes for future skip decisions, so a save failure should be ignored or logged; otherwise users can lose the migration prompt even though the session files were readable and importable.

Useful? React with 👍 / 👎.

let now = now_unix_seconds();
let ledger = load_import_ledger(codex_home)?;
let mut candidates = Vec::new();
let mut ledger = load_import_ledger(codex_home)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid overwriting concurrent ledger imports

When a detection request loads this snapshot and later refreshes an already-imported session, any import that completes in the meantime writes a newer ledger record, but the subsequent save_import_ledger from detection writes this stale snapshot back out and drops that import record. In the app server this can happen because session imports run in a spawned background task while new detect requests are still accepted, so the imported session can be offered/imported again.

Useful? React with 👍 / 👎.

@stefanstokic-oai stefanstokic-oai merged commit cbf62f6 into main Jun 4, 2026
31 checks passed
@stefanstokic-oai stefanstokic-oai deleted the codex/optimize-agent-import-detection branch June 4, 2026 17:13
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants