Skip to content

[codex] Speed up external agent session imports#26637

Merged
stefanstokic-oai merged 6 commits into
mainfrom
codex/external-agent-import-perf
Jun 8, 2026
Merged

[codex] Speed up external agent session imports#26637
stefanstokic-oai merged 6 commits into
mainfrom
codex/external-agent-import-perf

Conversation

@stefanstokic-oai

Copy link
Copy Markdown
Contributor

Why

Importing large external-agent session histories currently starts a full live Codex thread for every imported session. This initializes unrelated runtime systems and repeats expensive transcript, metadata, hashing, and ledger work.

On a 50-session, 238 MiB fixture, the existing path took roughly 70 seconds to complete the import and 77 seconds end to end.

What changed

  • Persist imported sessions directly through ThreadStore instead of starting full live threads.
  • Process imports through a bounded five-session pipeline.
  • Parse, extract, and hash each source file in one pass.
  • Move blocking source preparation onto the blocking thread pool.
  • Reuse prepared content hashes and update the import ledger once per batch.
  • Avoid metadata readback for newly written rollouts.
  • Preserve imported conversation history and visible thread metadata.
  • Keep the implementation out of codex-core and avoid changes to the public ThreadStore trait.

Performance

For the same 50-session, 238 MiB fixture:

Path Import completion End to end
Existing import 69.61s 76.62s
This change 5.95s 6.58s

All 50 sessions imported successfully with no warnings or contention signals.

Validation

  • just test -p codex-external-agent-sessions
  • just test -p codex-app-server external_agent_config_import
  • Verified imports do not initialize unrelated required MCP servers.
  • Verified previously imported source versions are skipped and changed sources can be imported again.
  • Verified imported rollouts remain readable through thread listing and history APIs.

@stefanstokic-oai stefanstokic-oai marked this pull request as ready for review June 8, 2026 17:12

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 583174aca7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +202 to +217
let metadata = ThreadMetadataPatch {
title,
preview: first_user_message.clone(),
model_provider: Some(model_provider),
created_at: Some(now),
updated_at: Some(now),
source: Some(source.clone()),
thread_source: Some(None),
agent_nickname: Some(source.get_nickname()),
agent_role: Some(source.get_agent_role()),
agent_path: Some(source.get_agent_path().map(Into::into)),
cwd: Some(cwd),
cli_version: Some(env!("CARGO_PKG_VERSION").to_string()),
first_user_message,
memory_mode: Some(memory_mode),
..Default::default()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve git metadata for imported sessions

When the imported session's cwd is inside a Git repository, the old start_thread_with_options path went through ThreadMetadataSync::for_create, which collects and persists the repo Git info. This direct ThreadStore path builds the metadata patch manually but never sets git_info, so newly imported threads lose their gitInfo in thread/list/thread/read even though normal imported rollouts used to expose it.

Useful? React with 👍 / 👎.

Comment on lines +202 to +204
let metadata = ThreadMetadataPatch {
title,
preview: first_user_message.clone(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep imported titles in rollout-backed reads

When importing a session with a custom/AI title on installations where the SQLite state DB is unavailable or later rebuilt from rollouts, setting only the observed title updates SQLite but skips the rollout name index. The old path used the explicit name update, which wrote the legacy name, so these imported threads lose their title in rollout-backed reads/lists.

Useful? React with 👍 / 👎.

}
}
}
if let Err(err) = record_completed_session_imports(&self.codex_home, completed_imports) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record completed imports before finishing the batch

For large imports, completed sessions stay only in memory until the whole stream finishes; if the app exits or this background task is cancelled after some persist_session calls succeed but before this ledger write, those source versions are not marked imported. The next detection/import will treat them as new and create duplicate imported threads, whereas the previous path recorded each completed session immediately.

Useful? React with 👍 / 👎.


use crate::config_manager::ConfigManager;

const SESSION_IMPORT_CONCURRENCY: usize = 5;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Split this oversized change

AGENTS.md Change size guidance caps non-mechanical changes at 800 changed lines; this commit changes about 1,171 lines across import, ledger, parsing, and app-server behavior. Please split a coherent stage, such as parser/ledger batching separately from direct ThreadStore import.

Useful? React with 👍 / 👎.

.expect("record completed imports");

let ledger = super::load_import_ledger(&codex_home).expect("ledger");
assert_eq!(ledger.records.len(), 1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Use pretty assertions in the new test module

AGENTS.md Test assertions asks Rust tests to import and use pretty_assertions::assert_eq for clearer diffs. This new test module uses the standard macro throughout, so failures on ledger records will be harder to read than the repo convention expects.

Useful? React with 👍 / 👎.

@stefanstokic-oai stefanstokic-oai merged commit 6d8e12a into main Jun 8, 2026
31 checks passed
@stefanstokic-oai stefanstokic-oai deleted the codex/external-agent-import-perf branch June 8, 2026 18:16
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 8, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants