Compress cold local thread-store rollouts by jif-oai · Pull Request #24941 · openai/codex

jif-oai · 2026-05-28T17:24:43Z

Why

Local thread-store rollouts are append-only JSONL files under sessions/ and archived_sessions/. Long-running local stores can accumulate a lot of cold history, but the store still needs to resume, list, search, and update thread metadata without requiring every old rollout to stay expanded on disk.

This adds an opt-in, under-development local_thread_store_compression feature so we can reclaim disk for cold rollouts while keeping existing local thread-store APIs compatible with both .jsonl and .jsonl.zst files.

What changed

Adds a best-effort rollout compression worker that compresses .jsonl rollouts older than seven days in sessions/ and archived_sessions/, uses a per-codex_home lock, verifies zstd output, preserves mtimes, and cleans stale temp files.
Wires that worker behind the local_thread_store_compression feature when constructing the local thread store.
Teaches rollout listing, metadata backfill, search, state-db filtering, and local thread-store path resolution to treat compressed rollouts as valid rollout files and avoid duplicate plain/compressed siblings.
Materializes a compressed rollout back to plain JSONL before append/resume writes, so existing write paths keep appending to a normal rollout file.
Adds the zstd rollout dependency and regenerates config schema for the feature key.

Testing

Added focused coverage in codex-rs/rollout/src/compression_tests.rs for:

loading history from a compressed rollout;
appending to a compressed rollout by materializing it back to JSONL;
compressing old active and archived rollouts while leaving fresh rollouts untouched and cleaning stale temps.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d97837a9a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

## Rollout compression stack This stack splits #24941 into reviewable steps for local rollout compression. The design is intentionally staged: 1. Teach readers, listing, search, and lookup to understand compressed rollouts. 2. Make append and resume paths materialize compressed rollouts back to plain JSONL before writing. 3. Add a disabled-by-default worker that can compress cold archived rollouts behind `local_thread_store_compression`. The key invariant is that writers append to plain `.jsonl`. A `.jsonl.zst` file is a cold/read representation; if a write is needed, the compressed file is materialized back to plain JSONL first. Readers prefer plain `.jsonl` when both forms exist and can fall back to the compressed sibling during transitions. The worker is deliberately the last PR and remains behind an under-development feature flag. It currently scans only `archived_sessions`, not active `sessions`, because active sessions have the highest resume/append race risk. That means this stack does not yet compress most unarchived local history. ## Known race / follow-up The remaining unresolved design question is writer/compressor coordination. Even for archived rollouts, a resume or metadata update can append while the worker is replacing the plain file with `.jsonl.zst`; the current double-stat checks narrow but do not fully eliminate the window where a writer has opened the plain file before unlink. Do not treat the worker PR as production-ready until we either: - prevent append/resume paths from racing archived compression, or - introduce a shared representation/append lock or equivalent coordination. The first two PRs are useful independently: they make compressed rollouts readable and make append paths safely recover back to plain JSONL. The third PR isolates the worker behavior so that coordination issue is reviewable separately. ## Validation Focused local validation for the stack includes: - `just test -p codex-rollout` - `just test -p codex-thread-store` where thread-store paths were touched - `just test -p codex-features` for the feature flag slice - `just bazel-lock-check` after dependency graph changes - scoped `just fix -p ...` passes for changed crates CI is still the source of truth for the full platform matrix. ## This PR in the stack This is PR 3/3, based on #25088. It adds the under-development feature flag and starts the best-effort background worker when enabled. The worker currently compresses only cold archived rollouts, skips active sessions, verifies compressed output, preserves mtime and permissions, keeps a store-level lock heartbeat, and cleans stale temp files. Stack order: 1. #25087: read compressed local rollouts. 2. #25088: materialize compressed rollouts before append. 3. This PR: add the disabled local compression worker.

## Rollout compression stack This stack splits openai#24941 into reviewable steps for local rollout compression. The design is intentionally staged: 1. Teach readers, listing, search, and lookup to understand compressed rollouts. 2. Make append and resume paths materialize compressed rollouts back to plain JSONL before writing. 3. Add a disabled-by-default worker that can compress cold archived rollouts behind `local_thread_store_compression`. The key invariant is that writers append to plain `.jsonl`. A `.jsonl.zst` file is a cold/read representation; if a write is needed, the compressed file is materialized back to plain JSONL first. Readers prefer plain `.jsonl` when both forms exist and can fall back to the compressed sibling during transitions. The worker is deliberately the last PR and remains behind an under-development feature flag. It currently scans only `archived_sessions`, not active `sessions`, because active sessions have the highest resume/append race risk. That means this stack does not yet compress most unarchived local history. ## Known race / follow-up The remaining unresolved design question is writer/compressor coordination. Even for archived rollouts, a resume or metadata update can append while the worker is replacing the plain file with `.jsonl.zst`; the current double-stat checks narrow but do not fully eliminate the window where a writer has opened the plain file before unlink. Do not treat the worker PR as production-ready until we either: - prevent append/resume paths from racing archived compression, or - introduce a shared representation/append lock or equivalent coordination. The first two PRs are useful independently: they make compressed rollouts readable and make append paths safely recover back to plain JSONL. The third PR isolates the worker behavior so that coordination issue is reviewable separately. ## Validation Focused local validation for the stack includes: - `just test -p codex-rollout` - `just test -p codex-thread-store` where thread-store paths were touched - `just test -p codex-features` for the feature flag slice - `just bazel-lock-check` after dependency graph changes - scoped `just fix -p ...` passes for changed crates CI is still the source of truth for the full platform matrix. ## This PR in the stack This is PR 3/3, based on openai#25088. It adds the under-development feature flag and starts the best-effort background worker when enabled. The worker currently compresses only cold archived rollouts, skips active sessions, verifies compressed output, preserves mtime and permissions, keeps a store-level lock heartbeat, and cleans stale temp files. Stack order: 1. openai#25087: read compressed local rollouts. 2. openai#25088: materialize compressed rollouts before append. 3. This PR: add the disabled local compression worker.

feat: compressor

d97837a

jif-oai requested a review from a team as a code owner May 28, 2026 17:24

jif-oai changed the title ~~feat: compressor~~ Compress cold local thread-store rollouts May 28, 2026

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

Comment thread codex-rs/rollout/src/compression.rs

Comment thread codex-rs/rollout/src/compression.rs

Comment thread codex-rs/rollout/src/compression.rs Outdated

Comment thread codex-rs/rollout/src/compression.rs

jif-oai added 5 commits May 28, 2026 18:45

fixes

6d523bb

box fix

988badb

again

7b74a61

make it stronger

8ad185c

Merge branch 'main' into jif/compressor

8c640ae

This was referenced May 29, 2026

Read compressed rollouts and materialize before append #25087

Merged

Materialize compressed rollouts before append #25088

Merged

Compress cold local rollouts #25089

Merged

jif-oai closed this Jun 1, 2026

aibrahim-oai mentioned this pull request Jun 1, 2026

[codex] Carry source thread ID in forked history #25662

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compress cold local thread-store rollouts#24941

Compress cold local thread-store rollouts#24941
jif-oai wants to merge 6 commits into
mainfrom
jif/compressor

jif-oai commented May 28, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jif-oai commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Testing

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jif-oai commented May 28, 2026 •

edited

Loading