[pull] main from openai:main by pull[bot] · Pull Request #12 · kontext-dev/codex

pull · 2026-02-26T18:25:31Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Summary - Skip `history_metadata` scanning when spawning subagents to avoid expensive per-spawn history scans. - Keeps behavior unchanged for normal sessions. Testing - `cd codex-rs && cargo test -p codex-core` - Failing in this environment (pre-existing and I don't think something I did?): - `suite::cli_stream::responses_mode_stream_cli` (SIGKILL + OTEL export error to http://localhost:14318/v1/logs) - `suite::grep_files::grep_files_tool_collects_matches` (unsupported call: grep_files) - `suite::grep_files::grep_files_tool_reports_empty_results` (unsupported call: grep_files) Co-authored-by: Codex <noreply@openai.com>

Attempt to reduce disk usage in mac ci. >off - This is the default for platforms with ELF binaries and windows-gnu (not Windows MSVC and not macOS). This typically means that DWARF debug information can be found in the final artifact in sections of the executable. This option is not supported on Windows MSVC. On macOS this options prevents the final execution of dsymutil to generate debuginfo.

Details here: https://openai.slack.com/archives/C09NZ54M4KY/p1772056758227339

## Summary - make `Config.model_reasoning_summary` optional so unset means use model default - resolve the optional config value to a concrete summary when building `TurnContext` - add protocol support for `default_reasoning_summary` in model metadata ## Validation - `cargo test -p codex-core --lib client::tests -- --nocapture` --------- Co-authored-by: Codex <noreply@openai.com>

## Summary - add tracing-based diagnostics for nested `codex.tool(...)` calls made from `js_repl` - emit a bounded, sanitized summary at `info!` - emit the exact raw serialized response object or error string seen by JavaScript at `trace!` - document how to enable these logs and where to find them, especially for `codex app-server` ## Why Nested `codex.tool(...)` calls inside `js_repl` are a debugging boundary: JavaScript sees the tool result, but that result is otherwise hard to inspect from outside the kernel. This change adds explicit tracing for that path using the repo’s normal observability pattern: - `info` for compact summaries - `trace` for exact raw payloads when deep debugging is needed ## What changed - `js_repl` now summarizes nested tool-call results across the response shapes it can receive: - message content - function-call outputs - custom tool outputs - MCP tool results and MCP error results - direct error strings - each nested `codex.tool(...)` completion logs: - `exec_id` - `tool_call_id` - `tool_name` - `ok` - a bounded summary struct describing the payload shape - at `trace`, the same path also logs the exact serialized response object or error string that JavaScript received - docs now include concrete logging examples for `codex app-server` - unit coverage was added for multimodal function output summaries and error summaries ## How to use it ### Summary-only logging Set: ```sh RUST_LOG=codex_core::tools::js_repl=info ``` For `codex app-server`, tracing output is written to the server process `stderr`. Example: ```sh RUST_LOG=codex_core::tools::js_repl=info \ LOG_FORMAT=json \ codex app-server \ 2> /tmp/codex-app-server.log ``` This emits bounded summary lines for nested `codex.tool(...)` calls. ### Full raw debugging Set: ```sh RUST_LOG=codex_core::tools::js_repl=trace ``` Example: ```sh RUST_LOG=codex_core::tools::js_repl=trace \ LOG_FORMAT=json \ codex app-server \ 2> /tmp/codex-app-server.log ``` At `trace`, you get: - the same `info` summary line - a `trace` line with the exact serialized response object seen by JavaScript - or the exact error string if the nested tool call failed ### Where the logs go For `codex app-server`, these logs go to process `stderr`, so redirect or capture `stderr` to inspect them. Example: ```sh RUST_LOG=codex_core::tools::js_repl=trace \ LOG_FORMAT=json \ /Users/fjord/code/codex/codex-rs/target/debug/codex app-server \ 2> /tmp/codex-app-server.log ``` Then inspect: ```sh rg "js_repl nested tool call" /tmp/codex-app-server.log ``` Without an explicit `RUST_LOG` override, these `js_repl` nested tool-call logs are typically not visible.

Co-authored-by: Codex <noreply@openai.com>

This log message floods logs on windows

OpenSSH complains if any other users have read access to ssh keys. ie #12226

## Why Before this change, an escalation approval could say that a command should be rerun, but it could not carry the sandbox configuration that should still apply when the escalated command is actually spawned. That left an unsafe gap in the `zsh-fork` skill path: skill scripts under `scripts/` that did not declare permissions could be escalated without a sandbox, and scripts that did declare permissions could lose their bounded sandbox on rerun or cached session approval. This PR extends the escalation protocol so approvals can optionally carry sandbox configuration all the way through execution. That lets the shell runtime preserve the intended sandbox instead of silently widening access. We likely want a single permissions type for this codepath eventually, probably centered on `Permissions`. For now, the protocol needs to represent both the existing `PermissionProfile` form and the fuller `Permissions` form, so this introduces a temporary disjoint union, `EscalationPermissions`, to carry either one. Further, this means that today, a skill either: - does not declare any permissions, in which case it is run using the default sandbox for the turn - specifies permissions, in which case the skill is run using that exact sandbox, which might be more restrictive than the default sandbox for the turn We will likely change the skill's permissions to be additive to the existing permissions for the turn. ## What Changed - Added `EscalationPermissions` to `codex-protocol` so escalation requests can carry either a `PermissionProfile` or a full `Permissions` payload. - Added an explicit `EscalationExecution` mode to the shell escalation protocol so reruns distinguish between `Unsandboxed`, `TurnDefault`, and `Permissions(...)` instead of overloading `None`. - Updated `zsh-fork` shell reruns to resolve `TurnDefault` at execution time, which keeps ordinary `UseDefault` commands on the turn sandbox and preserves turn-level macOS seatbelt profile extensions. - Updated the `zsh-fork` skill path so a skill with no declared permissions inherits the conversation's effective sandbox instead of escalating unsandboxed. - Updated the `zsh-fork` skill path so a skill with declared permissions reruns with exactly those permissions, including when a cached session approval is reused. ## Testing - Added unit coverage in `core/src/tools/runtimes/shell/unix_escalation.rs` for the explicit `UseDefault` / `RequireEscalated` / `WithAdditionalPermissions` execution mapping. - Added unit coverage in `core/src/tools/runtimes/shell/unix_escalation.rs` for macOS seatbelt extension preservation in both the `TurnDefault` and explicit-permissions rerun paths. - Added integration coverage in `core/tests/suite/skill_approval.rs` for permissionless skills inheriting the turn sandbox and explicit skill permissions remaining bounded across cached approval reuse.

## Summary - make resume/fork targets explicit and typed as `SessionTarget { path, thread_id }` (non-optional `thread_id`) - resolve `thread_id` centrally via `resolve_session_thread_id(...)`: - use CLI input directly when it is a UUID (`--resume <uuid>` / `--fork <uuid>`) - otherwise read `thread_id` from rollout `SessionMeta` for path-based selections (picker, `--resume-last`, name-based resume/fork) - use `thread_id` to read cwd from SQLite first during resume/fork cwd resolution - keep rollout fallback for cwd resolution when SQLite is unavailable or does not return thread metadata (`TurnContext` tail, then `SessionMeta`) - keep the resume picker open when a selected row has unreadable session metadata, and show an inline recoverable error instead of aborting the TUI ## Why This removes ad-hoc rollout filename parsing and makes resume/fork target identity explicit. The resume/fork cwd check can use indexed SQLite lookup by `thread_id` in the common path, while preserving rollout-based fallback behavior. It also keeps malformed legacy rows recoverable in the picker instead of letting a selection failure unwind the app. ## Notes - minimal TUI-only change; no schema/protocol changes - includes TUI test coverage for SQLite cwd precedence when `thread_id` is available - includes TUI regression coverage for picker inline error rendering / non-fatal unreadable session rows ## Codex author `codex resume 019c9205-7f8b-7173-a2a2-f082d4df3de3`

Summary is a required parameter on UserTurn. Ideally we'd like the core to decide the appropriate summary level. Make the summary optional and don't send it when not needed.

## Why `codex features list` currently prints features in declaration order from `codex_core::features::FEATURES`. That makes the output harder to scan when looking for a specific flag, and the order can change for reasons unrelated to the CLI. ## What changed - Sort the `codex features list` rows by feature key before printing them in `codex-rs/cli/src/main.rs`. - Add an integration test in `codex-rs/cli/tests/features.rs` that runs `codex features list` and asserts the feature-name column is alphabetized. ## Verification - Added `features_list_is_sorted_alphabetically_by_feature_name`. - Ran `cargo test -p codex-cli`.

## Summary - add top-level realtime audio config for microphone and speaker selection - apply configured devices when starting realtime capture and playback - keep missing-device behavior on the system default fallback path ## Validation - just write-config-schema - cargo test -p codex-core realtime_audio - cargo test -p codex-tui - just fix -p codex-core - just fix -p codex-tui - just fmt --------- Co-authored-by: Codex <noreply@openai.com>

## Why `unix_escalation.rs` had a large inline `mod tests` block that made the implementation harder to scan. This change moves those tests into a sibling file while keeping them as a child module, so they can still exercise private items without widening visibility. ## What Changed - replaced the inline `#[cfg(test)] mod tests` block in `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` with a path-based test module declaration - moved the existing unit tests into `codex-rs/core/src/tools/runtimes/shell/unix_escalation_tests.rs` - kept the extracted tests using `super::...` imports so they continue to access private helpers and types from `unix_escalation.rs` ## Testing - `cargo test -p codex-core unix_escalation::tests`

## Summary This PR includes the session's local date and timezone in the model-visible environment context and persists that data in `TurnContextItem`. ## What changed - captures the current local date and IANA timezone when building a turn context, with a UTC fallback if the timezone lookup fails - includes current_date and timezone in the serialized <environment_context> payload - stores those fields on TurnContextItem so they survive rollout/history handling, subagent review threads, and resume flows - treats date/timezone changes as environment updates, so prompt caching and context refresh logic do not silently reuse stale time context - updates tests to validate the new environment fields without depending on a single hardcoded environment-context string ## test built a local build and saw it in the rollout file: ``` {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}} ```

## Summary\n- add a websocket test-server request waiter so tests can synchronize on recorded client messages\n- use that waiter in the realtime delegation test instead of a fixed audio timeout\n- add temporary timing logs in the test and websocket mock to inspect where the flake stalls

### Summary Update `model/list` in app server to include more upgrade information.

## Summary - add a dedicated /audio picker for realtime microphone and speaker selection - persist realtime audio choices and prompt to restart only local audio when voice is live - add snapshot coverage for the new picker surfaces ## Validation - cargo test -p codex-tui - cargo insta accept - just fix -p codex-tui - just fmt

## Summary This changes `custom_tool_call_output` to use the same output payload shape as `function_call_output`, so freeform tools can return either plain text or structured content items. The main goal is to let `js_repl` return image content from nested `view_image` calls in its own `custom_tool_call_output`, instead of relying on a separate injected message. ## What changed - Changed `custom_tool_call_output.output` from `string` to `FunctionCallOutputPayload` - Updated freeform tool plumbing to preserve structured output bodies - Updated `js_repl` to aggregate nested tool content items and attach them to the outer `js_repl` result - Removed the old `js_repl` special case that injected `view_image` results as a separate pending user image message - Updated normalization/history/truncation paths to handle multimodal `custom_tool_call_output` - Regenerated app-server protocol schema artifacts ## Behavior Direct `view_image` calls still return a `function_call_output` with image content. When `view_image` is called inside `js_repl`, the outer `js_repl` `custom_tool_call_output` now carries: - an `input_text` item if the JS produced text output - one or more `input_image` items from nested tool results So the nested image result now stays inside the `js_repl` tool output instead of being injected as a separate message. ## Compatibility This is intended to be backward-compatible for resumed conversations. Older histories that stored `custom_tool_call_output.output` as a plain string still deserialize correctly, and older histories that used the previous injected-image-message flow also continue to resume. Added regression coverage for resuming a pre-change rollout containing: - string-valued `custom_tool_call_output` - legacy injected image message history #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` #12948

- [x] Improve app/list with force_fetch=true, we now keep cached snapshot until both install apps and directory apps load.

Addresses bug #12589 Builds on community PR #12763. This adds `oauth_resource` support for MCP `streamable_http` servers and wires it through the relevant config and login paths. It fixes the bug where the configured OAuth resource was not reliably included in the authorization request, causing MCP login to omit the expected `resource` parameter.

- Update the cloud requirements cache TTL to 30 minutes. - Add a background job to refresh the cache every 5 minutes. - Ensure there is only one refresh job per process.

- replace show_nux with structured availability_nux model metadata - expose availability NUX data through the app-server model API - update shared fixtures and tests for the new field

- add transport and conversation logs around connect, close, and parse flow - log realtime transport failures as errors for easier debugging

## Why The `notify` hook payload did not identify which Codex client started the turn. That meant downstream notification hooks could not distinguish between completions coming from the TUI and completions coming from app-server clients such as VS Code or Xcode. Now that the Codex App provides its own desktop notifications, it would be nice to be able to filter those out. This change adds that context without changing the existing payload shape for callers that do not know the client name, and keeps the new end-to-end test cross-platform. ## What changed - added an optional top-level `client` field to the legacy `notify` JSON payload - threaded that value through `core` and `hooks`; the internal session and turn state now carries it as `app_server_client_name` - set the field to `codex-tui` for TUI turns - captured `initialize.clientInfo.name` in the app server and applied it to subsequent turns before dispatching hooks - replaced the notify integration test hook with a `python3` script so the test does not rely on Unix shell permissions or `bash` - documented the new field in `docs/config.md` ## Testing - `cargo test -p codex-hooks` - `cargo test -p codex-tui` - `cargo test -p codex-app-server suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name -- --exact --nocapture` - `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs` still has unrelated existing failures in this environment) ## Docs The public config reference on `developers.openai.com/codex` should mention that the legacy `notify` payload may include a top-level `client` field. The TUI reports `codex-tui`, and the app server reports `initialize.clientInfo.name` when it is available.

jif-oai and others added 10 commits February 26, 2026 14:11

chore: clean DB runtime (#12905)

79d6f80

feat: use memory usage for selection (#12909)

c528f32

fix: do not apply turn cwd to metadata (#12887)

739d4b5

Details here: https://openai.slack.com/archives/C09NZ54M4KY/p1772056758227339

fix: ctrl c sub agent (#12911)

f0a85de

chore: calm down awaiter (#12925)

c53c08f

feat: fork thread multi agent (#12499)

d3603ae

pull bot locked and limited conversation to collaborators Feb 26, 2026

pull bot added the upstream-sync Automated upstream sync PR label Feb 26, 2026

jif-oai and others added 18 commits February 26, 2026 18:55

feat: add post-compaction sub-agent infos (#12774)

3404ecf

Co-authored-by: Codex <noreply@openai.com>

Remove noisy log (#12929)

717cbe3

This log message floods logs on windows

don't grant sandbox read access to ~/.ssh and a few other dirs. (#12835)

6b879fe

OpenSSH complains if any other users have read access to ssh keys. ie #12226

feat: add git info to memories (#12940)

a6065d3

Allow clients not to send summary as an option (#12950)

951a389

Summary is a required parameter on UserTurn. Ideally we'd like the core to decide the appropriate summary level. Make the summary optional and don't send it when not needed.

Feat: cxa-1833 update model/list (#12958)

8715a6e

### Summary Update `model/list` in app server to include more upgrade information.

[apps] Improve app/list with force_fetch=true (#12745)

6fe3dc2

- [x] Improve app/list with force_fetch=true, we now keep cached snapshot until both install apps and directory apps load.

Add a background job to refresh the requirements local cache (#12936)

f53612d

- Update the cloud requirements cache TTL to 30 minutes. - Add a background job to refresh the cache every 5 minutes. - Ensure there is only one refresh job per process.

aibrahim-oai and others added 3 commits February 26, 2026 22:02

Add model availability NUX metadata (#12972)

4d180ae

- replace show_nux with structured availability_nux model metadata - expose availability NUX data through the app-server model API - update shared fixtures and tests for the new field

Add realtime websocket tracing (#12981)

53e28f1

- add transport and conversation logs around connect, close, and parse flow - log realtime transport failures as errors for easier debugging

michiosw merged commit 48b6342 into kontext-dev:main Feb 27, 2026
13 of 36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from openai:main#12

[pull] main from openai:main#12
michiosw merged 31 commits intokontext-dev:mainfrom
openai:main

pull bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

pull bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

pull bot commented Feb 26, 2026 •

edited

Loading