Skip to content

compact_remote.rs filter violates Responses item-pairing invariant for store=true requests #20774

@amabile4

Description

@amabile4

What version of Codex CLI is running?

codex-cli 0.128.0

What subscription do you have?

ChatGPT Plus

Which model were you using?

gpt-5.5 (seed response resolved to gpt-5.5-2026-04-23)

What platform is your computer?

Microsoft Windows NT 10.0.26200.0 x64

What terminal emulator and version are you using (if applicable)?

PowerShell 7.6.1 on native Windows, not WSL.

What issue are you seeing?

Codex remote compaction can create an invalid Responses API history for
store=true requests: it drops a reasoning item but can retain the dependent
server-assigned assistant message.

This is a provider-independent invariant violation: under store=true, the
Responses API itself rejects orphaned assistant messages, and Codex's remote
compaction filter can construct payloads that trigger this rejection.

For store=true Responses requests, this creates an invalid replay shape:

assistant message with a server-assigned id, but without its required preceding reasoning item

For stored Responses requests, a server-assigned assistant message can depend on
the preceding server-assigned reasoning item. If Codex asks for
reasoning.encrypted_content, that reasoning item is the replayable form needed
to keep the stored conversation consistent.

I verified the same invalid replay shape against OpenAI-hosted /v1/responses
with store=true:

OpenAI-hosted Responses API
request model id: gpt-5.5
seed response model: gpt-5.5-2026-04-23

seed response: 200
  output contained server-assigned reasoning item
  output contained server-assigned assistant message

paired replay, input=[reasoning, assistant, user]: 200

orphan replay, input=[assistant, user]: 400
  message: Item 'msg_[redacted]' of type 'message' was provided without its required 'reasoning' item: 'rs_[redacted]'.

The same orphan replay succeeds with store=false, so this may be hidden in
normal OpenAI-hosted Codex usage today.

I am hitting this in practice on Azure-hosted Responses today, where Codex sends
store=true, and I currently run a local fork to work around it. On
OpenAI-hosted Codex this is masked because Codex only enables store=true on
the Azure path; the API-level reproduction above confirms the same orphan replay
also fails against api.openai.com when store=true is used.

This report is based on two observations: the store=true Responses replay
invariant is independently reproducible with /v1/responses, and Codex's
remote compaction filter is asymmetric. I do not know whether the current
compact endpoint naturally emits this shape in common cases. In my short probe
on 2026-05-02, /responses/compact returned a user message plus a compaction
item, not an assistant message. The bug is that Codex explicitly allows
assistant messages from remote compact output while dropping reasoning items, so
any compact output that contains reasoning + assistant can become an invalid
store=true replay.

What steps can reproduce the bug?

1. API-level store=true invariant check

This does not require Azure.

Run this with OPENAI_API_KEY set:

import json
import os
import re
import urllib.error
import urllib.request

MODEL = os.environ.get("OPENAI_MODEL", "gpt-5.5")
BASE_URL = os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1")
API_KEY = os.environ["OPENAI_API_KEY"]


def post(path, payload):
    req = urllib.request.Request(
        f"{BASE_URL}/{path}",
        data=json.dumps(payload).encode(),
        headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
        method="POST",
    )
    try:
        with urllib.request.urlopen(req, timeout=120) as res:
            return res.status, json.loads(res.read())
    except urllib.error.HTTPError as err:
        return err.code, json.loads(err.read())


def user(text):
    return {"role": "user", "content": [{"type": "input_text", "text": text}]}


seed_status, seed = post("responses", {
    "model": MODEL,
    "input": "Answer with exactly: store-true-seed",
    "store": True,
    "reasoning": {"effort": "medium"},
    "include": ["reasoning.encrypted_content"],
})
assert seed_status == 200, seed

reasoning = next((i for i in seed["output"] if i.get("type") == "reasoning" and i.get("id")), None)
assistant = next((i for i in seed["output"] if i.get("type") == "message" and i.get("role") == "assistant" and i.get("id")), None)
assert reasoning is not None, f"no server-assigned reasoning item in seed output: {seed['output']}"
assert assistant is not None, f"no server-assigned assistant message in seed output: {seed['output']}"

paired_status, paired = post("responses", {
    "model": MODEL,
    "store": True,
    "input": [reasoning, assistant, user("Continue after paired replay.")],
})
assert paired_status == 200, paired

orphan_status, orphan = post("responses", {
    "model": MODEL,
    "store": True,
    "input": [assistant, user("Continue after orphan replay.")],
})

message = orphan.get("error", {}).get("message", "")
message = re.sub(r"(msg|rs)_[A-Za-z0-9]+", r"\1_[redacted]", message)

print("request model id:", MODEL)
print("seed response model:", seed.get("model"))
print("seed response:", seed_status)
print("paired replay:", paired_status)
print("orphan replay:", orphan_status)
print("orphan error:", message)

Observed:

request model id: gpt-5.5
seed response model: gpt-5.5-2026-04-23
seed response: 200
paired replay: 200
orphan replay: 400
orphan error: Item 'msg_[redacted]' of type 'message' was provided without its required 'reasoning' item: 'rs_[redacted]'.

The error message above is verbatim from the API response except for redacted
server item ids.

2. Codex client-side transformation

This is the Codex-side invariant break and does not require Azure either.

In codex-rs/core/src/compact_remote.rs, process_compacted_history() filters
remote compact output:

compacted_history.retain(should_keep_compacted_history_item);

The filter keeps assistant messages:

ResponseItem::Message { role, .. } if role == "assistant" => true,

but drops reasoning items:

ResponseItem::Reasoning { .. } => false,

Permalink against upstream/main at 35aaa5d9fc, fetched 2026-05-02:

compacted_history.retain(should_keep_compacted_history_item);
insert_initial_context_before_last_real_user_or_summary(compacted_history, initial_context)
}
/// Returns whether an item from remote compaction output should be preserved.
///
/// Called while processing the model-provided compacted transcript, before we
/// append fresh canonical context from the current session.
///
/// We drop:
/// - `developer` messages because remote output can include stale/duplicated
/// instruction content.
/// - non-user-content `user` messages (session prefix/instruction wrappers),
/// while preserving real user messages and persisted hook prompts.
///
/// This intentionally keeps:
/// - `assistant` messages (future remote compaction models may emit them)
/// - `user`-role warnings and compaction-generated summary messages because
/// they parse as `TurnItem::UserMessage`.
fn should_keep_compacted_history_item(item: &ResponseItem) -> bool {
match item {
ResponseItem::Message { role, .. } if role == "developer" => false,
ResponseItem::Message { role, .. } if role == "user" => {
matches!(
crate::event_mapping::parse_turn_item(item),
Some(TurnItem::UserMessage(_) | TurnItem::HookPrompt(_))
)
}
ResponseItem::Message { role, .. } if role == "assistant" => true,
ResponseItem::Message { .. } => false,
ResponseItem::Compaction { .. } => true,
ResponseItem::Reasoning { .. }
| ResponseItem::LocalShellCall { .. }
| ResponseItem::FunctionCall { .. }
| ResponseItem::ToolSearchCall { .. }
| ResponseItem::FunctionCallOutput { .. }
| ResponseItem::ToolSearchOutput { .. }
| ResponseItem::CustomToolCall { .. }
| ResponseItem::CustomToolCallOutput { .. }
| ResponseItem::WebSearchCall { .. }
| ResponseItem::ImageGenerationCall { .. }
| ResponseItem::Other => false,

So a compact output like:

Reasoning(rs_...)
Message(role="assistant", id=msg_...)

can become:

Message(role="assistant", id=msg_...)

A minimal unit test in compact_remote.rs can make the current invariant break
visible:

#[test]
fn remote_compaction_filter_does_not_orphan_assistant_message() {
    use codex_protocol::models::{ContentItem, ResponseItem};

    let assistant = ResponseItem::Message {
        id: Some("msg_test".to_string()),
        role: "assistant".to_string(),
        content: vec![ContentItem::OutputText {
            text: "summary".to_string(),
        }],
        phase: None,
    };
    let mut compacted_history = vec![
        ResponseItem::Reasoning {
            id: "rs_test".to_string(),
            summary: Vec::new(),
            content: None,
            encrypted_content: Some("encrypted".to_string()),
        },
        assistant,
    ];

    compacted_history.retain(should_keep_compacted_history_item);

    assert!(
        !matches!(
            compacted_history.as_slice(),
            [ResponseItem::Message {
                id: Some(_),
                role,
                ..
            }] if role == "assistant"
        ),
        "assistant message survived without its reasoning predecessor: {compacted_history:?}"
    );
}

On the current filter at 35aaa5d9fc, this test fails with the panic shown
below. After applying either fix listed under "Expected behavior", the test
should pass, making it a suitable regression check.

I confirmed it fails on 35aaa5d9fc with:

assistant message survived without its reasoning predecessor: [Message { id: Some("msg_test"), role: "assistant", content: [OutputText { text: "summary" }], phase: None }]

What is the expected behavior?

Codex should preserve the Responses item-pairing invariant when shaping remote
compaction output.

Either of these would avoid the invalid store=true replay:

  1. Keep ResponseItem::Reasoning when retaining the dependent assistant item.
  2. Drop server-assigned assistant messages whose required reasoning predecessor
    was removed.

Additional information

I also checked current upstream/main as of 2026-05-02.

Current Codex sets the Responses request store value like this:

store: provider.is_azure_responses_endpoint()

Permalink against upstream/main at 35aaa5d9fc, fetched 2026-05-02:

store: provider.is_azure_responses_endpoint(),

Impact:

  • The root cause is a client-side invariant violation in
    process_compacted_history(). OpenAI-hosted Responses API rejects the same
    orphaned assistant replay when store=true is used, so this is not an
    Azure-only validation rule.
  • The failure surfaces today for Azure-hosted Responses endpoints because Codex
    sets store=true only when provider.is_azure_responses_endpoint() is true.
  • Any future Codex path or provider that uses store=true can hit the same
    invariant violation unless Codex preserves the reasoning/assistant pair.

Suggested fix:

After filtering remote compact output, enforce that retained server-assigned
assistant messages still have their required preceding reasoning item.

For example, I would add a small helper like:

compacted_history.retain(should_keep_compacted_history_item);
remove_orphaned_assistant_messages(&mut compacted_history); // new helper

Alternatively, retain Reasoning items from remote compact output when retaining
dependent assistant messages.

I would be happy to open a PR with either approach if maintainers have a
preference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    azureIssues related to the Azure-hosted OpenAI modelsbugSomething isn't workingcontextIssues related to context management (including compaction)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions