Skip to content

codex cli startup hangs 20-30s on Linux systems where bwrap bind-mounting / is slow (NFS, autofs) #19828

@MrPrezident

Description

@MrPrezident

What version of Codex CLI is running?

0.119.0 and later (bug introduced in commit 806e5f7 / PR #15893, Apr 6 2026)

What subscription do you have?

Enterprise

Which model were you using?

gpt-5.4

What platform is your computer?

Linux 5.14.21-150400.24.184-default x86_64 x86_64

What terminal emulator and version are you using (if applicable)?

Mate Terminal

What issue are you seeing?

Since commit 806e5f7 / PR #16215 , codex exec takes ~30 seconds to start on Linux systems with NFS-mounted or autofs-managed filesystems. No error messages are printed — the process simply hangs silently before printing the session header.

  $ time codex exec 'hi'
  # ... 30 second silence ...
  OpenAI Codex v0.1.15 (research preview)
  --------
  workdir: /home/user/project
  model: ...
  --------
  user
  hi
  codex
  Hey! How can I help you today?

  real    0m31s
  user    0m27s
  sys     0m2s

The root cause is system_bwrap_has_user_namespace_access() in codex-rs/sandboxing/src/bwrap.rs, added in PR #15893. It probes system bwrap on every startup by running:

bwrap --unshare-user --unshare-net --ro-bind / / /bin/true

via Command::output() with no timeout. On systems with NFS or autofs mounts, binding / causes bwrap to traverse thousands of automount points, taking 20-30 seconds — even though it ultimately succeeds and the sandbox works fine. This blocks the entire startup path synchronously.

What steps can reproduce the bug?

The hang can be confirmed in isolation:

  time bwrap --unshare-user --unshare-net --ro-bind / / /bin/true
  # real  0m26s
  1. Use a Linux system where NFS or autofs mounts are present under / (common in enterprise/HPC environments with network home directories or project mounts)
  2. Confirm the hang is present in isolation: time bwrap --unshare-user --unshare-net --ro-bind / / /bin/true — if this takes >1s, you will hit the bug
  3. Run time codex exec 'hi'
  4. Observe ~30s wall-clock delay before the session header prints

The delay scales with how long the bwrap command takes on your system. On a standard desktop Linux with no NFS mounts, bwrap completes in <50ms and the bug is not visible.

What is the expected behavior?

codex exec 'hi' should complete in ~3 seconds as it did prior to commit 806e5f7 / PR #16215. The bwrap probe's purpose is only to detect broken user-namespace configurations (e.g. No permissions to create a new namespace). If bwrap is present and ultimately succeeds, the probe should not add perceptible latency to startup.

A 500ms timeout on the probe is sufficient: on a healthy system it completes in <50ms; on a slow-bwrap system it times out, kills the child, and conservatively returns true (assume bwrap works) — which is the correct behavior since bwrap does work, it's just slow.

Additional information

Suggested fix in codex-rs/sandboxing/src/bwrap.rs — replace the blocking Command::output() call with a spawn + poll loop that kills the child and returns true after 500ms:

  use std::time::Duration;

  const BWRAP_CHECK_TIMEOUT: Duration = Duration::from_millis(500);

  fn system_bwrap_has_user_namespace_access(system_bwrap_path: &Path) -> bool {
      let mut child = match Command::new(system_bwrap_path)
          .args(["--unshare-user", "--unshare-net", "--ro-bind", "/", "/", "/bin/true"])
          .stdout(std::process::Stdio::null())
          .stderr(std::process::Stdio::piped())
          .spawn()
      {
          Ok(child) => child,
          Err(_) => return true,
      };

      let deadline = std::time::Instant::now() + BWRAP_CHECK_TIMEOUT;
      loop {
          match child.try_wait() {
              Ok(Some(status)) => {
                  let stderr = child.stderr.take().map_or_else(Vec::new, |mut r| {
                      use std::io::Read;
                      let mut buf = Vec::new();
                      let _ = r.read_to_end(&mut buf);
                      buf
                  });
                  let output = Output { status, stdout: Vec::new(), stderr };
                  return output.status.success() || !is_user_namespace_failure(&output);
              }
              Ok(None) => {
                  if std::time::Instant::now() >= deadline {
                      let _ = child.kill();
                      return true;
                  }
                  std::thread::sleep(Duration::from_millis(50));
              }
              Err(_) => return true,
          }
      }
  }

With this fix applied locally, startup on affected systems drops from ~30s to ~3s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformancesandboxIssues related to permissions or sandboxing

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions