Skip to content

revert: undo unconditional chdir that breaks cd persistence (#1400)#1487

Merged
ErikBjare merged 2 commits intomasterfrom
revert-chdir-unconditional
Feb 25, 2026
Merged

revert: undo unconditional chdir that breaks cd persistence (#1400)#1487
ErikBjare merged 2 commits intomasterfrom
revert-chdir-unconditional

Conversation

@TimeToBuildBob
Copy link
Copy Markdown
Member

@TimeToBuildBob TimeToBuildBob commented Feb 25, 2026

Summary

Reverts d2cef94 ("fix(server): always chdir to workspace at step start (#1400)").

That commit changed the server to unconditionally os.chdir(workspace) at the start of every step. This breaks cd persistence between tool calls — any cd a user or agent does in a shell step gets silently reset before the next step runs.

Fixes #1486.

What this restores

The previous behavior: only chdir to the workspace when the conversation has one or fewer user messages (i.e. the initial step). After that, the working directory set by tools is preserved across steps.

Context

The unconditional chdir was intended to fix conversations not starting in the right directory, but it introduced a worse regression: directory changes no longer stick. The original conditional approach handled the "start in the right place" case without clobbering subsequent cd calls.

A proper fix for concurrent conversations (where process-global os.chdir is inherently racy) would require threading the workspace path through the tool chain, but that's a larger refactor tracked separately.


Important

Reverts unconditional os.chdir in step() to restore cd persistence between steps in api_v2_sessions.py.

  • Behavior:
    • Reverts unconditional os.chdir(workspace) at every step start in step() in api_v2_sessions.py.
    • Restores behavior to only chdir to workspace if conversation has one or fewer user messages.
  • Context:
    • Fixes regression where cd persistence was broken between steps.
    • Original change was to ensure correct start directory but caused worse issues.
    • Proper fix for concurrent conversations requires larger refactor.

This description was created by Ellipsis for 9d4ac74. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Copy Markdown
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 9d4ac74 in 19 seconds. Click for details.
  • Reviewed 32 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_LMq4z8xztisbVAKb

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 25, 2026

Greptile Summary

This PR reverts commit d2cef94 which made os.chdir(workspace) run unconditionally at the start of every step in the server's step() function. That unconditional chdir broke cd persistence — any directory change made by the shell tool (via os.chdir in shell.py) was silently reset before the next step.

The restored behavior:

  • Initial step (≤1 user message): os.chdir(workspace) runs to ensure the conversation starts in the correct directory
  • Subsequent steps (>1 user message): the working directory set by tools (e.g., cd commands) is preserved across steps

This aligns with the CLI behavior in chat.py, which only calls os.chdir(workspace) once at session start.

  • The existing TODO comment acknowledges the limitation: os.chdir() is process-global, so concurrent conversations can still interfere with each other's working directory. A proper fix (threading workspace through the tool chain) is tracked separately.
  • No new logic is introduced — this is a clean revert to the prior conditional approach.

Confidence Score: 5/5

  • This PR is a clean, minimal revert that restores previously working behavior. Safe to merge.
  • Score reflects that this is a simple revert of a single commit in one file. The restored code was the previous production behavior, the logic is sound and consistent with the CLI path (chat.py), and the change fixes a clear regression (broken cd persistence). No new code or logic is introduced.
  • No files require special attention.

Important Files Changed

Filename Overview
gptme/server/api_v2_sessions.py Reverts unconditional os.chdir(workspace) to conditional form (only when ≤1 user message), restoring cd persistence between steps. Clean revert with no new issues.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Server as step()
    participant OS as os.chdir()
    participant Shell as shell tool

    Note over Server: Conversation start (≤1 user msg)
    Client->>Server: POST /step (1st user message)
    Server->>OS: os.chdir(workspace)
    Note over OS: CWD = /workspace
    Server->>Shell: Execute tool (e.g., cd /project)
    Shell->>OS: os.chdir("/project")
    Note over OS: CWD = /project

    Note over Server: Subsequent step (>1 user msg)
    Client->>Server: POST /step (2nd+ user message)
    Note over Server: Skip os.chdir (preserve CWD)
    Note over OS: CWD still = /project ✓
    Server->>Shell: Execute tool (uses /project)
Loading

Last reviewed commit: 9d4ac74

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@ErikBjare ErikBjare merged commit e90f9e5 into master Feb 25, 2026
8 checks passed
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
gptme/server/api_v2_sessions.py 0.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

TimeToBuildBob added a commit that referenced this pull request Feb 25, 2026
- Replace os.chdir(repo_dir) with cwd=repo_dir in subprocess calls to
  preserve working directory persistence (per revert in #1487)
- Consolidate duplicate 'from datasets import' into single import line
TimeToBuildBob added a commit that referenced this pull request Feb 26, 2026
- Replace os.chdir(repo_dir) with cwd=repo_dir in subprocess calls to
  preserve working directory persistence (per revert in #1487)
- Consolidate duplicate 'from datasets import' into single import line
TimeToBuildBob added a commit to TimeToBuildBob/gptme that referenced this pull request Feb 26, 2026
- Replace os.chdir(repo_dir) with cwd=repo_dir in subprocess calls to
  preserve working directory persistence (per revert in gptme#1487)
- Consolidate duplicate 'from datasets import' into single import line
ErikBjare pushed a commit that referenced this pull request Feb 26, 2026
…424) (#1489)

* feat(eval): add SWE-bench evaluation modules

Combines work from two WIP branches into a single recoverable base:

- **gptme/eval/swebench/**: Core SWE-bench evaluation framework
  (originally by @ErikBjare, PR #142)
  - Instance loading, repository setup utilities
  - Evaluation runner with gptme agent integration
  - CLI entry point (`gptme-eval-swebench`)

- **gptme/eval/swe_extra/**: SWE-bench setup scripts and data analysis
  (originally by @bjsi, PR #424)
  - Setup scripts for running SWE-bench evals
  - Data loading and analysis utilities (top-50 easiest instances)
  - Test specification generation
  - SWE-bench constants and harness integration

Both modules are WIP and require further integration work. Key known
limitations:
- `swe_extra` imports `SWEBenchInfo` from logmanager (not yet in main)
- `swe_extra` references `SWEBenchAgent` (not yet merged)
- `datasets` and `swebench` added as optional eval dependencies

Adds mypy overrides for WIP modules to defer type checking.

Co-authored-by: bjsi <44608421+bjsi@users.noreply.github.com>

* fix(eval): remove os.chdir() and duplicate import in swebench utils

- Replace os.chdir(repo_dir) with cwd=repo_dir in subprocess calls to
  preserve working directory persistence (per revert in #1487)
- Consolidate duplicate 'from datasets import' into single import line

* fix(eval): address greptile review - fix repo_dir in files dict and missing model arg

- evaluate.py: replace incorrect {"repo_dir": repo_dir} Files dict with
  proper prompt-embedded context; add missing log_dir/workspace_dir to
  EvalResult instances
- run_swe_extra.py: add --model CLI arg so cli() can call main() correctly
  for non-resume evaluations

* fix(eval): address greptile review feedback on swebench module

* fix(eval): copy repo to agent workspace and capture diff via git

* fix(eval): enable mypy for swebench modules per review

Remove WIP mypy ignore_errors for gptme.eval.swebench.* — code is clean.
Keep ignore_errors for gptme.eval.swe_extra.* (depends on unmerged SWEBenchInfo).

* feat(eval): integrate SWEBenchInfo/SWEBenchAgent and fix swe_extra for current swebench API

Per review, integrate bjsi's SWEBenchInfo and SWEBenchAgent code from PR #424:

- Add SWEBenchInfo frozen dataclass to logmanager.py for evaluation metadata
- Add SWEBenchAgent class in eval/agents/swebench.py with multi-stage pipeline
- Convert eval/agents.py to package (eval/agents/__init__.py + swebench.py)
- Update swe_bench_constants.py for current swebench library API
- Fix swe_bench_test_spec.py imports for restructured swebench package
- Fix type errors in swe_bench_extra_data.py and run_swe_extra.py
- Add matplotlib to mypy ignore_missing_imports
- Remove WIP ignore_errors override for swe_extra modules

* fix(eval): restore cwd after os.chdir in SWEBenchAgent.replay()

* refactor(eval): move SWEBenchInfo from logmanager to eval/swebench/info.py

Per Erik's review: SWEBenchInfo doesn't belong in logmanager.py, move it
into the eval/swebench module where it belongs.

- New: gptme/eval/swebench/info.py with SWEBenchInfo dataclass
- Updated gptme/eval/swebench/__init__.py to export SWEBenchInfo
- Removed SWEBenchInfo from gptme/logmanager.py
- Updated all imports in eval/agents/swebench.py, swe_extra/{run_swe_extra,
  swe_bench_test_spec,swe_bench_extra_data}.py

---------

Co-authored-by: TimeToBuildBob <TimeToBuildBob@users.noreply.github.com>
Co-authored-by: bjsi <44608421+bjsi@users.noreply.github.com>
@TimeToBuildBob TimeToBuildBob deleted the revert-chdir-unconditional branch February 28, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(server): unconditional chdir breaks cd persistence between steps

2 participants