revert: undo unconditional chdir that breaks cd persistence (#1400)#1487
revert: undo unconditional chdir that breaks cd persistence (#1400)#1487
Conversation
This reverts commit d2cef94.
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 9d4ac74 in 19 seconds. Click for details.
- Reviewed
32lines of code in1files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_LMq4z8xztisbVAKb
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
Greptile SummaryThis PR reverts commit d2cef94 which made The restored behavior:
This aligns with the CLI behavior in
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Server as step()
participant OS as os.chdir()
participant Shell as shell tool
Note over Server: Conversation start (≤1 user msg)
Client->>Server: POST /step (1st user message)
Server->>OS: os.chdir(workspace)
Note over OS: CWD = /workspace
Server->>Shell: Execute tool (e.g., cd /project)
Shell->>OS: os.chdir("/project")
Note over OS: CWD = /project
Note over Server: Subsequent step (>1 user msg)
Client->>Server: POST /step (2nd+ user message)
Note over Server: Skip os.chdir (preserve CWD)
Note over OS: CWD still = /project ✓
Server->>Shell: Execute tool (uses /project)
Last reviewed commit: 9d4ac74 |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- Replace os.chdir(repo_dir) with cwd=repo_dir in subprocess calls to preserve working directory persistence (per revert in #1487) - Consolidate duplicate 'from datasets import' into single import line
- Replace os.chdir(repo_dir) with cwd=repo_dir in subprocess calls to preserve working directory persistence (per revert in #1487) - Consolidate duplicate 'from datasets import' into single import line
- Replace os.chdir(repo_dir) with cwd=repo_dir in subprocess calls to preserve working directory persistence (per revert in gptme#1487) - Consolidate duplicate 'from datasets import' into single import line
…424) (#1489) * feat(eval): add SWE-bench evaluation modules Combines work from two WIP branches into a single recoverable base: - **gptme/eval/swebench/**: Core SWE-bench evaluation framework (originally by @ErikBjare, PR #142) - Instance loading, repository setup utilities - Evaluation runner with gptme agent integration - CLI entry point (`gptme-eval-swebench`) - **gptme/eval/swe_extra/**: SWE-bench setup scripts and data analysis (originally by @bjsi, PR #424) - Setup scripts for running SWE-bench evals - Data loading and analysis utilities (top-50 easiest instances) - Test specification generation - SWE-bench constants and harness integration Both modules are WIP and require further integration work. Key known limitations: - `swe_extra` imports `SWEBenchInfo` from logmanager (not yet in main) - `swe_extra` references `SWEBenchAgent` (not yet merged) - `datasets` and `swebench` added as optional eval dependencies Adds mypy overrides for WIP modules to defer type checking. Co-authored-by: bjsi <44608421+bjsi@users.noreply.github.com> * fix(eval): remove os.chdir() and duplicate import in swebench utils - Replace os.chdir(repo_dir) with cwd=repo_dir in subprocess calls to preserve working directory persistence (per revert in #1487) - Consolidate duplicate 'from datasets import' into single import line * fix(eval): address greptile review - fix repo_dir in files dict and missing model arg - evaluate.py: replace incorrect {"repo_dir": repo_dir} Files dict with proper prompt-embedded context; add missing log_dir/workspace_dir to EvalResult instances - run_swe_extra.py: add --model CLI arg so cli() can call main() correctly for non-resume evaluations * fix(eval): address greptile review feedback on swebench module * fix(eval): copy repo to agent workspace and capture diff via git * fix(eval): enable mypy for swebench modules per review Remove WIP mypy ignore_errors for gptme.eval.swebench.* — code is clean. Keep ignore_errors for gptme.eval.swe_extra.* (depends on unmerged SWEBenchInfo). * feat(eval): integrate SWEBenchInfo/SWEBenchAgent and fix swe_extra for current swebench API Per review, integrate bjsi's SWEBenchInfo and SWEBenchAgent code from PR #424: - Add SWEBenchInfo frozen dataclass to logmanager.py for evaluation metadata - Add SWEBenchAgent class in eval/agents/swebench.py with multi-stage pipeline - Convert eval/agents.py to package (eval/agents/__init__.py + swebench.py) - Update swe_bench_constants.py for current swebench library API - Fix swe_bench_test_spec.py imports for restructured swebench package - Fix type errors in swe_bench_extra_data.py and run_swe_extra.py - Add matplotlib to mypy ignore_missing_imports - Remove WIP ignore_errors override for swe_extra modules * fix(eval): restore cwd after os.chdir in SWEBenchAgent.replay() * refactor(eval): move SWEBenchInfo from logmanager to eval/swebench/info.py Per Erik's review: SWEBenchInfo doesn't belong in logmanager.py, move it into the eval/swebench module where it belongs. - New: gptme/eval/swebench/info.py with SWEBenchInfo dataclass - Updated gptme/eval/swebench/__init__.py to export SWEBenchInfo - Removed SWEBenchInfo from gptme/logmanager.py - Updated all imports in eval/agents/swebench.py, swe_extra/{run_swe_extra, swe_bench_test_spec,swe_bench_extra_data}.py --------- Co-authored-by: TimeToBuildBob <TimeToBuildBob@users.noreply.github.com> Co-authored-by: bjsi <44608421+bjsi@users.noreply.github.com>
Summary
Reverts d2cef94 ("fix(server): always chdir to workspace at step start (#1400)").
That commit changed the server to unconditionally
os.chdir(workspace)at the start of every step. This breakscdpersistence between tool calls — anycda user or agent does in a shell step gets silently reset before the next step runs.Fixes #1486.
What this restores
The previous behavior: only chdir to the workspace when the conversation has one or fewer user messages (i.e. the initial step). After that, the working directory set by tools is preserved across steps.
Context
The unconditional chdir was intended to fix conversations not starting in the right directory, but it introduced a worse regression: directory changes no longer stick. The original conditional approach handled the "start in the right place" case without clobbering subsequent
cdcalls.A proper fix for concurrent conversations (where process-global
os.chdiris inherently racy) would require threading the workspace path through the tool chain, but that's a larger refactor tracked separately.Important
Reverts unconditional
os.chdirinstep()to restorecdpersistence between steps inapi_v2_sessions.py.os.chdir(workspace)at every step start instep()inapi_v2_sessions.py.chdirto workspace if conversation has one or fewer user messages.cdpersistence was broken between steps.This description was created by
for 9d4ac74. You can customize this summary. It will automatically update as commits are pushed.