Skip to content

refactor: refactor culture#512

Merged
DoganK01 merged 3 commits intomasterfrom
culture-refactor
Jan 29, 2026
Merged

refactor: refactor culture#512
DoganK01 merged 3 commits intomasterfrom
culture-refactor

Conversation

@DoganK01
Copy link
Copy Markdown
Member

  • Refactor culture managing so that it only follow provided instructions

@claude
Copy link
Copy Markdown

claude bot commented Jan 29, 2026

PR Review: Culture Refactor (#512)

I've reviewed the changes in this PR. Here's my comprehensive feedback.

Summary

This PR refactors the culture management system to enforce stricter adherence to cultural guidelines. The changes move culture injection to the end of the system prompt and add more emphatic language.

Positive Aspects

  1. Strategic placement of culture: Moving culture injection to the end of the system prompt is a smart architectural decision - it gives culture guidelines final authority over earlier instructions.

  2. Clear intent: The PR description accurately reflects the code changes.

  3. Conditional default prompt: The logic to use a different default prompt when culture is present (lines 264-268 in system_prompt_manager.py) is well-thought-out.

Critical Issues

1. Tool Call Limit Change (CRITICAL)

File: src/upsonic/agent/agent.py:142
This is a 20x increase (5 to 100) in the default tool call limit with no explanation.

Concerns:

  • This seems unrelated to culture refactoring
  • Could significantly impact costs (20x more API calls possible)
  • May hide infinite loops or poorly designed agent behaviors
  • Breaking change for users relying on the 5-call limit as a safety mechanism

Recommendation: Either revert this change or move it to a separate PR with proper justification.

2. Aggressive Language May Be Counterproductive

File: src/upsonic/culture/manager.py:294-333

The extremely emphatic language (MUST, CRITICAL, NON-NEGOTIABLE) might:

  • Make the system MORE vulnerable to prompt injection attacks
  • Backfire with modern LLMs per research on prompt psychology
  • Add 40-50% more tokens without clear evidence it improves compliance
  • Leak into the agent's meta-awareness

The original test at test_culture_manager.py:255 will now FAIL because the header changed.

3. Breaking Test

The test at line 255 of test_culture_manager.py expects 'Agent Culture Guidelines' but the new output uses 'MANDATORY AGENT CULTURE GUIDELINES - STRICT COMPLIANCE REQUIRED'.

Recommendation: Update the test to match the new output OR reconsider the header change.

4. Redundant Default Prompt Change

File: src/upsonic/context/default_prompt.py:9

The addition 'IF PROVIDED' at the end is redundant and awkward. The conditional logic in system_prompt_manager.py already handles this case.

Security Considerations

  1. Prompt Injection Vulnerability: The emphatic language might make the system more vulnerable. An attacker could craft prompts using similar MUST/CRITICAL language to override instructions.

  2. No Input Sanitization: The culture description is inserted directly into prompts without sanitization.

Performance Considerations

  1. Token Count: The new format adds approximately 200-250 additional tokens to every system prompt that uses culture.

  2. Tool Call Limit: The 20x increase (5 to 100) could lead to significantly longer execution times and higher API costs.

Recommendations

Must Fix:

  1. Revert or separate the tool_call_limit change
  2. Update or fix the breaking test

Should Fix:
3. Reduce aggressive language - use clearer, more direct phrasing
4. Remove redundant change to default_prompt.py
5. Consider prompt injection risks

Questions

  1. What evidence/testing led to the decision that more emphatic language improves compliance?
  2. Why was the tool_call_limit increased from 5 to 100? Is this intentional or accidental?
  3. Have you tested this format against prompt injection attacks?

Final Verdict

The core architectural change (moving culture to end of prompt) is sound, but the implementation has several issues that should be addressed before merging.

@claude
Copy link
Copy Markdown

claude bot commented Jan 29, 2026

Pull Request Review: Culture Refactor

Overview

This PR introduces significant new functionality with the RalphLoop autonomous AI development system while refactoring the CultureManager. The changes add approximately 7,400 lines of code with comprehensive test coverage.

Strengths

1. Excellent Test Coverage

  • 1,862 lines of comprehensive unit tests in test_ralph_comprehensive.py
  • 1,132 lines of integration tests in test_ralph_real_world.py
  • Tests cover all major components: Config, State, Phases, Tools, Backpressure
  • Good use of fixtures and mocking for isolated testing

2. Well-Structured Architecture

  • Clear separation of concerns across modules (config, state, phases, tools, backpressure)
  • Proper use of type hints throughout
  • Good docstring coverage with examples
  • Lazy loading pattern consistent with project standards

3. Culture Manager Improvements

  • Removed storage operations for cleaner separation of concerns
  • Extraction logic is now more focused and explicit
  • Better error handling with fallback defaults

4. Security Considerations

  • Path validation in RalphFilesystemToolKit._validate_path() prevents directory traversal
  • Workspace isolation enforced
  • Timeouts on subprocess calls (default 300s)

CRITICAL Issues

1. Command Injection Vulnerability (BLOCKER)

Locations:

  • src/upsonic/ralph/tools/filesystem.py:431-438
  • src/upsonic/ralph/backpressure/gate.py:266-273

Both files use subprocess.run() with shell=True, which is a SECURITY RISK for command injection attacks. Since this is an autonomous agent framework that executes commands, malicious prompts could potentially execute arbitrary shell commands.

Recommendation: Use shell=False with shlex.split() for command parsing, or sanitize inputs with allowlists.

Medium Priority Issues

2. CultureManager Synchronous Wrapper

Location: src/upsonic/culture/manager.py:146-155

The prepare() method has a complex synchronous wrapper that could fail when there is already an event loop running. Consider using asyncio.run_coroutine_threadsafe() or documenting that async usage is preferred.

3. Culture Prompt Injection Risk

Location: src/upsonic/culture/manager.py:294-336

The culture guidelines use very strong enforcement language (MANDATORY, MUST, CRITICAL) which could make the agent overly rigid and be vulnerable to sophisticated prompt injection. Consider toning down the language and documenting that culture is advisory.

Low Priority Issues

4. Version Bump

Location: pyproject.toml:3

Version bumped from 0.71.0 to 0.71.1 (patch), but this PR adds major new features. Consider bumping to 0.72.0 (minor) per semantic versioning.

5. Parameter Naming

Locations: src/upsonic/direct.py:50, src/upsonic/agent/agent.py:123

Parameter named print could conflict with Python built-in. Recommend renaming to enable_print, verbose, or show_output.

6. Test Command Modification

Location: src/upsonic/ralph/backpressure/gate.py:121-137

Automatically modifies pytest commands by injecting --rootdir which could surprise users. Document this behavior clearly and consider making it configurable.

Final Verdict

Approve with required changes

This PR adds valuable functionality with good architecture and test coverage. However, the command injection vulnerability is a blocker that must be fixed before merging.

Required Before Merge:

  1. FIX: Command injection vulnerability (use shell=False)

Strongly Recommended:

  1. Fix culture manager sync wrapper
  2. Document culture prompt injection limitations
  3. Bump version to 0.72.0 (minor)

Nice to Have:

  1. Improve error logging with tracebacks
  2. Rename print parameter
  3. Make pytest rootdir modification configurable

Great work! The RalphLoop architecture is well-designed. Once the security issue is addressed, this will be a solid addition to Upsonic.

@DoganK01 DoganK01 merged commit 2e34637 into master Jan 29, 2026
5 checks passed
@DoganK01 DoganK01 deleted the culture-refactor branch January 29, 2026 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant