[Bug]: Fix `max_tokens` parameter for newer OpenAI models (gpt-5.x, o1, o3)

### Do you need to file an issue?

- [x] I have searched the existing issues and this bug is not already filed.
- [x] I believe this is a legitimate bug, not just a question or feature request.

### Describe the bug

Agent base classes use `max_tokens` parameter which is not supported by newer OpenAI models (gpt-5.x, o1, o3, gpt-4o). These models require `max_completion_tokens` instead.

### Steps to reproduce

1. Set LLM_BINDING = openai
2. Set LLM_MODEL = gpt-5.x
3. Upload knowledge base
4. Run solver

### Expected Behavior

# Fix `max_tokens` parameter for newer OpenAI models (gpt-5.x, o1, o3, gpt-4o)

## 🐛 Bug Description

Agent base classes use the `max_tokens` parameter which is **not supported** by newer OpenAI models (gpt-5.x, o1, o3, gpt-4o). These models require `max_completion_tokens` instead, causing API calls to fail with a 400 error.

## 🔴 Error Message

```
Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}
```

**Stack trace location:**
- `src/agents/solve/base_agent.py` line 188: `kwargs["max_tokens"] = max_tokens`
- `src/agents/solve/main_solver.py` line 264: `result = await self._run_dual_loop_pipeline(question, output_dir)`

## 📋 Affected Models

The following OpenAI models require `max_completion_tokens` instead of `max_tokens`:
- `gpt-5.x` and later (e.g., `gpt-5.2`)
- `gpt-4o` series
- `o1`, `o3` series (reasoning models)

## 🔍 Root Cause

A utility function `_get_token_limit_kwargs()` already exists in `src/api/routers/system.py` (lines 46-59) that correctly handles both cases, but it's:
1. **Private** (prefixed with `_`) and scoped to `system.py` only
2. **Only used** in the `/test/llm` endpoint
3. **Not applied** to any agent base classes

## 📁 Affected Files

All agent base classes hardcode `max_tokens` without checking the model type:

1. **`src/agents/solve/base_agent.py`** (line 188)
   ```python
   if max_tokens:
       kwargs["max_tokens"] = max_tokens  # ❌ Fails for gpt-5.x
   ```

2. **`src/agents/research/agents/base_agent.py`** (line 134)
   ```python
   if max_tokens:
       kwargs["max_tokens"] = max_tokens  # ❌ Fails for gpt-5.x
   ```

3. **`src/agents/guide/agents/base_guide_agent.py`** (line 131)
   ```python
   "max_tokens": max_tokens,  # ❌ Fails for gpt-5.x
   ```

4. **`src/agents/ideagen/base_idea_agent.py`** (line 120)
   ```python
   "max_tokens": max_tokens,  # ❌ Fails for gpt-5.x
   ```

## ✅ Existing Solution (Not Used)

The correct implementation already exists but is unused:

**`src/api/routers/system.py`** (lines 19-59):
```python
def _uses_max_completion_tokens(model: str) -> bool:
    """Check if model requires max_completion_tokens"""
    patterns = [
        r"^o[13]",      # o1, o3 models
        r"^gpt-4o",     # gpt-4o models
        r"^gpt-[5-9]",  # gpt-5.x and later
        r"^gpt-\d{2,}", # gpt-10+ (future proofing)
    ]
    # ... checks model name against patterns

def _get_token_limit_kwargs(model: str, max_tokens: int) -> dict:
    """Get appropriate token limit parameter"""
    if _uses_max_completion_tokens(model):
        return {"max_completion_tokens": max_tokens}
    return {"max_tokens": max_tokens}
```

Used correctly in `system.py` line 154:
```python
token_kwargs = _get_token_limit_kwargs(model, max_tokens=20)
response = await openai_complete_if_cache(..., **token_kwargs)
```

## 🛠️ Proposed Solution

1. **Move utility functions to shared location:**
   - Move `_uses_max_completion_tokens()` → `uses_max_completion_tokens()` (make public)
   - Move `_get_token_limit_kwargs()` → `get_token_limit_kwargs()` (make public)
   - Location: `src/core/core.py` (alongside other LLM configuration utilities)

2. **Update all agent base classes:**
   - Import `get_token_limit_kwargs` from `src.core.core`
   - Replace direct `kwargs["max_tokens"] = max_tokens` with:
     ```python
     if max_tokens:
         kwargs.update(get_token_limit_kwargs(model, max_tokens))
     ```

3. **Update `system.py`:**
   - Import from `src.core.core` instead of local definition
   - Remove duplicate code

## 📝 Expected Behavior

After fix:
- Older models (gpt-3.5, gpt-4) continue using `max_tokens` ✅
- Newer models (gpt-5.x, o1, o3, gpt-4o) automatically use `max_completion_tokens` ✅
- No breaking changes for existing functionality ✅

## 🔄 Steps to Reproduce

1. Set `LLM_MODEL=gpt-5.2` in `.env`
2. Attempt to use Solve module (or any agent module)
3. Observe 400 error with message about `max_tokens` being unsupported

## 💡 Impact

- **Severity:** High - Blocks usage of newer OpenAI models
- **Affected modules:** Solve, Research, Guide, IdeaGen (and potentially others)
- **User impact:** Cannot use gpt-5.x, o1, o3, or gpt-4o models in production

## 🔗 Related

This issue affects the same problem domain as:
- Any code path that calls LLM APIs through agent base classes
- Future model support (gpt-6.x, etc.) will have the same issue

## 📌 Additional Notes

- The fix is straightforward and low-risk (utility function already exists and works)
- No breaking changes expected
- Backward compatible with existing models


### Related Module

Smart Solver

### Configuration Used

_No response_

### Logs and screenshots

[Backend]  [Solver]       ○ ============================================================
[Backend]  [Solver]       ○ Dual-Loop Solver Initializing
[Backend]  [Solver]       ○ ============================================================
[Backend]  [Solver]       ○ Knowledge Base: RAG101
[Backend]  [Solver]       → Initializing agents...
[Backend]  [Solver]       ○   InvestigateAgent initialized
[Backend]  [Solver]       ○   NoteAgent initialized
[Backend]  [Solver]       ○   Solve Loop agents (lazy init)
[Backend]  [Solver]       ✓ Solver ready
[Backend]  [SolveAPI]     ○ [solve_20260106_170244_9c630be7] Solving: Calculate the linear convolution of x=[1,2,3] and ...
[Backend]  [SolveAPI]     → [solve_20260106_170244_9c630be7] Solving started
[Backend]  [Solver]       ○ ============================================================
[Backend]  [Solver]       ○ Problem Solving Started
[Backend]  [Solver]       ○ ============================================================
[Backend]  [Solver]       ○ Question: Calculate the linear convolution of x=[1,2,3] and h=[4,5]
[Backend]  [Solver]       ○ Output: C:\Users\kimseon\Documents\GitHub\DeepTutor\data\user\solve\solve_20260106_170244
[Backend]  [Solver]       ○ Pipeline: Analysis Loop → Solve Loop
[Backend]  [Solver]       ▶ Analysis Loop started | Understanding the question
[Backend]  [Solver]       ▶ AnalysisLoop started | max_iterations=3
[Backend]  [Solver]       ○ AnalysisLoop running | round=1
[Backend]  ERROR: OpenAI API Call Failed,
[Backend]  Model: gpt-5.2,
[Backend]  Params: {'temperature': 0.3, 'response_format': {'type': 'json_object'}, 'max_tokens': 8192}, Got: Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}
[Backend]  [Solver]       ✗ Solving failed: Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}
[Backend]  [Solver]       ✗ Traceback (most recent call last):
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\src\agents\solve\main_solver.py", line 264, in solve
[Backend]      result = await self._run_dual_loop_pipeline(question, output_dir)
[Backend]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\src\agents\solve\main_solver.py", line 342, in _run_dual_loop_pipeline
[Backend]      investigate_result = await self.investigate_agent.process(
[Backend]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\src\agents\solve\analysis_loop\investigate_agent.py", line 85, in process
[Backend]      response = await self.call_llm(
[Backend]                 ^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\src\agents\solve\base_agent.py", line 225, in call_llm
[Backend]      response = await openai_complete_if_cache(**kwargs)
[Backend]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\tenacity\asyncio\__init__.py", line 189, in async_wrapped
[Backend]      return await copy(fn, *args, **kwargs)
[Backend]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\tenacity\asyncio\__init__.py", line 111, in __call__
[Backend]      do = await self.iter(retry_state=retry_state)
[Backend]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\tenacity\asyncio\__init__.py", line 153, in iter
[Backend]      result = await action(retry_state)
[Backend]               ^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\tenacity\_utils.py", line 99, in inner
[Backend]      return call(*args, **kwargs)
[Backend]             ^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\tenacity\__init__.py", line 400, in <lambda>
[Backend]      self._add_action_func(lambda rs: rs.outcome.result())
[Backend]                                       ^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\AppData\Roaming\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\concurrent\futures\_base.py", line 449, in result      
[Backend]      return self.__get_result()
[Backend]             ^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\AppData\Roaming\uv\python\cpython-3.12.11-windows-x86_64-none\Lib\concurrent\futures\_base.py", line 401, in __get_result
[Backend]      raise self._exception
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\tenacity\asyncio\__init__.py", line 114, in __call__
[Backend]      result = await fn(*args, **kwargs)
[Backend]               ^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\lightrag\llm\openai.py", line 230, in openai_complete_if_cache        
[Backend]      response = await openai_async_client.beta.chat.completions.parse(
[Backend]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\openai\resources\chat\completions\completions.py", line 1670, in parse
[Backend]      return await self._post(
[Backend]             ^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\openai\_base_client.py", line 1797, in post
[Backend]      return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
[Backend]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Backend]    File "C:\Users\kimseon\Documents\GitHub\DeepTutor\.venv\Lib\site-packages\openai\_base_client.py", line 1597, in request
[Backend]      raise self._make_status_error_from_response(err.response) from None
[Backend]  openai.BadRequestError: Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}
[Backend]
[Backend]  [SolveAPI]     ✗ [solve_20260106_170244_9c630be7] Solving failed: Error code: 400 - {'error': {'message': "Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}
[Backend]  INFO:     connection closed

### Additional Information

- AI-Tutor Version:opentutor-web@0.2.0 dev
- Operating System: Windows NT 10.0; Win64; x64
- Python Version:3.12.11
- Node.js Version:node.js v22.17.0
- Browser (if applicable):
- Related Issues:


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Fix `max_tokens` parameter for newer OpenAI models (gpt-5.x, o1, o3) #54

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

Fix `max_tokens` parameter for newer OpenAI models (gpt-5.x, o1, o3, gpt-4o)

🐛 Bug Description

🔴 Error Message

📋 Affected Models

🔍 Root Cause

📁 Affected Files

✅ Existing Solution (Not Used)

🛠️ Proposed Solution

📝 Expected Behavior

🔄 Steps to Reproduce

💡 Impact

🔗 Related

📌 Additional Notes

Related Module

Configuration Used

Logs and screenshots

Additional Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Fix max_tokens parameter for newer OpenAI models (gpt-5.x, o1, o3) #54

Description

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

Fix max_tokens parameter for newer OpenAI models (gpt-5.x, o1, o3, gpt-4o)

🐛 Bug Description

🔴 Error Message

📋 Affected Models

🔍 Root Cause

📁 Affected Files

✅ Existing Solution (Not Used)

🛠️ Proposed Solution

📝 Expected Behavior

🔄 Steps to Reproduce

💡 Impact

🔗 Related

📌 Additional Notes

Related Module

Configuration Used

Logs and screenshots

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: Fix `max_tokens` parameter for newer OpenAI models (gpt-5.x, o1, o3) #54

Fix `max_tokens` parameter for newer OpenAI models (gpt-5.x, o1, o3, gpt-4o)