Alembic migration to make session_name required on messages by Rajat-Ahuja1997 · Pull Request #166 · plastic-labs/honcho

Rajat-Ahuja1997 · 2025-07-21T20:12:36Z

Resolves https://linear.app/plastic-labs/issue/DEV-994

Tested locally:

INFO  [alembic.runtime.migration] Running upgrade 917195d9b5e9 -> 05486ce795d5, make session_name required on messages
Found 11 orphaned messages without session_name
Found 4 workspace-peer combinations with orphaned messages
Creating default session 'default_session_GraceMoore996' for workspace 'LoadTest_tzu299tn' and peer 'GraceMoore996'
Created session peer association for peer 'GraceMoore996' in default session 'default_session_GraceMoore996'
Creating default session 'default_session_peer23' for workspace 'LoadTest_yexpjkps' and peer 'peer23'
Created session peer association for peer 'peer23' in default session 'default_session_peer23'
Creating default session 'default_session_alice' for workspace 'test' and peer 'alice'
Created session peer association for peer 'alice' in default session 'default_session_alice'
Creating default session 'default_session_AliceTaylor268' for workspace 'test_app' and peer 'AliceTaylor268'
Created session peer association for peer 'AliceTaylor268' in default session 'default_session_AliceTaylor268'
Making session_name NOT NULL

Summary by CodeRabbit

New Features
- All messages and message embeddings must now be associated with a session, improving data consistency.
- Increased the maximum allowed length for collection names from 512 to 1025 characters.
Bug Fixes
- Messages and embeddings without a session are automatically assigned to default sessions to maintain integrity.
Documentation
- Updated comments to reflect the mandatory session association for all messages and embeddings.

coderabbitai · 2025-07-21T20:12:42Z

Walkthrough

The changes enforce that every message must be associated with a session. This is achieved by updating the database schema, data models, and schemas to make the session_name field required. Orphaned messages are migrated to default sessions during the database migration process. Additionally, the maximum allowed length for collection names is increased from 512 to 1025 characters.

Changes

File(s)	Change Summary
migrations/versions/05486ce795d5_make_session_name_required_on_messages.py	Adds Alembic migration: migrates orphaned messages to default sessions, updates session-peers, and makes `session_name` non-nullable on messages and embeddings; increases collection name length constraint.
src/models.py	Changes `session_name` in `Message` and `MessageEmbedding` models to non-nullable; updates collection name length constraint from 512 to 1025 characters.
src/schemas.py	Changes `Message.session_name` field from optional to required in Pydantic schema.

Sequence Diagram(s)

sequenceDiagram
    participant MigrationScript
    participant DB
    participant SessionsTable
    participant SessionPeersTable

    MigrationScript->>DB: Find messages with null session_name
    loop For each workspace-peer with orphaned messages
        MigrationScript->>SessionsTable: Insert default session (handle conflicts)
        MigrationScript->>SessionPeersTable: Associate peer with new session (handle conflicts)
        MigrationScript->>DB: Update orphaned messages and embeddings to new session
    end
    MigrationScript->>DB: Ensure no orphaned messages remain
    MigrationScript->>DB: Alter messages and embeddings tables to make session_name non-nullable
    MigrationScript->>DB: Update collection name length constraint to 1025

Estimated code review effort

3 (~40 minutes)

Suggested reviewers

dr-frmr

Poem

In the warren of code, a session for all,
No message left lonely, no value too small.
Orphans are gathered, assigned with great care,
Now every chat knows its session is there.
🐇 With columns aligned and schemas anew,
This migration is done—hip-hop, hooray for you!

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6517684 and e1b1733.

📒 Files selected for processing (2)

migrations/versions/05486ce795d5_make_session_name_required_on_messages.py (1 hunks)
src/models.py (3 hunks)

📓 Path-based instructions (3)

src/**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.py: Follow isort conventions with absolute imports preferred
snake_case for variables/functions; PascalCase for classes
Line length: 88 chars (Black compatible)
Explicit error handling with appropriate exception types
Docstrings: Use Google style docstrings
Use environment variables via python-dotenv (.env)
Use specific exception types (ResourceNotFoundException, ValidationException, etc.)
Proper logging with context instead of print statements

Files:

src/models.py

src/models.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/models.py: Use explicit type hints with SQLAlchemy mapped_column annotations
All tables use text IDs (nanoid format) as primary keys
Composite foreign keys for multi-tenant relationships
Feature flags on workspace, peer, and session levels
JSONB metadata fields for extensibility
HNSW indexes for vector similarity search

Files:

src/models.py

src/{models,crud}.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

Token counting on messages for usage tracking

Files:

src/models.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:438-441
Timestamp: 2025-06-18T16:44:57.733Z
Learning: In the Honcho database schema, messages.app_id references workspaces.public_id (string nanoid), not workspaces.id (integer primary key). This means that during migrations when workspaces.public_id is renamed to workspaces.id, the foreign key relationship with messages.app_id remains valid.

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:918-942
Timestamp: 2025-06-19T14:32:02.934Z
Learning: The queue and active_queue_sessions tables in the migration script `migrations/versions/d429de0e5338_adopt_peer_paradigm.py` are expected to remain small, so per-row updates are acceptable and performance optimizations like set-based updates are not necessary.

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:188-194
Timestamp: 2025-06-18T15:58:51.202Z
Learning: In the migration file `migrations/versions/d429de0e5338_adopt_peer_paradigm.py`, the team has explicitly decided to accept SQL injection risks from f-string interpolation of schema names, despite the security concerns around DDL injection via the DATABASE_SCHEMA environment variable.

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:36-77
Timestamp: 2025-06-19T14:07:32.309Z
Learning: The migration script `migrations/versions/d429de0e5338_adopt_peer_paradigm.py` properly handles foreign key constraints during table updates by explicitly dropping foreign key constraints before making schema changes, preventing constraint violations during the migration process.

Learnt from: dr-frmr
PR: plastic-labs/honcho#106
File: migrations/versions/a1b2c3d4e5f6_initial_schema.py:172-179
Timestamp: 2025-05-13T15:48:42.783Z
Learning: The `queue` and `active_queue_sessions` tables in the database schema reference `sessions.id` (numeric) rather than `sessions.public_id` (text) because they are internal-only tables not exposed through the API, where performance considerations outweigh consistent identifier usage.

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.

src/models.py (8)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/models.py : All tables use text IDs (nanoid format) as primary keys

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/models.py : Use explicit type hints with SQLAlchemy mapped_column annotations

Learnt from: Rajat-Ahuja1997
PR: #166
File: migrations/versions/05486ce795d5_make_session_name_required_on_messages.py:60-114
Timestamp: 2025-07-21T20:57:42.135Z
Learning: Peer names in the Honcho codebase are restricted to the pattern r"^[a-zA-Z0-9_-]+$" (same as session names), which prevents SQL injection and JSON syntax errors when peer names are interpolated into SQL queries and JSON strings during migrations.

Learnt from: dr-frmr
PR: #131
File: tests/test_schema_validations.py:76-80
Timestamp: 2025-06-17T21:40:26.817Z
Learning: In the MessageCreate schema, peer_name is the actual field name with "peer_id" as an alias, so when constructing MessageCreate(peer_id="value"), it sets the peer_name attribute to "value".

Learnt from: Rajat-Ahuja1997
PR: #131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:438-441
Timestamp: 2025-06-18T16:44:57.733Z
Learning: In the Honcho database schema, messages.app_id references workspaces.public_id (string nanoid), not workspaces.id (integer primary key). This means that during migrations when workspaces.public_id is renamed to workspaces.id, the foreign key relationship with messages.app_id remains valid.

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/models.py : Feature flags on workspace, peer, and session levels

Learnt from: Rajat-Ahuja1997
PR: #131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:36-77
Timestamp: 2025-06-19T14:07:32.309Z
Learning: The migration script migrations/versions/d429de0e5338_adopt_peer_paradigm.py properly handles foreign key constraints during table updates by explicitly dropping foreign key constraints before making schema changes, preventing constraint violations during the migration process.

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/models.py : Composite foreign keys for multi-tenant relationships

migrations/versions/05486ce795d5_make_session_name_required_on_messages.py (10)

Learnt from: Rajat-Ahuja1997
PR: #131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:36-77
Timestamp: 2025-06-19T14:07:32.309Z
Learning: The migration script migrations/versions/d429de0e5338_adopt_peer_paradigm.py properly handles foreign key constraints during table updates by explicitly dropping foreign key constraints before making schema changes, preventing constraint violations during the migration process.

Learnt from: Rajat-Ahuja1997
PR: #131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:918-942
Timestamp: 2025-06-19T14:32:02.934Z
Learning: The queue and active_queue_sessions tables in the migration script migrations/versions/d429de0e5338_adopt_peer_paradigm.py are expected to remain small, so per-row updates are acceptable and performance optimizations like set-based updates are not necessary.

Learnt from: Rajat-Ahuja1997
PR: #131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:188-194
Timestamp: 2025-06-18T15:58:51.202Z
Learning: In the migration file migrations/versions/d429de0e5338_adopt_peer_paradigm.py, the team has explicitly decided to accept SQL injection risks from f-string interpolation of schema names, despite the security concerns around DDL injection via the DATABASE_SCHEMA environment variable.

Learnt from: Rajat-Ahuja1997
PR: #131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:438-441
Timestamp: 2025-06-18T16:44:57.733Z
Learning: In the Honcho database schema, messages.app_id references workspaces.public_id (string nanoid), not workspaces.id (integer primary key). This means that during migrations when workspaces.public_id is renamed to workspaces.id, the foreign key relationship with messages.app_id remains valid.

Learnt from: Rajat-Ahuja1997
PR: #166
File: migrations/versions/05486ce795d5_make_session_name_required_on_messages.py:60-114
Timestamp: 2025-07-21T20:57:42.135Z
Learning: Peer names in the Honcho codebase are restricted to the pattern r"^[a-zA-Z0-9_-]+$" (same as session names), which prevents SQL injection and JSON syntax errors when peer names are interpolated into SQL queries and JSON strings during migrations.

Learnt from: dr-frmr
PR: #131
File: tests/test_schema_validations.py:76-80
Timestamp: 2025-06-17T21:40:26.817Z
Learning: In the MessageCreate schema, peer_name is the actual field name with "peer_id" as an alias, so when constructing MessageCreate(peer_id="value"), it sets the peer_name attribute to "value".

Learnt from: dr-frmr
PR: #131
File: src/routers/sessions.py:206-213
Timestamp: 2025-06-18T20:42:06.458Z
Learning: The get_or_create_session function in this codebase is designed to handle both session creation and adding peers to existing sessions. When called with peers, it will add those peers to an existing session rather than creating a duplicate session.

Learnt from: Rajat-Ahuja1997
PR: #131
File: src/crud.py:503-505
Timestamp: 2025-06-18T14:50:59.967Z
Learning: The Honcho project prefers upsert behavior for update operations across all resources (sessions, peers, workspaces). Update operations should create the resource if it doesn't exist rather than failing fast. This is an explicit design decision that differs from typical REST semantics but provides a more forgiving API experience.

Learnt from: Rajat-Ahuja1997
PR: #144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.

Learnt from: VVoruganti
PR: #144
File: src/crud.py:118-125
Timestamp: 2025-06-25T17:34:47.676Z
Learning: In embedding generation contexts (like message creation), prefer fail-fast error handling over graceful recovery to ensure data integrity. If embeddings cannot be generated when required, the operation should fail rather than continue with incomplete data.

🪛 Ruff (0.12.2)

migrations/versions/05486ce795d5_make_session_name_required_on_messages.py

42-42: Possible SQL injection vector through string-based query construction

(S608)

42-42: Trailing comma missing

Add trailing comma

(COM812)

50-54: Possible SQL injection vector through string-based query construction

(S608)

54-54: Trailing comma missing

Add trailing comma

(COM812)

60-60: Trailing comma missing

Add trailing comma

(COM812)

68-68: Trailing comma missing

Add trailing comma

(COM812)

73-86: Possible SQL injection vector through string-based query construction

(S608)

86-86: Trailing comma missing

Add trailing comma

(COM812)

91-103: Possible SQL injection vector through string-based query construction

(S608)

103-103: Trailing comma missing

Add trailing comma

(COM812)

106-106: Trailing comma missing

Add trailing comma

(COM812)

111-117: Possible SQL injection vector through string-based query construction

(S608)

117-117: Trailing comma missing

Add trailing comma

(COM812)

122-128: Possible SQL injection vector through string-based query construction

(S608)

128-128: Trailing comma missing

Add trailing comma

(COM812)

133-133: Possible SQL injection vector through string-based query construction

(S608)

133-133: Trailing comma missing

Add trailing comma

(COM812)

137-139: Create your own exception

(TRY002)

137-139: Avoid specifying long messages outside the exception class

(TRY003)

138-138: Trailing comma missing

Add trailing comma

(COM812)

148-148: Possible SQL injection vector through string-based query construction

(S608)

148-148: Trailing comma missing

Add trailing comma

(COM812)

149-149: Trailing comma missing

Add trailing comma

(COM812)

153-155: Create your own exception

(TRY002)

153-155: Avoid specifying long messages outside the exception class

(TRY003)

154-154: Trailing comma missing

Add trailing comma

(COM812)

167-167: Trailing comma missing

Add trailing comma

(COM812)

179-179: Trailing comma missing

Add trailing comma

(COM812)

185-185: Trailing comma missing

Add trailing comma

(COM812)

🧰 Additional context used

📓 Path-based instructions (3)

src/**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.py: Follow isort conventions with absolute imports preferred
snake_case for variables/functions; PascalCase for classes
Line length: 88 chars (Black compatible)
Explicit error handling with appropriate exception types
Docstrings: Use Google style docstrings
Use environment variables via python-dotenv (.env)
Use specific exception types (ResourceNotFoundException, ValidationException, etc.)
Proper logging with context instead of print statements

Files:

src/models.py

src/models.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/models.py: Use explicit type hints with SQLAlchemy mapped_column annotations
All tables use text IDs (nanoid format) as primary keys
Composite foreign keys for multi-tenant relationships
Feature flags on workspace, peer, and session levels
JSONB metadata fields for extensibility
HNSW indexes for vector similarity search

Files:

src/models.py

src/{models,crud}.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

Token counting on messages for usage tracking

Files:

src/models.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:438-441
Timestamp: 2025-06-18T16:44:57.733Z
Learning: In the Honcho database schema, messages.app_id references workspaces.public_id (string nanoid), not workspaces.id (integer primary key). This means that during migrations when workspaces.public_id is renamed to workspaces.id, the foreign key relationship with messages.app_id remains valid.

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:918-942
Timestamp: 2025-06-19T14:32:02.934Z
Learning: The queue and active_queue_sessions tables in the migration script `migrations/versions/d429de0e5338_adopt_peer_paradigm.py` are expected to remain small, so per-row updates are acceptable and performance optimizations like set-based updates are not necessary.

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:188-194
Timestamp: 2025-06-18T15:58:51.202Z
Learning: In the migration file `migrations/versions/d429de0e5338_adopt_peer_paradigm.py`, the team has explicitly decided to accept SQL injection risks from f-string interpolation of schema names, despite the security concerns around DDL injection via the DATABASE_SCHEMA environment variable.

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#131
File: migrations/versions/d429de0e5338_adopt_peer_paradigm.py:36-77
Timestamp: 2025-06-19T14:07:32.309Z
Learning: The migration script `migrations/versions/d429de0e5338_adopt_peer_paradigm.py` properly handles foreign key constraints during table updates by explicitly dropping foreign key constraints before making schema changes, preventing constraint violations during the migration process.

Learnt from: dr-frmr
PR: plastic-labs/honcho#106
File: migrations/versions/a1b2c3d4e5f6_initial_schema.py:172-179
Timestamp: 2025-05-13T15:48:42.783Z
Learning: The `queue` and `active_queue_sessions` tables in the database schema reference `sessions.id` (numeric) rather than `sessions.public_id` (text) because they are internal-only tables not exposed through the API, where performance considerations outweigh consistent identifier usage.

Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.