Bug
Self-hosted Honcho v3.0.6 with DERIVER_PROVIDER=groq (and DIALECTIC_LEVELS__*__PROVIDER=groq, SUMMARY_PROVIDER=groq, DREAM_PROVIDER=groq) crashes on every structured-output LLM call with:
WARNING - Error on attempt 1/3 with groq/llama-3.3-70b-versatile: Object of type ModelMetaclass is not JSON serializable
INFO - Will retry with attempt 2/3
... (3 retries, all fail) ...
ERROR - Error processing representation batch for work unit representation:claude_code:anan-thinkpad-server:claude: RetryError[<Future at 0x... state=finished raised TypeError>]
Result: queue items pile up unprocessed, peer card / observations never get extracted, /v3/.../peers/{name}/card stays empty.
Root cause
In src/utils/clients.py the Groq branch passes the Pydantic class verbatim to response_format:
- Line 2301 (non-streaming):
if response_model:
groq_params[\"response_format\"] = response_model
- Line 2559 (streaming): same pattern.
The Groq Python SDK then tries to JSON-serialize that class to send over the wire, which fails because ModelMetaclass is not JSON-serializable. Compare to the OpenAI branch in the same file, which uses client.chat.completions.parse(response_format=response_model, ...) — the OpenAI SDK accepts the class directly, but the Groq SDK does not.
Groq's API expects either:
{\"type\": \"json_object\"}, or
{\"type\": \"json_schema\", \"json_schema\": {\"name\": \"...\", \"schema\": <dict>, \"strict\": true}}
Right now neither is sent — the raw Pydantic class is.
Repro
.env (relevant slice):
LLM_GROQ_API_KEY=gsk_...
DERIVER_PROVIDER=groq
DERIVER_MODEL=llama-3.3-70b-versatile
DERIVER_FLUSH_ENABLED=true # makes it fire on every message; bug also occurs without this, just slower to surface
DIALECTIC_LEVELS__minimal__PROVIDER=groq
DIALECTIC_LEVELS__minimal__MODEL=llama-3.3-70b-versatile
DIALECTIC_LEVELS__minimal__THINKING_BUDGET_TOKENS=0
DIALECTIC_LEVELS__minimal__MAX_TOOL_ITERATIONS=1
# (and same for low/medium/high/max, SUMMARY_*, DREAM_*)
- Bring up Honcho v3.0.6 via the example compose (api+deriver from source).
- POST a few messages to a session.
docker compose logs deriver -f shows the error above on every attempt.
- Queue rows accumulate with
processed=false:
SELECT count(*) FILTER (WHERE processed) AS done,
count(*) FILTER (WHERE NOT processed) AS pending
FROM queue WHERE task_type = 'representation';
Expected
Honcho should send a Groq-compatible response_format payload, parse the returned JSON, and validate it against response_model (the same way it already does in the post-response branch at lines ~2316–2334).
Suggested fix
Two safe options at lines 2301 and 2559:
Option A — use Groq json_schema strict mode (preferred — Groq supports it for several models including llama-3.3-70b-versatile):
if response_model:
groq_params[\"response_format\"] = {
\"type\": \"json_schema\",
\"json_schema\": {
\"name\": response_model.__name__,
\"schema\": response_model.model_json_schema(),
\"strict\": True,
},
}
elif json_mode:
groq_params[\"response_format\"] = {\"type\": \"json_object\"}
Option B — fall through to plain json_object mode and rely on the existing post-call model_validate (less strict, broader model compatibility):
if response_model or json_mode:
groq_params[\"response_format\"] = {\"type\": \"json_object\"}
The existing parsing block at 2316–2334 already does response_model.model_validate(json.loads(content)), so Option B works without any other change. Option A gives the model a schema hint and tighter outputs.
The same change is needed at the streaming site (line 2559).
Environment
- Honcho: v3.0.6 (commit 317b4a6)
- Deployment: self-hosted via included
docker-compose.yml.example (built from source)
- Python: 3.13 (from upstream Dockerfile)
groq package version: whatever ships with uv sync in v3.0.6
- Embeddings work fine (using
LLM_EMBEDDING_PROVIDER=openrouter, no Groq involvement)
AUTH_USE_AUTH=true, JWT-based auth working
- All non-Groq paths (peer create, message ingest, queue enqueue, reconciler
sync_vectors, cleanup_queue) work normally
- Workaround: switch
*_PROVIDER from groq to another provider (openrouter via LLM_OPENAI_COMPATIBLE_*, anthropic, openai, gemini) and the deriver processes successfully
Impact
Anyone who picks Groq as a provider for a chat feature in self-hosted v3.0.6 will hit this on the very first message. The free Groq tier is the obvious choice for low-volume self-hosters, so this likely blocks a non-trivial slice of new self-hosted deployments.
Bug
Self-hosted Honcho v3.0.6 with
DERIVER_PROVIDER=groq(andDIALECTIC_LEVELS__*__PROVIDER=groq,SUMMARY_PROVIDER=groq,DREAM_PROVIDER=groq) crashes on every structured-output LLM call with:Result: queue items pile up unprocessed, peer card / observations never get extracted,
/v3/.../peers/{name}/cardstays empty.Root cause
In
src/utils/clients.pythe Groq branch passes the Pydantic class verbatim toresponse_format:The Groq Python SDK then tries to JSON-serialize that class to send over the wire, which fails because
ModelMetaclassis not JSON-serializable. Compare to the OpenAI branch in the same file, which usesclient.chat.completions.parse(response_format=response_model, ...)— the OpenAI SDK accepts the class directly, but the Groq SDK does not.Groq's API expects either:
{\"type\": \"json_object\"}, or{\"type\": \"json_schema\", \"json_schema\": {\"name\": \"...\", \"schema\": <dict>, \"strict\": true}}Right now neither is sent — the raw Pydantic class is.
Repro
.env(relevant slice):docker compose logs deriver -fshows the error above on every attempt.processed=false:Expected
Honcho should send a Groq-compatible
response_formatpayload, parse the returned JSON, and validate it againstresponse_model(the same way it already does in the post-response branch at lines ~2316–2334).Suggested fix
Two safe options at lines 2301 and 2559:
Option A — use Groq json_schema strict mode (preferred — Groq supports it for several models including
llama-3.3-70b-versatile):Option B — fall through to plain json_object mode and rely on the existing post-call
model_validate(less strict, broader model compatibility):The existing parsing block at 2316–2334 already does
response_model.model_validate(json.loads(content)), so Option B works without any other change. Option A gives the model a schema hint and tighter outputs.The same change is needed at the streaming site (line 2559).
Environment
docker-compose.yml.example(built from source)groqpackage version: whatever ships withuv syncin v3.0.6LLM_EMBEDDING_PROVIDER=openrouter, no Groq involvement)AUTH_USE_AUTH=true, JWT-based auth workingsync_vectors,cleanup_queue) work normally*_PROVIDERfromgroqto another provider (openrouterviaLLM_OPENAI_COMPATIBLE_*,anthropic,openai,gemini) and the deriver processes successfullyImpact
Anyone who picks Groq as a provider for a chat feature in self-hosted v3.0.6 will hit this on the very first message. The free Groq tier is the obvious choice for low-volume self-hosters, so this likely blocks a non-trivial slice of new self-hosted deployments.