Synbad is a tool for detecting bugs in LLM inference providers, especially open-source ones. Synbad is maintained by Synthetic, as part of our efforts to keep our inference quality as high as possible.
If you find bugs in Synthetic's model hosting, please contribute the bugs here! We will fix them.
Synbad is distributed through npm. Install it with:
npm install -g @syntheticlab/synbadWe keep a running tally of provider+model results for tool calling and reasoning parsing for GLM-4.7, Kimi K2 Thinking, and MiniMax M2. Feel free to add more provider results!
| Provider | Model | Success Rate |
|---|---|---|
| Synthetic.new | GLM-4.7 | ✅ 100% |
| Synthetic.new | Kimi K2 Thinking | ✅ 100% |
| Synthetic.new | MiniMax M2 | ✅ 100% |
| Provider | Model | Success Rate |
|---|---|---|
| Fireworks | GLM-4.7 | ❌ 83% |
| Fireworks | Kimi K2 Thinking | ❌ 92% |
| Fireworks | MiniMax M2 | ✅ 100% |
| Provider | Model | Success Rate |
|---|---|---|
| Together | Kimi K2 Thinking | ❌ 66% |
| Provider | Model | Success Rate |
|---|---|---|
| Parasail | GLM-4.7 | ❌ 83% |
| Parasail | Kimi K2 Thinking | ❌ 75% |
Note for attempting reproductions: generally all tests are reproducible with
--count 1 and --count 1 --stream, but for evaluating the
response-in-reasoning eval, you generally will need a high count to reproduce
the bug: --count 40 and --count 40 --stream typically is sufficient.
All evals must pass both with and without Synbad's --stream parameter (which
tests streaming APIs) to be considered a pass.
If you already have some problematic JSON, head over to the Contributing section. If you don't, don't worry! Synbad makes it easy to capture the problematic JSON you're encountering.
First, run the Synbad Proxy, specifying the local port you want to use and the
inference host you want to target. For example, to forward requests from
localhost:3000 to Synthetic's API, you'd do:
synbad proxy -p 3000 -t https://api.synthetic.new/openai/v1Then, configure your coding agent — or whichever local tool you're using — to
point to http://localhost:3000 (or whichever port you selected). The Synbad
Proxy will log all request bodies to stdout, so all you need to do is
reproduce the bug by using your tool or coding agent, and then copy the JSON it
printed to stdout.
Now you have reproducible JSON to file a bug via Synbad!
First, clone this repo from Github. Then cd into it and run:
npm installAll inference evals are stored in the evals/ directory. They're written in
TypeScript. You need to export two things from an eval:
- The JSON that reproduces the problem, as the const
json. It doesn't have to reproduce it 100% of the time; if the bug appears even 5% of the time, that's fine. - A
testfunction that runs some asserts on the returned assistant message, which detect the error.
For example, we can test parallel tool call support very simply (as we do in the
evals/tools/parallel-tool.ts file):
import * as assert from "../../source/asserts.ts";
import { ChatMessage } from "../../source/chat-completion.ts";
export function test({ tool_calls }: ChatMessage) {
assert.isNotNullish(tool_calls);
assert.isNotEmptyArray(tool_calls);
assert.strictEqual(tool_calls.length, 2);
}
export const json = {
"messages": [
{"role": "user", "content": "What's the weather in Paris and London?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
],
"parallel_tool_calls": true,
"tool_choice": "auto",
}The asserts.ts file re-exports all of the built-in NodeJS assertion
functions, and also adds a few extra ones, e.g. isNotNullish which checks
whether an object is null or undefined.
To run your new eval, use the synbad.sh script in this repo, which
auto-recompiles everything (including your new test!) before running the evals.
Assuming you're testing the evals/reasoning/reasoning-parsing test, for
GLM-4.6 on Synthetic, and you want to run it 5 times since it isn't
consistently failing:
./synbad.sh eval --env-var SYNTHETIC_API_KEY \
--base-url "https://api.synthetic.new/openai/v1" \
--only evals/reasoning/reasoning-parsing \
--model "hf:zai-org/GLM-4.6" \
--count 5The OpenAI spec didn't originally include reasoning content parsing, since the original OpenAI models didn't reason. The open-source community added support for reasoning later, but there are two competing specs:
- Storing the reasoning content in
message.reasoning_content, or - Storing the reasoning content in
message.reasoning.
To make sure your evals work with a wider range of inference providers, use
the getReasoning function when testing reasoning parsing like so:
import { getReasoning } from "../../source/chat-completion.ts";
// In your test:
const reasoning = getReasoning(message);This ensures your test will use the correct reasoning content data regardless of which spec the underlying inference provider is using.
First, install it:
npm install -g @syntheticlab/synbadThen run:
synbad eval --env-var SYNTHETIC_API_KEY \
--base-url "https://api.synthetic.new/openai/v1" \
--model "hf:zai-org/GLM-4.6"