# Introduction
Source: https://docs.zeroeval.com/autotune/introduction

Run evaluations on models and prompts to find the best variants for your agents

Prompt optimization is a different approach to the traditional evals experience. Instead of setting up complex eval pipelines, we simply ingest your production traces and let you optimize your prompts based on your feedback.

## How it works

<Steps>
  <Step title="Instrument your code">
    Replace hardcoded prompts with `ze.prompt()` calls in Python or `ze.prompt({...})` in TypeScript
  </Step>

  <Step title="Every change creates a version">
    Each time you modify your prompt content, a new version is automatically created and tracked
  </Step>

  <Step title="Collect performance data">
    ZeroEval automatically tracks all LLM interactions and their outcomes
  </Step>

  <Step title="Tune and evaluate">
    Use the UI to run experiments, vote on outputs, and identify the best prompt/model combinations
  </Step>

  <Step title="One-click model deployments">
    Winning configurations are automatically deployed to your application without code changes
  </Step>
</Steps>

<CardGroup>
  <Card title="Setup Guide" icon="wrench" href="/autotune/setup">
    Learn how to integrate ze.prompt() into your Python or TypeScript codebase
  </Card>

  <Card title="Prompts Guide" icon="sliders" href="/autotune/prompts">
    Run experiments and deploy winning combinations
  </Card>
</CardGroup>


# Models
Source: https://docs.zeroeval.com/autotune/prompts/models

Evaluate your agent's performance across multiple models

<video />

ZeroEval lets you evaluate real production traces of specific agent tasks across different models, then ranking them over time. This helps you pick the best model for each part of your agent.


# Prompts
Source: https://docs.zeroeval.com/autotune/prompts/prompts

Use feedback on production traces to generate and validate better prompts

<video alt="Prompt optimizations" />

ZeroEval derives prompt optimization suggestions directly from feedback on your production traces. By capturing preferences and correctness signals, we provide concrete prompt edits you can test and use for your agents.

## Submitting Feedback

Feedback is the foundation of prompt optimization. You can submit feedback for completions through the ZeroEval dashboard, the Python SDK, or the public API. Feedback helps ZeroEval understand what good and bad outputs look like for your specific use case.

### Feedback through the dashboard

The easiest way to provide feedback is through the ZeroEval dashboard. Navigate to your task's "Suggestions" tab, review incoming completions, and provide thumbs up/down feedback with optional reasons and expected outputs.

### Feedback through the SDK

For programmatic feedback submission, use the Python or TypeScript SDK. This is useful when you have automated evaluation systems or want to collect feedback from your application in production.

<CodeGroup>
  ```python Python theme={null}
  import zeroeval as ze

  ze.init()

  # Send feedback for a specific completion
  ze.send_feedback(
      prompt_slug="support-bot",
      completion_id="550e8400-e29b-41d4-a716-446655440000",
      thumbs_up=False,
      reason="Response was too verbose",
      expected_output="A concise 2-3 sentence response"
  )
  ```

  ```typescript TypeScript theme={null}
  import * as ze from 'zeroeval';

  ze.init();

  // Send feedback for a specific completion
  await ze.sendFeedback({
    promptSlug: "support-bot",
    completionId: "550e8400-e29b-41d4-a716-446655440000",
    thumbsUp: false,
    reason: "Response was too verbose",
    expectedOutput: "A concise 2-3 sentence response"
  });
  ```
</CodeGroup>

#### Parameters

| Python            | TypeScript       | Type             | Required | Description                                                  |
| ----------------- | ---------------- | ---------------- | -------- | ------------------------------------------------------------ |
| `prompt_slug`     | `promptSlug`     | `str`/`string`   | Yes      | The slug/name of your prompt (same as used in `ze.prompt()`) |
| `completion_id`   | `completionId`   | `str`/`string`   | Yes      | The UUID of the completion to provide feedback on            |
| `thumbs_up`       | `thumbsUp`       | `bool`/`boolean` | Yes      | `True`/`true` for positive, `False`/`false` for negative     |
| `reason`          | `reason`         | `str`/`string`   | No       | Optional explanation of why you gave this feedback           |
| `expected_output` | `expectedOutput` | `str`/`string`   | No       | Optional description of what the expected output should be   |
| `metadata`        | `metadata`       | `dict`/`object`  | No       | Optional additional metadata to attach to the feedback       |

<Note>
  The `completion_id` is automatically tracked when you use `ze.prompt()` with automatic tracing enabled. You can access it from the OpenAI response object's `id` field, or retrieve it from your traces in the dashboard.
</Note>

#### Complete example with feedback

<CodeGroup>
  ```python Python theme={null}
  import zeroeval as ze
  from openai import OpenAI

  ze.init()
  client = OpenAI()

  # Define your prompt - ZeroEval will automatically use the latest optimized
  # version from your dashboard if one exists, falling back to this content
  system_prompt = ze.prompt(
      name="support-bot",
      content="You are a helpful customer support agent."
  )

  # Make a completion
  response = client.chat.completions.create(
      model="gpt-4",
      messages=[
          {"role": "system", "content": system_prompt},
          {"role": "user", "content": "How do I reset my password?"}
      ]
  )

  # Get the completion ID and text
  completion_id = response.id
  completion_text = response.choices[0].message.content

  # Evaluate the response (manually or automatically)
  is_good_response = evaluate_response(completion_text)

  # Send feedback based on evaluation
  ze.send_feedback(
      prompt_slug="support-bot",
      completion_id=completion_id,
      thumbs_up=is_good_response,
      reason="Clear step-by-step instructions" if is_good_response else "Missing link to reset page",
      expected_output=None if is_good_response else "Should include direct link: https://app.example.com/reset"
  )
  ```

  ```typescript TypeScript theme={null}
  import * as ze from 'zeroeval';
  import { OpenAI } from 'openai';

  ze.init();
  const client = ze.wrap(new OpenAI());

  // Define your prompt - ZeroEval will automatically use the latest optimized
  // version from your dashboard if one exists, falling back to this content
  const systemPrompt = await ze.prompt({
    name: "support-bot",
    content: "You are a helpful customer support agent."
  });

  // Make a completion
  const response = await client.chat.completions.create({
    model: "gpt-4",
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: "How do I reset my password?" }
    ]
  });

  // Get the completion ID and text
  const completionId = response.id;
  const completionText = response.choices[0].message.content;

  // Evaluate the response (manually or automatically)
  const isGoodResponse = evaluateResponse(completionText);

  // Send feedback based on evaluation
  await ze.sendFeedback({
    promptSlug: "support-bot",
    completionId: completionId,
    thumbsUp: isGoodResponse,
    reason: isGoodResponse ? "Clear step-by-step instructions" : "Missing link to reset page",
    expectedOutput: isGoodResponse ? undefined : "Should include direct link: https://app.example.com/reset"
  });
  ```
</CodeGroup>

<Note>
  **Auto-optimization**: When you use `ze.prompt()` with `content`, ZeroEval automatically fetches the latest optimized version from your dashboard if one exists. Your `content` serves as a fallback for initial setup. This means your prompts improve automatically as you tune them, without any code changes.

  If you need to test the hardcoded content specifically (e.g., for debugging or A/B testing), use `from_="explicit"` (Python) or `from: "explicit"` (TypeScript):

  <CodeGroup>
    ```python Python theme={null}
    # Bypass auto-optimization and always use this exact content
    prompt = ze.prompt(
        name="support-bot",
        from_="explicit",
        content="You are a helpful customer support agent."
    )
    ```

    ```typescript TypeScript theme={null}
    // Bypass auto-optimization and always use this exact content
    const prompt = await ze.prompt({
      name: "support-bot",
      from: "explicit",
      content: "You are a helpful customer support agent."
    });
    ```
  </CodeGroup>
</Note>

### Feedback through the API

For integration with non-Python systems or direct API access, you can submit feedback using the public HTTP API.

#### Endpoint

```
POST /v1/prompts/{prompt_slug}/completions/{completion_id}/feedback
```

#### Authentication

Requires API key authentication via the `Authorization` header:

```
Authorization: Bearer YOUR_API_KEY
```

#### Request body

```json theme={null}
{
  "thumbs_up": false,
  "reason": "Response was inaccurate",
  "expected_output": "The correct answer should mention X, Y, and Z",
  "metadata": {
    "evaluated_by": "automated_system",
    "evaluation_score": 0.45
  }
}
```

#### Response

```json theme={null}
{
  "id": "fb123e45-67f8-90ab-cdef-1234567890ab",
  "completion_id": "550e8400-e29b-41d4-a716-446655440000",
  "prompt_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "prompt_version_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
  "project_id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
  "thumbs_up": false,
  "reason": "Response was inaccurate",
  "expected_output": "The correct answer should mention X, Y, and Z",
  "metadata": {
    "evaluated_by": "automated_system",
    "evaluation_score": 0.45
  },
  "created_by": "user_id",
  "created_at": "2025-11-22T10:30:00Z",
  "updated_at": "2025-11-22T10:30:00Z"
}
```

#### Example with cURL

```bash theme={null}
curl -X POST https://api.zeroeval.com/v1/prompts/support-bot/completions/550e8400-e29b-41d4-a716-446655440000/feedback \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "thumbs_up": false,
    "reason": "Response was too vague",
    "expected_output": "Should provide specific steps",
    "metadata": {
      "user_satisfaction": "low"
    }
  }'
```

<Warning>
  If feedback already exists for the same completion from the same user, it will be updated with the new values. This allows you to correct or refine feedback as needed.
</Warning>

## Prompt optimizations from feedback

Once you've given a good amount of feedback on the incoming traffic for a given task, you can generate prompt optimizations using that feedback by clicking on the "Optimize Prompt" button in the "Suggestions" tab for the task.

<video alt="Prompt optimizations with feedback" />

Once you've generated a new prompt, you can test it with various models and see how it performs against the feedback you've already given.

<video alt="Model leaderboard" />


# Reference
Source: https://docs.zeroeval.com/autotune/reference

Parameters and configuration for ze.prompt

`ze.prompt` creates or fetches versioned prompts from the Prompt Library and returns decorated content for downstream LLM calls.

<Info>
  **TypeScript differences**: In TypeScript, `ze.prompt()` is an async function that returns `Promise<string>`. Parameters use camelCase and are passed as an options object: `ze.prompt({ name: "...", content: "..." })`.
</Info>

## Parameters

| Python      | TypeScript  | Type        | Required | Default            | Description                                                |
| ----------- | ----------- | ----------- | -------- | ------------------ | ---------------------------------------------------------- |
| `name`      | `name`      | string      | yes      | —                  | Task name associated with the prompt in the library        |
| `content`   | `content`   | string      | no       | `None`/`undefined` | Raw prompt content to ensure/create a version by content   |
| `from_`     | `from`      | string      | no       | `None`/`undefined` | Either `"latest"`, `"explicit"`, or a 64‑char SHA‑256 hash |
| `variables` | `variables` | dict/object | no       | `None`/`undefined` | Template variables to render `{{variable}}` tokens         |

Notes:

* In Python, use `from_` (with underscore) as `from` is a reserved keyword. TypeScript uses `from` directly.
* Exactly one of `content` or `from` must be provided (except when using `from: "explicit"` with `content`).
* `from="latest"` fetches the latest version bound to the task; otherwise `from` must be a 64‑char hex SHA‑256 hash.

## Behavior

* **content provided**: Computes a normalized SHA‑256 hash, ensures a prompt version exists for `name`, and returns decorated content.
* **from="latest"**: Fetches the latest version for `name` and returns decorated content.
* **from=**`<hash>`: Fetches by content hash for `name` and returns decorated content.

Decoration adds a compact metadata header used by integrations:

* `task`, `prompt_slug`, `prompt_version`, `prompt_version_id`, `variables`, and (when created by content) `content_hash`.

OpenAI integration: when `prompt_version_id` is present, the SDK will automatically patch the `model` parameter to the model bound to that prompt version.

## Return Value

* **Python**: `str` - Decorated prompt content ready to pass into LLM clients.
* **TypeScript**: `Promise<string>` - Async function returning decorated prompt content.

## Errors

| Python                | TypeScript            | When                                                                                   |
| --------------------- | --------------------- | -------------------------------------------------------------------------------------- |
| `ValueError`          | `Error`               | Both `content` and `from` provided (except explicit), or neither; invalid `from` value |
| `PromptRequestError`  | `PromptRequestError`  | `from="latest"` but no versions exist for `name`                                       |
| `PromptNotFoundError` | `PromptNotFoundError` | `from` is a hash that does not exist for `name`                                        |

## Examples

<CodeGroup>
  ```python Python theme={null}
  import zeroeval as ze

  # Create/ensure a version by content
  system = ze.prompt(
      name="support-triage",
      content="You are a helpful assistant for {{product}}.",
      variables={"product": "Acme"},
  )

  # Fetch the latest version for this task
  system = ze.prompt(name="support-triage", from_="latest")

  # Fetch a specific version by content hash
  system = ze.prompt(name="support-triage", from_="c6a7...deadbeef...0123")
  ```

  ```typescript TypeScript theme={null}
  import * as ze from 'zeroeval';

  // Create/ensure a version by content
  const system = await ze.prompt({
    name: "support-triage",
    content: "You are a helpful assistant for {{product}}.",
    variables: { product: "Acme" },
  });

  // Fetch the latest version for this task
  const system = await ze.prompt({ name: "support-triage", from: "latest" });

  // Fetch a specific version by content hash
  const system = await ze.prompt({ name: "support-triage", from: "c6a7...deadbeef...0123" });
  ```
</CodeGroup>


# Setup
Source: https://docs.zeroeval.com/autotune/setup

Getting started with autotune

ZeroEval's autotune feature allows you to continuously improve your prompts and automatically deploy the best-performing models. The setup is simple and powerful.

<img alt="Setup" />

## Getting started (\<5 mins)

Replace hardcoded prompts with `ze.prompt()` and include the name of the specific part of your agent that you want to tune.

<CodeGroup>
  ```python Python theme={null}
  # Before
  prompt = "You are a helpful assistant"

  # After - with autotune
  prompt = ze.prompt(
      name="assistant",
      content="You are a helpful assistant"
  )
  ```

  ```typescript TypeScript theme={null}
  // Before
  const prompt = "You are a helpful assistant";

  // After - with autotune
  const prompt = await ze.prompt({
    name: "assistant",
    content: "You are a helpful assistant"
  });
  ```
</CodeGroup>

That's it! You'll start seeing production traces in your dashboard for this specific task at [`ZeroEval › Prompts › [task_name]`](https://app.zeroeval.com).

<Note>
  **Auto-tune behavior:** When you provide `content`, ZeroEval automatically uses the latest optimized version from your dashboard if one exists. The `content` parameter serves as a fallback for when no optimized versions are available yet. This means you can hardcode a default prompt in your code, but ZeroEval will seamlessly swap in tuned versions without any code changes.

  To explicitly use the hardcoded content and bypass auto-optimization, use `from_="explicit"` (Python) or `from: "explicit"` (TypeScript):

  <CodeGroup>
    ```python Python theme={null}
    prompt = ze.prompt(
        name="assistant",
        from_="explicit",
        content="You are a helpful assistant"
    )
    ```

    ```typescript TypeScript theme={null}
    const prompt = await ze.prompt({
      name: "assistant",
      from: "explicit",
      content: "You are a helpful assistant"
    });
    ```
  </CodeGroup>
</Note>

## Pushing models to production

Once you see a model that performs well, you can send it to production with a single click, as seen below.

<img alt="Model deployment" />

Your specified model gets replaced automatically any time you use the prompt from `ze.prompt()`, as seen below.

<CodeGroup>
  ```python Python theme={null}
  # You write this
  response = client.chat.completions.create(
      model="gpt-4",  # ← Gets replaced!
      messages=[{"role": "system", "content": prompt}]
  )
  ```

  ```typescript TypeScript theme={null}
  // You write this
  const response = await openai.chat.completions.create({
    model: "gpt-4",  // ← Gets replaced!
    messages: [{ role: "system", content: prompt }]
  });
  ```
</CodeGroup>

## Example

Here's autotune in action for a simple customer support bot:

<CodeGroup>
  ```python Python theme={null}
  import zeroeval as ze
  from openai import OpenAI

  ze.init()
  client = OpenAI()

  # Define your prompt with version tracking
  system_prompt = ze.prompt(
      name="support-bot",
      content="""You are a customer support agent for {{company}}.
      Be helpful, concise, and professional.""",
      variables={"company": "TechCorp"}
  )

  # Use it normally - model gets patched automatically
  response = client.chat.completions.create(
      model="gpt-4",  # This might run claude-3-sonnet in production!
      messages=[
          {"role": "system", "content": system_prompt},
          {"role": "user", "content": "I need help with my order"}
      ]
  )
  ```

  ```typescript TypeScript theme={null}
  import * as ze from 'zeroeval';
  import { OpenAI } from 'openai';

  ze.init();
  const client = ze.wrap(new OpenAI());

  // Define your prompt with version tracking
  const systemPrompt = await ze.prompt({
    name: "support-bot",
    content: `You are a customer support agent for {{company}}.
      Be helpful, concise, and professional.`,
    variables: { company: "TechCorp" }
  });

  // Use it normally - model gets patched automatically
  const response = await client.chat.completions.create({
    model: "gpt-4",  // This might run claude-3-sonnet in production!
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: "I need help with my order" }
    ]
  });
  ```
</CodeGroup>

## Understanding Prompt Versions

ZeroEval automatically manages prompt versions for you. When you use `ze.prompt()` with `content`, the SDK will:

1. **Check for optimized versions**: First, it tries to fetch the latest optimized version from your dashboard
2. **Fall back to your content**: If no optimized versions exist yet, it uses the `content` you provided
3. **Create a version**: Your provided content is stored as the initial version for this task

This means you get the best of both worlds: hardcoded fallback prompts in your code, with automatic optimization in production.

<CodeGroup>
  ```python Python theme={null}
  # This will use the latest optimized version if one exists in your dashboard
  # Otherwise, it uses the content you provide here
  prompt = ze.prompt(
      name="customer-support",
      content="You are a helpful assistant."
  )
  ```

  ```typescript TypeScript theme={null}
  // This will use the latest optimized version if one exists in your dashboard
  // Otherwise, it uses the content you provide here
  const prompt = await ze.prompt({
    name: "customer-support",
    content: "You are a helpful assistant."
  });
  ```
</CodeGroup>

### Explicit version control

If you need more control over which version to use:

<CodeGroup>
  ```python Python theme={null}
  # Always use the latest optimized version (fails if none exists)
  prompt = ze.prompt(
      name="customer-support",
      from_="latest"
  )

  # Always use the hardcoded content (bypass auto-optimization)
  prompt = ze.prompt(
      name="customer-support",
      from_="explicit",
      content="You are a helpful assistant."
  )

  # Use a specific version by its content hash
  prompt = ze.prompt(
      name="customer-support",
      from_="a1b2c3d4..."  # 64-character SHA-256 hash
  )
  ```

  ```typescript TypeScript theme={null}
  // Always use the latest optimized version (fails if none exists)
  const prompt = await ze.prompt({
    name: "customer-support",
    from: "latest"
  });

  // Always use the hardcoded content (bypass auto-optimization)
  const prompt = await ze.prompt({
    name: "customer-support",
    from: "explicit",
    content: "You are a helpful assistant."
  });

  // Use a specific version by its content hash
  const prompt = await ze.prompt({
    name: "customer-support",
    from: "a1b2c3d4..."  // 64-character SHA-256 hash
  });
  ```
</CodeGroup>

### When to use each mode

| Mode                                                  | Use Case                                            | Behavior                            |
| ----------------------------------------------------- | --------------------------------------------------- | ----------------------------------- |
| `content` only                                        | **Recommended for most cases**                      | Auto-optimization with fallback     |
| `from_="explicit"` (Python) / `from: "explicit"` (TS) | Testing, debugging, or A/B testing specific prompts | Always use hardcoded content        |
| `from_="latest"` (Python) / `from: "latest"` (TS)     | Production where optimization is required           | Fail if no optimized version exists |
| `from_="<hash>"` (Python) / `from: "<hash>"` (TS)     | Pinning to specific tested versions                 | Use exact version by hash           |

<Tip>
  **Best practice**: Use `content` parameter alone for local development and production. ZeroEval will automatically use optimized versions when available. Only use `from_="explicit"` (Python) or `from: "explicit"` (TypeScript) when you specifically need to test or debug the hardcoded content.
</Tip>


# Introduction
Source: https://docs.zeroeval.com/judges/introduction

Continuously evaluate your production traffic with judges that learn over time

<video />

Calibrated LLM judges are AI evaluators that watch your traces, sessions, or spans and score outputs according to criteria you define. They get better over time the more you refine and correct their evaluations.

## When to use

Use a judge when you want consistent, scalable evaluation of:

* Hallucinations, safety/policy violations
* Response quality (helpfulness, tone, structure)
* Latency, cost, and error patterns tied to specific criteria


# Multimodal Evaluation
Source: https://docs.zeroeval.com/judges/multimodal-evaluation

Evaluate screenshots and images with LLM judges

LLM judges can evaluate spans that contain images alongside text. This is useful for browser agents, UI testing, visual QA, and any workflow where you need to assess visual output.

## How it works

1. **Attach images to spans** using SDK methods or structured output data
2. **Images are uploaded** during span ingestion (base64 data is stripped from the span)
3. **Judges fetch images** when evaluating the span and send them to a vision-capable LLM
4. **Evaluation results** appear in the dashboard like any other judge evaluation

The LLM sees both the span's text data (input/output) and any attached images, giving it full context for evaluation.

## Attaching images to spans

There are two ways to attach images to spans, depending on your workflow.

### Option 1: SDK helper methods

The SDK provides `add_screenshot()` and `add_image()` methods for attaching images with metadata.

**Screenshots with viewport context**

For browser agents or responsive testing, use `add_screenshot()` to capture different viewports:

```python theme={null}
import zeroeval as ze

with ze.span(name="homepage_test", tags={"has_screenshots": "true"}) as span:
    # Desktop viewport
    span.add_screenshot(
        base64_data=desktop_base64,
        viewport="desktop",
        width=1920,
        height=1080,
        label="Homepage - Desktop"
    )
    
    # Mobile viewport
    span.add_screenshot(
        base64_data=mobile_base64,
        viewport="mobile",
        width=375,
        height=812,
        label="Homepage - Mobile"
    )
    
    span.set_io(
        input_data="Load homepage and capture screenshots",
        output_data="Captured 2 viewport screenshots"
    )
```

**Generic images**

For charts, diagrams, or UI component states, use `add_image()`:

```python theme={null}
with ze.span(name="button_hover_test") as span:
    span.add_image(
        base64_data=before_hover_base64,
        label="Button - Default State"
    )
    
    span.add_image(
        base64_data=after_hover_base64,
        label="Button - Hover State"
    )
    
    span.set_io(
        input_data="Test button hover interaction",
        output_data="Button changes color on hover"
    )
```

### Option 2: Structured output\_data

If your workflow already produces screenshot data as structured output (common with browser automation agents), you can include images directly in the span's `output_data`. ZeroEval automatically detects and extracts images from JSON arrays containing `base64` fields.

```python theme={null}
import zeroeval as ze

with ze.span(
    name="screenshot_capture",
    kind="llm",
    tags={"has_screenshots": "true", "screenshot_count": "2"}
) as span:
    # Set input as conversation messages
    span.input_data = [
        {
            "role": "system",
            "content": "You are a screenshot capture service."
        },
        {
            "role": "user", 
            "content": "Navigate to the homepage and capture screenshots"
        }
    ]
    
    # Set output as array of screenshot objects with base64 data
    span.output_data = [
        {
            "viewport": "mobile",
            "width": 768,
            "height": 1024,
            "base64": mobile_screenshot_base64
        },
        {
            "viewport": "desktop",
            "width": 1920,
            "height": 1080,
            "base64": desktop_screenshot_base64
        }
    ]
```

When ZeroEval ingests this span, it:

1. Extracts each object with a `base64` field as an attachment
2. Uploads the images to storage
3. Strips the base64 data from `output_data` to keep the database lean
4. Preserves the metadata (viewport, width, height) for display

This approach works well when your browser agent or automation tool already produces structured screenshot output.

<Note>
  Both methods produce the same result: images stored and available for multimodal judge evaluation. Choose whichever fits your workflow better.
</Note>

## Creating a multimodal judge

Multimodal judges work like regular judges, but with criteria that reference attached images. The judge prompt should describe what to look for in the visual content.

### Example: UI consistency judge

```
Evaluate whether the UI renders correctly across viewports.

Check for:
- Layout breaks or overlapping elements
- Text that's too small to read on mobile
- Missing or broken images
- Inconsistent spacing between viewports

Score 1 if all viewports render correctly, 0 if there are visual issues.
```

### Example: Brand compliance judge

```
Check if the page follows brand guidelines.

Look for:
- Correct logo placement and sizing
- Brand colors used consistently
- Proper typography hierarchy
- Appropriate whitespace

Score 1 for full compliance, 0 for violations.
```

### Example: Accessibility judge

```
Evaluate visual accessibility of the interface.

Check:
- Sufficient color contrast
- Text size readability
- Clear visual hierarchy
- Button/link affordances

Score 1 if accessible, 0 if there are issues. Include specific problems in the reasoning.
```

## Filtering spans for multimodal evaluation

Use tags to identify which spans should be evaluated by your multimodal judge:

```python theme={null}
# Tag spans that have screenshots
with ze.span(name="browser_test", tags={"has_screenshots": "true"}) as span:
    span.add_screenshot(...)
```

Then configure your judge to only evaluate spans matching that tag. This prevents the judge from running on text-only spans where multimodal evaluation doesn't apply.

## Supported image formats

* JPEG
* PNG
* WebP
* GIF

Images are validated during ingestion. The maximum size is 10MB per image, with up to 5 images per span.

## Viewing images in the dashboard

Screenshots appear in two places:

1. **Span details view** - Images show in the Data tab with viewport labels and dimensions
2. **Judge evaluation modal** - When reviewing an evaluation, you'll see the images the judge analyzed

Images display with their labels, viewport type (for screenshots), and dimensions when available.

## Model support

Multimodal evaluation currently uses Gemini models, which support image inputs. When you create a judge, ZeroEval automatically handles the image formatting for the model.

<Note>
  Multimodal evaluation works best with specific, measurable criteria. Vague prompts like "does this look good?" will produce inconsistent results. Be explicit about what visual properties to check.
</Note>


# Pulling Evaluations
Source: https://docs.zeroeval.com/judges/pull-evaluations

Retrieve judge evaluations via SDK or REST API

Retrieve judge evaluations programmatically for reporting, analysis, or integration into your own workflows.

## Finding your IDs

Before making API calls, you'll need these identifiers:

| ID             | Where to find it                                                            |
| -------------- | --------------------------------------------------------------------------- |
| **Project ID** | Settings → Project, or in any URL after `/projects/`                        |
| **Judge ID**   | Click a judge in the dashboard; the ID is in the URL (`/judges/{judge_id}`) |
| **Span ID**    | In trace details, or returned by your instrumentation code                  |

## Python SDK

### Get available criteria for a judge

Use this before submitting criterion-level feedback to discover valid criterion keys.

```python theme={null}
import zeroeval as ze

ze.init(api_key="YOUR_API_KEY")

criteria = ze.get_judge_criteria(
    project_id="your-project-id",
    judge_id="your-judge-id",
)

print(criteria["evaluation_type"])
for criterion in criteria["criteria"]:
    print(criterion["key"], criterion.get("description"))
```

### Get evaluations by judge

Fetch all evaluations for a specific judge with pagination and optional filters.

```python theme={null}
import zeroeval as ze

ze.init(api_key="YOUR_API_KEY")

response = ze.get_judge_evaluations(
    project_id="your-project-id",
    judge_id="your-judge-id",
    limit=100,
    offset=0,
)

print(f"Total: {response['total']}")
for eval in response["evaluations"]:
    print(f"Span: {eval['span_id']}")
    print(f"Result: {'PASS' if eval['evaluation_result'] else 'FAIL'}")
    print(f"Score: {eval.get('score')}")  # For scored judges
    print(f"Reason: {eval['evaluation_reason']}")
```

**Optional filters:**

```python theme={null}
response = ze.get_judge_evaluations(
    project_id="your-project-id",
    judge_id="your-judge-id",
    limit=100,
    offset=0,
    start_date="2025-01-01T00:00:00Z",
    end_date="2025-01-31T23:59:59Z",
    evaluation_result=True,  # Only passing evaluations
    feedback_state="with_user_feedback",  # Only calibrated items
)
```

### Get evaluations by span

Fetch all judge evaluations for a specific span (useful when a span has been evaluated by multiple judges).

```python theme={null}
response = ze.get_span_evaluations(
    project_id="your-project-id",
    span_id="your-span-id",
)

for eval in response["evaluations"]:
    print(f"Judge: {eval['judge_name']}")
    print(f"Result: {'PASS' if eval['evaluation_result'] else 'FAIL'}")
    if eval.get('evaluation_type') == 'scored':
        print(f"Score: {eval['score']} / {eval['score_max']}")
```

## REST API

Use these endpoints directly with your API key in the `Authorization` header.

### Get available criteria for a judge

```bash theme={null}
curl -X GET "https://api.zeroeval.com/projects/{project_id}/judges/{judge_id}/criteria" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY"
```

### Get evaluations by judge

```bash theme={null}
curl -X GET "https://api.zeroeval.com/projects/{project_id}/judges/{judge_id}/evaluations?limit=100&offset=0" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY"
```

**Query parameters:**

| Parameter           | Type   | Description                                     |
| ------------------- | ------ | ----------------------------------------------- |
| `limit`             | int    | Results per page (1-500, default 100)           |
| `offset`            | int    | Pagination offset (default 0)                   |
| `start_date`        | string | Filter by date (ISO 8601)                       |
| `end_date`          | string | Filter by date (ISO 8601)                       |
| `evaluation_result` | bool   | `true` for passing, `false` for failing         |
| `feedback_state`    | string | `with_user_feedback` or `without_user_feedback` |

### Get evaluations by span

```bash theme={null}
curl -X GET "https://api.zeroeval.com/projects/{project_id}/spans/{span_id}/evaluations" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY"
```

## Response format

### Judge evaluations response

```json theme={null}
{
  "evaluations": [...],
  "total": 142,
  "limit": 100,
  "offset": 0
}
```

### Judge criteria response

```json theme={null}
{
  "judge_id": "judge-uuid",
  "evaluation_type": "scored",
  "score_min": 0,
  "score_max": 5,
  "pass_threshold": 3.5,
  "criteria": [
    {
      "key": "CTA_text",
      "label": "CTA_text",
      "description": "CTA clarity and visibility"
    }
  ]
}
```

### Span evaluations response

```json theme={null}
{
  "span_id": "abc-123",
  "evaluations": [...]
}
```

### Evaluation object

| Field               | Type          | Description                        |
| ------------------- | ------------- | ---------------------------------- |
| `id`                | string        | Unique evaluation ID               |
| `span_id`           | string        | The evaluated span                 |
| `evaluation_result` | bool          | Pass (`true`) or fail (`false`)    |
| `evaluation_reason` | string        | Judge's reasoning                  |
| `confidence_score`  | float         | Model confidence (0-1)             |
| `score`             | float \| null | Numeric score (scored judges only) |
| `score_min`         | float \| null | Minimum possible score             |
| `score_max`         | float \| null | Maximum possible score             |
| `pass_threshold`    | float \| null | Score required to pass             |
| `model_used`        | string        | LLM model that ran the evaluation  |
| `created_at`        | string        | ISO 8601 timestamp                 |

## Pagination example

For large result sets, paginate through all evaluations:

```python theme={null}
all_evaluations = []
offset = 0
limit = 100

while True:
    response = ze.get_judge_evaluations(
        project_id="your-project-id",
        judge_id="your-judge-id",
        limit=limit,
        offset=offset,
    )
    
    all_evaluations.extend(response["evaluations"])
    
    if len(response["evaluations"]) < limit:
        break
    
    offset += limit

print(f"Fetched {len(all_evaluations)} total evaluations")
```

## Related

* [Submitting Feedback](/judges/submit-feedback) - Programmatically submit feedback for judge evaluations


# Setup
Source: https://docs.zeroeval.com/judges/setup

Create and calibrate an AI judge in minutes

<img alt="Setup" />

## Creating a judge (\<5 mins)

1. Go to [Monitoring → Judges → New Judge](https://app.zeroeval.com/monitoring/judges).
2. Specify the criteria that you want to evaluate from your production traffic.
3. Tweak the prompt of the judge until it matches what you are looking for!

That's it! Historical and future traces will be scored automatically and shown in the dashboard.

## Calibrating your judge

For each evaluated item you have the option to mark it as correct or incorrect. This is automatically stored and used to improve the judge over time.

<img alt="Calibrating your judge" />


# Submitting Feedback
Source: https://docs.zeroeval.com/judges/submit-feedback

Programmatically submit feedback for judge evaluations via SDK

## Overview

When calibrating judges, you can submit feedback programmatically using the SDK.
This is useful for:

* Bulk feedback submission from automated pipelines
* Integration with custom review workflows
* Syncing feedback from external labeling tools

Your existing `send_feedback` integrations remain valid. Criterion-level feedback is an optional extension for scored judges.

## Important: Using the Correct IDs

Judge evaluations involve two related spans:

| ID                     | Description                                        |
| ---------------------- | -------------------------------------------------- |
| **Source Span ID**     | The original LLM call that was evaluated           |
| **Judge Call Span ID** | The span created when the judge ran its evaluation |

When submitting feedback, always include the `judge_id` parameter to ensure
feedback is correctly associated with the judge evaluation.

## Python SDK

### From the UI (Recommended)

The easiest way to get the correct IDs is from the Judge Evaluation modal:

1. Open a judge evaluation in the dashboard
2. Expand the "SDK Integration" section
3. Click "Copy" to copy the pre-filled Python code
4. Paste and customize the generated code

### Manual Submission

```python theme={null}
from zeroeval import ZeroEval

client = ZeroEval()

# Submit feedback for a judge evaluation
client.send_feedback(
    prompt_slug="your-judge-task-slug",  # The task/prompt associated with the judge
    completion_id="span-id-here",         # The span ID from the evaluation
    thumbs_up=True,                        # True = correct, False = incorrect
    reason="Optional explanation",
    judge_id="automation-id-here",         # Required for judge feedback
)
```

### Parameters

| Parameter           | Type  | Required | Description                                                |
| ------------------- | ----- | -------- | ---------------------------------------------------------- |
| `prompt_slug`       | str   | Yes      | The task slug associated with the judge                    |
| `completion_id`     | str   | Yes      | The span ID being evaluated                                |
| `thumbs_up`         | bool  | Yes      | `True` if judge was correct, `False` if wrong              |
| `reason`            | str   | No       | Explanation of the feedback                                |
| `judge_id`          | str   | Yes\*    | The judge automation ID (\*required for judge feedback)    |
| `expected_score`    | float | No       | For scored judges: the expected score value                |
| `score_direction`   | str   | No       | For scored judges: `"too_high"` or `"too_low"`             |
| `criteria_feedback` | dict  | No       | For scored judges: per-criterion expected score/reason map |

<Note>
  `expected_score` and `score_direction` are only valid for scored judges
  (judges with `evaluation_type: "scored"`). The API will return a 400 error
  if these fields are provided for binary judges.
</Note>

### Step 1: Discover Available Criteria (Scored Judges)

Before sending `criteria_feedback`, fetch valid criterion keys for the judge.

```python theme={null}
from zeroeval import ZeroEval

client = ZeroEval()

criteria = client.get_judge_criteria(
    project_id="your-project-id",
    judge_id="automation-id-here",
)

print(criteria["evaluation_type"])  # "scored" or "binary"
print(criteria["criteria"])         # [{"key": "...", "label": "...", "description": "..."}]
```

```bash theme={null}
curl -X GET "https://api.zeroeval.com/projects/{project_id}/judges/{judge_id}/criteria" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY"
```

### Step 2: Score-Based Feedback (General Score)

For judges using scored rubrics (not binary pass/fail), you can provide additional
feedback about the overall expected score:

```python theme={null}
from zeroeval import ZeroEval

client = ZeroEval()

# Submit feedback for a scored judge evaluation
client.send_feedback(
    prompt_slug="quality-scorer",
    completion_id="span-id-here",
    thumbs_up=False,                       # The judge was incorrect
    judge_id="automation-id-here",
    expected_score=3.5,                    # What the score should have been
    score_direction="too_high",            # The judge scored too high
    reason="Score should have been lower due to grammar issues",
)
```

### Step 3: Score-Based Feedback (Per-Criterion)

For scored judges, you can send corrections for specific criteria:

```python theme={null}
from zeroeval import ZeroEval

client = ZeroEval()

client.send_feedback(
    prompt_slug="quality-scorer",
    completion_id="span-id-here",
    thumbs_up=False,
    judge_id="automation-id-here",
    reason="Criterion-level score adjustments",
    criteria_feedback={
        "CTA_text": {
            "expected_score": 4.0,
            "reason": "CTA is clear and prominent"
        },
        "CX-004": {
            "expected_score": 1.0,
            "reason": "Required phone number is missing"
        }
    }
)
```

## REST API

### Binary Judge Feedback

```bash theme={null}
curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "thumbs_up": true,
    "reason": "Judge correctly identified the issue",
    "judge_id": "automation-uuid-here"
  }'
```

### Scored Judge Feedback

For scored judges, include `expected_score` and `score_direction`:

```bash theme={null}
curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "thumbs_up": false,
    "reason": "Score should have been lower",
    "judge_id": "automation-uuid-here",
    "expected_score": 3.5,
    "score_direction": "too_high"
  }'
```

### Scored Judge Feedback (Criterion-Level)

```bash theme={null}
curl -X POST "https://api.zeroeval.com/v1/prompts/{task_slug}/completions/{span_id}/feedback" \
  -H "Authorization: Bearer $ZEROEVAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "thumbs_up": false,
    "judge_id": "automation-uuid-here",
    "reason": "Criterion-level corrections",
    "criteria_feedback": {
      "CTA_text": {
        "expected_score": 4.0,
        "reason": "CTA is clear and visible"
      },
      "CX-004": {
        "expected_score": 1.0,
        "reason": "Phone number is missing"
      }
    }
  }'
```

## Criteria Payload Shape

`criteria_feedback` uses this shape:

```json theme={null}
{
  "criteria_feedback": {
    "criterion_key": {
      "expected_score": 4.0,
      "reason": "Optional explanation"
    }
  }
}
```

Validation rules:

* `judge_id` is required when sending `criteria_feedback`
* `criteria_feedback` is allowed only for scored judges (`evaluation_type: "scored"`)

## Finding Your IDs

| ID            | Where to Find It                                                   |
| ------------- | ------------------------------------------------------------------ |
| **Task Slug** | In the judge settings, or the URL when editing the judge's prompt  |
| **Span ID**   | In the evaluation modal, or via `get_judge_evaluations()` response |
| **Judge ID**  | In the URL when viewing a judge (`/judges/{judge_id}`)             |

## Bulk Feedback Submission

For submitting feedback on multiple evaluations, you can iterate through evaluations:

```python theme={null}
from zeroeval import ZeroEval

client = ZeroEval()

# Get evaluations to review
evaluations = client.get_judge_evaluations(
    project_id="your-project-id",
    judge_id="your-judge-id",
    limit=100,
)

# Submit feedback for each
for eval in evaluations["evaluations"]:
    # Your logic to determine if the evaluation was correct
    is_correct = your_review_logic(eval)
    
    client.send_feedback(
        prompt_slug="your-judge-task-slug",
        completion_id=eval["span_id"],
        thumbs_up=is_correct,
        reason="Automated review",
        judge_id="your-judge-id",
    )
```

## Related

* [Pulling Evaluations](/judges/pull-evaluations) - Retrieve judge evaluations programmatically
* [Python SDK Reference](/tracing/sdks/python/reference) - Full SDK API reference
* [Judge Setup](/judges/setup) - Configure and deploy judges


# Manual Instrumentation
Source: https://docs.zeroeval.com/tracing/manual-instrumentation

Create spans manually for LLM calls and custom operations

This guide covers how to manually instrument your code to create spans, particularly for LLM operations. You'll learn how to use both the SDK and direct API calls to send trace data to ZeroEval.

## SDK Manual Instrumentation

### Basic LLM Span with SDK

The simplest way to create an LLM span is using the SDK's span decorator or context manager:

<CodeGroup>
  ```python Python (Decorator) theme={null}
  import zeroeval as ze
  import openai

  client = openai.OpenAI()

  @ze.span(name="chat_completion", kind="llm")
  def generate_response(messages: list) -> str:
      """Create an LLM span with automatic input/output capture"""
      response = client.chat.completions.create(
          model="gpt-4",
          messages=messages,
          temperature=0.7
      )

      # The SDK automatically captures function arguments as input
      # and return values as output
      return response.choices[0].message.content
  ```

  ```python Python (Context Manager) theme={null}
  import zeroeval as ze
  import openai

  client = openai.OpenAI()

  def generate_response(messages: list) -> str:
      """Create an LLM span with manual control"""
      with ze.span(name="chat_completion", kind="llm") as span:
          # Set input data
          span.set_io(input_data=str(messages))

          # Make the API call
          response = client.chat.completions.create(
              model="gpt-4",
              messages=messages,
              temperature=0.7
          )

          # Set output data
          span.set_io(output_data=response.choices[0].message.content)

          # Add LLM-specific attributes
          span.set_attributes({
              "llm.model": "gpt-4",
              "llm.provider": "openai",
              "llm.input_tokens": response.usage.prompt_tokens,
              "llm.output_tokens": response.usage.completion_tokens,
              "llm.total_tokens": response.usage.total_tokens,
              "llm.temperature": 0.7
          })

          return response.choices[0].message.content
  ```
</CodeGroup>

### Advanced LLM Span with Metrics

For production use, capture comprehensive metrics for better observability:

```python theme={null}
import zeroeval as ze
import openai
import time
import json

@ze.span(name="chat_completion_advanced", kind="llm")
def generate_with_metrics(messages: list, **kwargs):
    """Create a comprehensive LLM span with all metrics"""

    # Get the current span to add attributes
    span = ze.get_current_span()

    # Track timing
    start_time = time.time()
    first_token_time = None

    # Prepare the request
    model = kwargs.get("model", "gpt-4")
    temperature = kwargs.get("temperature", 0.7)
    max_tokens = kwargs.get("max_tokens", None)

    # Set pre-request attributes
    span.set_attributes({
        "llm.model": model,
        "llm.provider": "openai",
        "llm.temperature": temperature,
        "llm.max_tokens": max_tokens,
        "llm.streaming": kwargs.get("stream", False)
    })

    # Store input messages in the expected format
    span.set_io(input_data=json.dumps([
        {"role": msg["role"], "content": msg["content"]}
        for msg in messages
    ]))

    try:
        client = openai.OpenAI()

        # Handle streaming responses
        if kwargs.get("stream", False):
            stream = client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=temperature,
                max_tokens=max_tokens,
                stream=True
            )

            full_response = ""
            tokens = 0

            for chunk in stream:
                if chunk.choices[0].delta.content:
                    if first_token_time is None:
                        first_token_time = time.time()
                        ttft_ms = (first_token_time - start_time) * 1000
                        span.set_attributes({"llm.ttft_ms": ttft_ms})

                    full_response += chunk.choices[0].delta.content
                    tokens += 1

            # Calculate throughput
            total_time = time.time() - start_time
            span.set_attributes({
                "llm.output_tokens": tokens,
                "llm.throughput_tokens_per_sec": tokens / total_time if total_time > 0 else 0,
                "llm.duration_ms": total_time * 1000
            })

            span.set_io(output_data=full_response)
            return full_response

        else:
            # Non-streaming response
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=temperature,
                max_tokens=max_tokens
            )

            # Capture all response metadata
            span.set_attributes({
                "llm.input_tokens": response.usage.prompt_tokens,
                "llm.output_tokens": response.usage.completion_tokens,
                "llm.total_tokens": response.usage.total_tokens,
                "llm.finish_reason": response.choices[0].finish_reason,
                "llm.system_fingerprint": response.system_fingerprint,
                "llm.response_id": response.id,
                "llm.duration_ms": (time.time() - start_time) * 1000
            })

            content = response.choices[0].message.content
            span.set_io(output_data=content)

            return content

    except Exception as e:
        # Capture error details
        span.set_status("error")
        span.set_attributes({
            "error.type": type(e).__name__,
            "error.message": str(e)
        })
        raise
```

## Provider-Specific Manual Instrumentation

For users making direct API calls to OpenAI or Gemini without using the SDK's automatic instrumentation, here are comprehensive guides to properly instrument your calls with cost calculation and conversation formatting.

### OpenAI API Manual Instrumentation

When calling the OpenAI API directly (using `requests`, `httpx`, or similar), you'll want to capture all the metrics that the automatic integration would provide:

<CodeGroup>
  ```python Python (OpenAI Direct API) theme={null}
  import requests
  import json
  import time
  import uuid
  from datetime import datetime, timezone

  class OpenAITracer:
  def **init**(self, api_key: str, zeroeval_api_key: str):
  self.openai_api_key = api_key
  self.zeroeval_api_key = zeroeval_api_key
  self.zeroeval_url = "https://api.zeroeval.com/api/v1/spans"

      def chat_completion_with_tracing(self, messages: list, model: str = "gpt-4o", **kwargs):
          """Make OpenAI API call with full ZeroEval instrumentation"""

          # Generate span identifiers
          trace_id = str(uuid.uuid4())
          span_id = str(uuid.uuid4())

          # Track timing
          start_time = time.time()

          # Prepare OpenAI request
          openai_payload = {
              "model": model,
              "messages": messages,
              **kwargs  # temperature, max_tokens, etc.
          }

          # Add stream_options for token usage in streaming calls
          is_streaming = kwargs.get("stream", False)
          if is_streaming and "stream_options" not in kwargs:
              openai_payload["stream_options"] = {"include_usage": True}

          try:
              # Make the OpenAI API call
              response = requests.post(
                  "https://api.openai.com/v1/chat/completions",
                  headers={
                      "Authorization": f"Bearer {self.openai_api_key}",
                      "Content-Type": "application/json"
                  },
                  json=openai_payload,
                  stream=is_streaming
              )
              response.raise_for_status()

              end_time = time.time()
              duration_ms = (end_time - start_time) * 1000

              if is_streaming:
                  # Handle streaming response
                  full_response = ""
                  input_tokens = 0
                  output_tokens = 0
                  finish_reason = None
                  response_id = None
                  system_fingerprint = None
                  first_token_time = None

                  for line in response.iter_lines():
                      if line:
                          line = line.decode('utf-8')
                          if line.startswith('data: '):
                              data_str = line[6:]
                              if data_str == '[DONE]':
                                  break

                              try:
                                  data = json.loads(data_str)

                                  # Capture first token timing
                                  if data.get('choices') and data['choices'][0].get('delta', {}).get('content'):
                                      if first_token_time is None:
                                          first_token_time = time.time()
                                      full_response += data['choices'][0]['delta']['content']

                                  # Capture final metadata
                                  if 'usage' in data:
                                      input_tokens = data['usage']['prompt_tokens']
                                      output_tokens = data['usage']['completion_tokens']

                                  if data.get('choices') and data['choices'][0].get('finish_reason'):
                                      finish_reason = data['choices'][0]['finish_reason']

                                  if 'id' in data:
                                      response_id = data['id']

                                  if 'system_fingerprint' in data:
                                      system_fingerprint = data['system_fingerprint']

                              except json.JSONDecodeError:
                                  continue

                  # Send ZeroEval span for streaming
                  self._send_span(
                      span_id=span_id,
                      trace_id=trace_id,
                      model=model,
                      messages=messages,
                      response_text=full_response,
                      input_tokens=input_tokens,
                      output_tokens=output_tokens,
                      duration_ms=duration_ms,
                      start_time=start_time,
                      finish_reason=finish_reason,
                      response_id=response_id,
                      system_fingerprint=system_fingerprint,
                      streaming=True,
                      first_token_time=first_token_time,
                      **kwargs
                  )

                  return full_response

              else:
                  # Handle non-streaming response
                  response_data = response.json()

                  # Extract response details
                  content = response_data['choices'][0]['message']['content']
                  usage = response_data.get('usage', {})

                  # Send ZeroEval span
                  self._send_span(
                      span_id=span_id,
                      trace_id=trace_id,
                      model=model,
                      messages=messages,
                      response_text=content,
                      input_tokens=usage.get('prompt_tokens', 0),
                      output_tokens=usage.get('completion_tokens', 0),
                      duration_ms=duration_ms,
                      start_time=start_time,
                      finish_reason=response_data['choices'][0].get('finish_reason'),
                      response_id=response_data.get('id'),
                      system_fingerprint=response_data.get('system_fingerprint'),
                      streaming=False,
                      **kwargs
                  )

                  return content

          except Exception as e:
              # Send error span
              end_time = time.time()
              duration_ms = (end_time - start_time) * 1000

              self._send_error_span(
                  span_id=span_id,
                  trace_id=trace_id,
                  model=model,
                  messages=messages,
                  duration_ms=duration_ms,
                  start_time=start_time,
                  error=e,
                  **kwargs
              )
              raise

      def _send_span(self, span_id: str, trace_id: str, model: str, messages: list,
                     response_text: str, input_tokens: int, output_tokens: int,
                     duration_ms: float, start_time: float, finish_reason: str = None,
                     response_id: str = None, system_fingerprint: str = None,
                     streaming: bool = False, first_token_time: float = None, **kwargs):
          """Send successful span to ZeroEval"""

          # Calculate throughput metrics
          throughput = output_tokens / (duration_ms / 1000) if duration_ms > 0 else 0
          ttft_ms = None
          if streaming and first_token_time:
              ttft_ms = (first_token_time - start_time) * 1000

          # Prepare span attributes following ZeroEval's expected format
          attributes = {
              # Core LLM attributes (these are used for cost calculation)
              "provider": "openai",  # Key for cost calculation
              "model": model,        # Key for cost calculation
              "inputTokens": input_tokens,   # Key for cost calculation
              "outputTokens": output_tokens, # Key for cost calculation

              # OpenAI-specific attributes
              "temperature": kwargs.get("temperature"),
              "max_tokens": kwargs.get("max_tokens"),
              "top_p": kwargs.get("top_p"),
              "frequency_penalty": kwargs.get("frequency_penalty"),
              "presence_penalty": kwargs.get("presence_penalty"),
              "streaming": streaming,
              "finish_reason": finish_reason,
              "response_id": response_id,
              "system_fingerprint": system_fingerprint,

              # Performance metrics
              "throughput": throughput,
              "duration_ms": duration_ms,
          }

          if ttft_ms:
              attributes["ttft_ms"] = ttft_ms

          # Clean up None values
          attributes = {k: v for k, v in attributes.items() if v is not None}

          # Format messages for good conversation display
          formatted_messages = self._format_messages_for_display(messages)

          span_data = {
              "id": span_id,
              "trace_id": trace_id,
              "name": f"{model}_completion",
              "kind": "llm",  # Critical: must be "llm" for cost calculation
              "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
              "ended_at": datetime.fromtimestamp(start_time + duration_ms/1000, timezone.utc).isoformat(),
              "status": "ok",
              "attributes": attributes,
              "input_data": json.dumps(formatted_messages),
              "output_data": response_text,
              "tags": {
                  "provider": "openai",
                  "model": model,
                  "streaming": str(streaming).lower()
              }
          }

          # Send to ZeroEval
          response = requests.post(
              self.zeroeval_url,
              headers={
                  "Authorization": f"Bearer {self.zeroeval_api_key}",
                  "Content-Type": "application/json"
              },
              json=[span_data]
          )

          if response.status_code != 200:
              print(f"Warning: Failed to send span to ZeroEval: {response.text}")

      def _send_error_span(self, span_id: str, trace_id: str, model: str,
                          messages: list, duration_ms: float, start_time: float,
                          error: Exception, **kwargs):
          """Send error span to ZeroEval"""

          attributes = {
              "provider": "openai",
              "model": model,
              "temperature": kwargs.get("temperature"),
              "max_tokens": kwargs.get("max_tokens"),
              "streaming": kwargs.get("stream", False),
              "error_type": type(error).__name__,
              "error_message": str(error),
              "duration_ms": duration_ms,
          }

          # Clean up None values
          attributes = {k: v for k, v in attributes.items() if v is not None}

          formatted_messages = self._format_messages_for_display(messages)

          span_data = {
              "id": span_id,
              "trace_id": trace_id,
              "name": f"{model}_completion",
              "kind": "llm",
              "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
              "ended_at": datetime.fromtimestamp(start_time + duration_ms/1000, timezone.utc).isoformat(),
              "status": "error",
              "attributes": attributes,
              "input_data": json.dumps(formatted_messages),
              "output_data": "",
              "error_message": str(error),
              "tags": {
                  "provider": "openai",
                  "model": model,
                  "error": "true"
              }
          }

          requests.post(
              self.zeroeval_url,
              headers={
                  "Authorization": f"Bearer {self.zeroeval_api_key}",
                  "Content-Type": "application/json"
              },
              json=[span_data]
          )

      def _format_messages_for_display(self, messages: list) -> list:
          """Format messages for optimal display in ZeroEval UI"""
          formatted = []
          for msg in messages:
              # Handle both dict and object formats
              if hasattr(msg, 'role'):
                  role = msg.role
                  content = msg.content
              else:
                  role = msg.get('role', 'user')
                  content = msg.get('content', '')

              # Handle multimodal content
              if isinstance(content, list):
                  # Extract text parts for display
                  text_parts = []
                  for part in content:
                      if isinstance(part, dict) and part.get('type') == 'text':
                          text_parts.append(part['text'])
                      elif isinstance(part, str):
                          text_parts.append(part)
                  content = '\n'.join(text_parts) if text_parts else '[Multimodal content]'

              formatted.append({
                  "role": role,
                  "content": content
              })

          return formatted

  # Usage example

  tracer = OpenAITracer(
  api_key="your-openai-api-key",
  zeroeval_api_key="your-zeroeval-api-key"
  )

  # Non-streaming call

  response = tracer.chat_completion_with_tracing([
  {"role": "user", "content": "What is the capital of France?"}
  ], model="gpt-4o", temperature=0.7)

  # Streaming call

  response = tracer.chat_completion_with_tracing([
  {"role": "user", "content": "Write a short story"}
  ], model="gpt-4o", stream=True, temperature=0.9)
  ```
</CodeGroup>

### Gemini API Manual Instrumentation

Gemini has a different API structure with `contents` instead of `messages` and different parameter names. Here's how to instrument Gemini API calls:

<CodeGroup>
  ```python Python (Gemini Direct API) theme={null}
  import requests
  import json
  import time
  import uuid
  from datetime import datetime, timezone

  class GeminiTracer:
  def **init**(self, api_key: str, zeroeval_api_key: str):
  self.gemini_api_key = api_key
  self.zeroeval_api_key = zeroeval_api_key
  self.zeroeval_url = "https://api.zeroeval.com/api/v1/spans"

      def generate_content_with_tracing(self, messages: list, model: str = "gemini-1.5-flash", **kwargs):
          """Make Gemini API call with full ZeroEval instrumentation"""

          trace_id = str(uuid.uuid4())
          span_id = str(uuid.uuid4())
          start_time = time.time()

          # Convert OpenAI-style messages to Gemini contents format
          contents, system_instruction = self._convert_messages_to_contents(messages)

          # Prepare Gemini request payload
          gemini_payload = {
              "contents": contents
          }

          # Add generation config
          generation_config = {}
          if kwargs.get("temperature") is not None:
              generation_config["temperature"] = kwargs["temperature"]
          if kwargs.get("max_tokens"):
              generation_config["maxOutputTokens"] = kwargs["max_tokens"]
          if kwargs.get("top_p") is not None:
              generation_config["topP"] = kwargs["top_p"]
          if kwargs.get("top_k") is not None:
              generation_config["topK"] = kwargs["top_k"]
          if kwargs.get("stop"):
              stop = kwargs["stop"]
              generation_config["stopSequences"] = stop if isinstance(stop, list) else [stop]

          if generation_config:
              gemini_payload["generationConfig"] = generation_config

          # Add system instruction if present
          if system_instruction:
              gemini_payload["systemInstruction"] = {"parts": [{"text": system_instruction}]}

          # Add tools if provided
          if kwargs.get("tools"):
              gemini_payload["tools"] = kwargs["tools"]
              if kwargs.get("tool_choice"):
                  gemini_payload["toolConfig"] = {
                      "functionCallingConfig": {"mode": kwargs["tool_choice"]}
                  }

          # Choose endpoint based on streaming
          is_streaming = kwargs.get("stream", False)
          endpoint = "streamGenerateContent" if is_streaming else "generateContent"
          url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:{endpoint}"

          try:
              response = requests.post(
                  url,
                  headers={
                      "x-goog-api-key": self.gemini_api_key,
                      "Content-Type": "application/json"
                  },
                  json=gemini_payload,
                  stream=is_streaming
              )
              response.raise_for_status()

              end_time = time.time()
              duration_ms = (end_time - start_time) * 1000

              if is_streaming:
                  # Handle streaming response
                  full_response = ""
                  input_tokens = 0
                  output_tokens = 0
                  finish_reason = None
                  model_version = None
                  first_token_time = None

                  for line in response.iter_lines():
                      if line:
                          try:
                              # Gemini streaming sends JSON objects separated by newlines
                              data = json.loads(line.decode('utf-8'))

                              if 'candidates' in data and data['candidates']:
                                  candidate = data['candidates'][0]

                                  # Extract content
                                  if 'content' in candidate and 'parts' in candidate['content']:
                                      for part in candidate['content']['parts']:
                                          if 'text' in part:
                                              if first_token_time is None:
                                                  first_token_time = time.time()
                                              full_response += part['text']

                                  # Extract finish reason
                                  if 'finishReason' in candidate:
                                      finish_reason = candidate['finishReason']

                              # Extract usage metadata (usually in final chunk)
                              if 'usageMetadata' in data:
                                  usage = data['usageMetadata']
                                  input_tokens = usage.get('promptTokenCount', 0)
                                  output_tokens = usage.get('candidatesTokenCount', 0)

                              # Extract model version
                              if 'modelVersion' in data:
                                  model_version = data['modelVersion']

                          except json.JSONDecodeError:
                              continue

                  self._send_span(
                      span_id=span_id, trace_id=trace_id, model=model,
                      original_messages=messages, response_text=full_response,
                      input_tokens=input_tokens, output_tokens=output_tokens,
                      duration_ms=duration_ms, start_time=start_time,
                      finish_reason=finish_reason, model_version=model_version,
                      streaming=True, first_token_time=first_token_time,
                      **kwargs
                  )

                  return full_response

              else:
                  # Handle non-streaming response
                  response_data = response.json()

                  # Extract response content
                  content = ""
                  if 'candidates' in response_data and response_data['candidates']:
                      candidate = response_data['candidates'][0]
                      if 'content' in candidate and 'parts' in candidate['content']:
                          content_parts = []
                          for part in candidate['content']['parts']:
                              if 'text' in part:
                                  content_parts.append(part['text'])
                          content = ''.join(content_parts)

                  # Extract usage
                  usage = response_data.get('usageMetadata', {})
                  input_tokens = usage.get('promptTokenCount', 0)
                  output_tokens = usage.get('candidatesTokenCount', 0)

                  # Extract other metadata
                  finish_reason = None
                  if 'candidates' in response_data and response_data['candidates']:
                      finish_reason = response_data['candidates'][0].get('finishReason')

                  model_version = response_data.get('modelVersion')

                  self._send_span(
                      span_id=span_id, trace_id=trace_id, model=model,
                      original_messages=messages, response_text=content,
                      input_tokens=input_tokens, output_tokens=output_tokens,
                      duration_ms=duration_ms, start_time=start_time,
                      finish_reason=finish_reason, model_version=model_version,
                      streaming=False, **kwargs
                  )

                  return content

          except Exception as e:
              end_time = time.time()
              duration_ms = (end_time - start_time) * 1000

              self._send_error_span(
                  span_id=span_id, trace_id=trace_id, model=model,
                  original_messages=messages, duration_ms=duration_ms,
                  start_time=start_time, error=e, **kwargs
              )
              raise

      def _convert_messages_to_contents(self, messages: list) -> tuple:
          """Convert OpenAI-style messages to Gemini contents format"""
          contents = []
          system_instruction = None

          for msg in messages:
              role = msg.get('role', 'user') if isinstance(msg, dict) else msg.role
              content = msg.get('content', '') if isinstance(msg, dict) else msg.content

              if role == 'system':
                  # Collect system instructions
                  if system_instruction:
                      system_instruction += f"\n{content}"
                  else:
                      system_instruction = content
                  continue

              # Convert content to parts
              if isinstance(content, list):
                  # Handle multimodal content
                  parts = []
                  for item in content:
                      if isinstance(item, dict) and item.get('type') == 'text':
                          parts.append({"text": item['text']})
                      # Add support for images, etc. if needed
              else:
                  parts = [{"text": str(content)}]

              # Convert role
              gemini_role = "user" if role == "user" else "model"
              contents.append({"role": gemini_role, "parts": parts})

          return contents, system_instruction

      def _send_span(self, span_id: str, trace_id: str, model: str,
                     original_messages: list, response_text: str,
                     input_tokens: int, output_tokens: int, duration_ms: float,
                     start_time: float, finish_reason: str = None,
                     model_version: str = None, streaming: bool = False,
                     first_token_time: float = None, **kwargs):
          """Send successful span to ZeroEval"""

          # Calculate performance metrics
          throughput = output_tokens / (duration_ms / 1000) if duration_ms > 0 else 0
          ttft_ms = None
          if streaming and first_token_time:
              ttft_ms = (first_token_time - start_time) * 1000

          # Prepare attributes following ZeroEval's expected format
          attributes = {
              # Core attributes for cost calculation (use provider naming)
              "provider": "gemini",     # Key for cost calculation
              "model": model,           # Key for cost calculation
              "inputTokens": input_tokens,   # Key for cost calculation
              "outputTokens": output_tokens, # Key for cost calculation

              # Gemini-specific attributes
              "temperature": kwargs.get("temperature"),
              "max_tokens": kwargs.get("max_tokens"),  # maxOutputTokens
              "top_p": kwargs.get("top_p"),
              "top_k": kwargs.get("top_k"),
              "stop_sequences": kwargs.get("stop"),
              "streaming": streaming,
              "finish_reason": finish_reason,
              "model_version": model_version,

              # Performance metrics
              "throughput": throughput,
              "duration_ms": duration_ms,
          }

          if ttft_ms:
              attributes["ttft_ms"] = ttft_ms

          # Include tool information if present
          if kwargs.get("tools"):
              attributes["tools_count"] = len(kwargs["tools"])
              attributes["tool_choice"] = kwargs.get("tool_choice")

          # Clean up None values
          attributes = {k: v for k, v in attributes.items() if v is not None}

          # Format original messages for display (convert back to OpenAI format for consistency)
          formatted_messages = self._format_messages_for_display(original_messages)

          span_data = {
              "id": span_id,
              "trace_id": trace_id,
              "name": f"{model}_completion",
              "kind": "llm",  # Critical: must be "llm" for cost calculation
              "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
              "ended_at": datetime.fromtimestamp(start_time + duration_ms/1000, timezone.utc).isoformat(),
              "status": "ok",
              "attributes": attributes,
              "input_data": json.dumps(formatted_messages),
              "output_data": response_text,
              "tags": {
                  "provider": "gemini",
                  "model": model,
                  "streaming": str(streaming).lower()
              }
          }

          # Send to ZeroEval
          response = requests.post(
              self.zeroeval_url,
              headers={
                  "Authorization": f"Bearer {self.zeroeval_api_key}",
                  "Content-Type": "application/json"
              },
              json=[span_data]
          )

          if response.status_code != 200:
              print(f"Warning: Failed to send span to ZeroEval: {response.text}")

      def _send_error_span(self, span_id: str, trace_id: str, model: str,
                          original_messages: list, duration_ms: float,
                          start_time: float, error: Exception, **kwargs):
          """Send error span to ZeroEval"""

          attributes = {
              "provider": "gemini",
              "model": model,
              "temperature": kwargs.get("temperature"),
              "max_tokens": kwargs.get("max_tokens"),
              "streaming": kwargs.get("stream", False),
              "error_type": type(error).__name__,
              "error_message": str(error),
              "duration_ms": duration_ms,
          }

          # Clean up None values
          attributes = {k: v for k, v in attributes.items() if v is not None}

          formatted_messages = self._format_messages_for_display(original_messages)

          span_data = {
              "id": span_id,
              "trace_id": trace_id,
              "name": f"{model}_completion",
              "kind": "llm",
              "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
              "ended_at": datetime.fromtimestamp(start_time + duration_ms/1000, timezone.utc).isoformat(),
              "status": "error",
              "attributes": attributes,
              "input_data": json.dumps(formatted_messages),
              "output_data": "",
              "error_message": str(error),
              "tags": {
                  "provider": "gemini",
                  "model": model,
                  "error": "true"
              }
          }

          requests.post(
              self.zeroeval_url,
              headers={
                  "Authorization": f"Bearer {self.zeroeval_api_key}",
                  "Content-Type": "application/json"
              },
              json=[span_data]
          )

      def _format_messages_for_display(self, messages: list) -> list:
          """Format messages for optimal display in ZeroEval UI"""
          formatted = []
          for msg in messages:
              if hasattr(msg, 'role'):
                  role = msg.role
                  content = msg.content
              else:
                  role = msg.get('role', 'user')
                  content = msg.get('content', '')

              # Handle multimodal content
              if isinstance(content, list):
                  text_parts = []
                  for part in content:
                      if isinstance(part, dict) and part.get('type') == 'text':
                          text_parts.append(part['text'])
                      elif isinstance(part, str):
                          text_parts.append(part)
                  content = '\n'.join(text_parts) if text_parts else '[Multimodal content]'

              formatted.append({
                  "role": role,
                  "content": content
              })

          return formatted

  # Usage example

  tracer = GeminiTracer(
  api_key="your-gemini-api-key",
  zeroeval_api_key="your-zeroeval-api-key"
  )

  # Non-streaming call

  response = tracer.generate_content_with_tracing([
  {"role": "user", "content": "What is the capital of France?"}
  ], model="gemini-1.5-flash", temperature=0.7)

  # Streaming call

  response = tracer.generate_content_with_tracing([
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Write a short story"}
  ], model="gemini-1.5-flash", stream=True, temperature=0.9)

  ```
</CodeGroup>

### Key Attributes for Cost Calculation

For accurate cost calculation, ZeroEval requires these specific attributes in your span:

| Attribute      | Required | Description                            | Example Values                        |
| -------------- | -------- | -------------------------------------- | ------------------------------------- |
| `provider`     | ✅        | Provider identifier for pricing lookup | `"openai"`, `"gemini"`, `"anthropic"` |
| `model`        | ✅        | Model identifier for pricing lookup    | `"gpt-4o"`, `"gemini-1.5-flash"`      |
| `inputTokens`  | ✅        | Number of input tokens consumed        | `150`                                 |
| `outputTokens` | ✅        | Number of output tokens generated      | `75`                                  |
| `kind`         | ✅        | Must be set to `"llm"`                 | `"llm"`                               |

**Cost Calculation Process:**

1. ZeroEval looks up pricing in the `provider_models` table using `provider` and `model`
2. Calculates: `(inputTokens × inputPrice + outputTokens × outputPrice) / 1,000,000`
3. Stores the result in the span's `cost` field
4. Cost is displayed in cents, automatically converted to dollars in the UI

**Current Supported Models for Cost Calculation:**

* **OpenAI**: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo`
* **Gemini**: `gemini-1.5-flash`, `gemini-1.5-pro`, `gemini-1.0-pro`
* **Anthropic**: `claude-3-5-sonnet`, `claude-3-haiku`, `claude-3-opus`

If your model isn't listed, the cost will be `0` and you'll see a warning in the logs. Contact support to add pricing for new models.

### Conversation Formatting Best Practices

To ensure your conversations display properly in the ZeroEval UI, follow these formatting guidelines:

<CodeGroup>
  ```python Python Message Formatting theme={null}
  def format_messages_for_zeroeval(messages: list) -> list:
      """Format messages for optimal display in ZeroEval UI"""
      formatted = []

      for msg in messages:
          # Handle both dict and object formats
          if hasattr(msg, 'role'):
              role = msg.role
              content = msg.content
          else:
              role = msg.get('role', 'user')
              content = msg.get('content', '')

          # Standardize role names
          if role in ['assistant', 'bot', 'ai']:
              role = 'assistant'
          elif role in ['human', 'user']:
              role = 'user'
          elif role == 'system':
              role = 'system'

          # Handle multimodal content - extract text for display
          if isinstance(content, list):
              text_parts = []
              for part in content:
                  if isinstance(part, dict):
                      if part.get('type') == 'text':
                          text_parts.append(part['text'])
                      elif part.get('type') == 'image_url':
                          text_parts.append(f"[Image: {part.get('image_url', {}).get('url', 'Unknown')}]")
                  elif isinstance(part, str):
                      text_parts.append(part)

              # Join text parts with newlines for readability
              content = '\n'.join(text_parts) if text_parts else '[Multimodal content]'

          # Ensure content is a string
          if not isinstance(content, str):
              content = str(content)

          # Trim excessive whitespace but preserve meaningful formatting
          content = content.strip()

          formatted.append({
              "role": role,
              "content": content
          })

      return formatted

  # Usage in span creation
  span_data = {
      "input_data": json.dumps(format_messages_for_zeroeval(original_messages)),
      "output_data": response_text.strip(),  # Clean response text too
      # ... other fields
  }
  ```
</CodeGroup>

**Key Formatting Rules:**

1. **Standardize Role Names**: Use `"user"`, `"assistant"`, and `"system"` consistently
2. **Handle Multimodal Content**: Extract text content and add descriptive placeholders for non-text elements
3. **Clean Whitespace**: Trim excessive whitespace while preserving intentional formatting
4. **Ensure String Types**: Convert all content to strings to avoid serialization issues
5. **Preserve Conversation Flow**: Maintain the original message order and context

**UI Display Features:**

* **Message Bubbles**: Conversations appear as chat bubbles with clear role distinction
* **Token Counts**: Hover over messages to see token usage breakdown
* **Copy Functionality**: Users can copy individual messages or entire conversations
* **Search**: Well-formatted messages are easily searchable within traces
* **Export**: Clean formatting ensures readable exports to various formats

**Common Formatting Issues to Avoid:**

* ❌ Mixed role naming (`bot` vs `assistant`)
* ❌ Nested objects in content fields
* ❌ Excessive line breaks or whitespace
* ❌ Empty or null content fields
* ❌ Non-string data types in content

**Pro Tips:**

* Keep system messages concise but informative
* Use consistent formatting across your application
* Include relevant context in message content for better debugging
* Consider truncating very long messages (>10k characters) with ellipsis

### Creating Child Spans

Create nested spans to track sub-operations within an LLM call:

```python theme={null}
import zeroeval as ze

@ze.span(name="rag_pipeline", kind="generic")
def answer_with_context(question: str) -> str:
    # Retrieval step
    with ze.span(name="retrieve_context", kind="vector_store") as retrieval_span:
        context = vector_db.search(question, k=5)
        retrieval_span.set_attributes({
            "vector_store.query": question,
            "vector_store.k": 5,
            "vector_store.results": len(context)
        })

    # LLM generation step
    with ze.span(name="generate_answer", kind="llm") as llm_span:
        messages = [
            {"role": "system", "content": f"Context: {context}"},
            {"role": "user", "content": question}
        ]

        response = generate_response(messages)

        llm_span.set_attributes({
            "llm.model": "gpt-4",
            "llm.context_length": len(str(context))
        })

        return response
```

## Direct API Instrumentation

If you prefer to send spans directly to the API without using an SDK, here's how to do it:

### API Authentication

First, obtain an API key from your [Settings → API Keys](https://app.zeroeval.com/settings?section=api-keys) page.

Include the API key in your request headers:

```bash theme={null}
Authorization: Bearer YOUR_API_KEY
```

### Basic Span Creation

Send a POST request to `/api/v1/spans` with your span data:

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST https://api.zeroeval.com/api/v1/spans \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '[{
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "trace_id": "550e8400-e29b-41d4-a716-446655440001",
      "name": "chat_completion",
      "kind": "llm",
      "started_at": "2024-01-15T10:30:00Z",
      "ended_at": "2024-01-15T10:30:02Z",
      "status": "ok",
      "attributes": {
        "llm.model": "gpt-4",
        "llm.provider": "openai",
        "llm.temperature": 0.7,
        "llm.input_tokens": 150,
        "llm.output_tokens": 230,
        "llm.total_tokens": 380
      },
      "input_data": "[{\"role\": \"user\", \"content\": \"What is the capital of France?\"}]",
      "output_data": "The capital of France is Paris."
    }]'
  ```

  ```python Python (Requests) theme={null}
  import requests
  import json
  from datetime import datetime, timezone
  import uuid

  def send_llm_span(messages, response_text, model="gpt-4", tokens=None):
      """Send an LLM span directly to the ZeroEval API"""

      # Generate IDs
      span_id = str(uuid.uuid4())
      trace_id = str(uuid.uuid4())

      # Prepare the span data
      span_data = {
          "id": span_id,
          "trace_id": trace_id,
          "name": "chat_completion",
          "kind": "llm",
          "started_at": datetime.now(timezone.utc).isoformat(),
          "ended_at": datetime.now(timezone.utc).isoformat(),
          "status": "ok",
          "attributes": {
              "llm.model": model,
              "llm.provider": "openai",
              "llm.temperature": 0.7
          },
          "input_data": json.dumps(messages),
          "output_data": response_text
      }

      # Add token counts if provided
      if tokens:
          span_data["attributes"].update({
              "llm.input_tokens": tokens.get("prompt_tokens"),
              "llm.output_tokens": tokens.get("completion_tokens"),
              "llm.total_tokens": tokens.get("total_tokens")
          })

      # Send to API
      response = requests.post(
          "https://api.zeroeval.com/api/v1/spans",
          headers={
              "Authorization": f"Bearer {YOUR_API_KEY}",
              "Content-Type": "application/json"
          },
          json=[span_data]  # Note: API expects an array
      )

      if response.status_code == 200:
          return response.json()
      else:
          raise Exception(f"Failed to send span: {response.text}")
  ```
</CodeGroup>

### Complete LLM Span with Session

Create a full trace with session context:

```python theme={null}
import requests
import json
from datetime import datetime, timezone
import uuid
import time

class ZeroEvalClient:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = "https://api.zeroeval.com/api/v1"
        self.session_id = str(uuid.uuid4())

    def create_llm_span(
        self,
        messages: list,
        response: dict,
        model: str = "gpt-4",
        trace_id: str = None,
        parent_span_id: str = None,
        start_time: float = None,
        end_time: float = None
    ):
        """Create a comprehensive LLM span with all metadata"""

        if not trace_id:
            trace_id = str(uuid.uuid4())

        if not start_time:
            start_time = time.time()
        if not end_time:
            end_time = time.time()

        span_id = str(uuid.uuid4())

        # Calculate duration
        duration_ms = (end_time - start_time) * 1000

        # Prepare comprehensive span data
        span_data = {
            "id": span_id,
            "trace_id": trace_id,
            "parent_span_id": parent_span_id,
            "name": f"{model}_completion",
            "kind": "llm",
            "started_at": datetime.fromtimestamp(start_time, timezone.utc).isoformat(),
            "ended_at": datetime.fromtimestamp(end_time, timezone.utc).isoformat(),
            "duration_ms": duration_ms,
            "status": "ok",

            # Session context
            "session": {
                "id": self.session_id,
                "name": "API Client Session"
            },

            # Core attributes
            "attributes": {
                "llm.model": model,
                "llm.provider": "openai",
                "llm.temperature": 0.7,
                "llm.max_tokens": 1000,
                "llm.streaming": False,

                # Token metrics
                "llm.input_tokens": response.get("usage", {}).get("prompt_tokens"),
                "llm.output_tokens": response.get("usage", {}).get("completion_tokens"),
                "llm.total_tokens": response.get("usage", {}).get("total_tokens"),

                # Performance metrics
                "llm.duration_ms": duration_ms,
                "llm.throughput_tokens_per_sec": (
                    response.get("usage", {}).get("completion_tokens", 0) /
                    (duration_ms / 1000) if duration_ms > 0 else 0
                ),

                # Response metadata
                "llm.finish_reason": response.get("choices", [{}])[0].get("finish_reason"),
                "llm.response_id": response.get("id"),
                "llm.system_fingerprint": response.get("system_fingerprint")
            },

            # Tags for filtering
            "tags": {
                "environment": "production",
                "version": "1.0.0",
                "user_id": "user_123"
            },

            # Input/Output
            "input_data": json.dumps(messages),
            "output_data": response.get("choices", [{}])[0].get("message", {}).get("content", ""),

            # Cost calculation (optional - will be calculated server-side if not provided)
            "cost": self.calculate_cost(
                model,
                response.get("usage", {}).get("prompt_tokens", 0),
                response.get("usage", {}).get("completion_tokens", 0)
            )
        }

        # Send the span
        response = requests.post(
            f"{self.base_url}/spans",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json=[span_data]
        )

        if response.status_code != 200:
            raise Exception(f"Failed to send span: {response.text}")

        return span_id

    def calculate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        """Calculate cost based on model and token usage"""
        # Example pricing (adjust based on actual pricing)
        pricing = {
            "gpt-4": {"input": 0.03 / 1000, "output": 0.06 / 1000},
            "gpt-3.5-turbo": {"input": 0.001 / 1000, "output": 0.002 / 1000}
        }

        if model in pricing:
            input_cost = input_tokens * pricing[model]["input"]
            output_cost = output_tokens * pricing[model]["output"]
            return input_cost + output_cost

        return 0.0
```

## Span Schema Reference

### Required Fields

| Field        | Type              | Description                     |
| ------------ | ----------------- | ------------------------------- |
| `trace_id`   | string (UUID)     | Unique identifier for the trace |
| `name`       | string            | Descriptive name for the span   |
| `started_at` | ISO 8601 datetime | When the span started           |

### Recommended Fields for LLM Spans

| Field         | Type              | Description                                             |
| ------------- | ----------------- | ------------------------------------------------------- |
| `id`          | string (UUID)     | Unique span identifier (auto-generated if not provided) |
| `kind`        | string            | Set to `"llm"` for LLM spans                            |
| `ended_at`    | ISO 8601 datetime | When the span completed                                 |
| `status`      | string            | `"ok"`, `"error"`, or `"unset"`                         |
| `input_data`  | string            | JSON string of input messages                           |
| `output_data` | string            | Generated text response                                 |
| `duration_ms` | number            | Total duration in milliseconds                          |
| `cost`        | number            | Calculated cost (auto-calculated if not provided)       |

### LLM-Specific Attributes

Store these in the `attributes` field:

| Attribute                       | Type    | Description                                  |
| ------------------------------- | ------- | -------------------------------------------- |
| `llm.model`                     | string  | Model identifier (e.g., "gpt-4", "claude-3") |
| `llm.provider`                  | string  | Provider name (e.g., "openai", "anthropic")  |
| `llm.temperature`               | number  | Temperature parameter                        |
| `llm.max_tokens`                | number  | Maximum tokens limit                         |
| `llm.input_tokens`              | number  | Number of input tokens                       |
| `llm.output_tokens`             | number  | Number of output tokens                      |
| `llm.total_tokens`              | number  | Total tokens used                            |
| `llm.streaming`                 | boolean | Whether response was streamed                |
| `llm.ttft_ms`                   | number  | Time to first token (streaming only)         |
| `llm.throughput_tokens_per_sec` | number  | Token generation rate                        |
| `llm.finish_reason`             | string  | Why generation stopped                       |
| `llm.response_id`               | string  | Provider's response ID                       |
| `llm.system_fingerprint`        | string  | Model version identifier                     |

### Optional Context Fields

| Field            | Type          | Description                                   |
| ---------------- | ------------- | --------------------------------------------- |
| `parent_span_id` | string (UUID) | Parent span for nested operations             |
| `session`        | object        | Session context with `id` and optional `name` |
| `tags`           | object        | Key-value pairs for filtering                 |
| `signals`        | object        | Custom signals for alerting                   |
| `error_message`  | string        | Error description if status is "error"        |
| `error_stack`    | string        | Stack trace for debugging                     |

## Best Practices

1. **Always set the `kind` field**: Use `"llm"` for LLM spans to enable specialized features like embeddings and cost tracking.

2. **Include token counts**: These are essential for cost calculation and performance monitoring.

3. **Capture timing metrics**: For streaming responses, track TTFT (time to first token) and throughput.

4. **Use consistent naming**: Follow a pattern like `{model}_completion` or `{provider}_{operation}`.

5. **Add context with tags**: Use tags for environment, version, user ID, etc., to enable powerful filtering.

6. **Handle errors gracefully**: Set status to "error" and include error details in attributes.

7. **Link related spans**: Use `parent_span_id` to create hierarchical traces for complex workflows.

8. **Batch span submissions**: When sending multiple spans, include them in a single API call as an array.

## Examples

### Multi-Step LLM Pipeline

Here's a complete example of tracking a RAG (Retrieval-Augmented Generation) pipeline:

```python theme={null}
import zeroeval as ze
import time
import json

@ze.span(name="rag_query", kind="generic")
def rag_pipeline(user_query: str) -> dict:
    trace_id = ze.get_current_trace()

    # Step 1: Query embedding
    with ze.span(name="embed_query", kind="llm") as embed_span:
        start = time.time()
        embedding = create_embedding(user_query)
        embed_span.set_attributes({
            "llm.model": "text-embedding-3-small",
            "llm.provider": "openai",
            "llm.input_tokens": len(user_query.split()),
            "llm.duration_ms": (time.time() - start) * 1000
        })

    # Step 2: Vector search
    with ze.span(name="vector_search", kind="vector_store") as search_span:
        results = vector_db.similarity_search(embedding, k=5)
        search_span.set_attributes({
            "vector_store.index": "knowledge_base",
            "vector_store.k": 5,
            "vector_store.results_count": len(results)
        })

    # Step 3: Rerank results
    with ze.span(name="rerank_results", kind="llm") as rerank_span:
        reranked = rerank_documents(user_query, results)
        rerank_span.set_attributes({
            "llm.model": "rerank-english-v2.0",
            "llm.provider": "cohere",
            "rerank.input_documents": len(results),
            "rerank.output_documents": len(reranked)
        })

    # Step 4: Generate response
    with ze.span(name="generate_response", kind="llm") as gen_span:
        context = "\n".join([doc.content for doc in reranked[:3]])
        messages = [
            {"role": "system", "content": f"Use this context to answer: {context}"},
            {"role": "user", "content": user_query}
        ]

        response = generate_with_metrics(messages, model="gpt-4")

        gen_span.set_attributes({
            "llm.context_documents": 3,
            "llm.context_length": len(context)
        })

        return {
            "answer": response,
            "sources": [doc.metadata for doc in reranked[:3]],
            "trace_id": trace_id
        }
```

This comprehensive instrumentation provides full visibility into your LLM operations, enabling you to monitor performance, track costs, and debug issues effectively.

## Next Steps

<CardGroup>
  <Card title="Configuration Reference" icon="gear" href="/tracing/additional-configuration">
    Complete guide to environment variables, initialization parameters, and
    runtime configuration options.
  </Card>

  <Card title="SDK Integrations" icon="plug" href="/tracing/sdks/python/integrations">
    Automatic instrumentation for popular LLM libraries without manual code
    changes.
  </Card>
</CardGroup>

<Note>
  For automatic instrumentation of popular LLM libraries, check out our [SDK
  integrations](/tracing/sdks/python/integrations) which handle all of this
  automatically.
</Note>


# OpenTelemetry
Source: https://docs.zeroeval.com/tracing/opentelemetry

Send traces to ZeroEval using the OpenTelemetry collector

ZeroEval provides native support for the OpenTelemetry Protocol (OTLP), allowing you to send traces from any OpenTelemetry-instrumented application directly to ZeroEval's API. This guide shows you how to configure the OpenTelemetry collector to export traces to ZeroEval.

## Prerequisites

* A ZeroEval API key (get one from your [workspace settings](https://app.zeroeval.com/settings/api-keys))
* OpenTelemetry collector installed ([installation guide](https://opentelemetry.io/docs/collector/getting-started/))

## Configuration

Create a collector configuration file (`otel-collector-config.yaml`):

```yaml theme={null}
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

  # ZeroEval-specific attributes
  attributes:
    actions:
      - key: deployment.environment
        value: "production"  # or staging, development, etc.
        action: upsert

exporters:
  otlphttp:
    endpoint: https://api.zeroeval.com
    headers:
      Authorization: "Bearer YOUR_ZEROEVAL_API_KEY"
    traces_endpoint: https://api.zeroeval.com/v1/traces

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes]
      exporters: [otlphttp]
```

## Docker Deployment

For containerized deployments, use this Docker Compose configuration:

```yaml theme={null}
version: '3.8'

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    container_name: otel-collector
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver
      - "8888:8888"   # Prometheus metrics
    environment:
      - ZEROEVAL_API_KEY=${ZEROEVAL_API_KEY}
    restart: unless-stopped
```

## Environment-based Configuration

To avoid hardcoding sensitive information, use environment variables:

```yaml theme={null}
exporters:
  otlphttp:
    endpoint: https://api.zeroeval.com
    headers:
      Authorization: "Bearer ${env:ZEROEVAL_API_KEY}"
    traces_endpoint: https://api.zeroeval.com/v1/traces
```

Then set the environment variable:

```bash theme={null}
export ZEROEVAL_API_KEY="your-api-key-here"
```

## Kubernetes Deployment

For Kubernetes environments, use this ConfigMap and Deployment:

```yaml theme={null}
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
data:
  otel-collector-config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    
    processors:
      batch:
        timeout: 1s
      
      k8sattributes:
        extract:
          metadata:
            - k8s.namespace.name
            - k8s.deployment.name
            - k8s.pod.name
    
    exporters:
      otlphttp:
        endpoint: https://api.zeroeval.com
        headers:
          Authorization: "Bearer ${env:ZEROEVAL_API_KEY}"
        traces_endpoint: https://api.zeroeval.com/v1/traces
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch, k8sattributes]
          exporters: [otlphttp]

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
spec:
  replicas: 2
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:latest
        args: ["--config=/etc/otel-collector-config.yaml"]
        env:
        - name: ZEROEVAL_API_KEY
          valueFrom:
            secretKeyRef:
              name: zeroeval-secret
              key: api-key
        ports:
        - containerPort: 4317
          name: otlp-grpc
        - containerPort: 4318
          name: otlp-http
        volumeMounts:
        - name: config
          mountPath: /etc/otel-collector-config.yaml
          subPath: otel-collector-config.yaml
      volumes:
      - name: config
        configMap:
          name: otel-collector-config

---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector
spec:
  selector:
    app: otel-collector
  ports:
  - name: otlp-grpc
    port: 4317
    targetPort: 4317
  - name: otlp-http
    port: 4318
    targetPort: 4318
```


# Quickstart
Source: https://docs.zeroeval.com/tracing/quickstart

Get started with tracing and observability in ZeroEval

### Get your API key

Create an API key from your [Settings → API Keys](https://app.zeroeval.com/settings?section=api-keys) page.

### Install the SDK

Get started with one of our SDKs:

<CardGroup>
  <Card title="Python SDK" icon="python" href="/tracing/sdks/python/setup">
    For Python applications using frameworks like FastAPI, Django, or Flask
  </Card>

  <Card title="TypeScript SDK" icon="js" href="/tracing/sdks/typescript/setup">
    For TypeScript and JavaScript applications using Node.js or Bun
  </Card>
</CardGroup>


# Reference
Source: https://docs.zeroeval.com/tracing/reference

Environment variables and configuration parameters for the ZeroEval tracer

Configure the ZeroEval tracer through environment variables, initialization parameters, or runtime methods.

## Environment Variables

Set before importing ZeroEval to configure default behavior.

| Variable                         | Type    | Default                      | Description                             |
| -------------------------------- | ------- | ---------------------------- | --------------------------------------- |
| `ZEROEVAL_API_KEY`               | string  | `""`                         | API key for authentication              |
| `ZEROEVAL_API_URL`               | string  | `"https://api.zeroeval.com"` | API endpoint URL                        |
| `ZEROEVAL_WORKSPACE_NAME`        | string  | `"Personal Workspace"`       | Workspace name                          |
| `ZEROEVAL_SESSION_ID`            | string  | auto-generated               | Session ID for grouping traces          |
| `ZEROEVAL_SESSION_NAME`          | string  | `""`                         | Human-readable session name             |
| `ZEROEVAL_SAMPLING_RATE`         | float   | `"1.0"`                      | Sampling rate (0.0-1.0)                 |
| `ZEROEVAL_DISABLED_INTEGRATIONS` | string  | `""`                         | Comma-separated integrations to disable |
| `ZEROEVAL_DEBUG`                 | boolean | `"false"`                    | Enable debug logging                    |

**Activation:** Set environment variables before importing the SDK.

```bash theme={null}
export ZEROEVAL_API_KEY="ze_1234567890abcdef"
export ZEROEVAL_SAMPLING_RATE="0.1"
export ZEROEVAL_DEBUG="true"
```

## Initialization Parameters

Configure via `ze.init()` - overrides environment variables.

| Parameter               | Type            | Default                      | Description                      |
| ----------------------- | --------------- | ---------------------------- | -------------------------------- |
| `api_key`               | string          | `None`                       | API key for authentication       |
| `workspace_name`        | string          | `"Personal Workspace"`       | Workspace name                   |
| `debug`                 | boolean         | `False`                      | Enable debug logging with colors |
| `api_url`               | string          | `"https://api.zeroeval.com"` | API endpoint URL                 |
| `disabled_integrations` | list\[str]      | `None`                       | Integrations to disable          |
| `enabled_integrations`  | list\[str]      | `None`                       | Only enable these integrations   |
| `setup_otlp`            | boolean         | `True`                       | Setup OpenTelemetry OTLP export  |
| `service_name`          | string          | `"zeroeval-app"`             | OTLP service name                |
| `tags`                  | dict\[str, str] | `None`                       | Global tags for all spans        |
| `sampling_rate`         | float           | `None`                       | Sampling rate (0.0-1.0)          |

**Activation:** Pass parameters to `ze.init()`.

```python theme={null}
ze.init(
    api_key="ze_1234567890abcdef",
    sampling_rate=0.1,
    disabled_integrations=["langchain"],
    debug=True
)
```

## Runtime Configuration

Configure after initialization via `ze.tracer.configure()`.

| Parameter              | Type             | Default | Description                          |
| ---------------------- | ---------------- | ------- | ------------------------------------ |
| `flush_interval`       | float            | `1.0`   | Flush frequency in seconds           |
| `max_spans`            | int              | `20`    | Buffer size before forced flush      |
| `collect_code_details` | boolean          | `True`  | Capture code details in spans        |
| `integrations`         | dict\[str, bool] | `{}`    | Enable/disable specific integrations |
| `sampling_rate`        | float            | `None`  | Sampling rate (0.0-1.0)              |

**Activation:** Call `ze.tracer.configure()` anytime after initialization.

```python theme={null}
ze.tracer.configure(
    flush_interval=0.5,
    max_spans=100,
    sampling_rate=0.05,
    integrations={"openai": True, "langchain": False}
)
```

## Available Integrations

| Integration            | User-Friendly Name | Auto-Instruments     |
| ---------------------- | ------------------ | -------------------- |
| `OpenAIIntegration`    | `"openai"`         | OpenAI client calls  |
| `GeminiIntegration`    | `"gemini"`         | Google Gemini calls  |
| `LangChainIntegration` | `"langchain"`      | LangChain components |
| `LangGraphIntegration` | `"langgraph"`      | LangGraph workflows  |
| `HttpxIntegration`     | `"httpx"`          | HTTPX requests       |
| `VocodeIntegration`    | `"vocode"`         | Vocode voice SDK     |

**Control via:**

* Environment: `ZEROEVAL_DISABLED_INTEGRATIONS="langchain,langgraph"`
* Init: `disabled_integrations=["langchain"]` or `enabled_integrations=["openai"]`
* Runtime: `ze.tracer.configure(integrations={"langchain": False})`

## Configuration Examples

### Production Setup

```python theme={null}
# High-volume production with sampling
ze.init(
    api_key="your_key",
    sampling_rate=0.05,  # 5% sampling
    debug=False,
    disabled_integrations=["langchain"]
)

ze.tracer.configure(
    flush_interval=0.5,   # Faster flushes
    max_spans=100         # Larger buffer
)
```

### Development Setup

```python theme={null}
# Full tracing with debug info
ze.init(
    api_key="your_key",
    debug=True,           # Colored logs
    sampling_rate=1.0     # Capture everything
)
```

### Memory-Optimized Setup

```python theme={null}
# Minimize memory usage
ze.tracer.configure(
    max_spans=5,                    # Small buffer
    collect_code_details=False,     # No code capture
    flush_interval=2.0              # Less frequent flushes
)
```


# Integrations
Source: https://docs.zeroeval.com/tracing/sdks/python/integrations

Automatic instrumentation for popular AI/ML frameworks

The [ZeroEval Python SDK](https://pypi.org/project/zeroeval/) automatically traces intruments the supported integrations, meaning the only thing to do is to initialize the SDK before importing the frameworks you want to trace.

## OpenAI

```python theme={null}
import zeroeval as ze
ze.init()

import openai
client = openai.OpenAI()

# This call is automatically traced
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Streaming is also automatically traced
stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

## LangChain

```python theme={null}
import zeroeval as ze
ze.init()

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# All components are automatically traced
model = ChatOpenAI()
prompt = ChatPromptTemplate.from_template("Tell me about {topic}")
chain = prompt | model

response = chain.invoke({"topic": "AI"})
```

## LangGraph

```python theme={null}
import zeroeval as ze
ze.init()

from langgraph.graph import StateGraph, START, END
from langchain_core.messages import HumanMessage

# Define a multi-node graph
workflow = StateGraph(AgentState)
workflow.add_node("reasoning", reasoning_node)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {"tools": "tools", "end": END}
)

app = workflow.compile()

# Full graph execution is automatically traced
result = app.invoke({"messages": [HumanMessage(content="Help me plan a trip")]})

# Streaming is also supported
for chunk in app.stream({"messages": [HumanMessage(content="Hello")]}):
    print(chunk)
```

## PydanticAI

PydanticAI agents are automatically traced, including multi-turn conversations. The SDK ensures that all LLM calls within an agent execution share the same trace, and consecutive conversation turns share the same trace ID when using shared message history.

```python theme={null}
import zeroeval as ze
ze.init()

from pydantic_ai import Agent
from pydantic import BaseModel

class Response(BaseModel):
    message: str
    sentiment: str

# Create an agent with structured output
agent = Agent(
    model="openai:gpt-4o-mini",
    output_type=Response,
    system_prompt="You are a helpful assistant."
)

# Single execution - automatically traced
result = await agent.run("Hello!")

# Multi-turn conversation - all turns share the same trace
message_history = []

async with agent.iter("First message", message_history=message_history) as run:
    async for node in run:
        pass
    message_history = run.result.all_messages()

# Second turn reuses the same trace_id
async with agent.iter("Follow-up message", message_history=message_history) as run:
    async for node in run:
        pass
    message_history = run.result.all_messages()
```

<Info>
  When you pass the same `message_history` list across multiple agent runs, ZeroEval automatically groups all runs under a single trace. This provides a unified view of the entire conversation.
</Info>

## LiveKit

The SDK automatically creates traces for LiveKit agents, including events from the following plugins:

* Cartesia (TTS)
* Deepgram (STT)
* OpenAI (LLM)

```python theme={null}
import zeroeval as ze
ze.init()

from livekit import agents
from livekit.agents import AgentSession, Agent
from livekit.plugins import openai

async def entrypoint(ctx: agents.JobContext):
    await ctx.connect()

    # All agent sessions are automatically traced
    session = AgentSession(
        llm=openai.realtime.RealtimeModel(voice="coral")
    )

    await session.start(
        room=ctx.room,
        agent=Agent(instructions="You are a helpful voice AI assistant.")
    )

    # Agent interactions are automatically captured
    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )

if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
```

<Note>
  Need help? Contact us at [founders@zeroeval.com](mailto:founders@zeroeval.com) or join our [Discord](https://discord.gg/MuExkGMNVz).
</Note>


# Reference
Source: https://docs.zeroeval.com/tracing/sdks/python/reference

Complete API reference for the Python SDK

## Installation

```bash theme={null}
pip install zeroeval
```

## Core Functions

### `init()`

Initializes the ZeroEval SDK. Must be called before using any other SDK features.

```python theme={null}
def init(
    api_key: str = None, 
    workspace_name: str = "Personal Workspace",
    debug: bool = False,
    api_url: str = "https://api.zeroeval.com"
) -> None
```

**Parameters:**

* `api_key` (str, optional): Your ZeroEval API key. If not provided, uses `ZEROEVAL_API_KEY` environment variable
* `workspace_name` (str, optional): The name of your workspace. Defaults to `"Personal Workspace"`
* `debug` (bool, optional): If True, enables detailed logging for debugging. Can also be enabled by setting `ZEROEVAL_DEBUG=true` environment variable
* `api_url` (str, optional): The URL of the ZeroEval API. Defaults to `"https://api.zeroeval.com"`

**Example:**

```python theme={null}
import zeroeval as ze

ze.init(
    api_key="your-api-key",
    workspace_name="My Workspace",
    debug=True
)
```

## Decorators

### `@span`

Decorator and context manager for creating spans around code blocks.

```python theme={null}
@span(
    name: str,
    session_id: Optional[str] = None,
    session: Optional[Union[str, dict[str, str]]] = None,
    attributes: Optional[dict[str, Any]] = None,
    input_data: Optional[str] = None,
    output_data: Optional[str] = None,
    tags: Optional[dict[str, str]] = None
)
```

**Parameters:**

* `name` (str): Name of the span
* `session_id` (str, optional): **Deprecated** - Use `session` parameter instead
* `session` (Union\[str, dict], optional): Session information. Can be:
  * A string containing the session ID
  * A dict with `{"id": "...", "name": "..."}`
* `attributes` (dict, optional): Additional attributes to attach to the span
* `input_data` (str, optional): Manual input data override
* `output_data` (str, optional): Manual output data override
* `tags` (dict, optional): Tags to attach to the span

**Usage as Decorator:**

```python theme={null}
import zeroeval as ze

@ze.span(name="calculate_sum")
def add_numbers(a: int, b: int) -> int:
    return a + b  # Parameters and return value automatically captured

# With manual I/O
@ze.span(name="process_data", input_data="manual input", output_data="manual output")
def process():
    # Process logic here
    pass

# With session
@ze.span(name="user_action", session={"id": "123", "name": "John's Session"})
def user_action():
    pass
```

**Usage as Context Manager:**

```python theme={null}
import zeroeval as ze

with ze.span(name="data_processing") as current_span:
    result = process_data()
    current_span.set_io(input_data="input", output_data=str(result))
```

### `@experiment`

Decorator that attaches dataset and model information to a function.

```python theme={null}
@experiment(
    dataset: Optional[Dataset] = None,
    model: Optional[str] = None
)
```

**Parameters:**

* `dataset` (Dataset, optional): Dataset to use for the experiment
* `model` (str, optional): Model identifier

**Example:**

```python theme={null}
import zeroeval as ze

dataset = ze.Dataset.pull("my-dataset")

@ze.experiment(dataset=dataset, model="gpt-4")
def my_experiment():
    # Experiment logic
    pass
```

## Classes

### `Dataset`

A class to represent a named collection of dictionary records.

#### Constructor

```python theme={null}
Dataset(
    name: str,
    data: list[dict[str, Any]],
    description: Optional[str] = None
)
```

**Parameters:**

* `name` (str): The name of the dataset
* `data` (list\[dict]): A list of dictionaries containing the data
* `description` (str, optional): A description of the dataset

**Example:**

```python theme={null}
dataset = Dataset(
    name="Capitals",
    description="Country to capital mapping",
    data=[
        {"input": "France", "output": "Paris"},
        {"input": "Germany", "output": "Berlin"}
    ]
)
```

#### Methods

##### `push()`

Push the dataset to the backend, creating a new version if it already exists.

```python theme={null}
def push(self, create_new_version: bool = False) -> Dataset
```

**Parameters:**

* `self`: The Dataset instance
* `create_new_version` (bool, optional): For backward compatibility. This parameter is no longer needed as new versions are automatically created when a dataset name already exists. Defaults to False

**Returns:** Returns self for method chaining

##### `pull()`

Static method to pull a dataset from the backend.

```python theme={null}
@classmethod
def pull(
    cls,
    dataset_name: str,
    version_number: Optional[int] = None
) -> Dataset
```

**Parameters:**

* `cls`: The Dataset class itself (automatically provided when using `@classmethod`)
* `dataset_name` (str): The name of the dataset to pull from the backend
* `version_number` (int, optional): Specific version number to pull. If not provided, pulls the latest version

**Returns:** A new Dataset instance populated with data from the backend

##### `add_rows()`

Add new rows to the dataset.

```python theme={null}
def add_rows(self, new_rows: list[dict[str, Any]]) -> None
```

**Parameters:**

* `self`: The Dataset instance
* `new_rows` (list\[dict]): A list of dictionaries representing the rows to add

##### `add_image()`

Add an image to a specific row.

```python theme={null}
def add_image(
    self,
    row_index: int,
    column_name: str,
    image_path: str
) -> None
```

**Parameters:**

* `self`: The Dataset instance
* `row_index` (int): Index of the row to update (0-based)
* `column_name` (str): Name of the column to add the image to
* `image_path` (str): Path to the image file to add

##### `add_audio()`

Add audio to a specific row.

```python theme={null}
def add_audio(
    self,
    row_index: int,
    column_name: str,
    audio_path: str
) -> None
```

**Parameters:**

* `self`: The Dataset instance
* `row_index` (int): Index of the row to update (0-based)
* `column_name` (str): Name of the column to add the audio to
* `audio_path` (str): Path to the audio file to add

##### `add_media_url()`

Add a media URL to a specific row.

```python theme={null}
def add_media_url(
    self,
    row_index: int,
    column_name: str,
    media_url: str,
    media_type: str = "image"
) -> None
```

**Parameters:**

* `self`: The Dataset instance
* `row_index` (int): Index of the row to update (0-based)
* `column_name` (str): Name of the column to add the media URL to
* `media_url` (str): URL pointing to the media file
* `media_type` (str, optional): Type of media - "image", "audio", or "video". Defaults to "image"

#### Properties

* `name` (str): The name of the dataset
* `description` (str): The description of the dataset
* `columns` (list\[str]): List of all unique column names
* `data` (list\[dict]): List of the data portion for each row
* `backend_id` (str): The ID in the backend (after pushing)
* `version_id` (str): The version ID in the backend
* `version_number` (int): The version number in the backend

#### Example

```python theme={null}
import zeroeval as ze

# Create a dataset
dataset = ze.Dataset(
    name="Capitals",
    description="Country to capital mapping",
    data=[
        {"input": "France", "output": "Paris"},
        {"input": "Germany", "output": "Berlin"}
    ]
)

# Push to backend
dataset.push()

# Pull from backend
dataset = ze.Dataset.pull("Capitals", version_number=1)

# Add rows
dataset.add_rows([{"input": "Italy", "output": "Rome"}])

# Add multimodal data
dataset.add_image(0, "flag", "flags/france.png")
dataset.add_audio(0, "anthem", "anthems/france.mp3")
dataset.add_media_url(0, "video_url", "https://example.com/video.mp4", "video")
```

### `Experiment`

Represents an experiment that runs a task on a dataset with optional evaluators.

#### Constructor

```python theme={null}
Experiment(
    dataset: Dataset,
    task: Callable[[Any], Any],
    evaluators: Optional[list[Callable[[Any, Any], Any]]] = None,
    name: Optional[str] = None,
    description: Optional[str] = None
)
```

**Parameters:**

* `dataset` (Dataset): The dataset to run the experiment on
* `task` (Callable): Function that processes each row and returns output
* `evaluators` (list\[Callable], optional): List of evaluator functions that take (row, output) and return evaluation result
* `name` (str, optional): Name of the experiment. Defaults to task function name
* `description` (str, optional): Description of the experiment. Defaults to task function's docstring

**Example:**

```python theme={null}
import zeroeval as ze

ze.init()

# Pull dataset
dataset = ze.Dataset.pull("Capitals")

# Define task
def capitalize_task(row):
    return row["input"].upper()

# Define evaluator
def exact_match(row, output):
    return row["output"].upper() == output

# Create and run experiment
exp = ze.Experiment(
    dataset=dataset,
    task=capitalize_task,
    evaluators=[exact_match],
    name="Capital Uppercase Test"
)

results = exp.run()

# Or run task and evaluators separately
results = exp.run_task()
exp.run_evaluators([exact_match], results)
```

#### Methods

##### `run()`

Run the complete experiment (task + evaluators).

```python theme={null}
def run(
    self,
    subset: Optional[list[dict]] = None
) -> list[ExperimentResult]
```

**Parameters:**

* `self`: The Experiment instance
* `subset` (list\[dict], optional): Subset of dataset rows to run the experiment on. If None, runs on entire dataset

**Returns:** List of experiment results for each row

##### `run_task()`

Run only the task without evaluators.

```python theme={null}
def run_task(
    self,
    subset: Optional[list[dict]] = None,
    raise_on_error: bool = False
) -> list[ExperimentResult]
```

**Parameters:**

* `self`: The Experiment instance
* `subset` (list\[dict], optional): Subset of dataset rows to run the task on. If None, runs on entire dataset
* `raise_on_error` (bool, optional): If True, raises exceptions encountered during task execution. If False, captures errors. Defaults to False

**Returns:** List of experiment results for each row

##### `run_evaluators()`

Run evaluators on existing results.

```python theme={null}
def run_evaluators(
    self,
    evaluators: Optional[list[Callable[[Any, Any], Any]]] = None,
    results: Optional[list[ExperimentResult]] = None
) -> list[ExperimentResult]
```

**Parameters:**

* `self`: The Experiment instance
* `evaluators` (list\[Callable], optional): List of evaluator functions to run. If None, uses evaluators from the Experiment instance
* `results` (list\[ExperimentResult], optional): List of results to evaluate. If None, uses results from the Experiment instance

**Returns:** The evaluated results

### `Span`

Represents a span in the tracing system. Usually created via the `@span` decorator.

#### Methods

##### `set_io()`

Set input and output data for the span.

```python theme={null}
def set_io(
    self,
    input_data: Optional[str] = None,
    output_data: Optional[str] = None
) -> None
```

**Parameters:**

* `self`: The Span instance
* `input_data` (str, optional): Input data to attach to the span. Will be converted to string if not already
* `output_data` (str, optional): Output data to attach to the span. Will be converted to string if not already

##### `set_tags()`

Set tags on the span.

```python theme={null}
def set_tags(self, tags: dict[str, str]) -> None
```

**Parameters:**

* `self`: The Span instance
* `tags` (dict\[str, str]): Dictionary of tags to set on the span

##### `set_attributes()`

Set attributes on the span.

```python theme={null}
def set_attributes(self, attributes: dict[str, Any]) -> None
```

**Parameters:**

* `self`: The Span instance
* `attributes` (dict\[str, Any]): Dictionary of attributes to set on the span

##### `set_error()`

Set error information for the span.

```python theme={null}
def set_error(
    self,
    code: str,
    message: str,
    stack: Optional[str] = None
) -> None
```

**Parameters:**

* `self`: The Span instance
* `code` (str): Error code or exception class name
* `message` (str): Error message
* `stack` (str, optional): Stack trace information

##### `add_screenshot()`

Attach a screenshot to the span for visual evaluation by LLM judges. Screenshots are uploaded during ingestion and can be evaluated alongside text data.

```python theme={null}
def add_screenshot(
    self,
    base64_data: str,
    viewport: str = "desktop",
    width: Optional[int] = None,
    height: Optional[int] = None,
    label: Optional[str] = None
) -> None
```

**Parameters:**

* `self`: The Span instance
* `base64_data` (str): Base64 encoded image data. Accepts raw base64 or data URL format (`data:image/png;base64,...`)
* `viewport` (str, optional): Viewport type - `"desktop"`, `"mobile"`, or `"tablet"`. Defaults to `"desktop"`
* `width` (int, optional): Image width in pixels
* `height` (int, optional): Image height in pixels
* `label` (str, optional): Human-readable description of the screenshot

**Example:**

```python theme={null}
import zeroeval as ze

with ze.span(name="browser_test", tags={"test": "visual"}) as span:
    # Capture and attach a desktop screenshot
    span.add_screenshot(
        base64_data=desktop_screenshot_base64,
        viewport="desktop",
        width=1920,
        height=1080,
        label="Homepage - Desktop"
    )
    
    # Also capture mobile view
    span.add_screenshot(
        base64_data=mobile_screenshot_base64,
        viewport="mobile",
        width=375,
        height=812,
        label="Homepage - iPhone"
    )
    
    span.set_io(
        input_data="Navigate to homepage",
        output_data="Captured viewport screenshots"
    )
```

##### `add_image()`

Attach a generic image to the span for visual evaluation. Use this for non-screenshot images like charts, diagrams, or UI component states.

```python theme={null}
def add_image(
    self,
    base64_data: str,
    label: Optional[str] = None,
    metadata: Optional[dict[str, Any]] = None
) -> None
```

**Parameters:**

* `self`: The Span instance
* `base64_data` (str): Base64 encoded image data. Accepts raw base64 or data URL format
* `label` (str, optional): Human-readable description of the image
* `metadata` (dict, optional): Additional metadata to store with the image

**Example:**

```python theme={null}
import zeroeval as ze

with ze.span(name="chart_generation") as span:
    # Generate a chart and attach it
    chart_base64 = generate_chart(data)
    
    span.add_image(
        base64_data=chart_base64,
        label="Monthly Revenue Chart",
        metadata={"chart_type": "bar", "data_points": 12}
    )
    
    span.set_io(
        input_data="Generate revenue chart for Q4",
        output_data="Chart generated with 12 data points"
    )
```

<Note>
  Images attached to spans can be evaluated by LLM judges configured for multimodal evaluation. See the [Multimodal Evaluation](/judges/multimodal-evaluation) guide for setup instructions.
</Note>

## Context Functions

### `get_current_span()`

Returns the currently active span, if any.

```python theme={null}
def get_current_span() -> Optional[Span]
```

**Returns:** The currently active Span instance, or None if no span is active

### `get_current_trace()`

Returns the current trace ID.

```python theme={null}
def get_current_trace() -> Optional[str]
```

**Returns:** The current trace ID, or None if no trace is active

### `get_current_session()`

Returns the current session ID.

```python theme={null}
def get_current_session() -> Optional[str]
```

**Returns:** The current session ID, or None if no session is active

### `set_tag()`

Sets tags on a span, trace, or session.

```python theme={null}
def set_tag(
    target: Union[Span, str],
    tags: dict[str, str]
) -> None
```

**Parameters:**

* `target`: The target to set tags on
  * `Span`: Sets tags on the specific span
  * `str`: Sets tags on the trace (if valid trace ID) or session (if valid session ID)
* `tags` (dict\[str, str]): Dictionary of tags to set

**Example:**

```python theme={null}
import zeroeval as ze

# Set tags on current span
current_span = ze.get_current_span()
if current_span:
    ze.set_tag(current_span, {"user_id": "12345", "environment": "production"})

# Set tags on trace
trace_id = ze.get_current_trace()
if trace_id:
    ze.set_tag(trace_id, {"version": "1.5"})
```

### `set_signal()`

Send a signal to a span, trace, or session.

```python theme={null}
def set_signal(
    target: Union[Span, str],
    signals: dict[str, Union[str, bool, int, float]]
) -> bool
```

**Parameters:**

* `target`: The entity to attach signals to
  * `Span`: Sends signals to the specific span
  * `str`: Sends signals to the trace (if active trace ID) or session
* `signals` (dict): Dictionary of signal names to values

**Returns:** True if signals were sent successfully, False otherwise

**Example:**

```python theme={null}
import zeroeval as ze

# Send signals to current span
current_span = ze.get_current_span()
if current_span:
    ze.set_signal(current_span, {
        "accuracy": 0.95,
        "is_successful": True,
        "error_count": 0
    })

# Send signals to trace
trace_id = ze.get_current_trace()
if trace_id:
    ze.set_signal(trace_id, {"model_score": 0.85})
```

## Judge Feedback APIs

### `send_feedback()`

Programmatically submit user feedback for a completion or judge evaluation.

```python theme={null}
def send_feedback(
    *,
    prompt_slug: str,
    completion_id: str,
    thumbs_up: bool,
    reason: Optional[str] = None,
    expected_output: Optional[str] = None,
    metadata: Optional[dict] = None,
    judge_id: Optional[str] = None,
    expected_score: Optional[float] = None,
    score_direction: Optional[str] = None,
    criteria_feedback: Optional[dict] = None
) -> dict
```

**Notes:**

* Existing usage without `criteria_feedback` is unchanged.
* `criteria_feedback` is optional and supported for scored judges.
* `judge_id` is required when sending `expected_score`, `score_direction`, or `criteria_feedback`.

### `get_judge_criteria()`

Fetch normalized criteria metadata for a judge (useful before criterion-level feedback).

```python theme={null}
def get_judge_criteria(
    project_id: str,
    judge_id: str
) -> dict
```

**Returns:**

* `judge_id`
* `evaluation_type`
* `score_min`, `score_max`, `pass_threshold`
* `criteria` (list of `{key, label, description}`)

## CLI Commands

The ZeroEval SDK includes a CLI tool for running experiments and setup.

### `zeroeval run`

Run a Python script containing ZeroEval experiments.

```bash theme={null}
zeroeval run script.py
```

### `zeroeval setup`

Interactive setup to configure API credentials.

```bash theme={null}
zeroeval setup
```

## Environment Variables

The SDK uses the following environment variables:

* `ZEROEVAL_API_KEY`: Your ZeroEval API key
* `ZEROEVAL_API_URL`: API endpoint URL (defaults to `https://api.zeroeval.com`)
* `ZEROEVAL_DEBUG`: Set to `true` to enable debug logging
* `ZEROEVAL_DISABLED_INTEGRATIONS`: Comma-separated list of integrations to disable


# Setup
Source: https://docs.zeroeval.com/tracing/sdks/python/setup

Get started with ZeroEval tracing in Python applications

The [ZeroEval Python SDK](https://pypi.org/project/zeroeval/) provides seamless integration with your Python applications through automatic instrumentation and a simple decorator-based API.

## Installation

<CodeGroup>
  ```bash pip theme={null}
  pip install zeroeval
  ```

  ```bash poetry theme={null}
  poetry add zeroeval
  ```
</CodeGroup>

## Basic Setup

```python theme={null}
import zeroeval as ze

# Option 1: ZEROEVAL_API_KEY in your environment variable file
ze.init()

# Option 2: Provide API key directly from
#           https://app.zeroeval.com/settings?tab=api-keys
ze.init(api_key="YOUR_API_KEY")
```

<Note>
  Run `zeroeval setup` once to save your API key securely to
  `~/.config/zeroeval/config.json`
</Note>

## Patterns

### Decorators

The `@span` decorator is the easiest way to add tracing:

```python theme={null}
import zeroeval as ze

@ze.span(name="fetch_data")
def fetch_data(user_id: str):
    # Function arguments are automatically captured as inputs
    # Return values are automatically captured as outputs
    return {"user_id": user_id, "name": "John Doe"}

@ze.span(name="process_data", attributes={"version": "1.0"})
def process_data(data: dict):
    # Add custom attributes for better filtering
    return f"Welcome, {data['name']}!"
```

### Context Manager

For more control over span lifecycles:

```python theme={null}
import zeroeval as ze

def complex_workflow():
    with ze.span(name="data_pipeline") as pipeline_span:
        # Fetch stage
        with ze.span(name="fetch_stage") as fetch_span:
            data = fetch_external_data()
            fetch_span.set_io(output_data=str(data))

        # Process stage
        with ze.span(name="process_stage") as process_span:
            processed = transform_data(data)
            process_span.set_io(
                input_data=str(data),
                output_data=str(processed)
            )

        # Save stage
        with ze.span(name="save_stage") as save_span:
            result = save_to_database(processed)
            save_span.set_io(output_data=f"Saved {result} records")
```

## Advanced Configuration

Fine-tune the tracer behavior:

```python theme={null}
from zeroeval.observability.tracer import tracer

# Configure tracer settings
tracer.configure(
    flush_interval=5.0,        # Flush every 5 seconds
    max_spans=200,             # Buffer up to 200 spans
    collect_code_details=True  # Capture source code context
)
```

## Context

Access current context information:

```python theme={null}
# Get the current span
current_span = ze.get_current_span()

# Get the current trace ID
trace_id = ze.get_current_trace()

# Get the current session ID
session_id = ze.get_current_session()
```

## CLI Tooling

The Python SDK includes helpful CLI commands:

```bash theme={null}
# Save your API key securely
zeroeval setup

# Run scripts with automatic tracing
zeroeval run my_script.py
```


# Integrations
Source: https://docs.zeroeval.com/tracing/sdks/typescript/integrations

Automatic tracing for popular AI/ML libraries

The ZeroEval TypeScript SDK provides automatic tracing for popular AI libraries through the `wrap()` function.

## OpenAI

Wrap your OpenAI client to automatically trace all API calls:

```typescript theme={null}
import { OpenAI } from 'openai';
import * as ze from 'zeroeval';

ze.init();
const openai = ze.wrap(new OpenAI());

// Chat completions are automatically traced
const completion = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello!' }]
});

// Streaming is also automatically traced
const stream = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
```

### Supported Methods

The OpenAI integration automatically traces:

* `chat.completions.create()` (streaming and non-streaming)
* `embeddings.create()`
* `images.generate()`, `images.edit()`, `images.createVariation()`
* `audio.transcriptions.create()`, `audio.translations.create()`

## Vercel AI SDK

Wrap the Vercel AI SDK module to trace all AI operations:

```typescript theme={null}
import * as ai from 'ai';
import { openai } from '@ai-sdk/openai';
import * as ze from 'zeroeval';

ze.init();
const wrappedAI = ze.wrap(ai);

// Text generation
const { text } = await wrappedAI.generateText({
  model: openai('gpt-4'),
  prompt: 'Write a haiku about coding'
});

// Streaming
const { textStream } = await wrappedAI.streamText({
  model: openai('gpt-4'),
  messages: [{ role: 'user', content: 'Hello!' }]
});

for await (const delta of textStream) {
  process.stdout.write(delta);
}

// Structured output
import { z } from 'zod';

const { object } = await wrappedAI.generateObject({
  model: openai('gpt-4'),
  schema: z.object({
    name: z.string(),
    age: z.number()
  }),
  prompt: 'Generate a random person'
});
```

### Supported Methods

The Vercel AI SDK integration automatically traces:

* `generateText()`, `streamText()`
* `generateObject()`, `streamObject()`
* `embed()`, `embedMany()`
* `generateImage()`
* `transcribe()`
* `generateSpeech()`

## LangChain / LangGraph

Use the callback handler for LangChain and LangGraph applications:

```typescript theme={null}
import { 
  ZeroEvalCallbackHandler, 
  setGlobalCallbackHandler 
} from 'zeroeval/langchain';

// Option 1: Set globally (recommended)
setGlobalCallbackHandler(new ZeroEvalCallbackHandler());

// All chain invocations are now automatically traced
const result = await chain.invoke({ topic: 'AI' });
```

```typescript theme={null}
import { ZeroEvalCallbackHandler } from 'zeroeval/langchain';

// Option 2: Per-invocation
const handler = new ZeroEvalCallbackHandler();
const result = await chain.invoke(
  { topic: 'AI' },
  { callbacks: [handler] }
);
```

## Auto-Detection

The `wrap()` function automatically detects which client you're wrapping:

```typescript theme={null}
import { OpenAI } from 'openai';
import * as ai from 'ai';
import * as ze from 'zeroeval';

ze.init();

// Automatically detected as OpenAI client
const openai = ze.wrap(new OpenAI());

// Automatically detected as Vercel AI SDK
const wrappedAI = ze.wrap(ai);
```

<Info>
  If `ze.init()` hasn't been called and `ZEROEVAL_API_KEY` is set in your environment, the SDK will automatically initialize when you first use `wrap()`.
</Info>

## Using with Prompts

The integrations automatically extract ZeroEval metadata from prompts created with `ze.prompt()`:

```typescript theme={null}
import { OpenAI } from 'openai';
import * as ze from 'zeroeval';

ze.init();
const openai = ze.wrap(new OpenAI());

// Create a version-tracked prompt
const systemPrompt = await ze.prompt({
  name: 'customer-support',
  content: 'You are a helpful customer support agent for {{company}}.',
  variables: { company: 'TechCorp' }
});

// The integration automatically:
// 1. Extracts the prompt metadata
// 2. Links the completion to the prompt version
// 3. Patches the model if one is bound to the prompt version
const response = await openai.chat.completions.create({
  model: 'gpt-4',  // May be replaced by bound model
  messages: [
    { role: 'system', content: systemPrompt },
    { role: 'user', content: 'I need help with my order' }
  ]
});
```

<Note>
  Need help? Check out our [GitHub examples](https://github.com/zeroeval/zeroeval-ts-sdk/tree/main/examples) or reach out on [Discord](https://discord.gg/MuExkGMNVz).
</Note>


# Reference
Source: https://docs.zeroeval.com/tracing/sdks/typescript/reference

Complete API reference for the TypeScript SDK

## Installation

```bash theme={null}
npm install zeroeval
```

## Core Functions

### `init()`

Initializes the ZeroEval SDK. Must be called before using any other SDK features.

```typescript theme={null}
function init(opts?: InitOptions): void
```

#### Parameters

| Option               | Type                      | Default                    | Description                             |
| -------------------- | ------------------------- | -------------------------- | --------------------------------------- |
| `apiKey`             | `string`                  | `ZEROEVAL_API_KEY` env     | Your ZeroEval API key                   |
| `apiUrl`             | `string`                  | `https://api.zeroeval.com` | Custom API URL                          |
| `flushInterval`      | `number`                  | `10`                       | Interval in seconds to flush spans      |
| `maxSpans`           | `number`                  | `100`                      | Maximum spans to buffer before flushing |
| `collectCodeDetails` | `boolean`                 | `true`                     | Capture source code context             |
| `integrations`       | `Record<string, boolean>` | —                          | Enable/disable specific integrations    |
| `debug`              | `boolean`                 | `false`                    | Enable debug logging                    |

#### Example

```typescript theme={null}
import * as ze from 'zeroeval';

ze.init({
  apiKey: 'your-api-key',
  debug: true
});
```

***

## Wrapper Functions

### `wrap()`

Wraps a supported AI client to automatically trace all API calls.

```typescript theme={null}
function wrap<T extends object>(client: T): WrappedClient<T>
```

#### Supported Clients

* OpenAI SDK (`openai` package)
* Vercel AI SDK (`ai` package)

#### Examples

```typescript theme={null}
// OpenAI
import { OpenAI } from 'openai';
import * as ze from 'zeroeval';

const openai = ze.wrap(new OpenAI());

// Vercel AI SDK
import * as ai from 'ai';
import * as ze from 'zeroeval';

const wrappedAI = ze.wrap(ai);
```

***

## Spans API

### `withSpan()`

Wraps a function execution in a span, automatically capturing timing and errors.

```typescript theme={null}
function withSpan<T>(
  opts: SpanOptions,
  fn: () => Promise<T> | T
): Promise<T> | T
```

#### SpanOptions

| Option        | Type                      | Required | Description                           |
| ------------- | ------------------------- | -------- | ------------------------------------- |
| `name`        | `string`                  | Yes      | Name of the span                      |
| `sessionId`   | `string`                  | No       | Session ID to associate with the span |
| `sessionName` | `string`                  | No       | Human-readable session name           |
| `tags`        | `Record<string, string>`  | No       | Tags to attach to the span            |
| `attributes`  | `Record<string, unknown>` | No       | Additional attributes                 |
| `inputData`   | `any`                     | No       | Manual input data override            |
| `outputData`  | `any`                     | No       | Manual output data override           |

#### Example

```typescript theme={null}
import * as ze from 'zeroeval';

const result = await ze.withSpan(
  { name: 'fetch-user-data' },
  async () => {
    const user = await fetchUser(userId);
    return user;
  }
);
```

### `@span` Decorator

Decorator for class methods to automatically create spans.

```typescript theme={null}
span(opts: SpanOptions): MethodDecorator
```

#### Example

```typescript theme={null}
import * as ze from 'zeroeval';

class UserService {
  @ze.span({ name: 'get-user' })
  async getUser(id: string): Promise<User> {
    return await db.users.findById(id);
  }
}
```

<Note>
  Requires `experimentalDecorators: true` in your `tsconfig.json`.
</Note>

***

## Context Functions

### `getCurrentSpan()`

Returns the currently active span, if any.

```typescript theme={null}
function getCurrentSpan(): Span | undefined
```

### `getCurrentTrace()`

Returns the current trace ID.

```typescript theme={null}
function getCurrentTrace(): string | undefined
```

### `getCurrentSession()`

Returns the current session ID.

```typescript theme={null}
function getCurrentSession(): string | undefined
```

### `setTag()`

Sets tags on a span, trace, or session.

```typescript theme={null}
function setTag(
  target: Span | string | undefined,
  tags: Record<string, string>
): void
```

#### Parameters

| Parameter   | Description                             |
| ----------- | --------------------------------------- |
| `Span`      | Sets tags on the specific span          |
| `string`    | Sets tags on the trace or session by ID |
| `undefined` | Sets tags on the current span           |

***

## Prompts API

### `prompt()`

Creates or fetches versioned prompts from the Prompt Library. Returns decorated content for downstream LLM calls.

```typescript theme={null}
async function prompt(options: PromptOptions): Promise<string>
```

#### PromptOptions

| Option      | Type                     | Required | Description                                                          |
| ----------- | ------------------------ | -------- | -------------------------------------------------------------------- |
| `name`      | `string`                 | Yes      | Task name associated with the prompt                                 |
| `content`   | `string`                 | No       | Raw prompt content (used as fallback or for explicit mode)           |
| `variables` | `Record<string, string>` | No       | Template variables to interpolate `{{variable}}` tokens              |
| `from`      | `string`                 | No       | Version control: `"latest"`, `"explicit"`, or a 64-char SHA-256 hash |

#### Behavior

* **Auto-optimization (default)**: If `content` is provided without `from`, tries to fetch the latest optimized version first, falls back to provided content
* **Explicit mode** (`from: "explicit"`): Always uses provided `content`, bypasses auto-optimization
* **Latest mode** (`from: "latest"`): Requires an optimized version to exist, fails if none found
* **Hash mode** (`from: "<hash>"`): Fetches a specific version by its 64-character SHA-256 content hash

#### Examples

```typescript theme={null}
import * as ze from 'zeroeval';

// Auto-optimization mode (recommended)
const prompt = await ze.prompt({
  name: 'customer-support',
  content: 'You are a helpful {{role}} assistant.',
  variables: { role: 'customer service' }
});

// Explicit mode - bypass auto-optimization
const prompt = await ze.prompt({
  name: 'customer-support',
  content: 'You are a helpful assistant.',
  from: 'explicit'
});

// Latest mode - require optimized version
const prompt = await ze.prompt({
  name: 'customer-support',
  from: 'latest'
});

// Hash mode - specific version
const prompt = await ze.prompt({
  name: 'customer-support',
  from: 'a1b2c3d4e5f6...'  // 64-char SHA-256 hash
});
```

#### Return Value

Returns a decorated prompt string with metadata header used by integrations:

```
<zeroeval>{"task":"...", "prompt_version": 1, ...}</zeroeval>Your prompt content here
```

#### Errors

| Error                 | When                                                                       |
| --------------------- | -------------------------------------------------------------------------- |
| `Error`               | Both `content` and `from` provided (except `from: "explicit"`), or neither |
| `PromptRequestError`  | `from: "latest"` but no versions exist                                     |
| `PromptNotFoundError` | `from` is a hash that does not exist                                       |

***

### `sendFeedback()`

Sends feedback for a completion to enable prompt optimization.

```typescript theme={null}
async function sendFeedback(options: SendFeedbackOptions): Promise<PromptFeedbackResponse>
```

#### SendFeedbackOptions

| Option           | Type                      | Required | Description                               |
| ---------------- | ------------------------- | -------- | ----------------------------------------- |
| `promptSlug`     | `string`                  | Yes      | The slug of the prompt (task name)        |
| `completionId`   | `string`                  | Yes      | UUID of the span/completion               |
| `thumbsUp`       | `boolean`                 | Yes      | `true` for positive, `false` for negative |
| `reason`         | `string`                  | No       | Explanation of the feedback               |
| `expectedOutput` | `string`                  | No       | What the expected output should be        |
| `metadata`       | `Record<string, unknown>` | No       | Additional metadata                       |
| `judgeId`        | `string`                  | No       | Judge automation ID for judge feedback    |
| `expectedScore`  | `number`                  | No       | Expected score for scored judges          |
| `scoreDirection` | `'too_high' \| 'too_low'` | No       | Score direction for scored judges         |

#### Example

```typescript theme={null}
import * as ze from 'zeroeval';

await ze.sendFeedback({
  promptSlug: 'support-bot',
  completionId: '550e8400-e29b-41d4-a716-446655440000',
  thumbsUp: false,
  reason: 'Response was too verbose',
  expectedOutput: 'A concise 2-3 sentence response'
});
```

***

## Signals API

### `sendSignal()`

Send a signal to a specific entity.

```typescript theme={null}
async function sendSignal(
  entityType: 'session' | 'trace' | 'span' | 'completion',
  entityId: string,
  name: string,
  value: string | boolean | number,
  signalType?: 'boolean' | 'numerical'
): Promise<void>
```

### `sendTraceSignal()`

Send a signal to the current trace.

```typescript theme={null}
function sendTraceSignal(
  name: string,
  value: string | boolean | number,
  signalType?: 'boolean' | 'numerical'
): void
```

### `sendSessionSignal()`

Send a signal to the current session.

```typescript theme={null}
function sendSessionSignal(
  name: string,
  value: string | boolean | number,
  signalType?: 'boolean' | 'numerical'
): void
```

### `sendSpanSignal()`

Send a signal to the current span.

```typescript theme={null}
function sendSpanSignal(
  name: string,
  value: string | boolean | number,
  signalType?: 'boolean' | 'numerical'
): void
```

### `getEntitySignals()`

Retrieve signals for a specific entity.

```typescript theme={null}
async function getEntitySignals(
  entityType: 'session' | 'trace' | 'span' | 'completion',
  entityId: string
): Promise<Signal[]>
```

#### Example

```typescript theme={null}
import * as ze from 'zeroeval';

await ze.withSpan({ name: 'process-request' }, async () => {
  // Process something...
  
  // Send signals
  ze.sendSpanSignal('success', true);
  ze.sendSpanSignal('latency_ms', 150);
  ze.sendTraceSignal('user_satisfied', true);
});
```

***

## Utility Functions

### `renderTemplate()`

Render a template string with variable substitution.

```typescript theme={null}
function renderTemplate(
  template: string,
  variables: Record<string, string | number | boolean>,
  options?: { ignoreMissing?: boolean }
): string
```

### `extractVariables()`

Extract variable names from a template string.

```typescript theme={null}
function extractVariables(template: string): Set<string>
```

### `sha256Hex()`

Compute SHA-256 hash of text.

```typescript theme={null}
async function sha256Hex(text: string): Promise<string>
```

### `normalizePromptText()`

Normalize prompt text for consistent hashing.

```typescript theme={null}
function normalizePromptText(text: string): string
```

***

## Error Classes

### `PromptNotFoundError`

Thrown when a specific prompt version (by hash) is not found.

```typescript theme={null}
class PromptNotFoundError extends Error {
  constructor(message: string)
}
```

### `PromptRequestError`

Thrown when a prompt request fails (e.g., no versions exist for `from: "latest"`).

```typescript theme={null}
class PromptRequestError extends Error {
  constructor(message: string, statusCode?: number)
}
```

***

## Types

### `Prompt`

```typescript theme={null}
interface Prompt {
  id: string;
  prompt_id: string;
  content: string;
  content_hash: string;
  version: number;
  model_id?: string;
}
```

### `PromptMetadata`

```typescript theme={null}
interface PromptMetadata {
  task: string;
  prompt_slug?: string;
  prompt_version?: number;
  prompt_version_id?: string;
  content_hash?: string;
  variables?: Record<string, string>;
}
```

### `Signal`

```typescript theme={null}
interface Signal {
  value: string | boolean | number;
  type: 'boolean' | 'numerical';
}
```

<Note>
  Need help? Check out our [GitHub examples](https://github.com/zeroeval/zeroeval-ts-sdk/tree/main/examples) or reach out on [Discord](https://discord.gg/MuExkGMNVz).
</Note>


# Setup
Source: https://docs.zeroeval.com/tracing/sdks/typescript/setup

Get started with ZeroEval tracing in TypeScript and JavaScript applications

The [ZeroEval TypeScript SDK](https://www.npmjs.com/package/zeroeval) provides tracing for Node.js applications through wrapper functions and integration callbacks.

## Installation

<CodeGroup>
  ```bash npm theme={null}
  npm install zeroeval
  ```

  ```bash yarn theme={null}
  yarn add zeroeval
  ```

  ```bash pnpm theme={null}
  pnpm add zeroeval
  ```
</CodeGroup>

## Basic Setup

```typescript theme={null}
import * as ze from 'zeroeval';

// Option 1: ZEROEVAL_API_KEY in your environment variable
ze.init();

// Option 2: Provide API key directly
ze.init({ apiKey: 'YOUR_API_KEY' });

// Option 3: With additional configuration
ze.init({
  apiKey: 'YOUR_API_KEY',
  apiUrl: 'https://api.zeroeval.com', // optional
  flushInterval: 10, // seconds
  maxSpans: 100,
});
```

## Patterns

The SDK offers two ways to add tracing to your TypeScript/JavaScript code:

### Function Wrapping

Use `withSpan()` to wrap function executions:

```typescript theme={null}
import * as ze from 'zeroeval';

// Wrap synchronous functions
const fetchData = (userId: string) =>
  ze.withSpan({ name: 'fetch_data' }, () => ({
    userId,
    name: 'John Doe'
  }));

// Wrap async functions
const processData = async (data: { name: string }) =>
  ze.withSpan(
    {
      name: 'process_data',
      attributes: { version: '1.0' }
    },
    async () => {
      const result = await transform(data);
      return `Welcome, ${result.name}!`;
    }
  );

// Complex workflows with nested spans
async function complexWorkflow() {
  return ze.withSpan({ name: 'data_pipeline' }, async () => {
    const data = await ze.withSpan(
      { name: 'fetch_stage' },
      fetchExternalData
    );

    const processed = await ze.withSpan(
      { name: 'process_stage' },
      () => transformData(data)
    );

    const result = await ze.withSpan(
      { name: 'save_stage' },
      () => saveToDatabase(processed)
    );

    return result;
  });
}
```

### Decorators

Use the `@span` decorator for class methods:

```typescript theme={null}
import { span } from 'zeroeval';

class DataService {
  @span({
    name: 'fetch_user_data',
    tags: { service: 'user_api' }
  })
  async fetchUser(userId: string) {
    const response = await fetch(`/api/users/${userId}`);
    return response.json();
  }

  @span({
    name: 'process_order',
    attributes: { version: '2.0' }
  })
  processOrder(orderId: string, items: string[]) {
    return { orderId, processed: true };
  }
}
```

<Note>
  **Decorators require TypeScript configuration**: Enable `experimentalDecorators` in your `tsconfig.json`:

  ```json theme={null}
  {
    "compilerOptions": {
      "experimentalDecorators": true
    }
  }
  ```

  When using runtime tools like `tsx` or `ts-node`, pass the `--experimental-decorators` flag.
</Note>

## Sessions

Group related spans into sessions:

```typescript theme={null}
import { v4 as uuidv4 } from 'uuid';
import * as ze from 'zeroeval';

const sessionId = uuidv4();

async function userJourney(userId: string) {
  return ze.withSpan(
    {
      name: 'user_journey',
      sessionId: sessionId,
      sessionName: 'User Onboarding'
    },
    async () => {
      // All nested spans inherit the session
      await ze.withSpan({ name: 'step_1' }, () => welcome(userId));
      await ze.withSpan({ name: 'step_2' }, () => setupProfile(userId));
      await ze.withSpan({ name: 'step_3' }, () => sendConfirmation(userId));
    }
  );
}
```

## Context

Access current context information:

```typescript theme={null}
import * as ze from 'zeroeval';

// Get the current span
const currentSpan = ze.getCurrentSpan();

// Get the current trace ID
const traceId = ze.getCurrentTrace();

// Get the current session ID
const sessionId = ze.getCurrentSession();
```

## Tagging

Attach tags for filtering and organization:

```typescript theme={null}
import * as ze from 'zeroeval';

// Set tags on the current span
ze.setTag(undefined, { user_id: '12345', environment: 'production' });

// Set tags on a specific trace
const traceId = ze.getCurrentTrace();
if (traceId) {
  ze.setTag(traceId, { feature: 'checkout' });
}

// Set tags on a span object
const span = ze.getCurrentSpan();
if (span) {
  ze.setTag(span, { action: 'process_payment' });
}
```

## Advanced Configuration

Fine-tune the SDK behavior:

```typescript theme={null}
import * as ze from 'zeroeval';

ze.init({
  apiKey: 'YOUR_API_KEY',
  apiUrl: 'https://api.zeroeval.com',
  flushInterval: 5,           // Flush every 5 seconds
  maxSpans: 200,              // Buffer up to 200 spans
  collectCodeDetails: true,   // Capture source code context
  debug: false,               // Enable debug logging
  integrations: {
    openai: true,             // Enable OpenAI integration
    vercelAI: true,           // Enable Vercel AI SDK integration
  }
});
```

<Note>
  Need help? Check out our [GitHub examples](https://github.com/zeroeval/zeroeval-ts-sdk/tree/main/examples) or reach out on [Discord](https://discord.gg/MuExkGMNVz).
</Note>


# Sessions
Source: https://docs.zeroeval.com/tracing/sessions

Group related spans into sessions for better organization and analysis

Sessions provide a powerful way to group related spans together, making it easier to track and analyze complex workflows, user interactions, or multi-step processes. This guide covers everything you need to know about working with sessions.

For complete API documentation, see the [Python SDK Reference](/tracing/sdks/python/reference).

## Creating Sessions

### Basic Session with ID

The simplest way to create a session is by providing a session ID:

```python theme={null}
import uuid
import zeroeval as ze

# Generate a unique session ID
session_id = str(uuid.uuid4())

@ze.span(name="process_request", session=session_id)
def process_request(data):
    # This span belongs to the session
    return transform_data(data)
```

### Named Sessions

For better organization in the ZeroEval dashboard, you can provide both an ID and a descriptive name:

```python theme={null}
@ze.span(
    name="user_interaction",
    session={
        "id": session_id,
        "name": "Customer Support Chat - User #12345"
    }
)
def handle_support_chat(user_id, message):
    # Process the support request
    return generate_response(message)
```

## Session Inheritance

Child spans automatically inherit the session from their parent span:

```python theme={null}
session_info = {
    "id": str(uuid.uuid4()),
    "name": "Order Processing Pipeline"
}

@ze.span(name="process_order", session=session_info)
def process_order(order_id):
    # These nested calls automatically belong to the same session
    validate_order(order_id)
    charge_payment(order_id)
    fulfill_order(order_id)
    
@ze.span(name="validate_order")
def validate_order(order_id):
    # Automatically part of the parent's session
    return check_inventory(order_id)

@ze.span(name="charge_payment")
def charge_payment(order_id):
    # Also inherits the session
    return process_payment(order_id)
```

## Advanced Session Patterns

### Multi-Agent RAG System

Track complex retrieval-augmented generation workflows with multiple specialized agents:

```python theme={null}
session = {
    "id": str(uuid.uuid4()),
    "name": "Multi-Agent RAG Pipeline"
}

@ze.span(name="rag_coordinator", session=session)
async def process_query(query):
    # Retrieval
    docs = await retrieval_agent(query)
    
    # Reranking  
    ranked = await reranking_agent(query, docs)
    
    # Generation
    response = await generation_agent(query, ranked)
    
    return response

@ze.span(name="retrieval_agent")
async def retrieval_agent(query):
    # Inherits session from parent
    embeddings = await embed(query)
    return await vector_search(embeddings)

@ze.span(name="generation_agent")
async def generation_agent(query, context):
    return await llm.generate(query, context)
```

### Conversational AI Session

Track a complete conversation with an AI assistant:

```python theme={null}
class ChatSession:
    def __init__(self, user_id):
        self.session = {
            "id": f"chat-{user_id}-{uuid.uuid4()}",
            "name": f"AI Chat - User {user_id}"
        }
        self.history = []
    
    @ze.span(name="process_message", session=lambda self: self.session)
    async def process_message(self, message):
        # Add to history
        self.history.append({"role": "user", "content": message})
        
        # Generate response
        response = await self.generate_response()
        self.history.append({"role": "assistant", "content": response})
        
        return response
    
    @ze.span(name="generate_response", session=lambda self: self.session)
    async def generate_response(self):
        return await llm.chat(self.history)
```

### Batch LLM Processing

Process multiple documents with LLMs in a single session:

```python theme={null}
async def batch_summarize(documents):
    session = {
        "id": f"batch-{uuid.uuid4()}",
        "name": f"Batch Summarization - {len(documents)} docs"
    }
    
    @ze.span(name="batch_processor", session=session)
    async def process():
        summaries = []
        
        for i, doc in enumerate(documents):
            with ze.span(name=f"summarize_doc_{i}", session=session) as span:
                try:
                    summary = await llm.summarize(doc)
                    span.set_io(
                        input_data=f"Doc: {doc['title']}",
                        output_data=summary[:100]
                    )
                    summaries.append(summary)
                except Exception as e:
                    span.set_error(
                        code=type(e).__name__,
                        message=str(e)
                    )
        
        return summaries
    
    return await process()
```

## Context Manager Sessions

You can also use sessions with the context manager pattern:

```python theme={null}
session_info = {
    "id": str(uuid.uuid4()),
    "name": "Data Pipeline Run"
}

with ze.span(name="etl_pipeline", session=session_info) as pipeline_span:
    # Extract phase
    with ze.span(name="extract_data") as extract_span:
        raw_data = fetch_from_source()
        extract_span.set_io(output_data=f"Extracted {len(raw_data)} records")
    
    # Transform phase
    with ze.span(name="transform_data") as transform_span:
        clean_data = transform_records(raw_data)
        transform_span.set_io(
            input_data=f"{len(raw_data)} raw records",
            output_data=f"{len(clean_data)} clean records"
        )
    
    # Load phase
    with ze.span(name="load_data") as load_span:
        result = save_to_destination(clean_data)
        load_span.set_io(output_data=f"Loaded to {result['location']}")
```


# Signals
Source: https://docs.zeroeval.com/tracing/signals

Capture real-world feedback and metrics to enrich your traces, spans, and sessions.

Signals are any piece of user feedback, behavior, or metric you care about – thumbs-up, a 5-star rating, dwell time, task completion, error rates … you name it. Signals help you understand how your AI system performs in the real world by connecting user outcomes to your traces.

You can attach signals to:

* **Completions** (LLM responses)
* **Spans** (individual operations)
* **Sessions** (user interactions)
* **Traces** (entire request flows)

For complete signals API documentation, see the [Python SDK Reference](/tracing/sdks/python/reference#signals).

## Using signals in code

### With the Python SDK

```python theme={null}
import zeroeval as ze

# Initialize the tracer
ze.init(api_key="your-api-key")

# Start a span and add a signal
with ze.trace("user_query") as span:
    # Your AI logic here
    response = process_user_query(query)

    # Add a signal to the current span
    ze.set_signal("user_satisfaction", True)
    ze.set_signal("response_quality", 4.5)
    ze.set_signal("task_completed", "success")
```

### Setting signals on different targets

```python theme={null}
# On the current span
ze.set_signal("helpful", True)

# On a specific span
span = ze.current_span()
ze.set_signal(span, {"rating": 5, "category": "excellent"})

# On the current trace
ze.set_trace_signal("conversion", True)

# On the current session
ze.set_session_signal("user_engaged", True)
```

## API endpoint

For direct API calls, send signals to:

```
POST https://api.zeroeval.com/workspaces/<WORKSPACE_ID>/signals
```

Auth is the same bearer API key you use for tracing.

### Payload schema

| field          | type                           | required | notes                                          |
| -------------- | ------------------------------ | -------- | ---------------------------------------------- |
| completion\_id | string                         | ❌        | **OpenAI completion ID** (for LLM completions) |
| span\_id       | string                         | ❌        | **Span ID** (for specific spans)               |
| trace\_id      | string                         | ❌        | **Trace ID** (for entire traces)               |
| session\_id    | string                         | ❌        | **Session ID** (for user sessions)             |
| name           | string                         | ✅        | e.g. `user_satisfaction`                       |
| value          | string \| bool \| int \| float | ✅        | your data – see examples below                 |

<Note>
  You must provide at least one of: `completion_id`, `span_id`, `trace_id`, or
  `session_id`.
</Note>

## Common signal patterns

Below are some quick copy-pasta snippets for the most common cases.

### 1. Binary feedback (👍 / 👎)

<CodeGroup>
  ```python Python SDK theme={null}
  import zeroeval as ze

  # On current span
  ze.set_signal("thumbs_up", True)

  # On specific span
  ze.set_signal(span, {"helpful": False})
  ```

  ```python API theme={null}
  import requests

  payload = {
      "span_id": span.id,
      "name": "thumbs_up",
      "value": True  // or False
  }
  requests.post(
      f"https://api.zeroeval.com/workspaces/{WORKSPACE_ID}/signals",
      json=payload,
      headers={"Authorization": f"Bearer {ZE_API_KEY}"}
  )
  ```
</CodeGroup>

### 2. Star rating (1–5)

```python theme={null}
ze.set_signal("star_rating", 4)
```

### 3. Continuous metrics

```python theme={null}
# Response time
ze.set_signal("response_time_ms", 1250.5)

# Task completion time
ze.set_signal("time_on_task_sec", 12.85)

# Accuracy score
ze.set_signal("accuracy", 0.94)
```

### 4. Categorical outcomes

```python theme={null}
ze.set_signal("task_status", "success")
ze.set_signal("error_type", "timeout")
ze.set_signal("user_intent", "purchase")
```

### 5. Session-level signals

```python theme={null}
# Track user engagement across an entire session
ze.set_session_signal("pages_visited", 5)
ze.set_session_signal("converted", True)
ze.set_session_signal("user_tier", "premium")
```

### 6. Trace-level signals

```python theme={null}
# Track outcomes for an entire request flow
ze.set_trace_signal("request_successful", True)
ze.set_trace_signal("total_cost", 0.045)
ze.set_trace_signal("model_used", "gpt-4o")
```

## Signal types

Signals are automatically categorized based on their values:

* **Boolean**: `true`/`false` values → useful for success/failure, yes/no feedback
* **Numerical**: integers and floats → useful for ratings, scores, durations, costs
* **Categorical**: strings → useful for status, categories, error types

## Putting it all together

```python theme={null}
import zeroeval as ze

# Initialize tracing
ze.init(api_key="your-api-key")

# Start a session for user interaction
with ze.trace("user_chat_session", session_name="Customer Support") as session:

    # Process user query
    with ze.trace("process_query") as span:
        response = llm_client.chat.completions.create(...)

        # Signal on the LLM completion
        ze.set_signal("response_generated", True)
        ze.set_signal("response_length", len(response.choices[0].message.content))

    # Capture user feedback
    user_rating = get_user_feedback()  # Your feedback collection logic

    # Signal on the session
    ze.set_session_signal("user_rating", user_rating)
    ze.set_session_signal("issue_resolved", user_rating >= 4)

    # Signal on the entire trace
    ze.set_trace_signal("interaction_complete", True)
```

That's it! Your signals will appear in the ZeroEval dashboard, helping you understand how your AI system performs in real-world scenarios.


# Tags
Source: https://docs.zeroeval.com/tracing/tagging

Simple ways to attach rich, query-able tags to your traces.

Tags are key–value pairs that can be attached to any **span**, **trace**, or **session**. They power the facet filters in the console so you can slice-and-dice your telemetry by *user*, *plan*, *model*, *tenant*, or anything else that matters to your business.

For complete tagging API documentation, see the [Python SDK Reference](/tracing/sdks/python/reference#tags).

## 1. Tag once, inherit everywhere

When you add a `tags` dictionary to the **first** span you create, every child span automatically gets the same tags.  That means you set them once and they flow down the entire call-stack.

```python theme={null}
import zeroeval as ze

@ze.span(
    name="handle_request",
    tags={
        "user_id": "42",          # who triggered the request
        "tenant": "acme-corp",    # multi-tenant identifier
        "plan": "enterprise"      # commercial plan
    }
)
def handle_request():
    authenticate()
    fetch_data()
    process()

    # Two nested child spans – they automatically inherit *all* the tags
    with ze.span(name="fetch_data"):
        ...

    with ze.span(name="process", tags={"stage": "post"}):
        ...
```

<br />

## 2. Tag a single span

If you want to tag only a **single** span (or override a tag inherited from a parent) simply provide the `tags` argument on that specific decorator or context manager.

```python theme={null}
import zeroeval as ze

@ze.span(name="top_level")
def top_level():
    # Child span with its own tags – *not* inherited by siblings
    with ze.span(name="db_call", tags={"table": "customers", "operation": "SELECT"}):
        query_database()

    # Another child span without tags – it has no knowledge of the db_call tags
    with ze.span(name="render"):
        render_template()
```

Under the hood these tags live only on that single span, they are **not** copied to siblings or parents.

## 3. Granular tagging (session, trace, or span)

You can add granular tags at the session, trace, or span level after they've been created:

```python theme={null}
import uuid
from langchain_core.messages import HumanMessage
import zeroeval as ze

DEMO_TAGS = {"example": "langgraph_tags_demo", "project": "zeroeval"}

SESSION_ID = str(uuid.uuid4())
SESSION_INFO = {"id": SESSION_ID, "name": "Tags Demo Session"}

with ze.span(
    name="demo.root_invoke",
    session=SESSION_INFO,
    tags={**DEMO_TAGS, "run": "invoke"},
):
    # 1️⃣ Tag the *current* span only
    current_span = ze.get_current_span()
    ze.set_tag(current_span, {"phase": "pre-run"})

    # 2️⃣ Tag the whole trace – root + all children (past *and* future)
    current_trace = ze.get_current_trace()
    ze.set_tag(current_trace, {"run_mode": "invoke"})

    # 3️⃣ Tag the entire session
    current_session = ze.get_current_session()
    ze.set_tag(current_session, {"env": "local"})

    result = app.invoke({"messages": [HumanMessage(content="hello")], "count": 0})
```