---
name: Lighton
description: Use when building document search, retrieval-augmented generation (RAG), document parsing, or structured data extraction pipelines. Agents should reach for this skill when users need to ingest documents, search across them, ask questions over document corpora, parse documents to Markdown, extract structured data from forms/invoices/contracts, organize documents with metadata, or integrate document processing into applications.
metadata:
    mintlify-proj: lighton
    version: "1.0"
---

# LightOn Developers API Skill

## Product summary

LightOn is a REST API for document search, parsing, and extraction at scale. It handles the full document-understanding pipeline: upload files, automatically parse and index them, then search or ask questions over your corpus without managing vector databases or OCR models. The API is organized around three core workflows: **Intelligence** (search, ask, file management), **Document Processing** (parse, extract), and **Administration** (API keys, workspaces). All endpoints live under `https://api.lighton.ai/api/v3/` and require bearer token authentication. See the [primary docs site](https://developers.lighton.ai) and [OpenAPI spec](https://api.lighton.ai/docs) for complete reference.

## When to use

Reach for this skill when:

- **Building a searchable knowledge base:** user uploads documents and wants to search or ask questions over them (use Files + Search or Ask)
- **Parsing documents on the fly:** user needs to convert a PDF/Office file to clean Markdown without storing it (use Parse)
- **Extracting structured data:** user needs to pull typed fields from forms, invoices, contracts, or other documents using a JSON Schema (use Extract)
- **Multi-tenant or team isolation:** user needs to partition documents by workspace, customer, or team with access control (use Workspaces + scoped API keys)
- **Organizing large corpora:** user needs to tag documents, classify them with metadata, or filter search results by structured attributes (use Tags or Facets)
- **Integrating with external systems:** user wants to sync documents from Google Drive, SharePoint, or other sources automatically (use Datasources)
- **Building RAG pipelines:** user wants to combine LightOn retrieval with their own LLM or a hosted model provider (use Search endpoint + your generation layer)

## Quick reference

### Core endpoints

| Endpoint | Purpose | Sync/Async | Limits |
|----------|---------|-----------|--------|
| `POST /api/v3/files` | Upload a document | Sync upload, async indexing | Typical: 2–5 sec to `embedded` status |
| `POST /api/v3/search` | Hybrid vector + lexical search | Sync | 10 results default, max 50 |
| `POST /api/v3/ask` | RAG in one call (search + generation) | Sync or streamed | 10 results default, max 50 |
| `POST /api/v3/parse` | Convert document to Markdown | Sync (≤20 MB, 15 pages) or async (≤100 MB, 1000 pages) | Sync: 20 MB / 15 pages; Async: 100 MB / 1000 pages |
| `POST /api/v3/extract` | Pull typed fields with JSON Schema | Sync (≤20 MB, 15 pages) or async (≤100 MB, 1000 pages) | Sync: 20 MB / 15 pages; Async: 100 MB / 1000 pages |

### Authentication

All requests require `Authorization: Bearer $LIGHTON_API_KEY` header. Create keys via console or `POST /api/v3/keys`. Keys can be scoped to specific workspaces with per-workspace roles (viewer, editor, owner).

### File status lifecycle

When you upload a file, poll `GET /api/v3/files/{id}` until status reaches a terminal state:

```
pending → parsing → embedding → embedded (ready to search)
       ↘ parsing_failed / embedding_failed / fail (error)
```

### Organizing documents

| Layer | Purpose | Scope | Use case |
|-------|---------|-------|----------|
| **Workspace** | Hard partition; every file lives in exactly one | Access control boundary | Multi-tenant, team isolation, permission segmentation |
| **Tag** | Flat label; a file can have many tags | Cross-workspace collections | Projects, topics, cross-cutting groups |
| **Facet** | Typed, hierarchical metadata with schema | Structured queries | Document classification, attribute filtering, precise metadata |

### Search response structure

Every search result includes:
- `content`: the matched passage text
- `score`: overall relevance score (0–1, higher is better)
- `scores`: per-signal breakdown (text, vision, keyword, multivector, relevance)
- `source`: file metadata (file_id, filename, page_start/end, tags, external_metadata)
- `workspace`: workspace the document belongs to

### Common request patterns

**Search in a workspace:**
```json
{"query": "...", "workspace_id": [42]}
```

**Search a tagged collection:**
```json
{"query": "...", "tag_id": [7]}
```

**Search specific files:**
```json
{"query": "...", "file_id": [101, 102]}
```

**Ask with streaming:**
```json
{"query": "...", "stream": true, "model": "mistral-large-latest"}
```

**Parse async:**
```json
{"document": "https://...", "options": {"async": true}}
```

**Extract with schema:**
```json
{"document": "https://...", "schema": {"type": "object", "properties": {...}}}
```

## Decision guidance

### When to use Search vs Ask

| Scenario | Use Search | Use Ask |
|----------|-----------|---------|
| You want raw passages to rank, display, or process yourself | ✓ | |
| You want a direct natural-language answer grounded in documents | | ✓ |
| You need to build a multi-turn conversation or custom prompts | ✓ (feed results to your LLM) | |
| You want to control which model generates the answer | ✓ (use your own model) | |
| You want a simple, single-turn Q&A with no setup | | ✓ |

### When to use Parse vs Extract

| Scenario | Use Parse | Use Extract |
|----------|-----------|------------|
| You want the full text content of a document | ✓ | |
| You want to feed a document to your own LLM | ✓ | |
| You want specific typed fields from a document | | ✓ |
| You're processing repetitive documents (invoices, forms) | | ✓ |
| You need a single consolidated object per document | ✓ (parse + your LLM) | |
| You need one object per page (mechanical extraction) | | ✓ |

### When to use Workspace vs Tag vs Facet

| Scenario | Use Workspace | Use Tag | Use Facet |
|----------|---------------|---------|-----------|
| You need permission boundaries (different teams see different docs) | ✓ | | |
| You need to group files across workspaces | | ✓ | |
| You need a lightweight label with no schema | | ✓ | |
| You need typed, hierarchical metadata with validation | | | ✓ |
| You need to filter search by structured attributes | | | ✓ |
| You're building a multi-tenant product | ✓ | | |

## Workflow

### Typical: Build a searchable knowledge base

1. **Create a workspace** (if multi-tenant): `POST /api/v3/workspaces` with a name. Note the `id`.
2. **Upload documents**: `POST /api/v3/files` with `workspace_id` and file. Get back a file `id` and `status: pending`.
3. **Poll for indexing**: `GET /api/v3/files/{id}` every 2 seconds until `status == "embedded"`.
4. **Search**: `POST /api/v3/search` with `query` and `workspace_id`. Get back ranked chunks with scores and source metadata.
5. **Ask (optional)**: `POST /api/v3/ask` with `query` and `workspace_id` to get a grounded LLM answer instead of raw passages.

### Typical: Extract structured data from a document

1. **Define your schema**: Write a JSON Schema describing the fields you want (e.g., `{"type": "object", "properties": {"invoice_number": {"type": "string"}, "total": {"type": "number"}}}`).
2. **Call Extract**: `POST /api/v3/extract` with the document (file or URL) and your schema.
3. **For large documents**: Set `options.async = true`, get back a job ID, then poll `GET /api/v3/extract/{job_id}` until `status == "completed"`.
4. **Parse the result**: `result.data` is an array of objects, one per page. Each object has the fields you defined, extracted from that page.

### Typical: Parse a document to Markdown

1. **Call Parse**: `POST /api/v3/parse` with the document (file or URL).
2. **For large documents**: Set `options.async = true`, get back a job ID, then poll `GET /api/v3/parse/{job_id}` until `status == "completed"`.
3. **Reconstruct the document**: `result.pages` is an array of `{index, markdown}` objects. Concatenate the markdown in index order to get the full document.

### Typical: Organize documents with metadata

1. **Create tags** (if using flat labels): `POST /api/v3/tags` with a name and description. Note the `id`.
2. **Create facets** (if using structured metadata): `POST /api/v3/content-types` with a tree structure and attributes. Define once, reuse across files.
3. **Assign to files**: At upload time, pass `tags` or `external_metadata`. After upload, use `POST /api/v3/files/{id}/tags` or `POST /api/v3/files/{id}/facets` to add/update.
4. **Filter search**: Pass `tag_id` or `content_type` / `attribute` filters to `POST /api/v3/search` to scope results.

## Common gotchas

- **Polling too fast:** Don't hammer the status endpoint. Poll every 2 seconds for files, every 5 seconds for parse/extract jobs. Exponential backoff is fine.
- **File status stuck in `pending`:** Check `status_detail` for parsing or embedding errors. Common causes: unsupported file format, corrupted PDF, or service overload. Retry after a few minutes.
- **Search returns no results:** Verify the workspace_id or file_id is correct. Check that files have reached `embedded` status. Try a simpler query. If using facet filters, ensure the file is classified and the attribute values are set.
- **Extract returns empty data:** Verify your JSON Schema is valid and matches the document structure. Extract applies the schema to every page independently; if a page doesn't match, that page's result is empty. For multi-page synthesis, use Search + your own LLM.
- **Ask returns a 503 or 504:** The model is temporarily unavailable or overloaded. Retry after 10–30 seconds. If persistent, reduce the number of documents in scope or shorten your query.
- **Workspace creation returns 403:** Workspace creation is disabled for your company. Create it from the console instead, or ask an admin.
- **Scoped API key can't access a workspace:** Verify the key's scopes include that workspace_id with the required role. Unscoped keys inherit the owner's full permissions.
- **Tag or facet not appearing on a file:** Tags assigned at upload time are added to the file. Tags added later via `POST /api/v3/files/{id}/tags` are marked `auto_assigned: false`. Facets require explicit classification with `POST /api/v3/files/{id}/facets`.
- **Batch requests fail at index N:** Actions before the failing index are committed. Fix the failing action and re-send the entire batch. All verbs are idempotent.
- **Rate limit (429):** You've exceeded the request rate. Back off exponentially. Check your plan's rate limits.

## Verification checklist

Before submitting work with LightOn:

- [ ] API key is set in `Authorization: Bearer` header (not in URL or body)
- [ ] File has reached `status: "embedded"` before searching (poll `GET /api/v3/files/{id}`)
- [ ] Search/Ask request includes `workspace_id` or `tag_id` if scoping is needed
- [ ] Extract schema is valid JSON and matches the document structure
- [ ] Parse/Extract async jobs are polled until `status: "completed"` or `status: "failed"`
- [ ] Workspace ID is correct and the file was uploaded to that workspace
- [ ] Facet classifications are applied before filtering by `content_type` or `attribute`
- [ ] Scoped API key includes the target workspace in its `scopes` list
- [ ] Error responses are checked for `error` code and `detail` message (not just HTTP status)
- [ ] Batch requests include the `index` field to identify which action failed

## Resources

- **Comprehensive navigation:** See [llms.txt](https://developers.lighton.ai/llms.txt) for a complete page-by-page listing of all documentation.
- **API Reference:** [Full OpenAPI specification](https://api.lighton.ai/docs) with every endpoint, parameter, and response shape.
- **Tutorials:** [From Documents to Answers](https://developers.lighton.ai/tutorials/from-documents-to-answers) — a guided walkthrough of the full RAG pipeline.
- **Error Codes:** [API Errors reference](https://developers.lighton.ai/api-reference/v3-error-codes) — machine-readable error codes and troubleshooting.

---

> For additional documentation and navigation, see: https://developers.lighton.ai/llms.txt