Open Source Python MIT
DataLabel

DataLabel

Data Labeling

★ 0 ⑂ 0 Updated 2026-03-15
Lightweight collaborative labeling framework — supports IAA metrics including Cohen's/Fleiss' Kappa and Krippendorff's Alpha for quantifying labeling consistency. Built-in conflict detection and multi-strategy fusion, with zero-deployment labeling via HTML annotation interface.
IAA Quantification Conflict Fusion Zero-Deployment Labeling

Quick Start

Install
pip install knowlyr-datalabel
Usage
# CLI: 从 schema 生成标注界面
knowlyr-datalabel create schema.json tasks.json -o annotator.html
generate_annotator Generate HTML annotation interface from DataRecipe analysis results
create_annotator Create HTML annotation interface from Schema and task data
merge_annotations Merge annotation results from multiple annotators
calculate_iaa Calculate Inter-Annotator Agreement (IAA)
validate_schema Validate DataLabel Schema and task data format correctness
export_results Export annotation results in JSON/JSONL/CSV format
import_tasks Import task data from JSON/JSONL/CSV and convert to DataLabel format
generate_dashboard Generate annotation progress dashboard HTML from annotation result files
llm_prelabel Use LLM to auto-prelabel task data
llm_quality_analysis Use LLM to analyze annotation quality, detect suspicious annotations and disagreements
llm_gen_guidelines Use LLM to auto-generate annotation guidelines from Schema and examples
adjudicate Adjudicate annotation conflicts — arbitrate disagreements and output final labels

Documentation

English | 中文

DataLabel

Serverless Human-in-the-Loop Annotation Framework
with LLM Pre-Labeling and Inter-Annotator Agreement

Generate self-contained HTML files for offline annotation. No server, no network, no deployment.

GitHub · PyPI · knowlyr.com

The Problem

Annotation tools today force a painful choice: heavyweight platforms (Label Studio, Prodigy) that require servers and databases, or throwaway scripts with zero quality guarantees. Neither provides statistical agreement metrics or LLM-assisted acceleration out of the box.

DataLabel takes a different approach: generate a single HTML file, send it to annotators, get results back. No server. No Docker. No network required.

What You Get

  • Serverless HTML Annotation -- self-contained files with all styles, logic, and data baked in. Works offline, supports dark mode and keyboard shortcuts
  • LLM Pre-Labeling -- Kimi / OpenAI / Anthropic generate initial labels so annotators start from calibration, not from scratch
  • Inter-Annotator Agreement -- Cohen's kappa, Fleiss' kappa, Krippendorff's alpha with pairwise agreement matrices and disagreement reports
  • Multi-Strategy Merging -- majority vote, average, or strict consensus with automatic conflict flagging
  • 5 Annotation Types -- scoring, single choice, multi choice, free text, and ranking (with Borda count merging)
  • Visual Dashboard -- standalone HTML report with progress tracking, distribution charts, and agreement heatmaps

Quick Start

pip install knowlyr-datalabel

# Create annotation interface
knowlyr-datalabel create schema.json tasks.json -o annotator.html

# Optional: LLM pre-labeling
knowlyr-datalabel prelabel schema.json tasks.json -o pre.json -p moonshot

# Merge results + compute agreement
knowlyr-datalabel merge ann1.json ann2.json ann3.json -o merged.json

# Generate analytics dashboard
knowlyr-datalabel dashboard ann1.json ann2.json -o dashboard.html
from datalabel import AnnotatorGenerator, ResultMerger

gen = AnnotatorGenerator()
gen.generate(schema=schema, tasks=tasks, output_path="annotator.html")

merger = ResultMerger()
result = merger.merge(["ann1.json", "ann2.json"], strategy="majority")
print(f"Agreement: {result.agreement_rate:.1%}")

Annotation Pipeline

graph LR
    S["Schema"] --> P["LLM Pre-Label"]
    P --> G["HTML Generator"]
    G --> B["Browser Annotation"]
    B --> R["Results"]
    R --> M["Merge + IAA"]
    M --> D["Dashboard"]

    style G fill:#0969da,color:#fff,stroke:#0969da
    style M fill:#8b5cf6,color:#fff,stroke:#8b5cf6
    style D fill:#2da44e,color:#fff,stroke:#2da44e
    style S fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style P fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style B fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style R fill:#1a1a2e,color:#e0e0e0,stroke:#444

MCP Integration

12 MCP tools, 6 resources, and 3 prompt templates for seamless AI IDE integration -- create annotations, merge results, compute IAA, and generate dashboards directly from your editor.

{
  "mcpServers": {
    "knowlyr-datalabel": {
      "command": "uv",
      "args": ["--directory", "/path/to/data-label", "run", "python", "-m", "datalabel.mcp_server"]
    }
  }
}

Ecosystem

DataLabel is part of the knowlyr data infrastructure:

Layer Project Role
Discovery AI Dataset Radar Dataset intelligence and trend analysis
Analysis DataRecipe Reverse analysis, schema extraction, cost estimation
Production DataSynth / DataLabel LLM batch synthesis / serverless annotation
Quality DataCheck Rule validation, anomaly detection, auto-fix
Audit ModelAudit Distillation detection, model fingerprinting

GitHub · PyPI · knowlyr.com

knowlyr -- serverless annotation with LLM pre-labeling and inter-annotator agreement

Want to discuss this project? Reach out to

Kai
Kai Founder & CEO
程薇
程薇 AI Test Engineer