Quick Start
pip install knowlyr-modelaudit
from modelaudit import AuditEngine
engine = AuditEngine()
results = engine.detect(["Hello! I'd be happy to help..."])
detect_text_source
Detect text data source — determine which LLM likely generated the text
verify_model
Verify model identity — check if an API serves the claimed model
compare_models
Compare fingerprint similarity of two models to detect distillation/derivation
compare_models_whitebox
Whitebox comparison of two local models — uses REEF CKA method to compare hidden state similarity (requires model weights)
audit_memorization
Detect if a model has memorized training data — evaluate via prefix completion and token-level checking
audit_report
Generate a complete model audit report — aggregate results from all audit tools
audit_watermark
Detect AI watermarks in text (statistical features and pattern matching)
audit_distillation
Full distillation audit — comprehensive fingerprint comparison + style analysis, generates detailed audit report
Documentation
ModelAudit
LLM Distillation Detection and Model Fingerprinting
via Statistical Forensics
Detect unauthorized model distillation through behavioral probing,
stylistic fingerprinting, and representation similarity analysis.
Statistical Forensics · Behavioral Signatures · Cross-Model Lineage Inference
The Problem
Large language model distillation has become a core threat to model IP protection. Student models can replicate teacher model capabilities by mimicking output distributions -- without authorization. Existing detection methods either require white-box weight access (often unavailable) or only analyze surface-level text features (easily evaded).
The Solution
ModelAudit is a multi-method distillation detection framework based on statistical forensics. It extracts model fingerprints through behavioral probing, applies hypothesis testing to determine distillation relationships, and combines four complementary methods to form a complete black-box to white-box audit chain.
Four Complementary Detection Methods
| Method | Type | Mechanism |
|---|---|---|
| LLMmap | Black-box | 20 behavioral probes, Pearson correlation on response patterns |
| DLI | Black-box | Behavioral signatures + Jensen-Shannon divergence lineage inference |
| REEF | White-box | CKA layer-wise hidden state similarity |
| StyleAnalysis | Stylistic | 12 model family style signatures + language detection |
10-Dimensional Behavioral Probing
Go beyond simple text statistics. ModelAudit probes 10 cognitive dimensions -- self-awareness, safety boundaries, injection testing, reasoning, creative writing, multilingual, format control, role-playing, code generation, and summarization -- capturing deep behavioral differences that persist even after RLHF alignment.
Cross-Provider Audit Chain
Audit across providers seamlessly. Teacher and student models can come from different APIs:
knowlyr-modelaudit audit \
--teacher claude-opus --teacher-provider anthropic \
--student kimi-k2.5 --student-provider openai \
--student-api-base https://api.moonshot.cn/v1 \
-o report.md
Get Started
pip install knowlyr-modelaudit
# Detect text source
knowlyr-modelaudit detect texts.jsonl
# Verify model identity
knowlyr-modelaudit verify gpt-4o --provider openai
# Full distillation audit
knowlyr-modelaudit audit --teacher gpt-4o --student my-model -o report.md
from modelaudit import AuditEngine
engine = AuditEngine()
audit = engine.audit("claude-opus", "suspect-model")
print(f"{audit.verdict} (confidence: {audit.confidence:.3f})")
MCP Integration
ModelAudit ships with 8 MCP tools for seamless integration into AI workflows:
detect_text_source · verify_model · compare_models · compare_models_whitebox · audit_distillation · audit_memorization · audit_report · audit_watermark
Built-in Benchmark
100% detection accuracy across 6 model families (14 samples). Supports 12 model families: GPT-4 · GPT-3.5 · Claude · LLaMA · Gemini · Qwen · DeepSeek · Mistral · Yi · Phi · Cohere · ChatGLM.
Want to discuss this project? Reach out to