Universal Spectroscopy Engine (USE)

A framework treating LLM activations as light spectra to measure semantic drift and hallucinations.

Overview

The Universal Spectroscopy Engine (USE) is a spectroscopy-inspired framework that treats Large Language Model (LLM) activations as physical light spectra. This novel approach enables the diagnosis of semantic drift, hallucinations, and model blindness locally, providing interpretable insights into model behavior.

Following the "Physics of Meaning" metaphor:

Light Source → User Input / Prompt
Material → The LLM (e.g., Llama-3-8B)
Prism → Sparse Autoencoder (SAE)
Spectrum → Feature Activations (Indices & Magnitudes)
Spectral Lines → Monosemantic Features (Concepts)
Thermal Noise → Polysemantic/Dense Activations

Core Hypotheses

H1: Spectral Purity - Hallucinations manifest as low spectral purity (high entropy/noise, few distinct peaks)
H2: Doppler Shift - Semantic meaning "redshifts" (generalizes) or "blueshifts" (distorts) through agent chains
H3: Absorption - Missing features indicate ignored instructions (model blindness)

Installation

Prerequisites

Python 3.10 or higher
At least 16GB RAM (32GB recommended for larger models)
NVIDIA GPU with CUDA support OR Apple Silicon (M1/M2/M3) for MPS acceleration

Step 1: Clone and Install Dependencies

# Navigate to project directory
cd universal-spectroscopy-engine

# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package in editable mode (includes all dependencies)
pip install -e .

Step 2: Verify Installation

import use
print(f"Universal Spectroscopy Engine v{use.__version__}")

# Check device availability
from use.utils import get_device
device = get_device()
print(f"Using device: {device}")

Getting Started

Step 1: Load Model and SAE

The USE uses:

Transformer models via transformer_lens (e.g., Gemma-2-2B)
Sparse Autoencoders via sae_lens (e.g., Gemma-Scope)

Both are downloaded automatically on first use.

Step 2: Basic Usage

from use import UniversalSpectroscopyEngine

# Initialize engine
engine = UniversalSpectroscopyEngine()

# Load model (downloads automatically on first use)
print("Loading model...")
engine.load_model("gemma-2-2b")

# Load SAE from Gemma-Scope (downloads automatically)
print("Loading SAE...")
engine.load_sae(
    model_name="gemma-2-2b",
    layer=5,
    release="gemma-scope-2b-pt-res-canonical",
    sae_id="layer_5/width_16k/canonical"
)

# Or use auto-detection (simpler)
engine.load_sae("gemma-2-2b", layer=5)

# Process input text
print("Processing input...")
input_text = "The cat sat on the mat."
spectrum = engine.process(input_text)

print(f"Spectrum: {spectrum}")
print(f"Number of active features: {len(spectrum)}")
print(f"Top 10 features: {spectrum.get_top_features(k=10)}")

Step 3: Test Hypotheses

# H1: Calculate Spectral Purity (Hallucination Detection)
purity = engine.calculate_purity(spectrum)
print(f"Spectral Purity: {purity:.4f}")
if purity < 0.3:
    print("⚠️  Warning: Low spectral purity - possible hallucination")

# H2: Calculate Semantic Drift (Compare two spectra)
input_spec = engine.process("The cat sat on the mat.")
output_spec = engine.process("The feline rested on the rug.")
drift = engine.calculate_drift(input_spec, output_spec)
print(f"Semantic Drift: {drift:.4f}")
if drift > 0.5:
    print("⚠️  Significant semantic drift detected")

# H3: Detect Absorption (Model Blindness)
absorbed = engine.detect_absorption(input_spec, output_spec)
if absorbed:
    print(f"⚠️  Model ignored {len(absorbed)} features: {absorbed[:10]}...")

# Cleanup
engine.cleanup()

Step 4: Advanced Usage with Context Manager

from use import UniversalSpectroscopyEngine

# Use context manager for automatic cleanup
with UniversalSpectroscopyEngine() as engine:
    engine.load_model("llama-3-8b")
    engine.load_sae("llama-3-8b", layer=5)
    
    spectrum = engine.process("Your text here")
    purity = engine.calculate_purity(spectrum)
    
    # Cleanup happens automatically

Project Structure

universal-spectroscopy-engine/
├── src/use/              # Main package source code
│   ├── __init__.py       # Package exports
│   ├── engine.py         # UniversalSpectroscopyEngine (main class)
│   ├── excitation.py     # ExcitationController (The Slit)
│   ├── sae_adapter.py    # SAE_Adapter (The Prism)
│   ├── interference.py   # InterferenceEngine (The Detector)
│   ├── spectrum.py       # Spectrum data class
│   └── utils.py          # Device detection, helpers
├── tests/                # Unit and integration tests
├── notebooks/            # Jupyter notebooks for experiments
├── .cursor/rules/        # Spectroscope system configuration
├── pyproject.toml        # Project configuration and dependencies
├── Dockerfile            # Docker configuration
├── docker-compose.yml    # Docker Compose configuration
└── README.md             # This file

Components