Latent Diagnostics

Measuring computational regimes inside LLMs via attribution graph geometry.

What This Is

We extract attribution graphs from model internals (via transcoders/SAEs) and compute metrics that characterize different computational patterns. Different task types produce measurably different geometric signatures.

Key Findings

1. Length-Controlled Metric Effects

Metric	What It Measures	Effect Size (Cohen's d)
Influence	Causal strength between features	d=1.08
Concentration	Focused vs diffuse computation	d=0.87
N_active	Feature count	d=0.07 (length artifact)

2. Geometric Structure of Task Domains

Using inertia tensor analysis (adapted from AIDA-TNG galaxy morphology):

Domain	Shape	Effective Dim	Interpretation
Grammar	Prolate	2.19	Focused along one computational axis
Commonsense	Prolate	2.26	Slightly more distributed
NLI	Prolate	1.73	Most concentrated variance
Paraphrase	Prolate	2.16	Similar to grammar

All task domains are prolate (cigar-shaped in metric space) — variance concentrates along a dominant axis rather than spreading uniformly.

3. What Works vs What Doesn't

Works:

Task type classification (grammar vs reasoning)
Computational complexity estimation
Anomaly detection (out-of-distribution inputs)

Doesn't work:

Hallucination detection
Truthfulness detection (d=0.05)
Output correctness prediction

The model processes hallucinations with the same internal geometry as truthful statements.

The Journey

Dec 2025: Started with hallucination detection via feature spectroscopy
Jan 2026: Discovered most "signal" was text length confounding (r=0.98)
Feb 2026: Pivoted to task-type diagnostics with length control
Mar 2026: Added geometric analysis — domain shapes in metric space

See archive/disproved/ for early experiments with honest disclaimers.

Directory Structure

notebooks/                    # START HERE - narrative series
  01_introduction.ipynb       # What this project discovers
  02_the_journey.ipynb        # From hallucination detection to task diagnostics
  03_methodology.ipynb        # How we extract and analyze metrics
  04_core_results.ipynb       # Main findings with visualizations
  05_negative_results.ipynb   # What doesn't work (and why)

experiments/
  core/                       # Main analyses (geometric_analysis.py, etc.)
  statistics/                 # Statistical tests
  visualization/              # Figure generation
  utilities/                  # Shared code
  _archive/                   # Historical experiments
  _runs/                      # Timestamped analysis outputs

figures/paper/                # Core figures
data/results/                 # Computed metrics (JSON)
scripts/                      # Modal GPU runners
archive/disproved/            # Early work with honest post-mortems

Quick Start

pip install -e .

# Run geometric analysis
python experiments/core/geometric_analysis.py --analyze

# Generate figures
python experiments/visualization/generate_figures.py

# Compute attribution metrics (GPU, parallel)
modal run scripts/modal_general_attribution.py \
  --input-file data/domain_analysis/domain_samples.json \
  --output-file data/results/domain_attribution_metrics.json

Methodology

Attribution Graphs: Extract causal graphs via circuit-tracer showing feature→feature influence during inference
Metrics:
- mean_influence: Average edge weight (causal strength)
- concentration: Gini coefficient of influence distribution
- mean_activation: Feature activation strength
Length Control: Residualize against text length (raw n_active correlates r=0.98 with tokens)
Geometric Analysis: Treat domain samples as point clouds in 6D metric space, compute shape via inertia tensor eigendecomposition (axis ratios, effective dimensionality)

Limitations

Requires model internals — SAE/transcoder access (currently Gemma 2 via Goodfire)
Compute intensive — ~30 sec/sample on A100
Measures structure, not correctness — can't detect hallucinations
Length confounding — must residualize (raw n_active is artifact)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.claude		.claude
_meta		_meta
archive		archive
data		data
experiments		experiments
exports		exports
figures		figures
notebooks		notebooks
research		research
scripts		scripts
src/neural_polygraph		src/neural_polygraph
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-docker.txt		requirements-docker.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Diagnostics

What This Is

Key Findings

1. Length-Controlled Metric Effects

2. Geometric Structure of Task Domains

3. What Works vs What Doesn't

The Journey

Directory Structure

Quick Start

Methodology

Limitations

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Latent Diagnostics

What This Is

Key Findings

1. Length-Controlled Metric Effects

2. Geometric Structure of Task Domains

3. What Works vs What Doesn't

The Journey

Directory Structure

Quick Start

Methodology

Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages