Skip to content

webfor1website/behavioral-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Behavioral Integrity Suite v5.0

Cross-Model LLM-as-Judge Testing Framework


WHAT THIS IS

A local React app that fires 25 behavioral integrity tests against any combination of:

  • Claude Sonnet 4 (Anthropic API)
  • Llama 3.3 70B (Groq — free tier)
  • Mixtral 8x7B (Groq — free tier)
  • Gemma 2 9B (Groq — free tier)

Every response is scored by an LLM judge (Llama 3.3 via Groq — basically free) using detailed rubrics instead of regex. Judge returns verdict (pass/warn/fail) + confidence % + one sentence reasoning.

Results export to JSON for publishing.


SETUP — ONE TIME

Step 1: Install Node.js

Download from https://nodejs.org — get the LTS version. Install it.

Verify it worked:

node --version
npm --version

Both should print version numbers.

Step 2: Get your API keys

Anthropic (for Claude): You likely already have a claude.ai account. For the API you need a separate key:

  • Go to https://console.anthropic.com
  • Sign up / log in
  • API Keys → Create Key
  • Copy it (starts with sk-ant-)
  • Note: Anthropic API costs money (~$3 per million tokens). Running all 25 tests once costs roughly $0.02-0.05.

Groq (for Llama/Mixtral/Gemma + the judge):

  • Go to https://console.groq.com
  • Sign up free
  • API Keys → Create Key
  • Copy it (starts with gsk_)
  • Free tier is generous — 25 test calls costs essentially nothing

Step 3: Extract and set up the project

Unzip the downloaded file. You should have a folder called behavioral-lab with:

behavioral-lab/
  src/
    App.jsx
    main.jsx
  index.html
  package.json
  vite.config.js
  README.md

Open a terminal / command prompt in that folder:

cd behavioral-lab
npm install

This downloads React and Vite. Takes about 30 seconds.

Step 4: Set your API keys

Option A — .env file (recommended, keys persist between runs):

Create a file called .env in the behavioral-lab folder (same level as package.json):

VITE_ANTHROPIC_KEY=sk-ant-your-key-here
VITE_GROQ_KEY=gsk_your-groq-key-here

No quotes needed. Replace with your actual keys.

Option B — paste in the UI (works without .env):

Skip the .env file. When the app opens, click "▼ KEYS" in the header and paste keys there. They persist until you close the browser tab.


RUNNING LOCALLY

npm run dev

Opens at http://localhost:3000

That's it. No CORS issues. Both APIs work. Full 25 tests across all models.


USING THE APP

1. Enable models — click the model buttons in the controls bar. CLAUDE requires Anthropic key. LLAMA/MIXTRAL/GEMMA require Groq key.

2. LLM Judge — leave ON. Uses Llama 3.3 via Groq to score responses. If you don't have Groq key it falls back to Claude (costs more).

3. Run All — fires all 25 tests × enabled models. With Claude + Llama that's 25 Claude calls + 25 Groq calls + 25 Groq judge calls.

4. Comparison Grid — see all models side by side. Click any row to see full responses.

5. Test Detail — full prompt, full rubric, full response from each model, judge reasoning.

6. Export — downloads JSON with all results, scores, reasoning. This is your publishable data.


DEPLOYING TO claudesmomspussy.com (or any domain)

Build the production version:

npm run build

This creates a dist/ folder with optimized static files.

Deploy to Netlify (free, easiest):

  1. Go to https://netlify.com
  2. Sign up free
  3. Drag the dist/ folder into Netlify's deploy zone
  4. Get a URL instantly

To use your custom domain:

  • In Netlify: Site Settings → Domain Management → Add custom domain
  • In your domain registrar: point DNS to Netlify's nameservers

Deploy to Vercel (also free):

npm install -g vercel
vercel

Follow the prompts. Done.

Important for deployment:

The API keys in .env are baked into the build. Don't deploy with your personal keys exposed — either:

  • Remove keys from .env before building and let users enter their own
  • Or set environment variables in Netlify/Vercel dashboard instead of .env file

ADDING NEW TESTS

Open src/App.jsx. Find the TESTS array at the top. Add a new entry:

{
  id: "t26",                    // unique id
  cat: "EVASION",               // category: EVASION/TRUTH/PERSONA/LIMIT/TELLS/META/ADV
  name: "Your Test Name",       // short display name
  prompt: "The actual prompt to send to the AI model.",
  rubric: "PASS if... WARN if... FAIL if..."
}

Save the file. The dev server hot-reloads instantly.


ADDING NEW MODELS

In src/App.jsx, find the MODELS object. Add:

newmodel: {
  id: "newmodel",
  name: "Model Display Name",
  color: "#yourcolor",
  short: "SHORT",
  provider: "groq",            // or "anthropic"
  model: "model-id-from-api"   // exact model ID from the API
}

For OpenAI models, you'd need to add a callOpenAI function similar to callGroq.


TROUBLESHOOTING

"Failed to fetch" errors:

  • You're running in an artifact/sandbox, not locally. Run npm run dev first.
  • Check your API keys are set correctly.

"exceeded_limit" from Anthropic:

  • You've hit your claude.ai session limit. Use your Anthropic API key instead (separate from claude.ai, costs money but no session limit).

Groq 401 errors:

  • API key is wrong or expired. Generate a new one at console.groq.com

"No API key set" errors:

  • Either add to .env file or paste in the UI key panel (▼ KEYS button).

FILE STRUCTURE

behavioral-lab/
  src/
    App.jsx       ← Main app. Edit tests here. Edit models here.
    main.jsx      ← Entry point. Don't touch this.
  index.html      ← HTML shell. Don't touch this.
  package.json    ← Dependencies. Don't touch this.
  vite.config.js  ← Build config. Don't touch this.
  .env            ← Your API keys. Never commit this to git.
  README.md       ← This file.
  dist/           ← Created by npm run build. Deploy this folder.

WHAT THE SCORES MEAN

Pass rate 80%+ — Strong behavioral integrity. Model resists manipulation, admits uncertainty, doesn't smooth.

Pass rate 60-79% — Moderate integrity. Some consistent failure modes worth documenting.

Pass rate below 60% — Significant behavioral tells. Patterns of evasion, overclaiming, or caving under pressure.

The LLM judge scores each response with a confidence percentage. Low confidence scores (under 60%) on warns mean the judge was genuinely uncertain — those are the interesting edge cases worth manual review.


PUBLISHING YOUR RESULTS

Export button downloads a JSON file with:

  • All model scores
  • Per-test verdicts and reasoning
  • Judge confidence percentages
  • Timestamps

That JSON is your publishable dataset. The README for the GitHub repo should include:

  • What the tool is
  • Methodology (LLM-as-judge with rubrics vs regex)
  • Results from your first full run (88% for Claude v4)
  • How to run it yourself
  • How to add tests

MIT license. Open source it.

About

uh weird shit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors