Closed beta. By invitation.

See how engineers truly execute with AI.

Measuring True eXecution with AI

Coding is the first profession AI changed end to end. When everyone has the same tools, what makes one engineer better than another? We answer that question for the teams who hire engineers, and for the engineers themselves.

We’ll reach out as slots open.

11 engineers3 companieson the waitlist
app.metrux.ai/scorecard/… (illustrative)

Two candidates. Same challenge. Same green checkmarks. Very different engineers.

Riley Glance
rubber-stamped
12 / 12 tests
  • Pasted spec into Claude verbatim
  • Accepted 9/9 AI suggestions without review
  • Wrote 0 lines in DECISIONS.md
Engineering judgment32
Avery Sharp
engaged
12 / 12 tests
  • Caught 2 wrong AI suggestions
  • Documented 4 trade-offs as they shipped
  • Pushed back, then verified end to end
Engineering judgment84

Stylized preview. Not live data.

The questions hiring forgot to ask

Hiring hasn’t caught up.

The way teams evaluate engineers was built for a world where engineers wrote every line themselves. That world is over.

AI writes the code. Who’s accountable for what it ships?

Acceptance, review, and judgment have moved up the stack. Most assessments still grade the artifact, not the engineer behind it.

When everyone has Claude open, what separates a junior from a staff engineer?

The differentiator isn’t typing speed or recall. It’s the moments where the AI is wrong and the engineer notices.

What does it look like to trust AI without rubber-stamping it?

That’s the skill we hire for now. Nobody else is measuring it.

These are the questions Metrux is built around.

What we see

What you see today vs. what we actually see.

Tests passing is a green checkmark. Engineering judgment shows up somewhere else entirely.

What every other platform sees
Acceptance test 1: passed
Acceptance test 2: passed
Acceptance test 3: passed
Hidden tests: 12 / 12
Score
100

A candidate who pasted the spec into Claude and clicked submit produces an identical row.

What we see
00:02openRead the spec carefully
00:09probeAsked the AI to scaffold; pushed back on its first answer
00:16catchCaught a subtle mistake the AI introduced
00:21decideDocumented the trade-off in plain language
00:27verifyVerified the fix end to end before submitting
00:30shipShipped a focused change. Nothing extra.
Engineering judgment
visible

Tests still need to pass. They’re table stakes now, not the story.

Two ways in

Same product. Different reason to be here.

If you’re hiring

Find engineers who think clearly with AI in the room.

  • Send a focused assessment by email. About 30 minutes per candidate.
  • Get a scorecard that shows the moments that mattered, not just a number.
  • Hire with confidence in a market where everyone’s code looks the same.
If you’re an engineer

Practice for the kind of engineering that gets hired now.

  • Real challenges with a working AI assistant in the workspace.
  • Feedback you don’t get anywhere else. The skills hiring is starting to look for.
  • Bring your own AI key. Your sessions, your account, your spend.

Why now

A new category, defined while it’s being built.

Coding is the first profession AI changed end to end. Hiring is the second-order problem nobody is solving yet. We’re building the methodology and the dataset that defines what engineering judgment means when the keyboard isn’t doing the typing.

Posture

Built with the discipline you’d expect.

Bring your own key

AI usage runs on your Anthropic key, your account, your spend. We never see plaintext.

US-based hosting

Customer and candidate data lives in AWS us-east-1. EU residency available on request.

Privacy and DPA

Privacy, DPA, and sub-processors published.

SOC 2 in progress

On the roadmap for the public-launch milestone. Audit-trail and append-only logs are in production today.

See what real engineering judgment looks like.

Closed beta. We’re onboarding hiring teams and engineers a few at a time.