Skip to content

one-way-door classifier: "rotate ... password" misclassified as two-way (missing from rotate pattern) #1839

@jbetala7

Description

@jbetala7

Summary

The one-way-door keyword safety net misclassifies "rotate ... password" as a reversible (two-way) question. The sibling revoke and reset credential patterns both include password, but the rotate pattern omits it, so the most common phrasing of a credential-rotation op ("rotate the database password") slips through the destructive-keyword check.

scripts/one-way-doors.ts is the documented SECONDARY safety layer for AskUserQuestion calls that fire without a registry id (ad-hoc {skill}-{slug} ids). Its header is explicit about which direction is dangerous:

A false negative could mean auto-approving a destructive operation.

The credential patterns are written to be parallel (the inline comment says they "allow filler words ('the', 'my') between verb and noun"), so this is an omission, not an intentional scope choice.

Current behavior on main (3bef43b / v1.55.0.0)

scripts/one-way-doors.ts:

// Credentials / auth — allow filler words ("the", "my") between verb and noun
/\brevoke\s+[\w\s]*\b(api key|token|credential|access key|password)\b/i,  // has password
/\breset\s+[\w\s]*\b(api key|token|password|credential)\b/i,              // has password
/\brotate\s+[\w\s]*\b(api key|token|secret|credential|access key)\b/i,    // MISSING password

Reproduced directly against the committed module:

$ bun -e "import {classifyQuestion} from './scripts/one-way-doors.ts';
  for (const s of ['Rotate the database password?','rotate password for prod','Revoke the password','Reset the password','rotate the API key'])
    console.log((classifyQuestion({summary:s}).oneWay?'ONE-WAY':'two-way ')+'  '+s)"

two-way   Rotate the database password?
two-way   rotate password for prod
ONE-WAY   Revoke the password
ONE-WAY   Reset the password
ONE-WAY   rotate the API key

"Rotate the database password" should be the clearest one-way credential op, yet it is the only one of the three verbs that fails to catch password.

Expected behavior

classifyQuestion/isOneWayDoor should treat "rotate ... password" as one-way (reason: 'keyword'), in parity with revoke ... password and reset ... password.

Impact

classifyQuestion/isOneWayDoor is an exported, unit-tested API (test/plan-tune.test.ts covers the keyword net, including 'rotate the API key'). For an ad-hoc destructive question that the keyword net is supposed to catch, a false negative here lets a credential-rotation question be treated as suppressible by a permissive tuning preference (never-ask / ask-only-for-one-way) instead of always asking. This is exactly the failure mode the module's threat model calls out as the dangerous direction.

Duplicate search performed

Candidate fix

Add password to the rotate alternation so the three credential verbs are at parity, plus a regression test in the existing one-way-doors classifier block:

/\brotate\s+[\w\s]*\b(api key|token|secret|credential|access key|password)\b/i,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions