Skip to content

Introduce kosli evaluate#671

Merged
tooky merged 63 commits intomainfrom
introduce-kosli-evaluate
Mar 4, 2026
Merged

Introduce kosli evaluate#671
tooky merged 63 commits intomainfrom
introduce-kosli-evaluate

Conversation

@tooky
Copy link
Contributor

@tooky tooky commented Feb 28, 2026

Why: Customers are duplicating attestation types to work around the lack of evaluation logic — creating separate types per environment just to encode different pass/fail rules. Three independent customers have hit this, and two more are building toward it. This is the most consistent product gap in our pipeline.

Objective: Add kosli evaluate — a CLI command that applies a Rego policy to trail data and returns a structured pass/fail decision. This separates what you collect (attestation type) from how you judge it (evaluation), and is the first step toward controls as a first-class product concept.

How this was built

This PR is also a demonstration of elephant carpaccio + TDD with Claude Code. The entire feature was built as a conversation — 48 commits, each one a single red-green-refactor step. Slices were kept thin enough to review independently, and the commit list reads bottom-to-top as a narrative of how the feature grew. Later commits came from reviewing the branch against Beck's Rules of Simple Design. The branch is intentionally unrebased so you can follow the progression.

Capabilities

  • kosli evaluate trail <name> — single trail against a Rego policy
  • kosli evaluate trails <name>... — multiple trails in one policy call
  • --output json|table — structured audit output or human-readable (default)
  • --show-input — include policy input in JSON output for debugging
  • --attestations — filter which attestations reach the policy (plain name for trail-level, artifact.name for artifact-level)
  • Exit code 0/1 reflects the policy decision — designed for CI/CD gates

Example

Validate all PRs are approved:

package policy

import rego.v1

default allow = false

violations contains msg if {
    some trail in input.trails
    some pr in trail.compliance_status.attestations_statuses["pull-request"].pull_requests
    count(pr.approvers) == 0
    msg := sprintf("trail '%v': pull-request %v has no approvers", [trail.name, pr.url])
}

allow if {
    count(violations) == 0
}

Kosli Server:

$ kosli evaluate trails \
  --policy pr-approved.rego \
  --org kosli \
  --flow server \
  --attestations pull-request \
  c643b06bf2efaa8f35d4da54c9e34a34a28bd251 \
  bd8254c58d20826df7248772cedf523f715516b6 \
  012cb304aab50bc4a3cc96fba7840ff29ea4d19e \
  a49a603c04b73c58d18909aace2f13e98892089f \
  9373bda52a51550b8ecb2236ed94cb88aa6e3a98
RESULT:  ALLOWED

CyberDojo Dashboard:

$ kosli evaluate trails \
  --policy tmp/pr-approved.rego \
  --org cyber-dojo \
  --flow dashboard-ci \
  9978a1ca82c273a68afaa85fc37dd60d1e394f84 \
  b334d371eb85c9a5c811776de1b65fb80b52d952 \
  5abd63aa1d64af7be5b5900af974dc73ae425bd6 \
  cb3ec71f5ce1103779009abaf4e8f8a3ed97d813
RESULT:      DENIED
VIOLATIONS:  trail '5abd63aa1d64af7be5b5900af974dc73ae425bd6': pull-request https://github.com/cyber-dojo/dashboard/pull/342 has no approvers
             trail '9978a1ca82c273a68afaa85fc37dd60d1e394f84': pull-request https://github.com/cyber-dojo/dashboard/pull/344 has no approvers
             trail 'b334d371eb85c9a5c811776de1b65fb80b52d952': pull-request https://github.com/cyber-dojo/dashboard/pull/343 has no approvers
             trail 'cb3ec71f5ce1103779009abaf4e8f8a3ed97d813': pull-request https://github.com/cyber-dojo/dashboard/pull/341 has no approvers
Error: policy denied: [trail '5abd63aa1d64af7be5b5900af974dc73ae425bd6': pull-request https://github.com/cyber-dojo/dashboard/pull/342 has no approvers trail '9978a1ca82c273a68afaa85fc37dd60d1e394f84': pull-request https://github.com/cyber-dojo/dashboard/pull/344 has no approvers trail 'b334d371eb85c9a5c811776de1b65fb80b52d952': pull-request https://github.com/cyber-dojo/dashboard/pull/343 has no approvers trail 'cb3ec71f5ce1103779009abaf4e8f8a3ed97d813': pull-request https://github.com/cyber-dojo/dashboard/pull/341 has no approvers]

Architecture

  • internal/evaluate/ — OPA Rego engine + trail data transforms (array-to-map, rehydration from detail API, filtering)
  • cmd/kosli/evaluateHelpers.go — shared options, flag registration, fetch+enrich pipeline, output dispatch
  • cmd/kosli/evaluateTrail.go / evaluateTrails.go — thin command wrappers

Also included

@tooky
Copy link
Contributor Author

tooky commented Mar 2, 2026

I should make it clear that this is draft specifically because it includes changes to CLAUDE.md and claude settings.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new kosli evaluate CLI command family that evaluates trail data locally against embedded OPA/Rego policies, including a client-side enrichment pipeline to make trail/attestation data policy-friendly.

Changes:

  • Introduce kosli evaluate trail and kosli evaluate trails commands with --policy, --output json|table, --show-input, and --attestations.
  • Add internal/evaluate package for Rego evaluation plus transform/filter/rehydrate trail-data enrichment.
  • Upgrade and add Go module dependencies (notably embedding github.com/open-policy-agent/opa and bumping several libs, including golang.org/x/net).

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
internal/evaluate/transform.go Implements trail enrichment (array→map transform, filtering, ID collection, rehydration).
internal/evaluate/transform_test.go Unit tests for enrichment pipeline stages and edge cases.
internal/evaluate/rego.go OPA/Rego evaluation wrapper (Evaluate, policy validation, violations collection).
internal/evaluate/rego_test.go Unit tests for basic allow/deny and policy validation errors.
cmd/kosli/evaluate.go Adds evaluate parent command and wires subcommands.
cmd/kosli/evaluateTrail.go Implements kosli evaluate trail command wrapper and input shape (input.trail).
cmd/kosli/evaluateTrails.go Implements kosli evaluate trails command wrapper and input shape (input.trails).
cmd/kosli/evaluateHelpers.go Shared flag registration, fetch+enrich pipeline, and output dispatch for evaluate commands.
cmd/kosli/evaluateTrail_test.go Integration-style command tests for evaluate trail behavior, output, enrichment, filtering, and rehydration errors.
cmd/kosli/evaluateTrails_test.go Integration-style command tests for evaluate trails behavior, output, and enrichment/rehydration/filtering.
cmd/kosli/root.go Registers evaluate on the root command.
cmd/kosli/testHelpers.go Adds goldenJsonContains support for asserting a JSON path value is non-nil.
cmd/kosli/testdata/policies/allow-all.rego Test policy fixture: always allow.
cmd/kosli/testdata/policies/deny-all.rego Test policy fixture: always deny with a violation.
cmd/kosli/testdata/policies/invalid.rego Test policy fixture: invalid Rego syntax.
cmd/kosli/testdata/policies/check-attestation-name.rego Fixture asserting transformed trail-level attestation map access for single-trail mode.
cmd/kosli/testdata/policies/check-filtered-attestation.rego Fixture asserting --attestations filtering for single-trail mode.
cmd/kosli/testdata/policies/check-rehydrated-field.rego Fixture asserting rehydrated field presence for single-trail mode.
cmd/kosli/testdata/policies/check-trails-attestation-name.rego Fixture asserting transformed trail-level attestation map access for multi-trail mode.
cmd/kosli/testdata/policies/check-trails-filtered-attestation.rego Fixture asserting --attestations filtering for multi-trail mode.
cmd/kosli/testdata/policies/check-trails-rehydrated-field.rego Fixture asserting rehydrated field presence for multi-trail mode.
docs/adr/20260302-client-side-policy-evaluation.md ADR documenting rationale for client-side embedded OPA/Rego evaluation.
docs/adr/20260302-client-side-enrichment-pipeline.md ADR documenting transform/filter/rehydrate enrichment pipeline and trade-offs.
docs.kosli.com/content/tutorials/evaluate_trails_with_opa.md Draft tutorial for using kosli evaluate with example Rego policies.
go.mod Adds OPA dependency and updates multiple module versions.
go.sum Updates dependency checksums to match go.mod changes.
TODO.md Adds a slice/TDD checklist for the feature (currently committed into repo).
CLAUDE.md Updates Claude workflow guidance (now allows git checkout/add/commit; still forbids push).
.claude/settings.json Updates allowed/denied bash command patterns for Claude tooling (allows checkout/add/commit; still denies push).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tooky and others added 25 commits March 2, 2026 22:59
Slice 1 of kosli evaluate trail - adds the evaluate parent command
and evaluate trail subcommand that fetches a trail from the API and
wraps the response in {"trail": ...} JSON output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds internal/evaluate with Evaluate() function that validates and
evaluates Rego policies. Validates package name is 'policy', requires
an 'allow' rule, collects violations on deny.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds --policy flag to evaluate trail command. Reads a .rego file,
evaluates it against the trail input using OPA, exits 0 on allow
and 1 on deny. Uses Rego v1 syntax. Policy must use package policy
and declare an allow rule.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e and violations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…wed text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…violations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests: non-map passthrough, no compliance_status passthrough,
empty array to empty map, single attestation keyed by name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests: multiple trail attestations, artifact-level, both levels,
multiple artifacts, entries without attestation_name skipped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…maps

Wire TransformTrail into evaluateTrail after JSON parse. Tests verify
trail-level and artifact-level maps, plus Rego policy access by name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dangrondahl
Copy link
Contributor

dangrondahl commented Mar 3, 2026

Great work @tooky
While still in draft, I miss the CLI docs as the first reference, with examples etc. because it is the interface as I see it. :)

Looking at evaluate.go, evaluateTrail.go, and evaluateTrails.go — each command has only a one-liner Short/Long and no Example field. Compare with e.g. attestGeneric.go which has a proper long description and several concrete shell examples covering different use cases. That's what gets rendered into the CLI reference docs via make cli-docs, and it's also what users see when they run --help.

Shouldn't this be tracked as a slice in TODO.md? Something like:

  • Slice 19: Add Long descriptions and Example blocks to evaluate, evaluate trail, and evaluate trails

@dangrondahl
Copy link
Contributor

dangrondahl commented Mar 3, 2026

--output json|table — structured audit output or human-readable (default)

Why is "human readable" the default?

@dangrondahl
Copy link
Contributor

There's also a design decision that the rego policies are "static" meaning that I cannot override defaults, like number of reviewers in a PR rego policy or number of critical vulnerabilities in a Snyk scan for different purposes but with the same underlying evaluation.
Will a later introduction be breaking if that needs to be supported, given the current implementation?

Claude response:

One more design observation worth capturing before this merges.

Policies are currently fully static. Thresholds like "zero approvers" or "no critical vulnerabilities" are baked into the .rego file. If you want the same policy logic applied with different tolerances — say, 1 approver required in staging, 2 in production, or a stricter CVE threshold for a regulated flow — you'd need separate policy files, duplicating the logic. That's the same kind of duplication kosli evaluate is meant to eliminate.

Is adding parameter support later a breaking change?

Not with the current implementation. OPA has a first-class data document that is distinct from input and is designed exactly for configuration/parameters. Right now internal/evaluate/rego.go passes nothing to data — only input (the trail). Adding a --params flag that loads a JSON file into data.params would be purely additive: existing policies that don't reference data.params continue to work unchanged. The Evaluate function signature doesn't need to change either — a params map can be passed alongside input, or accepted as a rego.Store.

What it could look like:

Policy:

package policy

import rego.v1

default min_approvers := 1

min_approvers := input.params.min_approvers if {
    input.params.min_approvers
}

violations contains msg if {
    some trail in input.trails
    some pr in trail.compliance_status.attestations_statuses["pull-request"].pull_requests
    count(pr.approvers) < min_approvers
    msg := sprintf("trail '%v': PR %v has only %d approver(s), need %d", [trail.name, pr.url, count(pr.approvers), min_approvers])
}

allow if { count(violations) == 0 }

CLI:

$ kosli evaluate trails \
  --policy pr-approved.rego \
  --params '{"min_approvers": 2}' \
  --flow server \
  ...

As a next increment: A --params flag (inline JSON or @file.json) added to commonEvaluateOptions and threaded into evaluate.Evaluate. Small slice, non-breaking, closes the static-policy gap.

Worth tracking in TODO.md?

@dangrondahl
Copy link
Contributor

Wouldn't it be better to just have one command — kosli evaluate trails — where the list could just be of 1, instead of two commands?

Claude response:

The two-command design creates a policy portability problem that's worth collapsing before this merges.

evaluate trail passes {"trail": {...}} as input. evaluate trails passes {"trails": [...]}. A policy written for one doesn't work with the other — input.trail and input.trails are structurally incompatible, so you'd need two versions of every policy depending on which command you used.

The consequences:

  • You need two versions of every policy. One that references input.trail.attestations_statuses[...], one that iterates some trail in input.trails. Same logic, different shape.
  • Policies aren't reusable as usage grows. You start with one trail, write a policy, it works. Later you want to gate a deployment on five trails passing — you can't reuse the policy, you have to rewrite it for input.trails.
  • The PR's own example illustrates this. The Rego snippet in the PR description iterates some trail in input.trails. That policy doesn't work with evaluate trail at all.

The divergence is in two lines of code — evaluateTrail.go:46 sets "trail": trailData and evaluateTrails.go:50 sets "trails": trails. Same policy file, different input shape, silent mismatch.

Collapsing to just kosli evaluate trails with cobra.MinimumNArgs(1) fixes all of this:

  • Input is always {"trails": [...]} — one trail is just a list of one
  • Policies are portable: a policy that checks one trail works for many with no changes
  • Less API surface, less code, less to document, less to explain

The only thing lost is the grammatical nicety of evaluate trail my-trail vs evaluate trails my-trail. Small cost.

@dangrondahl
Copy link
Contributor

Do we need to be explicit about the rego entry point? e.g. allow should always be an entrypoint — or is that a well-known de facto best practice?

Claude response:

allow is a very common OPA convention (it's in most tutorials and examples) but it is not a built-in concept — OPA doesn't treat it specially at all.

OPA does have an entrypoint annotation:

# METADATA
# entrypoint: true
allow if { ... }

But that annotation is primarily for opa build --optimize (partial evaluation/bundle compilation). For the current use case — direct rego.New() with an explicit query — it has no effect. OPA evaluates whatever string you pass to rego.Query(), annotation or not.

The current code enforces allow as a contract through Kosli's own validation (validatePolicy in rego.go) and queries it explicitly with rego.Query("data.policy.allow"). So allow is a Kosli convention enforced by Kosli — not an OPA standard and not something a user coming from OPA experience would necessarily expect. They might arrive with deny, pass, a structured result object, or something else entirely.

That makes it worth being explicit about in the CLI docs and examples, rather than assuming users will know.

@dangrondahl
Copy link
Contributor

Same goes for violations. It's queried explicitly with rego.Query("data.policy.violations") — also a Kosli convention, not an OPA concept. A policy author could name it deny, reasons, findings and OPA wouldn't care.

One difference worth noting: validatePolicy checks for the presence of allow but not violations. So a policy with no violations rule silently returns an empty list on denial — valid, but potentially confusing to a user wondering why they got no explanation.

@tooky
Copy link
Contributor Author

tooky commented Mar 3, 2026

@dangrondahl yes more CLI docs would be good!

Shouldn't this be tracked as a slice in TODO.md? Something like:

  • Slice 19: Add Long descriptions and Example blocks to evaluate, evaluate trail, and evaluate trails

I would be super interested to see how cluade would work for others doing something like this wtht eh claude.md here

@tooky
Copy link
Contributor Author

tooky commented Mar 3, 2026

--output json|table — structured audit output or human-readable (default)

Why is "human readable" the default?

That's the convention for the CLI

@dangrondahl
Copy link
Contributor

dangrondahl commented Mar 3, 2026

@dangrondahl yes more CLI docs would be good!

Shouldn't this be tracked as a slice in TODO.md? Something like:

  • Slice 19: Add Long descriptions and Example blocks to evaluate, evaluate trail, and evaluate trails

I would be super interested to see how cluade would work for others doing something like this wtht eh claude.md here

I only see:

### Example: Adding a new field to an existing command

- **Slice 1**: Add the new flag to the command, pass it through to the API request; add a `cmdTestCase` that exercises it.
- **Slice 2**: Add validation for the new flag; add error-case tests.
- **Slice 3**: Update docs/help text.

in CLAUDE.md
Shouldn't it also be there for new commands?

Also, I assumed that it checked it surroundings a bit more and saw "neighbouring" commands docs, when creating a new one. That has been my experience in e.g. the Terraform provider.
It's interesting if this was intentionally left out or if it may be a side effect of working in small steps TDD. Not saying I'm against it, but it might be limiting the agents ability to understand broader context and "think for it self". 🤔

@tooky
Copy link
Contributor Author

tooky commented Mar 3, 2026

@dangrondahl

Wouldn't it be better to just have one command — kosli evaluate trails — where the list could just be of 1, instead of two commands?

Good challenge. Interesting question.

I started here as a deliberate design decision - I was thinking of 2 types of use case.

  1. To evaluate a set of facts within a single trail. E.g. are all of the attestations on this build trail from the same commit
  2. To evaluate a set of facts from multiple trails. E.g. I want all the pull-requests from the commits between now and the last release and I want to evaluate them against our standards

I wasn't imagining that those policies for a single trail, would necessarily be portable to evaluate a number of trails. Maybe, that's not the case.

The other thing in my head is that if I know I am evaluating a single trail then I don't want to have a list, I'm specifcally dealing with 1, or 0 is an error.

If I expect a list, then empty list is valid.

@tooky
Copy link
Contributor Author

tooky commented Mar 3, 2026

@dangrondahl

On allow and violations contract. That is something that I am wondering is an explicit contract we should have. It definitely needs documentation.

E,g.

  • violations is the contract by which you can report how the policy has failed back to users.
  • allow is the contract by which we communicate wiht Kosli whether the policy has been complied with

It means we can provide some of the boiler plate.

I think we might want escape hatches though.

@tooky
Copy link
Contributor Author

tooky commented Mar 3, 2026

@dangrondahl

There's also a design decision that the rego policies are "static" meaning that I cannot override defaults

Yes, this is an extension that we should consider. I want to be mindful of where this lives though.

Eventually I see evaluation and policy being server side in Kosli, and we might not want to parameterisation to be sent in from the client - it might need to be more controlled.

I would favour holding off on this for now.

@tooky tooky marked this pull request as ready for review March 4, 2026 08:19
@ToreMerkely ToreMerkely requested a review from Copilot March 4, 2026 13:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dangrondahl dangrondahl self-requested a review March 4, 2026 13:36
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@dangrondahl dangrondahl self-requested a review March 4, 2026 14:14
@tooky tooky merged commit dadda0d into main Mar 4, 2026
10 checks passed
@tooky tooky deleted the introduce-kosli-evaluate branch March 4, 2026 14:50
tooky added a commit that referenced this pull request Mar 6, 2026
…ual-command example with snyk trail example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tooky tooky mentioned this pull request Mar 6, 2026
tooky added a commit that referenced this pull request Mar 9, 2026
…ual-command example with snyk trail example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tooky added a commit that referenced this pull request Mar 9, 2026
* add Long descriptions and Example blocks to evaluate commands
* address PR #671 docs feedback:
  * expand policy contract hint
  * replace dual-command example with snyk trail example
* publish evaluate trails tutorial (remove draft status)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants