Skip to content

fix(evals): copy docker-agent binary + entrypoint in custom-base-image template#3029

Merged
hamza-jeddad merged 1 commit into
mainfrom
796-eval-custom-base-image-evals-fail-with-unknown-command-configsagentyaml-dockerfilecustomtemplate-missing-docker-agent-binary-entrypoint
Jun 9, 2026
Merged

fix(evals): copy docker-agent binary + entrypoint in custom-base-image template#3029
hamza-jeddad merged 1 commit into
mainfrom
796-eval-custom-base-image-evals-fail-with-unknown-command-configsagentyaml-dockerfilecustomtemplate-missing-docker-agent-binary-entrypoint

Conversation

@hamza-jeddad

Copy link
Copy Markdown
Contributor

Summary

Fixes #796.

docker-agent eval runs each eval case in a freshly built container. pkg/evaluation/build.go picks one of two embedded templates:

  • Dockerfile.template (default — used when the eval has no image:)
  • Dockerfile.custom.template (used when the eval sets evals.image:)

The custom template was missing the two things the default template provides: it never copied the docker-agent binary into the image and never set the /run.sh … docker-agent run … entrypoint wrapper. As a result the eval container inherited the base image's ENTRYPOINT ["/docker-agent"], and eval.go appended the agent YAML path as CMD, producing:

running docker agent in container: container failed: exit status 1
(stderr: Error: unknown command "/configs/<agent>.yaml" for "docker-agent")

This broke every custom-base-image eval (e.g. task-style evals that set a base image), while plain evals (no image:) kept working. Note that PR #2779 only fixed the /run.sh printf generation in the default template, so it did not address this.

Fix

Bring Dockerfile.custom.template to parity with the default template:

  • COPY --from=docker/docker-agent:edge /docker-agent /
  • create the /run.sh exec wrapper
  • ENTRYPOINT ["/run.sh", "/docker-agent", "run", "--exec", "--yolo", "--json"]
  • add the same telemetry-suppression env vars and a custom image label

FROM {{.BaseImage}} and the CopyWorkingDir conditional are preserved, so the change is fully data-compatible with build.go (no Go changes needed).

Tests

Added pkg/evaluation/dockerfile_template_test.go:

  • TestDockerfileCustomTemplateParity — asserts the custom template copies the binary and sets the /run.sh entrypoint (guards against this exact regression).
  • TestDockerfileTemplatesRender — renders both templates across the CopyWorkingDir matrix.

go test ./pkg/evaluation/ passes. The full go test ./... suite passes except one pre-existing, host-specific pkg/sandbox test failure unrelated to this change.

Scope

Eval harness only. The published docker/docker-agent image, docker-agent run/serve, and the TUI are unaffected — Dockerfile.custom.template is referenced solely by pkg/evaluation/build.go.

…e template

The custom-base-image eval template (Dockerfile.custom.template) never
copied the docker-agent binary nor set the /run.sh entrypoint wrapper, so
eval containers inherited the base image's ENTRYPOINT ["/docker-agent"].
The agent YAML path was then passed as a bare subcommand, failing every
custom-image eval with:

    Error: unknown command "/configs/<agent>.yaml" for "docker-agent"

Bring the custom template to parity with the default template by copying
the binary from docker/docker-agent:edge and wrapping it with the
/run.sh "docker-agent run --exec --yolo --json" entrypoint. Add a
regression test asserting parity between the two templates.

Fixes #796
@hamza-jeddad hamza-jeddad requested a review from a team as a code owner June 9, 2026 09:32
@aheritier aheritier added area/testing Test infrastructure, CI/CD, test runners, evaluation kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 9, 2026

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

The fix correctly brings Dockerfile.custom.template to parity with the default template by adding the missing COPY --from=docker/docker-agent:edge, RUN printf wrapper, ENTRYPOINT, and telemetry-suppression env vars. The regression guard test (TestDockerfileCustomTemplateParity) and render test (TestDockerfileTemplatesRender) are a solid addition. Two minor test-quality nits below.

"custom template must copy the docker-agent binary into the eval image")
assert.Contains(t, out, `ENTRYPOINT ["/run.sh", "/docker-agent", "run", "--exec", "--yolo", "--json"]`,
"custom template must set the /run.sh docker-agent run entrypoint")
assert.Contains(t, out, "/run.sh",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Redundant /run.sh assertion is subsumed by the ENTRYPOINT assertion

The assertion on line 51 (assert.Contains(t, out, "/run.sh", ...)) can never fail independently — it will always pass if the ENTRYPOINT assertion on line 49 passes, since the ENTRYPOINT string itself contains /run.sh. The comment says this guards that "the /run.sh entrypoint wrapper" is created, but it only checks that the string /run.sh appears anywhere in the output.

To actually guard against a regression where the RUN printf ... > /run.sh creation command is missing (but the ENTRYPOINT line still references it), consider asserting on the RUN printf line directly:

assert.Contains(t, out, "RUN printf", "custom template must contain the RUN printf command that creates /run.sh")

func TestDockerfileTemplatesRender(t *testing.T) {
t.Parallel()

for _, custom := range []bool{false, true} {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Loop in TestDockerfileTemplatesRender lacks t.Run sub-tests

The nested for loops iterate over 4 combinations (custom × copyWorkingDir), but no t.Run(...) is used to give each combination a name. When an assertion fails, the test output will only say TestDockerfileTemplatesRender failed — you won't know which combination (custom=true, copyWorkingDir=false, etc.) triggered the failure.

Consider wrapping the loop body:

for _, custom := range []bool{false, true} {
    for _, copyWorkingDir := range []bool{false, true} {
        name := fmt.Sprintf("custom=%v/copyWorkingDir=%v", custom, copyWorkingDir)
        t.Run(name, func(t *testing.T) {
            out := renderTemplate(t, custom, copyWorkingDir, "alpine:latest")
            // assertions...
        })
    }
}

@hamza-jeddad hamza-jeddad merged commit 1583709 into main Jun 9, 2026
14 checks passed
@hamza-jeddad hamza-jeddad deleted the 796-eval-custom-base-image-evals-fail-with-unknown-command-configsagentyaml-dockerfilecustomtemplate-missing-docker-agent-binary-entrypoint branch June 9, 2026 09:42
aheritier added a commit that referenced this pull request Jun 10, 2026
The eval harness copies the docker-agent binary from docker/docker-agent:edge
into custom base images at build time and overrides their entrypoint with its
own /run.sh wrapper. Users need to know their base image's entrypoint will be
replaced and that the base image should only provide the runtime environment.

Ref: #3029
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/testing Test infrastructure, CI/CD, test runners, evaluation kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants