fix(evals): copy docker-agent binary + entrypoint in custom-base-image template by hamza-jeddad · Pull Request #3029 · docker/docker-agent

hamza-jeddad · 2026-06-09T09:32:29Z

Summary

Fixes #796.

docker-agent eval runs each eval case in a freshly built container. pkg/evaluation/build.go picks one of two embedded templates:

Dockerfile.template (default — used when the eval has no image:)
Dockerfile.custom.template (used when the eval sets evals.image:)

The custom template was missing the two things the default template provides: it never copied the docker-agent binary into the image and never set the /run.sh … docker-agent run … entrypoint wrapper. As a result the eval container inherited the base image's ENTRYPOINT ["/docker-agent"], and eval.go appended the agent YAML path as CMD, producing:

running docker agent in container: container failed: exit status 1
(stderr: Error: unknown command "/configs/<agent>.yaml" for "docker-agent")

This broke every custom-base-image eval (e.g. task-style evals that set a base image), while plain evals (no image:) kept working. Note that PR #2779 only fixed the /run.sh printf generation in the default template, so it did not address this.

Fix

Bring Dockerfile.custom.template to parity with the default template:

COPY --from=docker/docker-agent:edge /docker-agent /
create the /run.sh exec wrapper
ENTRYPOINT ["/run.sh", "/docker-agent", "run", "--exec", "--yolo", "--json"]
add the same telemetry-suppression env vars and a custom image label

FROM {{.BaseImage}} and the CopyWorkingDir conditional are preserved, so the change is fully data-compatible with build.go (no Go changes needed).

Tests

Added pkg/evaluation/dockerfile_template_test.go:

TestDockerfileCustomTemplateParity — asserts the custom template copies the binary and sets the /run.sh entrypoint (guards against this exact regression).
TestDockerfileTemplatesRender — renders both templates across the CopyWorkingDir matrix.

go test ./pkg/evaluation/ passes. The full go test ./... suite passes except one pre-existing, host-specific pkg/sandbox test failure unrelated to this change.

Scope

Eval harness only. The published docker/docker-agent image, docker-agent run/serve, and the TUI are unaffected — Dockerfile.custom.template is referenced solely by pkg/evaluation/build.go.

…e template The custom-base-image eval template (Dockerfile.custom.template) never copied the docker-agent binary nor set the /run.sh entrypoint wrapper, so eval containers inherited the base image's ENTRYPOINT ["/docker-agent"]. The agent YAML path was then passed as a bare subcommand, failing every custom-image eval with: Error: unknown command "/configs/<agent>.yaml" for "docker-agent" Bring the custom template to parity with the default template by copying the binary from docker/docker-agent:edge and wrapping it with the /run.sh "docker-agent run --exec --yolo --json" entrypoint. Add a regression test asserting parity between the two templates. Fixes #796

docker-agent

Assessment: 🟢 APPROVE

The fix correctly brings Dockerfile.custom.template to parity with the default template by adding the missing COPY --from=docker/docker-agent:edge, RUN printf wrapper, ENTRYPOINT, and telemetry-suppression env vars. The regression guard test (TestDockerfileCustomTemplateParity) and render test (TestDockerfileTemplatesRender) are a solid addition. Two minor test-quality nits below.

docker-agent · 2026-06-09T09:35:06Z

+		"custom template must copy the docker-agent binary into the eval image")
+	assert.Contains(t, out, `ENTRYPOINT ["/run.sh", "/docker-agent", "run", "--exec", "--yolo", "--json"]`,
+		"custom template must set the /run.sh docker-agent run entrypoint")
+	assert.Contains(t, out, "/run.sh",


[LOW] Redundant /run.sh assertion is subsumed by the ENTRYPOINT assertion

The assertion on line 51 (assert.Contains(t, out, "/run.sh", ...)) can never fail independently — it will always pass if the ENTRYPOINT assertion on line 49 passes, since the ENTRYPOINT string itself contains /run.sh. The comment says this guards that "the /run.sh entrypoint wrapper" is created, but it only checks that the string /run.sh appears anywhere in the output.

To actually guard against a regression where the RUN printf ... > /run.sh creation command is missing (but the ENTRYPOINT line still references it), consider asserting on the RUN printf line directly:

assert.Contains(t, out, "RUN printf", "custom template must contain the RUN printf command that creates /run.sh")

docker-agent · 2026-06-09T09:35:06Z

+func TestDockerfileTemplatesRender(t *testing.T) {
+	t.Parallel()
+
+	for _, custom := range []bool{false, true} {


[LOW] Loop in TestDockerfileTemplatesRender lacks t.Run sub-tests

The nested for loops iterate over 4 combinations (custom × copyWorkingDir), but no t.Run(...) is used to give each combination a name. When an assertion fails, the test output will only say TestDockerfileTemplatesRender failed — you won't know which combination (custom=true, copyWorkingDir=false, etc.) triggered the failure.

Consider wrapping the loop body:

for _, custom := range []bool{false, true} { for _, copyWorkingDir := range []bool{false, true} { name := fmt.Sprintf("custom=%v/copyWorkingDir=%v", custom, copyWorkingDir) t.Run(name, func(t *testing.T) { out := renderTemplate(t, custom, copyWorkingDir, "alpine:latest") // assertions... }) } }

The eval harness copies the docker-agent binary from docker/docker-agent:edge into custom base images at build time and overrides their entrypoint with its own /run.sh wrapper. Users need to know their base image's entrypoint will be replaced and that the base image should only provide the runtime environment. Ref: #3029

hamza-jeddad requested a review from a team as a code owner June 9, 2026 09:32

aheritier added area/testing Test infrastructure, CI/CD, test runners, evaluation kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 9, 2026

docker-agent reviewed Jun 9, 2026

View reviewed changes

melmennaoui approved these changes Jun 9, 2026

View reviewed changes

hamza-jeddad merged commit 1583709 into main Jun 9, 2026
14 checks passed

hamza-jeddad deleted the 796-eval-custom-base-image-evals-fail-with-unknown-command-configsagentyaml-dockerfilecustomtemplate-missing-docker-agent-binary-entrypoint branch June 9, 2026 09:42

BrewTestBot mentioned this pull request Jun 9, 2026

docker-agent 1.74.0 Homebrew/homebrew-core#287104

Merged

aheritier mentioned this pull request Jun 10, 2026

docs: update evaluation and compaction documentation #3044

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evals): copy docker-agent binary + entrypoint in custom-base-image template#3029

fix(evals): copy docker-agent binary + entrypoint in custom-base-image template#3029
hamza-jeddad merged 1 commit into
mainfrom
796-eval-custom-base-image-evals-fail-with-unknown-command-configsagentyaml-dockerfilecustomtemplate-missing-docker-agent-binary-entrypoint

hamza-jeddad commented Jun 9, 2026

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent Jun 9, 2026

Uh oh!

docker-agent Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hamza-jeddad commented Jun 9, 2026

Summary

Fix

Tests

Scope

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

docker-agent Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants