fix(evals): copy docker-agent binary + entrypoint in custom-base-image template#3029
Conversation
…e template
The custom-base-image eval template (Dockerfile.custom.template) never
copied the docker-agent binary nor set the /run.sh entrypoint wrapper, so
eval containers inherited the base image's ENTRYPOINT ["/docker-agent"].
The agent YAML path was then passed as a bare subcommand, failing every
custom-image eval with:
Error: unknown command "/configs/<agent>.yaml" for "docker-agent"
Bring the custom template to parity with the default template by copying
the binary from docker/docker-agent:edge and wrapping it with the
/run.sh "docker-agent run --exec --yolo --json" entrypoint. Add a
regression test asserting parity between the two templates.
Fixes #796
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The fix correctly brings Dockerfile.custom.template to parity with the default template by adding the missing COPY --from=docker/docker-agent:edge, RUN printf wrapper, ENTRYPOINT, and telemetry-suppression env vars. The regression guard test (TestDockerfileCustomTemplateParity) and render test (TestDockerfileTemplatesRender) are a solid addition. Two minor test-quality nits below.
| "custom template must copy the docker-agent binary into the eval image") | ||
| assert.Contains(t, out, `ENTRYPOINT ["/run.sh", "/docker-agent", "run", "--exec", "--yolo", "--json"]`, | ||
| "custom template must set the /run.sh docker-agent run entrypoint") | ||
| assert.Contains(t, out, "/run.sh", |
There was a problem hiding this comment.
[LOW] Redundant /run.sh assertion is subsumed by the ENTRYPOINT assertion
The assertion on line 51 (assert.Contains(t, out, "/run.sh", ...)) can never fail independently — it will always pass if the ENTRYPOINT assertion on line 49 passes, since the ENTRYPOINT string itself contains /run.sh. The comment says this guards that "the /run.sh entrypoint wrapper" is created, but it only checks that the string /run.sh appears anywhere in the output.
To actually guard against a regression where the RUN printf ... > /run.sh creation command is missing (but the ENTRYPOINT line still references it), consider asserting on the RUN printf line directly:
assert.Contains(t, out, "RUN printf", "custom template must contain the RUN printf command that creates /run.sh")| func TestDockerfileTemplatesRender(t *testing.T) { | ||
| t.Parallel() | ||
|
|
||
| for _, custom := range []bool{false, true} { |
There was a problem hiding this comment.
[LOW] Loop in TestDockerfileTemplatesRender lacks t.Run sub-tests
The nested for loops iterate over 4 combinations (custom × copyWorkingDir), but no t.Run(...) is used to give each combination a name. When an assertion fails, the test output will only say TestDockerfileTemplatesRender failed — you won't know which combination (custom=true, copyWorkingDir=false, etc.) triggered the failure.
Consider wrapping the loop body:
for _, custom := range []bool{false, true} {
for _, copyWorkingDir := range []bool{false, true} {
name := fmt.Sprintf("custom=%v/copyWorkingDir=%v", custom, copyWorkingDir)
t.Run(name, func(t *testing.T) {
out := renderTemplate(t, custom, copyWorkingDir, "alpine:latest")
// assertions...
})
}
}The eval harness copies the docker-agent binary from docker/docker-agent:edge into custom base images at build time and overrides their entrypoint with its own /run.sh wrapper. Users need to know their base image's entrypoint will be replaced and that the base image should only provide the runtime environment. Ref: #3029
Summary
Fixes #796.
docker-agent evalruns each eval case in a freshly built container.pkg/evaluation/build.gopicks one of two embedded templates:Dockerfile.template(default — used when the eval has noimage:)Dockerfile.custom.template(used when the eval setsevals.image:)The custom template was missing the two things the default template provides: it never copied the
docker-agentbinary into the image and never set the/run.sh … docker-agent run …entrypoint wrapper. As a result the eval container inherited the base image'sENTRYPOINT ["/docker-agent"], andeval.goappended the agent YAML path as CMD, producing:This broke every custom-base-image eval (e.g. task-style evals that set a base image), while plain evals (no
image:) kept working. Note that PR #2779 only fixed the/run.shprintfgeneration in the default template, so it did not address this.Fix
Bring
Dockerfile.custom.templateto parity with the default template:COPY --from=docker/docker-agent:edge /docker-agent //run.shexec wrapperENTRYPOINT ["/run.sh", "/docker-agent", "run", "--exec", "--yolo", "--json"]customimage labelFROM {{.BaseImage}}and theCopyWorkingDirconditional are preserved, so the change is fully data-compatible withbuild.go(no Go changes needed).Tests
Added
pkg/evaluation/dockerfile_template_test.go:TestDockerfileCustomTemplateParity— asserts the custom template copies the binary and sets the/run.shentrypoint (guards against this exact regression).TestDockerfileTemplatesRender— renders both templates across theCopyWorkingDirmatrix.go test ./pkg/evaluation/passes. The fullgo test ./...suite passes except one pre-existing, host-specificpkg/sandboxtest failure unrelated to this change.Scope
Eval harness only. The published
docker/docker-agentimage,docker-agent run/serve, and the TUI are unaffected —Dockerfile.custom.templateis referenced solely bypkg/evaluation/build.go.