Skip to content

fix: prevent model re-download of cached models after helm upgrade#203

Merged
Defilan merged 4 commits intomainfrom
fix/model-cache-redownload-193
Mar 4, 2026
Merged

fix: prevent model re-download of cached models after helm upgrade#203
Defilan merged 4 commits intomainfrom
fix/model-cache-redownload-193

Conversation

@Defilan
Copy link
Member

@Defilan Defilan commented Mar 4, 2026

Summary

Fixes #193 — After helm upgrade, the model controller re-downloads cached models despite them existing in the PVC.

  • Early return for Ready models: If status is already Ready and the file exists on disk, return immediately without updating status, preventing reconcile-loop storms after restarts
  • Atomic writes: downloadModel() and copyLocalModel() now write to a temp file then os.Rename to the final path, so a crash mid-download never leaves a partial file that os.Stat would later find and mark Ready
  • Content-Length validation: HTTP downloads verify received bytes match the server's Content-Length, catching truncated downloads

Test plan

  • make test — 132/132 specs pass (4 new)
  • make vet && make fmt — clean
  • E2E: new test restarts the controller after model reaches Ready, verifies no re-download via log assertions (make test-e2e)

Defilan added 3 commits March 3, 2026 19:00
)

Add early return when model status is already Ready and the cached file
exists on disk, avoiding unnecessary status updates that trigger
re-reconcile loops. Rewrite downloadModel and copyLocalModel to use
atomic temp-file-then-rename writes so a crash mid-download never leaves
a partial file at the final path. Validate HTTP Content-Length against
received bytes to detect truncated downloads.

Signed-off-by: Christopher Maher <chris@defilan.com>
Signed-off-by: Christopher Maher <chris@mahercode.io>
Signed-off-by: Christopher Maher <chris@defilan.com>
Signed-off-by: Christopher Maher <chris@mahercode.io>
Add an e2e test that restarts the controller after a model reaches
Ready, then verifies no re-download occurs by checking controller
logs for the early-return cache-hit path.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan force-pushed the fix/model-cache-redownload-193 branch from 3aafc85 to 353855c Compare March 4, 2026 03:34
The Kind cluster uses emptyDir for model cache, so the file is lost on
pod restart. Update the test to verify the controller detects the missing
file and re-downloads cleanly, returning the model to Ready.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan merged commit a8f9a88 into main Mar 4, 2026
15 checks passed
@Defilan Defilan deleted the fix/model-cache-redownload-193 branch March 4, 2026 04:50
This was referenced Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model controller re-downloads cached models after controller upgrade

1 participant