Skip to content

Fix caching for services during smoke tests#262

Merged
m-goggins merged 54 commits into
mainfrom
fix-caching-for-services-during-smoke-tests
Mar 28, 2025
Merged

Fix caching for services during smoke tests#262
m-goggins merged 54 commits into
mainfrom
fix-caching-for-services-during-smoke-tests

Conversation

@m-goggins

@m-goggins m-goggins commented Mar 21, 2025

Copy link
Copy Markdown
Collaborator

Description

This is part 2 of ticket #224, which ensures that we are caching the RL image and the database images during the smoke tests.

Related Issues

Fixes #224

Additional Notes

@codecov

codecov Bot commented Mar 21, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.85%. Comparing base (0971c41) to head (898ac23).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #262      +/-   ##
==========================================
+ Coverage   97.82%   97.85%   +0.02%     
==========================================
  Files          33       33              
  Lines        1748     1770      +22     
==========================================
+ Hits         1710     1732      +22     
  Misses         38       38              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@m-goggins m-goggins marked this pull request as ready for review March 24, 2025 17:24
@m-goggins m-goggins requested a review from ericbuckley as a code owner March 24, 2025 17:24
@m-goggins

Copy link
Copy Markdown
Collaborator Author

@ericbuckley After trying out the build-push-action in the smoke tests for the RL image (here), I'm still seeing it pull the python slim image every time. Perhaps I've implemented the build-push-action incorrectly, but from my understanding it doesn't cache the final image (only the build layers), which is potentially why it's pulling python slim every time?

In this iteration, I re-worked the RL to push to GHCR as rl-service:pr-$PR_NUMBER so that we don't run into conflicts with rl-service:latest when two builds happen simultaneously as you pointed out on the other PR. Thoughts?

I'm also happy to pair on this if you have time and some additional thoughts on the build-push-action tact.

@ericbuckley

Copy link
Copy Markdown
Collaborator

@ericbuckley After trying out the build-push-action in the smoke tests for the RL image (here), I'm still seeing it pull the python slim image every time. Perhaps I've implemented the build-push-action incorrectly, but from my understanding it doesn't cache the final image (only the build layers), which is potentially why it's pulling python slim every time?

In this iteration, I re-worked the RL to push to GHCR as rl-service:pr-$PR_NUMBER so that we don't run into conflicts with rl-service:latest when two builds happen simultaneously as you pointed out on the other PR. Thoughts?

I'm also happy to pair on this if you have time and some additional thoughts on the build-push-action tact.

In the commit you shared it looked like the caching was happening in one job (build-rl-with-docker) and the execution in another (smoke_tests)? Do you know if gha caching is persisted across jobs and/or did you try it all as one job?

The rl-service:pr-$PR_NUMBER works as a solution as well, I think with that path we should think through a cleanup path though, as it will get quite large over time. That can be done in a second GH issue though. And I'm also around to pair this afternoon if you want.

@m-goggins

Copy link
Copy Markdown
Collaborator Author

gha caching does not persist across jobs. I forgot to mention that I tried both (caching and building in one job and separately) but neither seemed to do the trick.

I'll proceed with the rl-service:pr-$PR_NUMBER solution and write a follow up ticket to handle the clean up.

@m-goggins m-goggins requested a review from ericbuckley March 26, 2025 19:35
@ericbuckley

Copy link
Copy Markdown
Collaborator

gha caching does not persist across jobs. I forgot to mention that I tried both (caching and building in one job and separately) but neither seemed to do the trick.

I'll proceed with the rl-service:pr-$PR_NUMBER solution and write a follow up ticket to handle the clean up.

@m-goggins thanks for all your experimentation here, testing is particularly tricky as we haven't run into a dockerhub authentication issue yet! Another option to consider, the work you did on the database services gets us more than halfway there (according to my count, we use 6 service images in every commit check and 4 rl-service images). I'm wondering, if maybe we just roll out the database service caching work you did and call it at that. Wait to see if anything breaks post Apr 1st and go from there.

@m-goggins m-goggins requested a review from bamader March 26, 2025 20:25
ericbuckley
ericbuckley previously approved these changes Mar 26, 2025

@ericbuckley ericbuckley left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @m-goggins. One last question on the schedule job, did you check to see that it runs as expected? I think as is, it will only run on April 1st. Wondering if we want to change up the "on" schedule in a commit, just to verify that it works, then flip it back.

@m-goggins

Copy link
Copy Markdown
Collaborator Author

I tried testing on this branch, but it looks like scheduled jobs only run on the default branch of a repo so I can't test until it's merged into main.

I set it to run at 12:01am on the 27th so assuming you're okay with merging, I can check to see if it worked tomorrow and re-adjust to the 1st if everything goes smoothly.

@ericbuckley

Copy link
Copy Markdown
Collaborator

I tried testing on this branch, but it looks like scheduled jobs only run on the default branch of a repo so I can't test until it's merged into main.

I set it to run at 12:01am on the 27th so assuming you're okay with merging, I can check to see if it worked tomorrow and re-adjust to the 1st if everything goes smoothly.

Oh, I meant instead of changing the schedule, change the trigger. The below would have the workflow run on a PR that is trying to merge into main.

on:
  push:
    branches: [main]

@m-goggins

Copy link
Copy Markdown
Collaborator Author

Oh, yep! In that case, it has run successfully, here: https://github.com/CDCgov/RecordLinker/actions/runs/14065686730

@m-goggins m-goggins mentioned this pull request Mar 28, 2025
2 tasks
@m-goggins m-goggins merged commit e179e23 into main Mar 28, 2025
@m-goggins m-goggins deleted the fix-caching-for-services-during-smoke-tests branch March 28, 2025 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

increase docker caching in workflows

3 participants