Skip to content

Conversation

@jmarrec
Copy link
Contributor

@jmarrec jmarrec commented Sep 4, 2025

Pull request overview

This pull requests enables caching both CCache and regression results from develop.

What does this mean in practice?

A PR workflow with a minimal change (touching a few lines in Boilers.cc) will run under 6 minutes versus about 52 minutes

Description of the purpose of this PR

60+ commits (probably close to a 100 in reality since I squashed and force pushed a bunch) were eventually squashed into one. Some details can be found on my fork at https://github.com/jmarrec/EnergyPlus/compare/develop_cache_details...jmarrec:EnergyPlus:cache_regression_details?expand=1

This PR does several things, some of which can be turned off:

  • It uses the Ninja generator on all platforms to maximize the speed
  • It installs and configures CCache + it CACHES on Github (actions/cache) to speed up incremental builds
  • It creates two in-repo composite actions
    • setup-runner: takes care of installing the system dependencies such as ninja and ccache, and configuring ccache, (and figuring out nproc)
    • configure-and-build: takes care of calling CMake Configure with the appropriate options regardless of whether we use windows or Unix, then buildings with ninja (with minimal targets for integration only or the full thing, based on a boolean)
  • It Creates a workflow that runs on develop that will pick up an existing ccache, build E+, then run the tests. The integration tests results are cached
  • it heavily modifies the test_pull_request.yml workflow so that:
    • It picks up the cached develop regression tests based on the develop SHA (and platform + compiler ID I'm computing) at the time of the workflow run
    • ONLY In the event of a race condition (develop workflow didn't have time to finish before the pr workflow runs) will it build the baseline first and then run the integration tests (the alternative would be to just fail the workflow, and we reboot it later)
    • Otherwise, if the cached regression results from develop are found, develop branch is not even built
    • The PR checkout is built (reusing a CCache cache from the branch if any, otherwise it falls back to the latest develop CCache) and integration tests will run.
    • Rest is business as usual: analyze the regressions and produce a summary etc
  • I currently enabled probably too many platforms for the develop workflow
    • I added an arm64 ubuntu 24.04, which happens to fail on 1ZoneUncontrolledUTF8 btw
    • I added a windows-11-arm (WoA) runner, made a couple of tweaks for minizip build options so it'd work
    • (The idea was to ensure that the shared composite actions were doing the RIGHT thing in all cases, future proofing)
  • I added a workflow that will cleanup any branch specific cache after the pull_request in question is merged (or the branch is deleted)

What testing has been conducted?

Well, I've done a LOT of runs on my fork.

After squashing all these commits, here is a run after I did gh cache delete --all so there are zero ccaches to be found:

Some caveat on precompiled headers:

  • I disabled PCH because I couldn't really make it work properly with ccache on mac arm64. mac, well, clang to be precise, is notoriously harder to configure ccache with PCH enabled. But mac arm64 was the first platform I tested it on, since it's what's the current test_pull_request workflow uses. I've briefly tried re-enabling it on ubuntu arm64, but I already spent too much time working on this

Future investigations into re-enabling PCH should start with this

Pull Request Author

  • Title of PR should be user-synopsis style (clearly understandable in a standalone changelog context)
  • Label the PR with at least one of: Defect, Refactoring, NewFeature, Performance, and/or DoNoPublish
  • Pull requests that impact EnergyPlus code must also include unit tests to cover enhancement or defect repair
  • Author should provide a "walkthrough" of relevant code changes using a GitHub code review comment process
  • If any diffs are expected, author must demonstrate they are justified using plots and descriptions
  • If changes fix a defect, the fix should be demonstrated in plots and descriptions
  • If any defect files are updated to a more recent version, upload new versions here or on DevSupport
  • If IDD requires transition, transition source, rules, ExpandObjects, and IDFs must be updated, and add IDDChange label
  • If structural output changes, add to output rules file and add OutputChange label
  • If adding/removing any LaTeX docs or figures, update that document's CMakeLists file dependencies

Reviewer

  • Perform a Code Review on GitHub
  • If branch is behind develop, merge develop and build locally to check for side effects of the merge
  • If defect, verify by running develop branch and reproducing defect, then running PR and reproducing fix
  • If feature, test running new feature, try creative ways to break it
  • CI status: all green or justified
  • Check that performance is not impacted (CI Linux results include performance check)
  • Run Unit Test(s) locally
  • Check any new function arguments for performance impacts
  • Verify IDF naming conventions and styles, memos and notes and defaults
  • If new idf included, locally check the err file and other outputs

62 commits were squashed

Try caching and co

copy paste issue

Gotta checkout baseline so I find the composite action

path issue

Missing shell/uses

Wrong path

Botched the cache key

Ccache: cache key with COMPILER version (future proofing GHA updating the compilers)

setup-msbuild does not expose cl.exe and co, so do it manually

Wrong target

Wrong order for GIT SHA used in cache key

Use runner.os not matrix.pretty in cache key

Dumb change to see caching in action

Show ccache -p. And try to see if that speeds up the build (no code change).

It took ALSO 8 min to build E+ with a dumb simple change even though it restored ccache
is this because of PCH?

Try to disable PCH to see if ccache works without

Dumb change to see if ccache works without PCH now...

bothced N => NPROC replacement

I'm getting a weird error, force git sha to 10 characters

Put back problem matchers

Try to avoid a checkout of baseline if not needed, build ENTIRE E+ on develop (caching), configure CCACHE_BASEDIR. See notes

* Disabled PCH (and deleted the actions' cache): build took 10 min, not 8min.
* Made another change to Boilers.cc and rebuilt: this time the build took ONE (1) minute.

* Made a PR:
    * The baseline regression tests are picked up correctly, and baseline isn't rebuilt. That's good.
    * The CCache thing isn't working. And because of NO PCH the build takes 20 minutes instead of 14

Notes:
* On develop, I suppose you still want to build the entire project, not just energyplus + some fortran stuff, BECAUSE you're doing that in test_pull_requests, and you want a primed CCache folder when you do
* CCache will not hit a cache if you build at another path, because CMake puts a bunch of absolute paths. So there's two ways to fix that:
    * Build at the SAME path.
    * Try to configure CCache to ignore that, via base_dir(CCACHE_BASEDIR)

TEMPORARY( revert!) just run a couple of integration tests

Try to use action directly from github

Ok whatever, just be too damn clever and move on

Is a composite action a good idea? Really?

COmposite action doesn't access secrets, just use https

Bash typo

Dunno where I messed up STORING the sha

Start another version that uses composite steps differently, maybe it's more readable

Botched paths to actions

Gotta map the value of the composite action outputs explicitly!

typo

Wrong arg

typo

Typo in bash

Tweaks

PR: should disable PCH too.

ON PR, cache ccache only by branch name I guess. Consider just not caching perhaps.

adjust the unpacked PR paths

When a PR is built without the reg tests found from develop (race condition or caching issue), perhaps adding PCH is better

Ubuntu; try with PCH to see if that works. Set slopiness to pch_defines,time_macros

https://ccache.dev/manual/latest.html#_precompiled_headers

Would a subsequent develop build work with PCH?

default ccache base dir to the workspace

Use Pwsh on Windows, use my "unpacked version" (I find it clearer), and enable all platforms. Turning off PCH for now

Windows quote issue

Try arm-64 versions for Ubuntu 24.04 and Windows 11

Gotta change the cache key so that Linux arm64 doesn't pick up cache from the x64 one

Tweak: use a listed Python arm64 for windows + a quote issue

pwsh doesn't respect exit codes apparently

actions/runner-images#6668

Allow building zlib on windows arm64

make it -VV to see why 1ZoneUncontrolledUTF8 is failing on ubuntu arm

pwsh quote issue

Forgot to call target_architecture

Disable the UTF8 test that fails on ubuntu arm

Aparently I need tzdata explicitly on windows

Add a workflow to clean caches from merged branches

cf https://docs.github.com/en/actions/how-tos/manage-workflow-runs/manage-caches#force-deleting-cache-entries

show ccache config after building for debug

Missing an s! so condition was evaluating to "null" and the baseline wasn't checked out

Avoid wipping the build/ and .ccache dirs

Of course the ReadVarsESO target is called ReadVars on windows. OF COURSE.

fixup ninja command

Clean up and move back the workflows

Put back the problem matchers
@jmarrec jmarrec requested a review from Myoldmopar September 4, 2025 20:06
@jmarrec jmarrec self-assigned this Sep 4, 2025
@jmarrec jmarrec added NewFeature Includes code to add a new feature to EnergyPlus Developer Issue Related to cmake, packaging, installers, or developer tooling (CI, etc) labels Sep 4, 2025
@jmarrec
Copy link
Contributor Author

jmarrec commented Sep 4, 2025

The Build and Test thing isn't working here because when develop is checked out, the shared action isn't found, since it's not on develop yet.

Ignore that

Copy link
Member

@Myoldmopar Myoldmopar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarrec this could go down near the top of monumental changes to the E+ build process. I'm incredibly excited about this, and would love to see how we could build on this to improve the regression outputs for incredible reviews. Also I'd like to see if we can now add coverage results to GHA so that we can immediately sunset Decent CI.

I think the best thing to do here is to get this merged right in, and if there are any hiccups, we'll just deal with them after.

arch: x86_64
python-arch: x64
run_regressions: true
pretty: "Windows x64"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really exciting to see the platforms added back in now that CI will be speedy.

@Myoldmopar Myoldmopar merged commit 729c147 into develop Sep 5, 2025
10 of 13 checks passed
@Myoldmopar Myoldmopar deleted the cache_regressions branch September 5, 2025 13:29
@jmarrec jmarrec mentioned this pull request Sep 5, 2025
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Developer Issue Related to cmake, packaging, installers, or developer tooling (CI, etc) NewFeature Includes code to add a new feature to EnergyPlus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants