Skip to content

Optimize test build times#1230

Merged
PhilippGackstatter merged 13 commits intonextfrom
pgackst-optimize-build-times
Mar 20, 2025
Merged

Optimize test build times#1230
PhilippGackstatter merged 13 commits intonextfrom
pgackst-optimize-build-times

Conversation

@PhilippGackstatter
Copy link
Copy Markdown
Contributor

@PhilippGackstatter PhilippGackstatter commented Mar 19, 2025

Optimizes the way we build and run tests.

Profiles

I'm evaluating all of these using the build and test execution times from make test-prove (with a cargo clean in between) on my computer.

Current next

[profile.test-release]
inherits = "release"
debug = true
debug-assertions = true
overflow-checks = true

Build time: 304s
Test execution time: 71s

Dev profile

[profile.test-release]
inherits = "dev"
codegen-units = 16

Build time: 33s
Test execution time: 338s

Dev + Opt-Level 1

[profile.test-release]
inherits = "dev"
opt-level = 1
codegen-units = 16

Build time: 67s
Test execution time: 74s

Dev + Opt-Level 1 + Thin LTO

[profile.test-release]
inherits = "dev"
opt-level = 1
lto = "thin"
codegen-units = 16

Build time: 94s
Test execution time: 72s

Conclusion

I think the clear winner is Dev + Opt-Level 1 as it has the lowest build time with a practically equivalent test execution time to the other options (except Dev). For that reason I renamed test-release to test-dev.

Given that it is also not necessary to specifically compile winterfell with a higher opt-level because it doesn't bring any additional benefits.

Interestingly on my machine the build is 4.5x faster, while in CI it is sadly only a ~2x improvement (from 12m 02s to 6m 08s).
For the execution time, the tests are ~5% slower in CI (4m 24s previously and 4m 42s now) which seems acceptable given the build time improvement.

CI Reuse

What's also interesting is that in this run from another recent PR https://github.com/0xPolygonMiden/miden-base/actions/runs/13933162712/job/38995000040#step:7:1 the make test-default seems to have to completely rebuild what make test-build just finished doing, judging by the time.

In this PR, the change from test-dev alone causes the "second build" to only take another minute (see first CI run from this PR). I removed the $(BUILD_GENERATED_FILES_IN_SRC) from the make test-build command which doesn't seem necessary and that eliminates the "second build" entirely (see second CI run from this PR).

The overall CI improvement for the tests from this PR is from 29m 38s to 12m 12s (2.4x).

Let me know if I should try anything else.

closes #1222

@PhilippGackstatter PhilippGackstatter added the no changelog This PR does not require an entry in the `CHANGELOG.md` file label Mar 19, 2025
@PhilippGackstatter PhilippGackstatter marked this pull request as ready for review March 19, 2025 13:33
@PhilippGackstatter PhilippGackstatter changed the title Optimize build times Optimize test build times Mar 19, 2025
@Mirko-von-Leipzig
Copy link
Copy Markdown
Contributor

We really need to stop with these out-of-tree proto files :/. They're constantly a cause of CI and build times issues because we have to "force" rebuild them sometimes and its non-trivial to monitor because they're not co-located with the build.rs that's (ab)using them.

Copy link
Copy Markdown
Contributor

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other investigation options:

  • debug symbols though then errors are likely worse
  • mold as a linker if link time is a lot

@Mirko-von-Leipzig
Copy link
Copy Markdown
Contributor

Looking at the CI build/tests it also looks like the rust-cache isn't working.

Though that might be because it includes the profile in the cache key, and that was changed.

You might want to re-run the actions to see if that's the case. Although I think I set the cache to only save on push into next to minimize the cache size.

@bobbinth
Copy link
Copy Markdown
Contributor

Given that it is also not necessary to specifically compile winterfell with a higher opt-level because it doesn't bring any additional benefits.

Is debug_assertions enabled at this level? The reason I'm asking is that winterfell runs a pretty expensive check (i.e., here and here) when debug_assertions are enabled.

@PhilippGackstatter
Copy link
Copy Markdown
Contributor Author

PhilippGackstatter commented Mar 20, 2025

winterfell runs a pretty expensive check [...] when debug_assertions are enabled.

Ah, thank you. We were enabling debug-assertions even before under the test-release profile because we explicitly enabled debug-assertions in the profile and also because we passed RUSTFLAGS="-C debug-assertions" on the command line.
Disabling the debug assertions for winter-prover specifically gives a massive boost in test execution time. On my machine make test-prove now compiles in 63s and runs in 11s.

The build time in CI is now similar to before with 6m 16s but test execution time is now 7m 58s, so a nice improvement over the original ~30 min.

I removed all DEBUG_ASSERTIONS from the Makefile because they are either implicitly enabled through the use of the standard cargo dev profile or through the cargo test-dev profile. I think it's clearer that way too. Previously I missed that we were overwriting this option on the cli. Now everything is specified in Cargo.toml.

I also combined make test-default and make test-prove because the prove tests are not significantly slower than the other ones, so the distinction no longer seems necessary. Let me know if we should still keep that distinction. (We still have make test-dev to run a subset of the most basic tests which I use locally as a sanity check.)

I tried using the mold linker in CI but there does not seem to be a significant difference, so I reverted that change to keep the setup as simple as it is.

By far the slowest test is now create_accounts_with_non_zero_anchor_block which doesn't do any proving. It is slow because it creates $2^{16}$ blocks with the MockChain to test an account ID creation with something other than the genesis block as the anchor. If we could speed up the creation of fake blocks in the mock chain that would help here. I guess the slow part comes from all the hashing in the MMR operations, not sure if we can cheat here somehow.

Copy link
Copy Markdown
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you!

On my machine make test-prove now compiles in 63s and runs in 11s.

So, on your machine, the combined build + test time went from ~375 seconds to ~75 second, right? Or about 5x improvement?

The build time in CI is now similar to before with 6m 16s but test execution time is now 7m 58s, so a nice improvement over the original ~30 min.

And on CI we went from ~30 mins to ~14 mins, right? So, about 2x improvement?

By far the slowest test is now create_accounts_with_non_zero_anchor_block which doesn't do any proving. It is slow because it creates $2^16$ blocks with the MockChain to test an account ID creation with something other than the genesis block as the anchor. If we could speed up the creation of fake blocks in the mock chain that would help here. I guess the slow part comes from all the hashing in the MMR operations, not sure if we can cheat here somehow.

Let's create a separate issue for this.

@PhilippGackstatter
Copy link
Copy Markdown
Contributor Author

So, on your machine, the combined build + test time went from ~375 seconds to ~75 second, right? Or about 5x improvement?

Exactly, yes.

And on CI we went from ~30 mins to ~14 mins, right? So, about 2x improvement?

Ah no, when I said "test execution time is now 7m 58s" I should have said the "entire test time", not just the execution time. So it went from ~30mins to ~8mins, so 3.75x improvement.

@Mirko-von-Leipzig
Copy link
Copy Markdown
Contributor

So, on your machine, the combined build + test time went from ~375 seconds to ~75 second, right? Or about 5x improvement?

Exactly, yes.

And on CI we went from ~30 mins to ~14 mins, right? So, about 2x improvement?

Ah no, when I said "test execution time is now 7m 58s" I should have said the "entire test time", not just the execution time. So it went from ~30mins to ~8mins, so 3.75x improvement.

This should improve further once you land this on next due to caching.

@PhilippGackstatter PhilippGackstatter merged commit 6bf1ee5 into next Mar 20, 2025
11 checks passed
@PhilippGackstatter PhilippGackstatter deleted the pgackst-optimize-build-times branch March 20, 2025 15:54
@PhilippGackstatter
Copy link
Copy Markdown
Contributor Author

This should improve further once you land this on next due to caching.

The tests on next now run in 3 minutes, compared to the previous 30 mins, which is pretty cool. And that is still slowed down by just one test (#1234).

The VM uses the approach we had previously, so maybe worth it wo change it there as well, cc @plafer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog This PR does not require an entry in the `CHANGELOG.md` file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants