Skip to content

CI build, doc-build: Run containers explicitly, separate jobs for pyright, build, modularized tests, long tests#36498

Merged
vbraun merged 171 commits intosagemath:developfrom
mkoeppe:ci_build_explicit_docker_exec
May 2, 2024
Merged

CI build, doc-build: Run containers explicitly, separate jobs for pyright, build, modularized tests, long tests#36498
vbraun merged 171 commits intosagemath:developfrom
mkoeppe:ci_build_explicit_docker_exec

Conversation

@mkoeppe
Copy link
Copy Markdown
Contributor

@mkoeppe mkoeppe commented Oct 21, 2023

Running the containers explicitly, instead of using the declarative container: feature of GH Actions gives us more control:

We split out static type checking with pyright into its separate workflow:

  • pyright.yml: As static checking does not need a build of the Sage library, for PRs that do not make any changes to packages, there's nothing to build, and the workflow gives a fast turnaround just after 10 minutes. For applying the CI fixes from blocker tickets, this workflow uses the technique favored in ci: improve things around auto-apply of fixes #36349.

The workflow build.yml first launches a job:

After this is completed, more jobs are launched:

The workflows doc-build.yml and doc-build-pdf.yml again build incrementally and then build the documentation. The diffing code for the HTML documentation now runs in the host, not the container, which simplifies things. (To show that diffing still works, we include a small change to the Sage library.)

Splitting the workflow jobs implements @kwankyu's suggestion from:

The steps "again builds incrementally" actually use the GH Actions cache, https://docs.docker.com/build/ci/github-actions/cache/#cache-backend-api. When nothing is cached and the 3 workflows are launched in parallel, they may each build the same thing. But when there's congestion that leads to the workflows to be run serially, the 2nd and 3rd workflow will be able to use the cache from the 1st workflow. This elasticity may be helpful in reducing congestion.

There is a rather small per-project limit of 10 GB for this cache, so we'll have to see how effectively caching works in practice. See https://github.com/sagemath/sage/actions/caches

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

@kwankyu
Copy link
Copy Markdown
Collaborator

kwankyu commented Oct 22, 2023

src/sage/schemes/toric/variety.py sneaked in?

@mkoeppe
Copy link
Copy Markdown
Contributor Author

mkoeppe commented Oct 22, 2023

via #36443

@tobiasdiez
Copy link
Copy Markdown
Contributor

I don't get the point of this PR. It seems to mostly complicate the setup and essentially reimplements the container feature of github. For the out-of-space issue it would be nice to first know what's the route cause to be able to evaluate if steps to fixing it go into the correct direction. For potential speedups by caching the docker image, I'm not sure if this is really needed. Yes, from a conceptual viewpoint you only want to build sagelib once, and then reuse it - but well, it mostly takes 2-10 mins of incremental compilation anyway. Personally, I think we have more pressing issues with the ci setup and better not introduce more moving parts (atm at least).

@mkoeppe
Copy link
Copy Markdown
Contributor Author

mkoeppe commented Oct 22, 2023

For the out-of-space issue it would be nice to first know what's the root cause to be able to evaluate if steps to fixing it go into the correct direction.

It's really not very complicated. GitHub runs the container from its very full runner image. We know that it does not have enough space for our large containers.

@mkoeppe
Copy link
Copy Markdown
Contributor Author

mkoeppe commented Oct 22, 2023

It seems to [...] essentially reimplements the container feature of github.

That's right. The declarative container feature is convenient, but we have run into the two restrictions described in the PR description. So we just use docker directly; problem solved.

@mkoeppe
Copy link
Copy Markdown
Contributor Author

mkoeppe commented Oct 22, 2023

Yes, from a conceptual viewpoint you only want to build sagelib once, and then reuse it - but well, it mostly takes 2-10 mins of incremental compilation anyway.

Actually, the compilation can take much longer when low level parts of sagelib are touched, and when certain packages are updated.

@tobiasdiez
Copy link
Copy Markdown
Contributor

For the out-of-space issue it would be nice to first know what's the root cause to be able to evaluate if steps to fixing it go into the correct direction.

It's really not very complicated. GitHub runs the container from its very full runner image. We know that it does not have enough space for our large containers.

Github provides 14gb of space (after its runner image), that should be sufficient (and was sufficient) to run a 7gb image in it and install sage. Also our image got bigger by a few houndred mbs.

@mkoeppe
Copy link
Copy Markdown
Contributor Author

mkoeppe commented Oct 22, 2023

If you want to understand more details, I'd suggest that you experiment with the space parameters in https://github.com/sagemath/sage/blob/develop/.github/workflows/docker.yml#L148

@tobiasdiez
Copy link
Copy Markdown
Contributor

It's already quite easy to run commands outside the container. Here is an example for exactly your first use case: https://github.com/osquery/osquery/blob/a916e80a895f8abb9b749161d774d528d0466b31/.github/workflows/hosted_runners.yml#L257-L263 I don't think we need more flexibility.

@mkoeppe
Copy link
Copy Markdown
Contributor Author

mkoeppe commented Oct 23, 2023

It's already quite easy to run commands outside the container. Here is an example for exactly your first use case: https://github.com/osquery/osquery/blob/a916e80a895f8abb9b749161d774d528d0466b31/.github/workflows/hosted_runners.yml#L257-L263

Thanks for finding this.

I don't think we need more flexibility.

Well, I already know that we do need it, namely for sharing the build, which I'll do in a follow up.

@vbraun
Copy link
Copy Markdown
Member

vbraun commented Apr 16, 2024

merge conflict

vbraun pushed a commit to vbraun/sage that referenced this pull request Apr 18, 2024
sagemathgh-36498: CI build, doc-build: Run containers explicitly, separate jobs for pyright, build, modularized tests, long tests
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
Running the containers explicitly, instead of using the declarative
`container:` feature of GH Actions gives us more control:
- we can create more space on the host if necessary; we just scraped by
an out of space condition in sagemath#36473 /
sagemath#36469 (comment)
- we can run some operations outside of the container but in the same
job; this will make the separate "Get CI fixes" jobs unnecessary,
addressing the cosmetic concerns from
sagemath#36338 (comment),
sagemath#36349
- it enables caching between the various workflows (as first discussed
in sagemath#36446).

We split out static type checking with pyright into its separate
workflow:
- **pyright.yml**: As static checking does not need  a build of the Sage
library, for PRs that do not make any changes to packages, there's
nothing to build, and the workflow gives a fast turnaround just after 10
minutes. For applying the CI fixes from blocker tickets, this workflow
uses the technique favored in sagemath#36349.

The workflow **build.yml** first launches a job:
- **test-new:** It builds incrementally (using a tox-generated
`Dockerfile` and https://github.com/marketplace/actions/build-and-push-
docker-images) and does the quick incremental test. This completes
within 10 to 20 minutes when there's no change.

After this is completed, more jobs are launched:
- **test-mod:** It again builds incrementally and tests a modularized
distribution. Later (with more from sagemath#35095), more jobs will be added to
this matrix job for other distributions.
- **test-long:** It again builds incrementally and runs the long test.

The workflows **doc-build.yml** and **doc-build-pdf.yml** again build
incrementally and then build the documentation. The diffing code for the
HTML documentation now runs in the host, not the container, which
simplifies things. (To show that diffing still works, we include a small
change to the Sage library.)

Splitting the workflow jobs implements @kwankyu's  suggestion from:
- sagemath#35652 (comment)
(Fixes sagemath#35814)

The steps "again builds incrementally" actually use the GH Actions
cache, https://docs.docker.com/build/ci/github-actions/cache/#cache-
backend-api. When nothing is cached and the 3 workflows are launched in
parallel, they may each build the same thing. But when there's
congestion that leads to the workflows to be run serially, the 2nd and
3rd workflow will be able to use the cache from the 1st workflow. This
elasticity may be helpful in reducing congestion.

There is a rather small per-project limit of 10 GB for this cache, so
we'll have to see how effectively caching works in practice. See
https://github.com/sagemath/sage/actions/caches


<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36938 (merged here)
- Depends on sagemath#36951 (merged here)
- Depends on sagemath#37351 (merged here)
- Depends on sagemath#37750 (merged here)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36498
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe
vbraun pushed a commit to vbraun/sage that referenced this pull request Apr 20, 2024
sagemathgh-36498: CI build, doc-build: Run containers explicitly, separate jobs for pyright, build, modularized tests, long tests
    
<!-- ^^^^^
Please provide a concise, informative and self-explanatory title.
Don't put issue numbers in there, do this in the PR body below.
For example, instead of "Fixes sagemath#1234" use "Introduce new method to
calculate 1+1"
-->
<!-- Describe your changes here in detail -->

<!-- Why is this change required? What problem does it solve? -->
Running the containers explicitly, instead of using the declarative
`container:` feature of GH Actions gives us more control:
- we can create more space on the host if necessary; we just scraped by
an out of space condition in sagemath#36473 /
sagemath#36469 (comment)
- we can run some operations outside of the container but in the same
job; this will make the separate "Get CI fixes" jobs unnecessary,
addressing the cosmetic concerns from
sagemath#36338 (comment),
sagemath#36349
- it enables caching between the various workflows (as first discussed
in sagemath#36446).

We split out static type checking with pyright into its separate
workflow:
- **pyright.yml**: As static checking does not need  a build of the Sage
library, for PRs that do not make any changes to packages, there's
nothing to build, and the workflow gives a fast turnaround just after 10
minutes. For applying the CI fixes from blocker tickets, this workflow
uses the technique favored in sagemath#36349.

The workflow **build.yml** first launches a job:
- **test-new:** It builds incrementally (using a tox-generated
`Dockerfile` and https://github.com/marketplace/actions/build-and-push-
docker-images) and does the quick incremental test. This completes
within 10 to 20 minutes when there's no change.

After this is completed, more jobs are launched:
- **test-mod:** It again builds incrementally and tests a modularized
distribution. Later (with more from sagemath#35095), more jobs will be added to
this matrix job for other distributions.
- **test-long:** It again builds incrementally and runs the long test.

The workflows **doc-build.yml** and **doc-build-pdf.yml** again build
incrementally and then build the documentation. The diffing code for the
HTML documentation now runs in the host, not the container, which
simplifies things. (To show that diffing still works, we include a small
change to the Sage library.)

Splitting the workflow jobs implements @kwankyu's  suggestion from:
- sagemath#35652 (comment)
(Fixes sagemath#35814)

The steps "again builds incrementally" actually use the GH Actions
cache, https://docs.docker.com/build/ci/github-actions/cache/#cache-
backend-api. When nothing is cached and the 3 workflows are launched in
parallel, they may each build the same thing. But when there's
congestion that leads to the workflows to be run serially, the 2nd and
3rd workflow will be able to use the cache from the 1st workflow. This
elasticity may be helpful in reducing congestion.

There is a rather small per-project limit of 10 GB for this cache, so
we'll have to see how effectively caching works in practice. See
https://github.com/sagemath/sage/actions/caches


<!-- If this PR resolves an open issue, please link to it here. For
example "Fixes sagemath#12345". -->
<!-- If your change requires a documentation PR, please link it
appropriately. -->

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [ ] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on
- sagemath#12345: short description why this is a dependency
- sagemath#34567: ...
-->
- Depends on sagemath#36938 (merged here)
- Depends on sagemath#36951 (merged here)
- Depends on sagemath#37351 (merged here)
- Depends on sagemath#37750 (merged here)

<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
    
URL: sagemath#36498
Reported by: Matthias Köppe
Reviewer(s): Kwankyu Lee, Matthias Köppe
@vbraun
Copy link
Copy Markdown
Member

vbraun commented Apr 24, 2024

merge conflict

Matthias Koeppe added 19 commits April 24, 2024 22:13
…Replace use of {envpython} by just python3"

This reverts commit cca2b8d.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

build.yml: Report failures (e.g. from incremental test) earlier

4 participants