e2e tests: use /var/tmp, not $TMPDIR, as workdirs#22207
e2e tests: use /var/tmp, not $TMPDIR, as workdirs#22207openshift-merge-bot[bot] merged 1 commit intocontainers:mainfrom
Conversation
626030f to
1e30d68
Compare
|
Well, it's working. I'm leaving this as draft because it's a pretty big change, but I consider this ready for review. |
|
hmm, how important is this? Is this a requirement for the tmpfs change? I think using a tmpfs makes sense locally, well at least on systems with enough memory. I think it should be faster and not cause so much wear on the ssd/hdd. Although I haven't yet tried benchmarking both options to see how much of a difference it really makes. |
How important: I can't really answer that. I would like for And yes, this PR is a hard requirement for the above automation_images PR. e2e tests consume huge amounts of disk space. They won't even come close to running in tmpfs on a CI VM. Timing: great question and one that is troubling me. This seems to be slower, but I haven't figured out why. Timing results from this PR:
Timing results from #22229, the most recently merged PR on main:
I'm not seeing the same slowdown on #17831 (which I'm using for extra testing). Now that I've posted (and preserved) timing results, I'm going to just re-push this and see how it goes. Yesterday's CI run had a lot of quay flakes that required reruns. |
|
CI time is definitely a big factor but given we didn't use tmpfs before I do not think this changes anything for CI in particular other than some paths. I am more worried about local runs, it has been a while but the last time I worked on the major e2e changes I was able to run the full suite in under 5 mins on my laptop (12 threads). Given the much faster CPU compared to CI I think IO could become a big bottleneck locally and given that I have 32 gigs of RAM I like to use it. |
test/e2e/common_test.go
Outdated
There was a problem hiding this comment.
Fair point re: local runs. My initial design had an envariable check here, PODMAN_E2E_TMPDIR, but it felt too clunky. I could instead do:
use $TMPDIR
if $CI envariable is defined
then use /var/tmp
Would that help?
There was a problem hiding this comment.
It would help me but I am not sure this condition actually depends on CI or not. It depends on whenever a system has enough available RAM to run the suite fully in RAM. If I would like to run this on my small PI it most likely will not work. So maybe we should measure how much RAM we actually need and decide base don this.
Also for CI it might be worth to investigate to fully run it from memory given we want to make CI faster.
|
Better results this morning:
|
b70abba to
4eabb75
Compare
cddfb9e to
f4bc63e
Compare
|
I'm calling this ready, but would like a lot of eyeballs on it please. |
|
Ephemeral COPR build failed. @containers/packit-build please check. |
|
@Luap99 I think this is in your bucket now. Very few people run the full test suite locally, and everyone suffers waiting for the test suite to complete, as well as cost and saving the planet, so I think we should move forward on this. |
4fb8065 to
d58c503
Compare
cevich
left a comment
There was a problem hiding this comment.
FWIW, LGTM on the CI side of things, I double-checked hack/get_ci_vm.sh and contrib/cirrus/lib.sh and don't see any reason there would be problems.
Luap99
left a comment
There was a problem hiding this comment.
some comments, I will also run this locally to check if there are any new failures.
test/e2e/common_test.go
Outdated
There was a problem hiding this comment.
I really dislike this CI magic, different behaviours between local and CI are a nightmare to debug.
There was a problem hiding this comment.
CI VMs don't have enough RAM to run these tests in tmpfs. If we choose to experiment with super-duper RAM VMs, that's groovy, and it can easily be layered on top of this PR. In fact it can't even be started without this PR, because this PR is what makes /tmp be tmpfs.
All other feedback addressed, thank you.
There was a problem hiding this comment.
Drive-by suggestion: I don't think we need the entire /var/tmp on a ramdisk. Might be worth while using some diskspace tracking tool to see what the /var/tmp/subtest-* consumes over time. Maybe we don't require as big super-duper RAM VMs as we think? Also, improvements to test-cleanup code could help out a lot here.
There was a problem hiding this comment.
CI VMs don't have enough RAM to run these tests in tmpfs. If we choose to experiment with super-duper RAM VMs, that's groovy, and it can easily be layered on top of this PR. In fact it can't even be started without this PR, because this PR is what makes /tmp be tmpfs.
Which is why I said I am fine with this for now.
I don't suggest making /var/tmp tmpfs, we simple need to switch back to /tmp as root. This PR has the nessesary ground work to easily switch the basedir so we can experiment with it when it is merged
|
I did some local runs, I can run the rootless e2e tests in 4m when based on tmpfs (the used test images were cached on real disk). By using /var/tmp (my loclal SSD) it took over 6m20s so over 50% slower. So there is big potential in time savings. This PR doesn't make it worse so I think I am fine going ahead with it once you address my comments until I have more time tweaking some options. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
Needs a rebase |
TMPDIR is typically /tmp which is typically(*) a tmpfs.
This PR ignores $TMPDIR when $CI is defined, forcing all
e2e tests to set up one central working directory in /var/tmp
instead.
Also, lots of cleanup.
(*) For many years, up to and still including the time of
this PR, /tmp on Fedora CI VMs is actually NOT tmpfs,
it is just / (root). This is nonstandard and undesirable.
Efforts are underway to remove this special case.
Signed-off-by: Ed Santiago <santiago@redhat.com>
|
/lgtm |
TMPDIR is typically /tmp which is typically(*) a tmpfs.
This PR ignores $TMPDIR when $CI is defined, forcing all
e2e tests to set up one central working directory in /var/tmp
instead.
Also, lots of cleanup.
(*) For many years, up to and still including the time of
this PR, /tmp on Fedora CI VMs is actually NOT tmpfs,
it is just / (root). This is nonstandard and undesirable.
VMs: containers/automation_images#340
Signed-off-by: Ed Santiago santiago@redhat.com