Skip to content

systests: cp: add wait_for_ready#20912

Merged
openshift-merge-bot[bot] merged 1 commit intocontainers:mainfrom
edsantiago:fix_some_cp_flakes
Dec 6, 2023
Merged

systests: cp: add wait_for_ready#20912
openshift-merge-bot[bot] merged 1 commit intocontainers:mainfrom
edsantiago:fix_some_cp_flakes

Conversation

@edsantiago
Copy link
Copy Markdown
Member

Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: #20282 (I hope)

Signed-off-by: Ed Santiago santiago@redhat.com

None

@openshift-ci openshift-ci bot added release-note-none approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 5, 2023
Copy link
Copy Markdown
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a logical explanation to me but I think you have overdone it a bit.

@edsantiago
Copy link
Copy Markdown
Member Author

but I think you have overdone it a bit.

I half-agree. My first pass was addressing only the touch/mkdir containers. After some testing, and some thinking about it, I decided I never want to look at this flake again. I then applied wait_for_ready to every run -d. Is that harmful?

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented Dec 5, 2023

but I think you have overdone it a bit.

I half-agree. My first pass was addressing only the touch/mkdir containers. After some testing, and some thinking about it, I decided I never want to look at this flake again. I then applied wait_for_ready to every run -d. Is that harmful?

Harmful no, but it makes the diff here bigger than it needs to be and makes the tests slower as they now always call podman logs even when it is not needed.

@edsantiago
Copy link
Copy Markdown
Member Author

OK. I'll repush once CI finishes.

Some of the tests were doing "podman run -d" without wait_for_ready.
This may be the cause of some of the CI flakes. Maybe even all?
It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug.

Thanks to Chris for making that astute observation.

Fixes: containers#20282  (I hope)

Signed-off-by: Ed Santiago <santiago@redhat.com>
@edsantiago
Copy link
Copy Markdown
Member Author

Done. Now wait_for_ready is added only to those containers that touch, echo, or mkdir .

Copy link
Copy Markdown
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Dec 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Copy Markdown
Member

rhatdan commented Dec 6, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit a64cc98 into containers:main Dec 6, 2023
@cevich
Copy link
Copy Markdown
Member

cevich commented Dec 6, 2023

Thanks for fixing this Ed, hopefully it was the cause.

It's not clear why the tests have been working reliably for years
under overlay, and only started failing under vfs, but shrug

If it helps, and this is a total guess. My feeling is the failure unpredictability is coming from the storage subsystem in the cloud context. All the CI VMs are running with (presumably multi-path) fiber-channel/network based storage. That in and of itself adds in a HUGE amount of complexity w/in the kernel and hardware-wise. Worse, both bandwidth and IOPS are "provisioned" (i.e. limited) based on what you pay for. Either/both of those aspects could easily result in randomly appearing "hiccups" in user-space. In other words, we should expect both the cloud "throttling" reads and/or writes, and occasional (transparent) hiccups w/in the hardware or network "fabric" itself.

@edsantiago edsantiago deleted the fix_some_cp_flakes branch December 6, 2023 16:24
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Mar 6, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none

Projects

None yet

Development

Successfully merging this pull request may close these issues.

podman cp under vfs: ENOENT

4 participants