Skip to content

unit: add jobs that should be dispatched later back to run_queue#21524

Closed
msekletar wants to merge 1 commit intosystemd:mainfrom
msekletar:issue-21458-empty-run-queue
Closed

unit: add jobs that should be dispatched later back to run_queue#21524
msekletar wants to merge 1 commit intosystemd:mainfrom
msekletar:issue-21458-empty-run-queue

Conversation

@msekletar
Copy link
Copy Markdown
Contributor

Assumption in edc027b was that job we first skipped because of active
ratelimit is still in run_queue. Hence we trigger the queue and dispatch
it in the next iteration. Actually we remove jobs from run_queue in
job_run_and_invalidate() before we call unit_start(). Hence if we want
to attempt to run the job again in the future we need to add it back
to run_queue.

Fixes #21458

@poettering poettering added pid1 reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Nov 25, 2021
@msekletar msekletar force-pushed the issue-21458-empty-run-queue branch from 14df643 to bd6eb7b Compare November 26, 2021 10:04
@msekletar
Copy link
Copy Markdown
Contributor Author

Silly mistake, I should have checked once more before pushing. Sorry about that.

Anyway, should be fixed now. @poettering PTAL.

@msekletar msekletar removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Nov 26, 2021
@poettering
Copy link
Copy Markdown
Member

lgtm, just one nit

@msekletar msekletar force-pushed the issue-21458-empty-run-queue branch from bd6eb7b to 64ec2c3 Compare November 26, 2021 10:52
@poettering poettering added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/with-minor-suggestions labels Nov 26, 2021
@bluca bluca added this to the v250 milestone Nov 26, 2021
Copy link
Copy Markdown
Member

@bluca bluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@bluca
Copy link
Copy Markdown
Member

bluca commented Nov 26, 2021

Bad news, this doesn't fix the issue :-(

focal-s390x:

TEST-37-RUNTIMEDIRECTORYPRESERVE:   FAIL     (1201 s)

@bluca bluca added ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels Nov 26, 2021
Assumption in edc027b was that job we first skipped because of active
ratelimit is still in run_queue. Hence we trigger the queue and dispatch
it in the next iteration. Actually we remove jobs from run_queue in
job_run_and_invalidate() before we call unit_start(). Hence if we want
to attempt to run the job again in the future we need to add it back
to run_queue.

Fixes systemd#21458
@msekletar msekletar force-pushed the issue-21458-empty-run-queue branch from 64ec2c3 to 5ca1ebb Compare November 26, 2021 16:56
@bluca bluca removed the ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR label Nov 26, 2021
@bluca
Copy link
Copy Markdown
Member

bluca commented Nov 26, 2021

looks like the ubuntu CI infrastructure is having issues

@mrc0mmand
Copy link
Copy Markdown
Member

I'll give it a try locally, for now, just to see if it indeed helps.

@mrc0mmand
Copy link
Copy Markdown
Member

It hasn't failed after 200+ iterations (whereas before it failed in <10 iterations), so it looks like the patch indeed works. But let's wait for the Ubuntu CIs to come back to life for extra measure.

continue;

job_add_to_run_queue(j);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely dumb this down: simply enqueue any job for a mount unit again, regardless what it is. Adding something to the queue can basically be done on a hunch, there's no need to guarantee that anything really changed. Hence, I'd really enqueue any job for a mount unit without trying to be too smart here. It's not that this is going to be a million things, it's just a small number, and it makes things really robust.

@poettering
Copy link
Copy Markdown
Member

I posted a dumbed down version of this PR: #21543

@mrc0mmand
Copy link
Copy Markdown
Member

Superseded by #21543.

@mrc0mmand mrc0mmand closed this Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

TEST-37-RUNTIMEDIRECTORYPRESERVE gets stuck on focal-s390x, focal-ppc64el

4 participants