Skip to content

core: uv_backend_timeout should return 0 when has pending watchers#3234

Closed
XeCycle wants to merge 1 commit into
libuv:v1.xfrom
XeCycle:watcher-timeout
Closed

core: uv_backend_timeout should return 0 when has pending watchers#3234
XeCycle wants to merge 1 commit into
libuv:v1.xfrom
XeCycle:watcher-timeout

Conversation

@XeCycle

@XeCycle XeCycle commented Jul 7, 2021

Copy link
Copy Markdown
Contributor

Looks like when a new FD is going to be EPOLL_CTL_ADDed, it is first pushed to watcher_queue; a later call to uv__io_poll would actually register them to kernel. But with UV_RUN_NOWAIT the iteration stops at an unfortunate point where we may have "runnable task" before blocking, and uv_backend_timeout failed to report this kind of work.

Some context: I'm embedding libuv into python's asyncio loop, to integrate existing codebase into python. (Not really performance-sensitive, so not using https://github.com/MagicStack/uvloop/; this project is not really ready for such kind of loop sharing yet) My approach is to write a custom Selector (implements python selectors.BaseSelector) that simply calls uv_run(UV_RUN_NOWAIT) right before calling calling epoll_wait(epfd_of_python_asyncio), using combined timeout from libuv and python. Before this patch uv_backend_timeout may return -1 even when there are work in watcher_queue, and the relevant FDs are not registered to kernel, causing an infinite fruitless wait.

Draft because I'm not sure whether this is the best approach; with this patch downstream caller would just go back and call another uv_run(UV_RUN_NOWAIT), but this may incur an unnecessary call to epoll_wait. Maybe we could modify the execution order and split up uv__io_poll so that we check for stop right before calling epoll_wait, but that can be a massive change.

Dear maintainers if you decide this approach is good enough I can take a little time to add tests for it.

@XeCycle

XeCycle commented Jul 7, 2021

Copy link
Copy Markdown
Contributor Author

Oh, the same uv_backend_timeout is use by uv_run so this will cause uv__io_poll to be called with timeout=0, will definitely cause a performance regression. Probably need to split either uv_backend_timeout or uv__io_poll.

@vtjnash

vtjnash commented Jul 7, 2021

Copy link
Copy Markdown
Member

The problem description makes sense to me, although I have not thought about it closely enough to suggest a solution. It sounds like you are on the right track though. Perhaps uv_backend_timeout should take responsibility for processing the watcher_queue, so that external callers can be sure the uv_backend_fd is up-to-date after checking that function (without needing a second full run through uv_run to do that)?

@XeCycle XeCycle force-pushed the watcher-timeout branch from 17e3b73 to e2065e4 Compare July 8, 2021 01:46
@XeCycle

XeCycle commented Jul 8, 2021

Copy link
Copy Markdown
Contributor Author

Updated to split uv_backend_timeout.

@XeCycle

XeCycle commented Jul 8, 2021

Copy link
Copy Markdown
Contributor Author

Perhaps uv_backend_timeout should take responsibility for processing the watcher_queue

Sounds interesting, basically that means to split some work from uv__io_poll into uv_backend_timeout. Of course this changes iteration order somehow when using UV_RUN_NOWAIT; I can construct a use case that's able to observe this difference, but the case shows more problems. Let's suppose

  • user has a prepare handle that registers a new FD (e.g. by uv_poll_start)
  • and later unregisters it in a check handle

When using UV_RUN_DEFAULT everything works, the FD is always polled by kernel when blocking; but with either UV_RUN_ONCE or UV_RUN_NOWAIT, that FD would have been unregistered after uv_backend_timeout returns, but prepare handles hadn't been executed, no FD given to kernel. If we return 0 for non-empty watcher_queue then this case becomes a busy polling loop, neither satisfactory.

Arguably this case looks niche, but seems the only way to fix the whole embedding problem is to alter UV_RUN_NOWAIT to return from uv_run right before the final call to epoll_wait; that's the only point where everyone expects to block. I'm unsure how "breaking" this is; maybe push it to v2? Or we can add another uv_run_mode...

EDIT, not the only way, we can also change uv__io_start and uv__io_stop to do (un)registration immediately, instead of queueing.

@XeCycle XeCycle force-pushed the watcher-timeout branch from e2065e4 to f5e31ad Compare July 8, 2021 06:49
@vtjnash

vtjnash commented Jul 26, 2021

Copy link
Copy Markdown
Member

IIUC, this will fix #3101

@XeCycle

XeCycle commented Jul 27, 2021

Copy link
Copy Markdown
Contributor Author

@vtjnash this is only a partial fix. As described above, the "start poll in prepare --- stop in check" pattern will become a busy loop with this change; maybe we can call it "working", but barely useful. Still better than nothing at all, though.

I suspect that pattern may be rare, because it incurs unnecessary syscalls at every iteration, so performance sensitive users probably have avoided it; but I have no statistics to support my guess.

Summing up options discussed so far:

  • just this patch: fixes only part of the problem, but simple; edge case degrades to busy wait
  • update fd in uv_backend_timeout: fixes only part of the problem, but same edge case will ignore events; also bulkier
  • change behavior of UV_RUN_NOWAIT: super bulky change, fixes everything, but... breaking change?
  • a new run mode: do we want to introduce a new mode just to tell users the old mode doesn't work?

Maybe we can go with 1 for now and postpone 3 for v2. I'm ready to get back and add tests for this if you are happy with this change; afraid I don't have enough time to carry out option 3 however.

@stale

stale Bot commented Aug 18, 2021

Copy link
Copy Markdown

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale Bot added the stale label Aug 18, 2021
vtjnash added a commit to vtjnash/libuv that referenced this pull request Feb 9, 2022
Pending work may be either (on any platform) pending_queue callbacks or
(on unix) watcher_queue handles to add to the io poll object.
Previously, we might have gotten somewhat stuck if the user caused an
event to be added to one of these in the idle or prepare callbacks, or
was embedding libuv.

Refs: libuv#3234
Refs: libuv#3101
Refs: libuv#3308
vtjnash added a commit to vtjnash/libuv that referenced this pull request Feb 10, 2022
Pending work may be either (on any platform) pending_queue callbacks or
(on unix) watcher_queue handles to add to the io poll object.
Previously, we might have gotten somewhat stuck if the user caused an
event to be added to one of these in the idle or prepare callbacks, or
was embedding libuv.

Refs: libuv#3234
Refs: libuv#3101
Refs: libuv#3308
vtjnash added a commit that referenced this pull request Feb 13, 2022
Pending work may be either (on any platform) pending_queue callbacks or
(on unix) watcher_queue handles to add to the io poll object.
Previously, we might have gotten somewhat stuck if the user caused an
event to be added to one of these in the idle or prepare callbacks, or
was embedding libuv.

Refs: #3234
Refs: #3101
Refs: #3308
@vtjnash vtjnash closed this Feb 14, 2022
@vtjnash

vtjnash commented Feb 14, 2022

Copy link
Copy Markdown
Member

Fixed by #3466

@XeCycle XeCycle deleted the watcher-timeout branch February 28, 2022 09:21
JeffroMF pushed a commit to JeffroMF/libuv that referenced this pull request May 16, 2022
Pending work may be either (on any platform) pending_queue callbacks or
(on unix) watcher_queue handles to add to the io poll object.
Previously, we might have gotten somewhat stuck if the user caused an
event to be added to one of these in the idle or prepare callbacks, or
was embedding libuv.

Refs: libuv#3234
Refs: libuv#3101
Refs: libuv#3308
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
Pending work may be either (on any platform) pending_queue callbacks or
(on unix) watcher_queue handles to add to the io poll object.
Previously, we might have gotten somewhat stuck if the user caused an
event to be added to one of these in the idle or prepare callbacks, or
was embedding libuv.

Refs: libuv/libuv#3234
Refs: libuv/libuv#3101
Refs: libuv/libuv#3308
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Dec 16, 2025
Pending work may be either (on any platform) pending_queue callbacks or
(on unix) watcher_queue handles to add to the io poll object.
Previously, we might have gotten somewhat stuck if the user caused an
event to be added to one of these in the idle or prepare callbacks, or
was embedding libuv.

Refs: libuv/libuv#3234
Refs: libuv/libuv#3101
Refs: libuv/libuv#3308
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants