Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115103: Implement delayed free mechanism for free-threaded builds #115367

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Feb 12, 2024

This adds _PyMem_FreeDelayed() and supporting functions. The
_PyMem_FreeDelayed() function frees memory with the same allocator as
PyMem_Free(), but after some delay to ensure that concurrent lock-free
readers have finished.

…uilds

This adds `_PyMem_FreeDelayed()` and supporting functions. The
`_PyMem_FreeDelayed()` function frees memory with the same allocator as
`PyMem_Free()`, but after some delay to ensure that concurrent lock-free
readers have finished.
Copy link
Contributor

@DinoV DinoV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid, let me know what you think of the comments.

process_queue(struct llist_node *head, struct _qsbr_thread_state *qsbr)
{
while (!llist_empty(head)) {
struct _mem_work_chunk *buf = work_queue_first(head);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we'e not just doing something like:
for (Py_ssize_t i = buf->rd_idx; i < buf->wr_idx; i++) {
...
}

Or the while loop like _PyMem_FiniDelayed with the polling from here added in vs just going back through the head of the list each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the loop to avoid grabbing the head every iteration

struct _mem_work_chunk *buf = work_queue_first(head);

if (buf->rd_idx == buf->wr_idx) {
llist_remove(&buf->node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should wait for buf->rd_idx == WORK_ITEMS_PER_CHUNK before freeing and removing?

Otherwise it seems like we could end up in a situation where we're doing some delay frees, process them all, and then quickly need to do more delayed frees and end up re-allocating this 4k buffer repeatedly.

And if this is any buffer other than the last then buf->wr_idx == WORK_ITEMS_PER_CHUNK so we'd always free all but one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea. We can do that for the thread queues but not the interpreter queue.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!
(I 've taken a look at QSBR mechanism due to working on list allocation while making list thread-safe)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants