Skip to content

Background Job Manager (BJM) - replacement for BIO#12029

Closed
JimB123 wants to merge 3 commits into
redis:unstablefrom
JimB123:bio-to-bjm
Closed

Background Job Manager (BJM) - replacement for BIO#12029
JimB123 wants to merge 3 commits into
redis:unstablefrom
JimB123:bio-to-bjm

Conversation

@JimB123

@JimB123 JimB123 commented Apr 12, 2023

Copy link
Copy Markdown
Contributor

The BIO module is a source of minor annoyance. Background Job Manager (BJM) is designed as a modular replacement for BIO. Assuming there is positive response for this PR, the PR will be extended to convert existing BIO usage to BJM and remove BIO.

Motivations:

  • BIO (Background I/O) is a misnomer. I'd guess that this was originally a mechanism to perform an async flush on the AOF file, but it has since morphed into a mechanism for performing other kinds of background activities - like lazy flush - which have nothing to do with I/O
  • BIO isn't well layered. BIO should be a low-level utility for handling async background tasks. However the design of this utility requires that the "low-level" code have specific knowledge of the "high-level" application.
  • BIO isn't very modular. For each new type of background task, the BIO code must be altered. A new thread is created for each type of task. New definitions and logic are added for each new task.
  • Improve performance. BIO creates multiple threads, but doesn't use them effectively. There is 1 unique thread for each type of task. During a burst of lazy evictions, only the (single) eviction thread will be active while other BIO threads sit idle. During a lazy flush operation, 2 dictionaries (main and expire) are processed sequentially rather than in parallel.
  • Reduced memory usage. BIO uses a list to maintain the queue of jobs. The new Fifo uses much less memory (and is more performant) if there's a large burst of jobs.

PR Contents:

  • Fifo - presents a simple FIFO queue, it is over 50% more space efficient and over 50% faster than using Redis list. It's delivered as an independent, modular, & reusable utility data structure.
  • BJM provides a simple fire-and-forget interface to have something done by a background thread. It maintains a fixed/configurable quantity of threads (rather than one per "job type"). This prevents having too many active background threads and also allows faster processing for a large number of jobs of the same job type.

@JimB123 JimB123 requested a review from madolson April 12, 2023 17:34
@JimB123 JimB123 marked this pull request as draft April 12, 2023 17:34
@JimB123 JimB123 marked this pull request as ready for review April 12, 2023 18:13

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review. This looks like two good ideas!

If BJM replaces BIO, I'd prefer that BIO is deleted at the same time so we don't have several ways of doing the same thing.

Regarding the included fifo datastructure, can it replace adlist in more places, perhaps all occurrences throughout the repo? If yes, we can delete adlist too. If not, perhaps it can be beneficial to replace a few usages. AFACT, this fifo data structure can easily be made to support the operations of a double-ended queue, i.e. push and pop in both ends.

Comment thread src/fifo.h
Comment on lines +15 to +24
// Push an item onto the end of the queue.
void fifoPush(Fifo *q, void *ptr);

// Look at the first item in the queue (without removing it).
// NOTE: asserts if the queue is empty.
void *fifoPeek(Fifo *q);

// Return and remove the first item from the queue.
// NOTE: asserts if the queue is empty.
void *fifoPop(Fifo *q);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The words push and pop are more often used for a stack (AKA LIFO) datatype.

In a FIFO queue (or just queue) the operations are more often called enqueue and dequeue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very partial to the Rust VecDeque push_front and pop_back, so I don't mind the naming of push and pop for a queue.

Comment thread src/fifo.c
Comment on lines +26 to +32
* For a SINGLE BLOCK containing (for example) 4 items, the layout looks like this:
* +--------+--------+--------+--------+--------+--------+--------+--------+
* SINGLE BLOCK: | slot 0 | slot 1 | slot 2 | slot 3 | slot 4 | slot 5 | slot 6 | next/ |
* | item | item | item | item | - | - | - | lastIdx|
* +--------+--------+--------+--------+--------+--------+--------+--------+
* ^
* lastIdx (3)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful documentation!

@madolson

Copy link
Copy Markdown
Contributor

I think it's a good quality of life improvement to Redis.

@oranagra Pinging you since I would like your buy in before Jim does any more work. The current PR just includes the high level APIs, but the intention is to use this functionality to replace the bio.c functionality. I couldn't immediately think of other places the FIFO might be used, but I still think it's useful enough to include. The goal would be to include the refactoring after we stabilize 7.2 and target it for Redis 8. Note this will eventually be a major decision since we will want to allow configuring a custom number of threads.

@JimB123

JimB123 commented Apr 12, 2023

Copy link
Copy Markdown
Contributor Author

Not a full review. This looks like two good ideas!

Thanks Viktor @zuiderkwast

If BJM replaces BIO, I'd prefer that BIO is deleted at the same time so we don't have several ways of doing the same thing.

Absolutely. I just want a go/no-go decision before I put in the work to remove BIO. But that's definitely my intent.

Regarding the included fifo datastructure, can it replace adlist in more places, perhaps all occurrences throughout the repo? If yes, we can delete adlist too. If not, perhaps it can be beneficial to replace a few usages. AFACT, this fifo data structure can easily be made to support the operations of a double-ended queue, i.e. push and pop in both ends.

I have some FIFO use cases in local code, but I'm not sure that this is applicable in too many places for OSS Redis. This is a very efficient FIFO queue, but it has limitations for other uses. Namely:

  • It's singly-linked. This saves space. Adding a reverse pointer would increase memory overhead by ~134%. Since it's singly-linked, pops off the rear aren't possible.
  • There's no mechanism to remove items from the middle of the list other than an O(n) search. There's no way to address an item other than it's value. The physical location of an item can change.

Comment thread src/fifo.c


// Look at the first item in the queue (without removing it).
// NOTE: asserts if the queue is empty.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully buy the utility of asserting on the queue being non-empty, most implementations would return NULL instead. You implement this paradigm by getting the length first, and then deciding to call peek.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the value is user-defined, 0/NULL might be a valid value. You could change the API to look something like:

bool fifoPeek(Fifo *q, void **returned_value);

This definition is uglier, and the caller still has to check the bool (which is not much different than checking the length). [And if we still can't use bool, it's even uglier with an int!]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just assuming you could require end users to wrap objects that might be NULL or return a value that embeds the pointer, but I like the proposed API too. Having an API that "might" sometimes assert is a very difficult API to understand and might introduce non-obvious bugs. Forcing a user to also check for the length puts a lot of complexity onto the user to remember.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still disagree with this choice, seems more like a ticking time bomb than anything else.

Comment thread src/fifo.h Outdated
Comment thread src/fifo.h
Comment on lines +15 to +24
// Push an item onto the end of the queue.
void fifoPush(Fifo *q, void *ptr);

// Look at the first item in the queue (without removing it).
// NOTE: asserts if the queue is empty.
void *fifoPeek(Fifo *q);

// Return and remove the first item from the queue.
// NOTE: asserts if the queue is empty.
void *fifoPop(Fifo *q);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very partial to the Rust VecDeque push_front and pop_back, so I don't mind the naming of push and pop for a queue.

@zuiderkwast

Copy link
Copy Markdown
Contributor

@JimB123

This is a very efficient FIFO queue, but it has limitations for other uses. Namely:

  • It's singly-linked. This saves space. Adding a reverse pointer would increase memory overhead by ~134%. Since it's singly-linked, pops off the rear aren't possible.

Right, I didn't think about that. We can't have pop_back but it'd be possible to add a push_front in O(1) if that's useful in any case.

  • There's no mechanism to remove items from the middle of the list other than an O(n) search.

True, but many of the adlist operations are O(n) too so it doesn't mean it's a regression. But it's cleaner to refuse to add O(n) operations altogether. :)

@JimB123

JimB123 commented Apr 13, 2023

Copy link
Copy Markdown
Contributor Author

@zuiderkwast

... but it'd be possible to add a push_front in O(1) if that's useful in any case.

I just added a fifoPushFront in my local branch. Will include with final delivery.

@oranagra

Copy link
Copy Markdown
Member

sorry for joining late, too busy elsewhere.
before diving into the code, which i assume is very neat, i'd like to challenge the reasoning that was mentioned (other than a cleanup of ugly code).

first, doing more background jobs in parallel, isn't necessarily desired, examples:

  1. freeing expires and db dict in parallel means more contention on the allocator arena locks.
  2. for WAITAOF, we recently added some ordering requirements, two jobs of different types designated to the same thread so that we are sure they're not executed out of order.

secondly, the description says it's more memory efficient, but do we really care? AFAIK we don't usually have a long list of tasks in that queue. maybe more important is the insertion and deletion overhead (allocation / deallocation), and the locking (maybe a lockless queue can be beneficial)

@JimB123 JimB123 force-pushed the bio-to-bjm branch 2 times, most recently from 3341606 to 96c6369 Compare April 18, 2023 21:33
@JimB123

JimB123 commented Apr 18, 2023

Copy link
Copy Markdown
Contributor Author

@oranagra You are correct that the primary motivation is code cleanup. BJM provides a reusable interface - as opposed to BIO which has a hard-coded interface change for every type of background operation. BJM corrects the layering issues and introduces modularity of design.

Side-node: as evidence of the layering/modularity improvement, 10 functions have been removed from server.h, either becoming local (static) functions or being eliminated completely.

  1. freeing expires and db dict in parallel means more contention on the allocator arena locks.

Yes, that's possible/probable. However I did some testing and do see an overall improvement in time. Note that the expires dictionary is actually pretty trivial and doesn't contribute a lot to the operation.

  1. for WAITAOF, we recently added some ordering requirements, two jobs of different types designated to the same thread so that we are sure they're not executed out of order.

I'd suggest that this isn't the best model. Enforcing an ordering on fire-and-forget background tasks shouldn't be necessary. For the case mentioned, with 2 tasks, this is easily accomplished with a single callback which performs the events serially. Alternatively, the initial task could initiate (queue) the 2nd task before completion.

secondly, the description says it's more memory efficient, but do we really care?

Again, you are correct. The background queue usually has a very limited list of items (typically 1 or 0). Unless something weird is happening, memory should not be a concern.

I mentioned performance and memory as side-benefits, mostly as an indication that this change shouldn't degrade performance or memory usage.

For a queuing use case, the Fifo queue is much more memory efficient and faster than using list. While it won't make much of a difference here, I suspect there are additional use cases where this data structure will be handy. (I have some local use cases.)

@JimB123

JimB123 commented Apr 18, 2023

Copy link
Copy Markdown
Contributor Author

Integration complete. Changes visible now.

  • BJM replaces BIO providing better modularity and reusability
  • LazyFree has been refactored for better modularity and reusability
  • All code referencing BIO or LazyFree has been updated

@oranagra

Copy link
Copy Markdown
Member

I'd suggest that this isn't the best model. Enforcing an ordering on fire-and-forget background tasks shouldn't be necessary. For the case mentioned, with 2 tasks, this is easily accomplished with a single callback which performs the events serially. Alternatively, the initial task could initiate (queue) the 2nd task before completion.

IIRC it's not two tasks that can be replaced with a single callback that performs both since the tasks are not started together.
IIRC it's an fsync and close tasks, each also updating the global synced replication offset, and if they're executed out of order then the one scheduled earlier can override the fsynced replication offset already updated (setting it to a lower value).

@slavak correct me if i'm wrong, and if you have time, take a look at the changes in this PR.

@uvletter

Copy link
Copy Markdown
Contributor

I like the idea, I proposed a refactor in #11713 (comment) before, while the current way is too coupled. In my imagination something like thread pool(exactly the BJM here) substitutes the BIO, somehow instead of the fire-and-forget interface, I assume a future based implementation, which can impose the order between jobs, or wait for a specific job.

future f = submit(task); // run task asynchronously
future f2 = submit_then(f, task2); // run task2 after task is done
f2.get(); // wait for the finish of task2

For the case mentioned, with 2 tasks, this is easily accomplished with a single callback which performs the events serially. Alternatively, the initial task could initiate (queue) the 2nd task before completion.

It requires task1 knows about task2, or callback knows task1 and task2 both. However the task is dynamic so many cases may occurs:

  1. task1 is running, and task2 is issued
  2. task1 is done, and then task2 is issued
  3. task1, task2, task3... is issued

Comment thread src/fifo.c
* ^
* lastIdx (2)
*/
typedef struct FifoBlock {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about 32-bit system, I think we should align FifoBlock to 8 byte explicitly to guarantee 3 unused bits in next

Suggested change
typedef struct FifoBlock {
typedef struct __attribute__((aligned(8))) FifoBlock {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aligned attribute is applicable to compiler configured memory. Specifically globals and stack. It can also affect the padding within a structure. But it doesn't affect the alignment for a block of heap memory (returned by malloc).

malloc, by definition, is required to return blocks which are aligned for the largest type that can fit in the block. malloc doesn't know what you will be storing in the block. So, a malloc(5) is guaranteed to be (at least) 4-byte aligned (since is might hold an integer at offset 0). Any block 8-bytes or larger is guaranteed to be (at least) 8-byte aligned. On architectures where there are datatypes requiring 16 byte alignment, any malloc operation for 16 or more bytes would have to be 16 byte aligned.

Even on a 32-bit system, there are types requiring 8 byte alignment. We can be assured that malloc will always return a block that's (at least) 8 byte aligned.

@zuiderkwast zuiderkwast Apr 19, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning: For allocators which lack a malloc_size(), zmalloc didn't always respect the alignment requirement, since it used to add a prefix of sizeof(size_t) bytes before the returned address (zmalloc() returns ptr+PREFIX_SIZE) which on 32bit platforms i 4. This is now changed so PREFIX_SIZE is at least 8, which guarantees the returned pointer is 8 bytes aligned.

This was not always the case, but now it is, to allow a similar pointer bit trick in dict.c.

Comment thread src/bjm.c

static int functionsCount;
static int functionsCapacity;
static bjmJobFunc *functions; // An array of function pointers. Index matches job lists.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not directly register function to jobsByFunc, while Joblist already has the member func

Comment thread src/bjm.c
} Joblist;

// This arrays hold a Joblist* for each known callback function.
static Joblist **jobsByFunc; // Array indexed by index in functions[]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use array of Joblist can reduce a pointer redirect
To have a better readibility we could put jobsByFunc, functionsCount and functionsCapacity togather, e.g. in a struct

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of related, if we want to make the code more module, it might help to avoid these local statics and have a single object that is initialized. That way we could at least support unit testing more easily in the future since we would be able to allocated dedicated objects. Not a requirement, just a thought.

Comment thread src/bjm.h
* This co-locates a static variable at the point of job submission, and avoids repeated
* registration calls.
*/
bjmJobFuncHandle bjmRegisterJobFunc(bjmJobFunc func);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the register-and-submit pattern everywhere seems a little annoying, comparing with it I still prefer the previous way of static pre-defined enum, or we can just submit the func and hash it to specific slot, without a pre-register.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the register-and-submit pattern everywhere seems a little annoying

@uvletter I agree with you. I actually coded it as a hash on my first draft. I changed to the registration mechanism in order to avoid the need for a hash lookup on every submission. The function registration is simple for units containing an init function. Without an init function, we can either create one, or do the static registration trick. A hash lookup isn't much, but when you add it up for every lazy eviction (and other uses), it seems pretty wasteful since it can be easily avoided with a registration.

Note that I don't like the static enum - at least not as it's used today. This is one of the reasons that I'm proposing the refactor. We shouldn't have to change BIO in order to have a new job to submit.

@uvletter

Copy link
Copy Markdown
Contributor

IIRC it's not two tasks that can be replaced with a single callback that performs both since the tasks are not started together.
IIRC it's an fsync and close tasks, each also updating the global synced replication offset, and if they're executed out of order then the one scheduled earlier can override the fsynced replication offset already updated (setting it to a lower value)

Reviewed the WAITAOF code, I also suppose it's the race condition that slavak wanna forbid. I find another way to solve it without making the task in order. When updating the fsynced replication offset in bio, we can compare the to-be-fsynced offset and currently fsynced offset, skip the set if the set would rollback the offset, otherwise CAS the fsynced offset to the new one, so simply fire-and-forget model works well.

@slavak

slavak commented Apr 23, 2023

Copy link
Copy Markdown
Contributor

@oranagra is correct about the WAITAOF ordering requirements. In fact, the requirement goes both ways:

Suppose we have task f which fsyncs and updates the fsynced offset to a, and task g which fsyncs and updates the offset to b, where a < b.

  1. If g gets executed before f, we don't want f to overwrite the offset value b with the smaller a.
  2. In specific cases (IIRC- during AOF rewrite), executing g may not guarantee all data up to that point has been actually fsynced until f is also executed, so setting the offset to b may actually provide invalid durability guarantees.

(1) can be resolved with a CAS that takes the maximum of the two values. (2) was the main reason we chose to enforce ordering on the tasks. I think eliminating the requirement for task ordering will be quite difficult.

@JimB123

JimB123 commented Apr 24, 2023

Copy link
Copy Markdown
Contributor Author

I think eliminating the requirement for task ordering will be quite difficult.

So, maybe the best option is to keep BIO for it's original purpose (I/O) - leave it for the AOF single-queue case, but strip out other use cases?

Alternatively, BJM could add an additional thread which is dedicated to sequential jobs. We could have something like submitSequential which ensures FIFO ordering to a single thread.

@madolson madolson left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As sort of generally discussed, I have new interest in trying to come up with a better background job structure. I think this sets up some good primitives, mostly minor stylistic and organization comments, but a couple of serious ones.

Comment thread src/fifo.c


// Look at the first item in the queue (without removing it).
// NOTE: asserts if the queue is empty.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still disagree with this choice, seems more like a ticking time bomb than anything else.

Comment thread src/fifo.h
Comment on lines +19 to +20
// (Supports usage as a stack.)
void fifoPushFront(Fifo *q, void *ptr);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this isn't a fifo it can also be lifo, so I don't think the naming makes sense much at all.

Comment thread src/db.c


static bjmJobFuncHandle freeModuleObjectId;
static void freeModuleObjectFunc(void *privdata) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we lazily setting this up?

Comment thread src/db.c
Comment on lines +236 to +240
// Module values are not fundamental types. We can't pass these
// directly to lazyfree. We need to call the module interface to
// evaluate effort. Furthermore, modules require DBID/KEY which
// implies a main dictionary entry - this is not the case for
// fundamental types.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always use /* */ for comments.

Comment thread src/redisassert.h
#define __REDIS_ASSERT_H__

#include "config.h"
#include <stdlib.h>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Comment thread src/config.c
return 1;
}

void drainBackgroundAofFsyncs(); // in aof.c

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why extern instead of placing it in the header?

Comment thread src/bjm.c
}


// API

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this convention, it's not adding that much and it's confusing outside of amazon contexts.

Comment thread src/bjm.c

// API
void bjmKillThreads(void) {
for (int i = 0; i < threadCount; i++) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int i = 0; i < threadCount; i++) {
for (int i = 0; i < threadCount; i++) {

Comment thread src/bjm.c

// API
void bjmInit(int numThreads) {
if (threadCount == numThreads) return; // Silently skip to support testing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thing you are doing so much testing :). If we want to better enable testing, we should start allocating the BJM system as its own struct so we can initialize multiple of them (Mentioned this in another comment).

Comment thread src/config.c
/* Integer configs */
createIntConfig("databases", NULL, IMMUTABLE_CONFIG, 1, INT_MAX, server.dbnum, 16, INTEGER_CONFIG, NULL, NULL),
createIntConfig("port", NULL, MODIFIABLE_CONFIG, 0, 65535, server.port, 6379, INTEGER_CONFIG, NULL, updatePort), /* TCP port. */
createIntConfig("bjm-threads", NULL, IMMUTABLE_CONFIG, 1, 128, server.bjm_threads_num, 2, INTEGER_CONFIG, NULL, NULL),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this practically needs to be 1 as a default, otherwise users might see significantly more usage than before.

Comment thread src/lazyfree.h
* May short-circuit BJM for small items, if so, metrics will be untouched. */
void lazyfreeDict(dict *d);

/* LazyFree a list, invoking the list's free function (if any) for each item. */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor inconsistency in the API, this one can also short circuit.

@uvletter

Copy link
Copy Markdown
Contributor

I think eliminating the requirement for task ordering will be quite difficult.

So, maybe the best option is to keep BIO for it's original purpose (I/O) - leave it for the AOF single-queue case, but strip out other use cases?

Alternatively, BJM could add an additional thread which is dedicated to sequential jobs. We could have something like submitSequential which ensures FIFO ordering to a single thread.

How about my future based proposal? In contrast to fire-and-forget, every submission will return a stub, so we can query the task completion by the stub, or submit new task in respect of dependencies. For example

future f1 = submit(task1)
future f2 = submit_with_dependency(task2, f1)
// if we wanna know if task2 completes
status = query_task(f2)

which makes task1 and task2 in sync. I have a rough draft of it(though not implement it yet), if you have interest I'd like implement it.

@madolson

Copy link
Copy Markdown
Contributor

How about my future based proposal? In contrast to fire-and-forget, every submission will return a stub, so we can query the task completion by the stub, or submit new task in respect of dependencies. For example

I like the futures idea. It could be called on the event loop, such that when the background operation is done it will invoke a callback from the eventloop.

@JimB123

JimB123 commented Jul 11, 2023

Copy link
Copy Markdown
Contributor Author

How about my future based proposal? In contrast to fire-and-forget, every submission will return a stub, so we can query the task completion by the stub, or submit new task in respect of dependencies. For example

I like the futures idea. It could be called on the event loop, such that when the background operation is done it will invoke a callback from the eventloop.

Certainly possible. But greatly increases the overhead. This would require:

  • Returning data from the background thread to foreground thread via mutex queue (or similar)
  • Invoking callbacks via timer(?) task on the event loop
  • Additional memory to pass callback info to the background thread - and then return it to foreground thread

In most cases, the future capability would not be needed. We could make the "future" part optional, only incurring the overhead when something more than fire-and-forget is needed. However, if it's going to be optional, maybe we want to implement without the future capability and then later implementing that as an extension.

@madolson

Copy link
Copy Markdown
Contributor

In most cases, the future capability would not be needed. We could make the "future" part optional, only incurring the overhead when something more than fire-and-forget is needed. However, if it's going to be optional, maybe we want to implement without the future capability and then later implementing that as an extension.

I agree with this.

@oranagra

Copy link
Copy Markdown
Member

However, if it's going to be optional, maybe we want to implement without the future capability and then later implementing that as an extension

doesn't that mean merge a PR that introduces a bug into redis?
i.e. causes WAITAOF to be unsafe?

@madolson

Copy link
Copy Markdown
Contributor

doesn't that mean merge a PR that introduces a bug into redis?

I don't know if this was specifically targeting to be the solution to WAITAOF. We definitely need to resolve the WAITAOF issue before merging.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@madolson

Copy link
Copy Markdown
Contributor

@JimB123 When you get a chance, move this PR to https://github.com/placeholderkv/placeholderkv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants