Background Job Manager (BJM) - replacement for BIO#12029
Conversation
zuiderkwast
left a comment
There was a problem hiding this comment.
Not a full review. This looks like two good ideas!
If BJM replaces BIO, I'd prefer that BIO is deleted at the same time so we don't have several ways of doing the same thing.
Regarding the included fifo datastructure, can it replace adlist in more places, perhaps all occurrences throughout the repo? If yes, we can delete adlist too. If not, perhaps it can be beneficial to replace a few usages. AFACT, this fifo data structure can easily be made to support the operations of a double-ended queue, i.e. push and pop in both ends.
| // Push an item onto the end of the queue. | ||
| void fifoPush(Fifo *q, void *ptr); | ||
|
|
||
| // Look at the first item in the queue (without removing it). | ||
| // NOTE: asserts if the queue is empty. | ||
| void *fifoPeek(Fifo *q); | ||
|
|
||
| // Return and remove the first item from the queue. | ||
| // NOTE: asserts if the queue is empty. | ||
| void *fifoPop(Fifo *q); |
There was a problem hiding this comment.
The words push and pop are more often used for a stack (AKA LIFO) datatype.
In a FIFO queue (or just queue) the operations are more often called enqueue and dequeue.
There was a problem hiding this comment.
I'm very partial to the Rust VecDeque push_front and pop_back, so I don't mind the naming of push and pop for a queue.
| * For a SINGLE BLOCK containing (for example) 4 items, the layout looks like this: | ||
| * +--------+--------+--------+--------+--------+--------+--------+--------+ | ||
| * SINGLE BLOCK: | slot 0 | slot 1 | slot 2 | slot 3 | slot 4 | slot 5 | slot 6 | next/ | | ||
| * | item | item | item | item | - | - | - | lastIdx| | ||
| * +--------+--------+--------+--------+--------+--------+--------+--------+ | ||
| * ^ | ||
| * lastIdx (3) |
There was a problem hiding this comment.
Beautiful documentation!
|
I think it's a good quality of life improvement to Redis. @oranagra Pinging you since I would like your buy in before Jim does any more work. The current PR just includes the high level APIs, but the intention is to use this functionality to replace the bio.c functionality. I couldn't immediately think of other places the FIFO might be used, but I still think it's useful enough to include. The goal would be to include the refactoring after we stabilize 7.2 and target it for Redis 8. Note this will eventually be a major decision since we will want to allow configuring a custom number of threads. |
Thanks Viktor @zuiderkwast
Absolutely. I just want a go/no-go decision before I put in the work to remove BIO. But that's definitely my intent.
I have some FIFO use cases in local code, but I'm not sure that this is applicable in too many places for OSS Redis. This is a very efficient FIFO queue, but it has limitations for other uses. Namely:
|
|
|
||
|
|
||
| // Look at the first item in the queue (without removing it). | ||
| // NOTE: asserts if the queue is empty. |
There was a problem hiding this comment.
I'm not sure I fully buy the utility of asserting on the queue being non-empty, most implementations would return NULL instead. You implement this paradigm by getting the length first, and then deciding to call peek.
There was a problem hiding this comment.
Since the value is user-defined, 0/NULL might be a valid value. You could change the API to look something like:
bool fifoPeek(Fifo *q, void **returned_value);
This definition is uglier, and the caller still has to check the bool (which is not much different than checking the length). [And if we still can't use bool, it's even uglier with an int!]
There was a problem hiding this comment.
I was just assuming you could require end users to wrap objects that might be NULL or return a value that embeds the pointer, but I like the proposed API too. Having an API that "might" sometimes assert is a very difficult API to understand and might introduce non-obvious bugs. Forcing a user to also check for the length puts a lot of complexity onto the user to remember.
There was a problem hiding this comment.
I still disagree with this choice, seems more like a ticking time bomb than anything else.
| // Push an item onto the end of the queue. | ||
| void fifoPush(Fifo *q, void *ptr); | ||
|
|
||
| // Look at the first item in the queue (without removing it). | ||
| // NOTE: asserts if the queue is empty. | ||
| void *fifoPeek(Fifo *q); | ||
|
|
||
| // Return and remove the first item from the queue. | ||
| // NOTE: asserts if the queue is empty. | ||
| void *fifoPop(Fifo *q); |
There was a problem hiding this comment.
I'm very partial to the Rust VecDeque push_front and pop_back, so I don't mind the naming of push and pop for a queue.
Right, I didn't think about that. We can't have
True, but many of the adlist operations are O(n) too so it doesn't mean it's a regression. But it's cleaner to refuse to add O(n) operations altogether. :) |
I just added a |
|
sorry for joining late, too busy elsewhere. first, doing more background jobs in parallel, isn't necessarily desired, examples:
secondly, the description says it's more memory efficient, but do we really care? AFAIK we don't usually have a long list of tasks in that queue. maybe more important is the insertion and deletion overhead (allocation / deallocation), and the locking (maybe a lockless queue can be beneficial) |
3341606 to
96c6369
Compare
|
@oranagra You are correct that the primary motivation is code cleanup. BJM provides a reusable interface - as opposed to BIO which has a hard-coded interface change for every type of background operation. BJM corrects the layering issues and introduces modularity of design. Side-node: as evidence of the layering/modularity improvement, 10 functions have been removed from
Yes, that's possible/probable. However I did some testing and do see an overall improvement in time. Note that the expires dictionary is actually pretty trivial and doesn't contribute a lot to the operation.
I'd suggest that this isn't the best model. Enforcing an ordering on fire-and-forget background tasks shouldn't be necessary. For the case mentioned, with 2 tasks, this is easily accomplished with a single callback which performs the events serially. Alternatively, the initial task could initiate (queue) the 2nd task before completion.
Again, you are correct. The background queue usually has a very limited list of items (typically 1 or 0). Unless something weird is happening, memory should not be a concern. I mentioned performance and memory as side-benefits, mostly as an indication that this change shouldn't degrade performance or memory usage. For a queuing use case, the |
|
Integration complete. Changes visible now.
|
IIRC it's not two tasks that can be replaced with a single callback that performs both since the tasks are not started together. @slavak correct me if i'm wrong, and if you have time, take a look at the changes in this PR. |
|
I like the idea, I proposed a refactor in #11713 (comment) before, while the current way is too coupled. In my imagination something like thread pool(exactly the BJM here) substitutes the BIO, somehow instead of the fire-and-forget interface, I assume a future based implementation, which can impose the order between jobs, or wait for a specific job.
It requires task1 knows about task2, or callback knows task1 and task2 both. However the task is dynamic so many cases may occurs:
|
| * ^ | ||
| * lastIdx (2) | ||
| */ | ||
| typedef struct FifoBlock { |
There was a problem hiding this comment.
how about 32-bit system, I think we should align FifoBlock to 8 byte explicitly to guarantee 3 unused bits in next
| typedef struct FifoBlock { | |
| typedef struct __attribute__((aligned(8))) FifoBlock { |
There was a problem hiding this comment.
The aligned attribute is applicable to compiler configured memory. Specifically globals and stack. It can also affect the padding within a structure. But it doesn't affect the alignment for a block of heap memory (returned by malloc).
malloc, by definition, is required to return blocks which are aligned for the largest type that can fit in the block. malloc doesn't know what you will be storing in the block. So, a malloc(5) is guaranteed to be (at least) 4-byte aligned (since is might hold an integer at offset 0). Any block 8-bytes or larger is guaranteed to be (at least) 8-byte aligned. On architectures where there are datatypes requiring 16 byte alignment, any malloc operation for 16 or more bytes would have to be 16 byte aligned.
Even on a 32-bit system, there are types requiring 8 byte alignment. We can be assured that malloc will always return a block that's (at least) 8 byte aligned.
There was a problem hiding this comment.
Worth mentioning: For allocators which lack a malloc_size(), zmalloc didn't always respect the alignment requirement, since it used to add a prefix of sizeof(size_t) bytes before the returned address (zmalloc() returns ptr+PREFIX_SIZE) which on 32bit platforms i 4. This is now changed so PREFIX_SIZE is at least 8, which guarantees the returned pointer is 8 bytes aligned.
This was not always the case, but now it is, to allow a similar pointer bit trick in dict.c.
|
|
||
| static int functionsCount; | ||
| static int functionsCapacity; | ||
| static bjmJobFunc *functions; // An array of function pointers. Index matches job lists. |
There was a problem hiding this comment.
why not directly register function to jobsByFunc, while Joblist already has the member func
| } Joblist; | ||
|
|
||
| // This arrays hold a Joblist* for each known callback function. | ||
| static Joblist **jobsByFunc; // Array indexed by index in functions[] |
There was a problem hiding this comment.
Use array of Joblist can reduce a pointer redirect
To have a better readibility we could put jobsByFunc, functionsCount and functionsCapacity togather, e.g. in a struct
There was a problem hiding this comment.
Sort of related, if we want to make the code more module, it might help to avoid these local statics and have a single object that is initialized. That way we could at least support unit testing more easily in the future since we would be able to allocated dedicated objects. Not a requirement, just a thought.
| * This co-locates a static variable at the point of job submission, and avoids repeated | ||
| * registration calls. | ||
| */ | ||
| bjmJobFuncHandle bjmRegisterJobFunc(bjmJobFunc func); |
There was a problem hiding this comment.
the register-and-submit pattern everywhere seems a little annoying, comparing with it I still prefer the previous way of static pre-defined enum, or we can just submit the func and hash it to specific slot, without a pre-register.
There was a problem hiding this comment.
the register-and-submit pattern everywhere seems a little annoying
@uvletter I agree with you. I actually coded it as a hash on my first draft. I changed to the registration mechanism in order to avoid the need for a hash lookup on every submission. The function registration is simple for units containing an init function. Without an init function, we can either create one, or do the static registration trick. A hash lookup isn't much, but when you add it up for every lazy eviction (and other uses), it seems pretty wasteful since it can be easily avoided with a registration.
Note that I don't like the static enum - at least not as it's used today. This is one of the reasons that I'm proposing the refactor. We shouldn't have to change BIO in order to have a new job to submit.
Reviewed the WAITAOF code, I also suppose it's the race condition that slavak wanna forbid. I find another way to solve it without making the task in order. When updating the fsynced replication offset in bio, we can compare the to-be-fsynced offset and currently fsynced offset, skip the set if the set would rollback the offset, otherwise CAS the fsynced offset to the new one, so simply fire-and-forget model works well. |
|
@oranagra is correct about the Suppose we have task
(1) can be resolved with a CAS that takes the maximum of the two values. (2) was the main reason we chose to enforce ordering on the tasks. I think eliminating the requirement for task ordering will be quite difficult. |
So, maybe the best option is to keep BIO for it's original purpose (I/O) - leave it for the AOF single-queue case, but strip out other use cases? Alternatively, BJM could add an additional thread which is dedicated to sequential jobs. We could have something like |
madolson
left a comment
There was a problem hiding this comment.
As sort of generally discussed, I have new interest in trying to come up with a better background job structure. I think this sets up some good primitives, mostly minor stylistic and organization comments, but a couple of serious ones.
|
|
||
|
|
||
| // Look at the first item in the queue (without removing it). | ||
| // NOTE: asserts if the queue is empty. |
There was a problem hiding this comment.
I still disagree with this choice, seems more like a ticking time bomb than anything else.
| // (Supports usage as a stack.) | ||
| void fifoPushFront(Fifo *q, void *ptr); |
There was a problem hiding this comment.
So, this isn't a fifo it can also be lifo, so I don't think the naming makes sense much at all.
|
|
||
|
|
||
| static bjmJobFuncHandle freeModuleObjectId; | ||
| static void freeModuleObjectFunc(void *privdata) { |
There was a problem hiding this comment.
Why are we lazily setting this up?
| // Module values are not fundamental types. We can't pass these | ||
| // directly to lazyfree. We need to call the module interface to | ||
| // evaluate effort. Furthermore, modules require DBID/KEY which | ||
| // implies a main dictionary entry - this is not the case for | ||
| // fundamental types. |
There was a problem hiding this comment.
Always use /* */ for comments.
| #define __REDIS_ASSERT_H__ | ||
|
|
||
| #include "config.h" | ||
| #include <stdlib.h> |
| return 1; | ||
| } | ||
|
|
||
| void drainBackgroundAofFsyncs(); // in aof.c |
There was a problem hiding this comment.
Why extern instead of placing it in the header?
| } | ||
|
|
||
|
|
||
| // API |
There was a problem hiding this comment.
I'm not a fan of this convention, it's not adding that much and it's confusing outside of amazon contexts.
|
|
||
| // API | ||
| void bjmKillThreads(void) { | ||
| for (int i = 0; i < threadCount; i++) { |
There was a problem hiding this comment.
| for (int i = 0; i < threadCount; i++) { | |
| for (int i = 0; i < threadCount; i++) { |
|
|
||
| // API | ||
| void bjmInit(int numThreads) { | ||
| if (threadCount == numThreads) return; // Silently skip to support testing |
There was a problem hiding this comment.
Good thing you are doing so much testing :). If we want to better enable testing, we should start allocating the BJM system as its own struct so we can initialize multiple of them (Mentioned this in another comment).
| /* Integer configs */ | ||
| createIntConfig("databases", NULL, IMMUTABLE_CONFIG, 1, INT_MAX, server.dbnum, 16, INTEGER_CONFIG, NULL, NULL), | ||
| createIntConfig("port", NULL, MODIFIABLE_CONFIG, 0, 65535, server.port, 6379, INTEGER_CONFIG, NULL, updatePort), /* TCP port. */ | ||
| createIntConfig("bjm-threads", NULL, IMMUTABLE_CONFIG, 1, 128, server.bjm_threads_num, 2, INTEGER_CONFIG, NULL, NULL), |
There was a problem hiding this comment.
I think this practically needs to be 1 as a default, otherwise users might see significantly more usage than before.
| * May short-circuit BJM for small items, if so, metrics will be untouched. */ | ||
| void lazyfreeDict(dict *d); | ||
|
|
||
| /* LazyFree a list, invoking the list's free function (if any) for each item. */ |
There was a problem hiding this comment.
Minor inconsistency in the API, this one can also short circuit.
How about my future based proposal? In contrast to fire-and-forget, every submission will return a stub, so we can query the task completion by the stub, or submit new task in respect of dependencies. For example which makes task1 and task2 in sync. I have a rough draft of it(though not implement it yet), if you have interest I'd like implement it. |
I like the futures idea. It could be called on the event loop, such that when the background operation is done it will invoke a callback from the eventloop. |
Certainly possible. But greatly increases the overhead. This would require:
In most cases, the future capability would not be needed. We could make the "future" part optional, only incurring the overhead when something more than fire-and-forget is needed. However, if it's going to be optional, maybe we want to implement without the future capability and then later implementing that as an extension. |
I agree with this. |
doesn't that mean merge a PR that introduces a bug into redis? |
I don't know if this was specifically targeting to be the solution to |
|
|
|
@JimB123 When you get a chance, move this PR to https://github.com/placeholderkv/placeholderkv |
The BIO module is a source of minor annoyance. Background Job Manager (BJM) is designed as a modular replacement for BIO. Assuming there is positive response for this PR, the PR will be extended to convert existing BIO usage to BJM and remove BIO.
Motivations:
listto maintain the queue of jobs. The new Fifo uses much less memory (and is more performant) if there's a large burst of jobs.PR Contents:
list. It's delivered as an independent, modular, & reusable utility data structure.