Skip to content

Conversation

@perryitay
Copy link
Contributor

@perryitay perryitay commented Dec 19, 2021

issue : #9655

Some modules might perform a long-running logic in different stages of Redis lifetime, for example:

  • command execution
  • RDB loading
  • thread safe context

During this long-running logic Redis is not responsive.

This PR offers

  1. An API to process events while a busy command is running (RM_Yield)
  2. A new flag (ALLOW_BUSY) to mark the commands that should be handled during busy
    jobs which can also be used by modules (allow-busy)
  3. In slow commands and thread safe contexts, this flag will start rejecting commands with -BUSY only
    after busy-reply-threshold
  4. During loading (rdb_load callback), it'll process events right away (not wait for busy-reply-threshold),
    but either way, the processing is throttled to the server hz rate.
  5. Allow modules to Yield to redis background tasks, but not to client commands
  • rename script-time-limit to busy-reply-threshold (an alias to the pre-7.0 lua-time-limit)

@perryitay perryitay changed the title Busy commands module Adding module api for processing commands during busy jobs and allow flagging the commands that should be handled at this status Dec 19, 2021
@oranagra oranagra linked an issue Dec 22, 2021 that may be closed by this pull request
@MeirShpilraien
Copy link

MeirShpilraien commented Dec 22, 2021

Suggestion, why not using the script timeout mechanism here:

int scriptInterrupt(scriptRunCtx *run_ctx) {
?
We can create scriptRunCtx and set it on RedisModuleCtx, then we will be able to call this API and share the code, WDYT?

In addition, I believe we should consider whether or not this can be called from a thread safe context (I believe that we should support it, many of our modules needs to call if from a thread). And we should probably add tests that uses it from a thread safe context if we decide that its supported.

@oranagra
Copy link
Member

@MeirShpilraien if feels to me like trying to tie two things that are not really related, and i suspect it'll cause complications in the future.
maybe i'm missing something, but if it's just about replicating 10 lines of code that don't contain any very sensitive code, we better go with the cloning approach.

@guybe7
Copy link
Collaborator

guybe7 commented Dec 22, 2021

maybe i'm missing something but if we have a module command that does something heavy, shouldn't it just block the client and do the heavy lifting in a thread?
or maybe we're talking about a case where you deliberately want to prevent the server from executing most of the commands while your module command is doing something? is there a concrete example for such a case?

@MeirShpilraien
Copy link

@MeirShpilraien if feels to me like trying to tie two things that are not really related, and i suspect it'll cause complications in the future.
maybe i'm missing something, but if it's just about replicating 10 lines of code that don't contain any very sensitive code, we better go with the cloning approach.

@oranagra I can see how it can be useful on more places, for example RM_Call and scriptCall has a huge code duplication of basically invoking commands on Redis.

I agree though that it will probably require more effort to match this code to what we need so we can leave it for the future I guess.

maybe i'm missing something but if we have a module command that does something heavy, shouldn't it just block the client and do the heavy lifting in a thread?

@guybe7 the use-case is that when a module need to do a long operation which need to be atomic and then it needs to lock the GIL for a long time.

@perryitay perryitay changed the title Adding module api for processing commands during busy jobs and allow flagging the commands that should be handled at this status DRAFT - (some changes are required - please don't reveiew)Adding module api for processing commands during busy jobs and allow flagging the commands that should be handled at this status Dec 23, 2021
@perryitay perryitay force-pushed the busy-commands-module branch from 6b4a379 to 318aa31 Compare December 23, 2021 08:43
@perryitay perryitay force-pushed the busy-commands-module branch 5 times, most recently from a566a73 to c2082ac Compare December 23, 2021 10:10
@perryitay perryitay force-pushed the busy-commands-module branch from c2082ac to f3673ae Compare December 23, 2021 10:12
@perryitay perryitay changed the title DRAFT - (some changes are required - please don't reveiew)Adding module api for processing commands during busy jobs and allow flagging the commands that should be handled at this status Adding module api for processing commands during busy jobs and allow flagging the commands that should be handled at this status Dec 23, 2021
Copy link
Member

@itamarhaber itamarhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!CR

@oranagra
Copy link
Member

@madolson i'd appreciate a review of my last commit.
maybe we should rename BLOCKED_PAUSE to BLOCKED_POSTPONED since i think it just means that processCommand tried to process it and decided to postpone it.
or, if you find bugs in my code, maybe i need to create a dedicated BLOCKED_YIELD flag.

Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a new BLOCKED_YIELD, this should work as expected.

oranagra and others added 2 commits January 19, 2022 09:05
Co-authored-by: sundb <sundbcn@gmail.com>
@sundb
Copy link
Collaborator

sundb commented Jan 20, 2022

It seems that replconfCommand and quitCommand missing CMD_ALLOW_BUSY.

    if (scriptIsTimedout() &&
          c->cmd->proc != authCommand &&
          c->cmd->proc != helloCommand &&
          c->cmd->proc != replconfCommand &&        <- missing
          c->cmd->proc != multiCommand &&
          c->cmd->proc != discardCommand &&
          c->cmd->proc != watchCommand &&
          c->cmd->proc != unwatchCommand &&
          c->cmd->proc != quitCommand &&            <- missing
          c->cmd->proc != resetCommand &&
          c->cmd->proc != shutdownCommand && /* more checks in shutdownCommand */
        !(c->cmd->proc == scriptCommand &&

@oranagra
Copy link
Member

It seems that replconfCommand and quitCommand missing CMD_ALLOW_BUSY.

good catch.. i guess it was a bad merge conflict resolution.

@oranagra oranagra merged commit c4b7882 into redis:unstable Jan 20, 2022
@sundb sundb mentioned this pull request Jan 3, 2024
oranagra added a commit that referenced this pull request Jan 7, 2024
## Issues and solutions from #12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
    - Introduced: 
       Version: 7.0.0
       PR: #9963

   - Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.

   - Initial / Abandoned Solution:
      * Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
         to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
    
    - Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
  Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
    #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
    #1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)

  Previous read of size 4 at 0x0001045990c0 by main thread:
    #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
    #1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
    #3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```

2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.

    - Introduced: 
       Version: 7.0.0
       PR: #9963

   - Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.

    - Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
  Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
    #3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)

  Previous write of size 4 at 0x000100b54000 by main thread:
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
    #3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```

## The final fix as the comments:
#12817 (comment)
Optimized solution based on the above comment:

First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.

When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
    
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
#12817 (comment)
mentioned direction.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
roggervalf pushed a commit to roggervalf/redis that referenced this pull request Feb 11, 2024
## Issues and solutions from redis#12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
    - Introduced: 
       Version: 7.0.0
       PR: redis#9963

   - Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.

   - Initial / Abandoned Solution:
      * Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
         to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
    
    - Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
  Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
    #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
    redis#1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    redis#2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)

  Previous read of size 4 at 0x0001045990c0 by main thread:
    #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
    redis#1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
    redis#2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
    redis#3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```

2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.

    - Introduced: 
       Version: 7.0.0
       PR: redis#9963

   - Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.

    - Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
  Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    redis#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    redis#2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
    redis#3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    redis#4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)

  Previous write of size 4 at 0x000100b54000 by main thread:
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    redis#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    redis#2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
    redis#3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```

## The final fix as the comments:
redis#12817 (comment)
Optimized solution based on the above comment:

First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.

When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
    
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
redis#12817 (comment)
mentioned direction.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
sundb added a commit to sundb/redis that referenced this pull request May 13, 2024
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
    - Introduced:
       Version: 7.0.0
       PR: redis#9963

   - Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.

   - Initial / Abandoned Solution:
      * Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
         to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.

    - Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
  Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):

  Previous read of size 4 at 0x0001045990c0 by main thread:

```

2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.

    - Introduced:
       Version: 7.0.0
       PR: redis#9963

   - Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.

    - Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
  Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):

  Previous write of size 4 at 0x000100b54000 by main thread:

```

redis#12817 (comment)
Optimized solution based on the above comment:

First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.

When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.

2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
redis#12817 (comment)
mentioned direction.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Apr 22, 2025
## Issues and solutions from redis#12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
    - Introduced: 
       Version: 7.0.0
       PR: redis#9963

   - Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.

   - Initial / Abandoned Solution:
      * Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
         to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
    
    - Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
  Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
    #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
    #1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)

  Previous read of size 4 at 0x0001045990c0 by main thread:
    #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
    #1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
    #3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```

2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.

    - Introduced: 
       Version: 7.0.0
       PR: redis#9963

   - Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.

    - Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
  Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
    #3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    #4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)

  Previous write of size 4 at 0x000100b54000 by main thread:
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    #1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    #2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
    #3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```

## The final fix as the comments:
redis#12817 (comment)
Optimized solution based on the above comment:

First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.

When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
    
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
redis#12817 (comment)
mentioned direction.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
funny-dog pushed a commit to funny-dog/redis that referenced this pull request Sep 17, 2025
## Issues and solutions from redis#12817
1. Touch ProcessingEventsWhileBlocked and calling moduleCount() without
GIL in afterSleep()
    - Introduced: 
       Version: 7.0.0
       PR: redis#9963

   - Harm Level: Very High
If the module thread calls `RM_Yield()` before the main thread enters
afterSleep(),
and modifies `ProcessingEventsWhileBlocked`(+1), it will cause the main
thread to not wait for GIL,
which can lead to all kinds of unforeseen problems, including memory
data corruption.

   - Initial / Abandoned Solution:
      * Added `__thread` specifier for ProcessingEventsWhileBlocked.
`ProcessingEventsWhileBlocked` is used to protect against nested event
processing, but event processing
in the main thread and module threads should be completely independent
and unaffected, so it is safer
         to use TLS.
* Adding a cached module count to keep track of the current number of
modules, to avoid having to use `dictSize()`.
    
    - Related Warnings:
```
WARNING: ThreadSanitizer: data race (pid=1136)
  Write of size 4 at 0x0001045990c0 by thread T4 (mutexes: write M0):
    #0 processEventsWhileBlocked networking.c:4135 (redis-server:arm64+0x10006d124)
    redis#1 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    redis#2 bg_call_worker <null>:83232836 (blockedclient.so:arm64+0x16a8)

  Previous read of size 4 at 0x0001045990c0 by main thread:
    #0 afterSleep server.c:1861 (redis-server:arm64+0x100024f98)
    redis#1 aeProcessEvents ae.c:408 (redis-server:arm64+0x10000fd64)
    redis#2 aeMain ae.c:496 (redis-server:arm64+0x100010f0c)
    redis#3 main server.c:7220 (redis-server:arm64+0x10003f38c)
```

2. aeApiPoll() is not thread-safe
When using RM_Yield to handle events in a module thread, if the main
thread has not yet
entered `afterSleep()`, both the module thread and the main thread may
touch `server.el` at the same time.

    - Introduced: 
       Version: 7.0.0
       PR: redis#9963

   - Old / Abandoned Solution:
Adding a new mutex to protect timing between after beforeSleep() and
before afterSleep().
Defect: If the main thread enters the ae loop without any IO events, it
will wait until
the next timeout or until there is any event again, and the module
thread will
always hang until the main thread leaves the event loop.

    - Related Warnings:
```
SUMMARY: ThreadSanitizer: data race ae_kqueue.c:55 in addEventMask
==================
==================
WARNING: ThreadSanitizer: data race (pid=14682)
  Write of size 4 at 0x000100b54000 by thread T9 (mutexes: write M0):
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    redis#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    redis#2 processEventsWhileBlocked networking.c:4138 (redis-server:arm64+0x10006d3c4)
    redis#3 RM_Yield module.c:2410 (redis-server:arm64+0x10018b66c)
    redis#4 bg_call_worker <null>:16042052 (blockedclient.so:arm64+0x169c)

  Previous write of size 4 at 0x000100b54000 by main thread:
    #0 aeApiPoll ae_kqueue.c:175 (redis-server:arm64+0x100010588)
    redis#1 aeProcessEvents ae.c:399 (redis-server:arm64+0x10000fb84)
    redis#2 aeMain ae.c:496 (redis-server:arm64+0x100010da8)
    redis#3 main server.c:7238 (redis-server:arm64+0x10003f51c)
```

## The final fix as the comments:
redis#12817 (comment)
Optimized solution based on the above comment:

First, we add `module_gil_acquring` to indicate whether the main thread
is currently in the acquiring GIL state.

When the module thread starts to yield, there are two possibilities(we
assume the caller keeps the GIL):
1. The main thread is in the mid of beforeSleep() and afterSleep(), that
is, `module_gil_acquring` is not 1 now.
At this point, the module thread will wake up the main thread through
the pipe and leave the yield,
waiting for the next yield when the main thread may already in the
acquiring GIL state.
    
2. The main thread is in the acquiring GIL state.
The module thread release the GIL, yielding CPU to give the main thread
an opportunity to start
event processing, and then acquire the GIL again until the main thread
releases it.
This is what
redis#12817 (comment)
mentioned direction.

---------

Co-authored-by: Oran Agra <oran@redislabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Module API to process commands while blocked.

9 participants