-
Notifications
You must be signed in to change notification settings - Fork 233
Runtime timeouts #1610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime timeouts #1610
Conversation
26a8933 to
a1f6096
Compare
a1f6096 to
11a9416
Compare
swilly22
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's redesign the timeout/cron interface
src/timeout.h
Outdated
| typedef struct { | ||
| ExecutionPlan *plan; // the query's ExecutionPlan | ||
| bool query_completed; // whether the query has finished successfully | ||
| bool changes_committed; // whether the graph has been locked for commits | ||
| } TimeoutCtx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of a timeout source file, I think the API of this header should be:
- Timeout_SetTimeOut
- Timeout_ClearTimeOut
The Timeout_ClearTimeOut might accept as an input a CronTaskID generated by Cron_AddTask
this ID can be saved at the QueryCtx level.
Timeout_QueryCompleted and Timeout_ChangesCommitted revel too much internal information, please remove them.
Also let's try to find a more suitable location for this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can remove both query_completed and changes_committed
Once a write query tries to acquire the write-lock for the first time, it should (before acquiring the lock) call ClearTimeout if it succeed in clearing the timeout (given that a timeout is present) the query can proceed, otherwise the timeout been triggered, it shouldn't acquire the lock and return exception.
src/util/cron.c
Outdated
|
|
||
| static void CRON_FreeTask(CRON_TASK *t) { | ||
| ASSERT(t); | ||
| rm_free(t->pdata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cron_AddTask should state that it takes ownership over pdata
swilly22
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to call if you have any questions,
if we feel like performing a timeout on WRITE queries we can scrap that and apply it only to READ queries
| // execute due tasks | ||
| CRON_TASK *task = NULL; | ||
| while((task = CRON_Peek()) && CRON_TaskDue(task)) { | ||
| task = CRON_RemoveTask(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
between while((task = CRON_Peek()) && CRON_TaskDue(task)) { and task = CRON_RemoveTask();
the task might be removed by a worker thread, moreover between CRON_Peek() and CRON_RemoveTask() a new task might be inserted, this entire process (checking if there's a due task and removing it from the heap needs to be atomic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to atomic. I additionally changed the while to an if so that we don't block entering queries while waiting for a timeout to expire.
src/util/heap.c
Outdated
|
|
||
| /* ensure heap property */ | ||
| __pushup(h, idx); | ||
| if (idx < h->count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check can be performed a bit earlier, there's no need to perform:
h->array[idx] = h->array[h->count - 1]
h->array[h->count - 1] = NULL;
if this is the last item in the heap
| self.env.assertContains("Query timed out", str(error)) | ||
|
|
||
| def test04_timeout_during_commit_stage(self): | ||
| query = "CREATE (a:M) WITH a UNWIND range(1,10000) AS ctr SET a.v = ctr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
src/timeout.h
Outdated
| typedef struct { | ||
| ExecutionPlan *plan; // the query's ExecutionPlan | ||
| bool query_completed; // whether the query has finished successfully | ||
| bool changes_committed; // whether the graph has been locked for commits | ||
| } TimeoutCtx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can remove both query_completed and changes_committed
Once a write query tries to acquire the write-lock for the first time, it should (before acquiring the lock) call ClearTimeout if it succeed in clearing the timeout (given that a timeout is present) the query can proceed, otherwise the timeout been triggered, it shouldn't acquire the lock and return exception.
src/timeout.c
Outdated
| void Timeout_ClearTimeout() { | ||
| CronTask task = QueryCtx_GetTimeoutJob(); | ||
| if(task == NULL) return; | ||
| Cron_RemoveTask(task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cron_RemoveTask should indicate if it managed to remove the Task. this indication should be returned by Timeout_ClearTimeout
src/query_ctx.h
Outdated
| /* Set the last writer which needs to commit */ | ||
| void QueryCtx_SetLastWriter(OpBase *op); | ||
| /* Set the query's associated timeout job. */ | ||
| void QueryCtx_SetTimeoutJob(CronTask task); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if it make sense to change this to: QueryCtx_SetTimeout(ms,... )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would work easily - the QueryCtx tracks whether the query has an associated timeout, it does not instantiate it.
src/query_ctx.c
Outdated
| // Changes are being committed, clear the timeout job. | ||
| Timeout_ClearTimeout(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be the first thing we try to do right before acquiring the GIL line 149,
if we fail clearing the timeout (timeout been triggered) we should emit an exception
* Add run-time configuration for default query timeouts * Timeout for write queries that haven't committed changes * define TIMEOUT_NO_TIMEOUT * Refactor timeout logic * Address PR comments * Do not use timeouts for write queries Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit 964b268)
* RedisGraph benchmark automation (#1557) * Added local and remote benchmark definition and automation * [fix] Fixes per PR review. Added option to specify benchmark via BENCHMARH=<benchmark name>. Updated benchmark template Co-authored-by: filipecosta90 <filipecosta.90@gmail.com> (cherry picked from commit f6f1ab2) * Updated benchmark UPDATE-BASELINE to be less restrictive in the latency KPI (#1577) Given we're still experimenting with the benchmarks CI KPI validation, this PR increases the `OverallClientLatencies.Total.q50` to be lower than 2.0 ( before was 1.5 ) so that we can collect further data and adjust afterwards... (cherry picked from commit 611a0f0) * * log redisgraph version (#1567) When pulling container image tagged as `latest` or `edge` I sometimes don't know which version I'm running, and it would be much faster to find out if the information was displayed at startup. This patch logs this information. Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit fe2e7ce) * [add] Triggering nightly CI benchmarks; Early return on CI benchmarks for forked PRs (#1579) (cherry picked from commit a529c1e) * use PRIu64 to format uint64_t (#1581) (cherry picked from commit c0e00d5) * [fix] Fixed missing github_actor on ci nightly benchmark automation (#1583) (cherry picked from commit 8abad84) * Fix idx assertion (#1580) * Fix flawed assertion in index deletion logic * Reduce KPI for updates_baseline benchmark * Address PR comments * Address PR comments Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit 6bad20a) * Always report run-time errors as the sole reply (#1590) * Always report run-time errors as the sole reply * Update test_timeout.py Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit c9ba776) * remove wrong assertion (#1591) (cherry picked from commit 12ef8ac) * Report 0 indices created on duplicate index creation (#1592) (cherry picked from commit e00f2c8) * Multi-platform build (#1587) Multi-platform build (cherry picked from commit 26ace7a) * Multi-platform build, take 2 (#1598) (cherry picked from commit acde693) * Moved common benchmark automation code to redisbench-admin package. Improved benchmark specification file (#1597) (cherry picked from commit ebea927) * Added readies submodule (#1600) * Added readies submodule * fixes 1 (cherry picked from commit efbfeaf) * Dockerfle: fixed artifacts copy (#1601) (cherry picked from commit f722f2d) * CircleCI: fixed version release (#1602) (cherry picked from commit 9f218d6) * CircleCI: release-related fix (#1604) (cherry picked from commit 15cf291) * remove redundent include (#1606) (cherry picked from commit 7ea1c43) * Threaded bulk insert (#1596) * Update the bulk updater to execute on a thread * Bulk loader endpoint locks for minimal time * TODOs * Use a separate thread pool for bulk operations * Update test_thread_pools.cpp * refactor bulk-insert * Fix PR problems * count number of pings during bulk-insert, only create graph context on BEGIN token Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit 2d43f9d) * Use system gcc in Ubuntu 16 (#1615) (cherry picked from commit 0c10130) * wrongly assumed add op had only 2 operands (#1618) (cherry picked from commit 6b06095) * Updated benchmark requirements version (#1616) * Updated benchmark requirements version * Update requirements.txt (cherry picked from commit db080d4) * Runtime timeouts (#1610) * Add run-time configuration for default query timeouts * Timeout for write queries that haven't committed changes * define TIMEOUT_NO_TIMEOUT * Refactor timeout logic * Address PR comments * Do not use timeouts for write queries Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit 964b268) * Fix typo in assertion * bump version to 2.2.16 Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> Co-authored-by: filipe oliveira <filipecosta.90@gmail.com> Co-authored-by: bc² <odanoburu@users.noreply.github.com> Co-authored-by: Rafi Einstein <raffapen@outlook.com>
* Add run-time configuration for default query timeouts * Timeout for write queries that haven't committed changes * define TIMEOUT_NO_TIMEOUT * Refactor timeout logic * Address PR comments * Do not use timeouts for write queries Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit 964b268)
* Threaded bulk insert (#1596) * Update the bulk updater to execute on a thread * Bulk loader endpoint locks for minimal time * TODOs * Use a separate thread pool for bulk operations * Update test_thread_pools.cpp * refactor bulk-insert * Fix PR problems * count number of pings during bulk-insert, only create graph context on BEGIN token Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit 2d43f9d) * set score to 1 for each document (#1607) * set score to 1 for each document * test fulltext search scoring * Update proc_fulltext_query.c * Add documentation Co-authored-by: Jeffrey Lovitz <jeffrey.lovitz@gmail.com> (cherry picked from commit bd1fdca) * wrongly assumed add op had only 2 operands (#1618) (cherry picked from commit 6b06095) * Updated benchmark requirements version (#1616) * Updated benchmark requirements version * Update requirements.txt (cherry picked from commit db080d4) * Runtime timeouts (#1610) * Add run-time configuration for default query timeouts * Timeout for write queries that haven't committed changes * define TIMEOUT_NO_TIMEOUT * Refactor timeout logic * Address PR comments * Do not use timeouts for write queries Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> (cherry picked from commit 964b268) * Use master version of CircleCI config * bump version to 2.4.2 Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com> Co-authored-by: filipe oliveira <filipecosta.90@gmail.com>
* Add run-time configuration for default query timeouts * Timeout for write queries that haven't committed changes * define TIMEOUT_NO_TIMEOUT * Refactor timeout logic * Address PR comments * Do not use timeouts for write queries Co-authored-by: swilly22 <roi@redislabs.com> Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>
Resolves #1589