dp bugfix: wait till dp thread stops in thread cancel#9136
dp bugfix: wait till dp thread stops in thread cancel#9136kv2019i merged 1 commit intothesofproject:mainfrom
Conversation
kv2019i
left a comment
There was a problem hiding this comment.
Looks good, one clarifying question inline (plus few minor comments).
| return 0; | ||
| #endif | ||
| /* cancel task if DP task*/ | ||
| if (mod->dev->ipc_config.proc_domain == COMP_PROCESSING_DOMAIN_DP && mod->dev->task) |
There was a problem hiding this comment.
Is this check safe? I mean task is cancelled, where is task set to null? It's initially NULL, so this check will cover DP tasks never started, but how about a DP task that has already ended?
There was a problem hiding this comment.
If the task is cancel - its OK, calling cancel again is safe and won't have any effect
If task if freed:
static inline void comp_free(struct comp_dev *dev)
{
assert(dev->drv->ops.free);
/* free task if shared component or DP task*/
if ((dev->is_shared || dev->ipc_config.proc_domain == COMP_PROCESSING_DOMAIN_DP) &&
dev->task) {
schedule_task_free(dev->task);
rfree(dev->task);
dev->task = NULL;
}
dev->drv->ops.free(dev);
}
free and reset are both called from IPC thread, so it looks safe. Anyway, I can change it to safer:
volatile task * _task = dev->task;
dev->task = NULL;
schedule_task_free(_task);
rfree(_task);
EDIT:
is it safer? Can't the compiler change order of lines in optimalization? Is it needed?
Free is called when a module is really being destroyed, calling "reset" during or after free operation would be fatal anyway
There was a problem hiding this comment.
SOF is relying on IPC serialisation. So if both these functions are only called from within IPCs, they cannot race.
There was a problem hiding this comment.
@marcinszkudlinski @lyakh I was actually thinking of the case where "dev->task" is a dangling pointer. I couldn't immediately find where dev->task is zeroed, so not sure if this check can ensure task is a valid task object.
| @@ -336,6 +336,9 @@ int module_reset(struct processing_module *mod) | |||
| if (md->state < MODULE_IDLE) | |||
There was a problem hiding this comment.
minor: some typoes in commit, "s/posiibility/possibility", "s/sollution/solution/",
|
As a funny coincidence, this old IPC3 issue was just exposed now: |
lyakh
left a comment
There was a problem hiding this comment.
I see that errors in thread handling are just moved over from another location, but let's use this opportunity to fix them too
| return 0; | ||
| #endif | ||
| /* cancel task if DP task*/ | ||
| if (mod->dev->ipc_config.proc_domain == COMP_PROCESSING_DOMAIN_DP && mod->dev->task) |
There was a problem hiding this comment.
SOF is relying on IPC serialisation. So if both these functions are only called from within IPCs, they cannot race.
| schedule_task_cancel(&dp_sch->ll_tick_src); | ||
|
|
||
| /* if the task is waiting on a semaphore - let it run and self-terminate */ | ||
| k_sem_give(&pdata->sem); |
There was a problem hiding this comment.
if the task is a lower priority thread, however, this won't switch to running that other task immediately
There was a problem hiding this comment.
it won't at this point, it will when k_thread_join() is called
src/schedule/zephyr_dp_schedule.c
Outdated
| err: | ||
| /* cleanup - unlock and free all allocated resources */ | ||
| scheduler_dp_unlock(lock_key); | ||
| if (thread_id) |
There was a problem hiding this comment.
this check will always return true
There was a problem hiding this comment.
yes, but only because the first possible jump to err is after thread is initialized. I would leave this "overprotection" as is - cost is just a few bytes of code.
There was a problem hiding this comment.
@marcinszkudlinski but before it's initialised it's undefined, so this check would be meaningless too and the compiler would warn you about that.
c7a07ab to
5cfcb4e
Compare
kv2019i
left a comment
There was a problem hiding this comment.
Fine to me now, thanks.
|
changing to [DNM], waiting for logs from stress test with DP AEC |
Will tag for v2.10 so we can get this fix. |
| #endif | ||
| /* cancel task if DP task*/ | ||
| if (mod->dev->ipc_config.proc_domain == COMP_PROCESSING_DOMAIN_DP && mod->dev->task) | ||
| schedule_task_cancel(mod->dev->task); |
There was a problem hiding this comment.
do I understand it correctly, that previously DP tasks were terminated from comp_free() by calling schedule_task_free()? Now scheduler_dp_task_cancel() will (potentially) be called twice - from here and from schedule_task_free(). Should be safe for now, at least as long as list_del() also initialises the list item.
There was a problem hiding this comment.
Yes., it is called twice, nobody guarantees that cancel had been called before free.
as long as list_del() also initialises the list item
it does:
static inline void list_item_del(struct list_item *item)
{
item->next->prev = item->prev;
item->prev->next = item->next;
list_init(item);
}
There was a problem hiding this comment.
@marcinszkudlinski yes, sorry, that's exactly what I meant: as long as list_init() is there, then calling it twice is ok, but if we once decide to remove it since technically it isn't needed there, then this (and probably a couple of other places) will break
|
please go ahead with merging |
|
One more review needed to be able to merge. |
|
@wszypelt @marcinszkudlinski Can you take a look at the quickbuild build fail for this PR. This is fixing an issue that showing as quickbuild failure for other PRs frequently. |
@marcinszkudlinski any update, can we merged for v2.10 or still WIP ? |
|
checking CI, please wait |
5cfcb4e to
750b119
Compare
|
@marcinszkudlinski jenkins looks OK, looks like a long queue on internal CI ? |
|
@lgirdwood results from internal CI should be available within 40 minutes |
|
another problem in CI, fix in progress... |
There's a race posiibility in DP when stopping a pipeline: - dp starts processing - incoming IPC - pause the pipeline. IPC has higher priority than DP, so DP is preempted - pipeline is stopping, module "reset" is called. Some of resources may be freed here - when IPC finishes, DP thread continues processing Sollution: wait for DP to finish processing and terminate DP thread before calling "reset" method in module To do this: 1) call "thread cancel" before calling "reset"reset 2) modify "thread cancel" to mark the thread to terminate and execute k_thread_join() 3) terminated thread cannot be restarted, so thread creation must be moved from "init" to "schedule". There's no need to reallocate memory zephyr guarantees that resources may be re-used when a thread is terminated. Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
750b119 to
92b2e53
Compare
|
fix - crash if a task had been deleted before was scheduled. |
@marcinszkudlinski ok, do you mean now you are happy with stress testing and we are good to go ? |
|
@lgirdwood I believe so, pls go ahead with merging |
There's a race posiibility in DP when stopping a pipeline:
than DP, so DP is preempted
may be freed here
Sollution: wait for DP to finish processing and terminate DP thread
before calling "reset" method in module
To do this:
execute k_thread_join()
moved from "init" to "schedule". There's no need to reallocate memory
zephyr guarantees that resources may be re-used when a thread
is terminated.