Skip to content

cmd/tui/handler: avoid panic at runtime due to trying to copy into an inner nil slice#799

Merged
roosterfish merged 1 commit intocanonical:mainfrom
edlerd:fix-tui-handler
Jun 4, 2025
Merged

cmd/tui/handler: avoid panic at runtime due to trying to copy into an inner nil slice#799
roosterfish merged 1 commit intocanonical:mainfrom
edlerd:fix-tui-handler

Conversation

@edlerd
Copy link
Contributor

@edlerd edlerd commented May 22, 2025

Done

  • cmd/tui/handler: avoid panic at runtime due to trying to copy into an inner nil slice

see https://github.com/canonical/microcloud/actions/runs/15183238063/job/42697492228 for a failing test run due to panick at runtime

image

@roosterfish
Copy link
Contributor

Thanks for tackling this. I have recorded a few of those errors also in #653.
There already is a section in the code that initializes the slice and I think it's a race on the table.rawRows field.
Sometimes it is using locking but sometimes it doesn't.

@roosterfish
Copy link
Contributor

@edlerd I have created #807 to track the progress. Do you want to look into it? Please also check my comment above.

@edlerd
Copy link
Contributor Author

edlerd commented Jun 3, 2025

@edlerd I have created #807 to track the progress. Do you want to look into it? Please also check my comment above.

Sure I opened this to fix it :)

There already is a section in the code that initializes the slice and I think it's a race on the table.rawRows field.

Looking at the context of the change, I can't see how this would get initialized in another code path. The allRows object is defined within the function, and we need to allocate memory for the items that are copied in. I think this only ever passes without panic in case the passed in i.table.rawRows has no nested arrays. See below for full context of the function.

image

@roosterfish
Copy link
Contributor

Looking at the context of the change, I can't see how this would get initialized

I am talking about the struct's rawRows, not the function local allRows. Almost the same name though :)

There are the the two handleRemoveEvent and handleInsertEvent which both update the struct's rawRows without obtaining the tableMu lock. So I have the suspicion that one of those two funcs is called whilst the getAllRows is called which is causing the panic.

@edlerd
Copy link
Contributor Author

edlerd commented Jun 3, 2025

There are the the two handleRemoveEvent and handleInsertEvent which both update the struct's rawRows without obtaining the tableMu lock. So I have the suspicion that one of those two funcs is called whilst the getAllRows is called which is causing the panic.

Thanks for clarifying, I see what you mean. Both event handlers could mess up the length of the raw rows when changing them under the hood without locking.

@edlerd edlerd force-pushed the fix-tui-handler branch from d946159 to f499f40 Compare June 3, 2025 15:48
@edlerd
Copy link
Contributor Author

edlerd commented Jun 3, 2025

@roosterfish unfortunately we don't have access to the table mutex in the event handlers. But I think there is an easy way out. We only ever interest the number of the table rows, so we can avoid the copy all together and just return the count in our helper function.

@roosterfish
Copy link
Contributor

roosterfish commented Jun 3, 2025

Good catch that we don't even require the data from getAllRows but instead we just need the number.

Now I think we don't even need to obtain the tableMu lock. Also the only other user is the Render func which isn't called concurrently as we only ever render one table at a time.
So we can get rid of tableMu sync.Mutex completely on the handler.

Instead we should have a mutex on the selectableTable whenever we add/delete rows to the table. And then we can have another func on selectableTable which obtains the lock and gets the len of the rows. This func can then be called by countAllRows of the handler.
This way we always have the right information about len.

…ze of the rows and don't need a copy of the table rows. This avoids a kernel panic, suspected due to parallel processing of handleRemoveEvent or handleInsertEvent on the table

Signed-off-by: David Edler <david.edler@canonical.com>
@edlerd edlerd force-pushed the fix-tui-handler branch from f499f40 to aab8227 Compare June 3, 2025 16:58
@edlerd
Copy link
Contributor Author

edlerd commented Jun 3, 2025

Now I think we don't even need to obtain the tableMu lock. Also the only other user is the Render func which isn't called concurrently as we only ever render one table at a time. So we can get rid of tableMu sync.Mutex completely on the handler.

Instead we should have a mutex on the selectableTable whenever we add/delete rows to the table. And then we can have another func on selectableTable which obtains the lock and gets the len of the rows. This func can then be called by countAllRows of the handler. This way we always have the right information about len.

Right, adjusted the PR to the new mutex inside of the selectable table.

Copy link
Contributor

@roosterfish roosterfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@roosterfish roosterfish merged commit 55b2cb6 into canonical:main Jun 4, 2025
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants