Skip to content

win-sshproxy.tid created before thread id is available#433

Merged
openshift-merge-bot[bot] merged 1 commit intocontainers:mainfrom
lstocchi:i432
Nov 29, 2024
Merged

win-sshproxy.tid created before thread id is available#433
openshift-merge-bot[bot] merged 1 commit intocontainers:mainfrom
lstocchi:i432

Conversation

@lstocchi
Copy link
Copy Markdown
Collaborator

this commit fixes a potential race condition that prevented the tests to succeed when running in a github workflow.
Basically the thread id was not actually available before writing it on the file, resulting in a thread id equals to 0 written in it. So, when the tests were trying to retrieve the thread id to use it to send the WM_QUIT signal, they failed.

This patch adds a check on the thread id before writing it on the file. Now, if the thread id is 0, it keeps calling winquit to retrieve it. If, after 10 secs, there is no success it returns an error.

it resolves #432

Copy link
Copy Markdown
Collaborator

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at winquit code, calling NotifyOnQuit is supposed to guarantee that GetCurrentMessageLoopThreadId returns a non-0 value. However, the thread id is set when NotifyOnQuit calls messageLoop(), and this call is done in a go routine, so NotifyOnQuit can return before the go routine runs and inits the thread id.

Some comments/suggestions, but I'm fine with the PR as is if you prefer to keep it this way.

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

for {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you could reuse this helper

func retry[T comparable](ctx context.Context, retryFunc func() (T, error), retryMsg string) (T, error) {
var (
returnVal T
err error
)
backoff := initialBackoff
loop:
for i := 0; i < maxRetries; i++ {
select {
case <-ctx.Done():
break loop
default:
// proceed
}
returnVal, err = retryFunc()
if err == nil {
return returnVal, nil
}
logrus.Debugf("%s (%s)", retryMsg, backoff)
sleep(ctx, backoff)
backoff = backOff(backoff)
}
return returnVal, fmt.Errorf("timeout: %w", err)
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I moved the Retry func in an utils package so it can be reused

defer file.Close()
tid := winquit.GetCurrentMessageLoopThreadId()

tid, err := getThreadId()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add a slight delay during win-ssh-proxy startup, do you expect this delay to be problematic in typical use?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO no but I have a limited knowledge of its usage. Locally, and for the stuff I do, I didn't even notice.
Maybe it would be noticeable with low resources machine but better to slow it a bit at startup and be sure everything works fine, no?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect it won't be noticeable. However, if this was noticeable, this would have an impact on podman machine start startup time, which can be problematic.
Since the thread id is only needed when one wants to stop the podman machine VM, an alternative would be to try to do the waiting and writing of the thread id in a go routine to avoid the blocking.
However, podman would need to be ready for that, and retry reading the file if it's missing, which is not the case at the moment.

With all that said, the current approach should be good enough for now.

@cfergeau
Copy link
Copy Markdown
Collaborator

@n1hility fwiw, a small race in win-ssh-proxy/winquit.

this commit fixes a potential race condition that prevented the tests to succeed
when running in a github workflow.
Basically the thread id was not actually available before
writing it on the file, resulting in a thread id equals to 0 written in it.
So, when the tests were trying to retrieve the thread id to use it to send
the WM_QUIT signal, they failed.

This patch adds a check on the thread id before writing
it on the file. Now, if the thread id is 0, it keeps calling winquit to
retrieve it. If, after 10 secs, there is no success it returns an error.

Signed-off-by: lstocchi <lstocchi@redhat.com>
@cfergeau
Copy link
Copy Markdown
Collaborator

I've created containers/winquit#2 for the underlying winquit issue.

Copy link
Copy Markdown
Collaborator

@cfergeau cfergeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thank you so much for making CI green!

defer file.Close()
tid := winquit.GetCurrentMessageLoopThreadId()

tid, err := getThreadId()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect it won't be noticeable. However, if this was noticeable, this would have an impact on podman machine start startup time, which can be problematic.
Since the thread id is only needed when one wants to stop the podman machine VM, an alternative would be to try to do the waiting and writing of the thread id in a go routine to avoid the blocking.
However, podman would need to be ready for that, and retry reading the file if it's missing, which is not the case at the moment.

With all that said, the current approach should be good enough for now.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Nov 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfergeau, lstocchi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit fe2d5d2 into containers:main Nov 29, 2024
@lstocchi lstocchi deleted the i432 branch November 29, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix failing win-sshproxy tests

2 participants