Skip to content

[Windows] Closing connections properly on Windows#185

Merged
Tim-Zhang merged 2 commits intocontainerd:masterfrom
jsturtevant:fix-windows-closing
May 9, 2023
Merged

[Windows] Closing connections properly on Windows#185
Tim-Zhang merged 2 commits intocontainerd:masterfrom
jsturtevant:fix-windows-closing

Conversation

@jsturtevant
Copy link
Copy Markdown
Collaborator

The current closing logic was causing a deadlock becuase an unused pipe was being returned when the close was called on the listenering. This sets a flag so we can detect that we should not use that pipe connection which wasn't actually connected to a client but returned from the async call becuase we told it too.

more info

I was using the skeleton example to test rust-extensions and found that there was a race condition when shutting the shim down.

# start shim
cargo run --example skeleton -- -namespace default -id 1234 -address "\\.\pipe\containerd-containerd"  -publish-binary ./bin/containerd start

# send disconnect
cargo run --example shim-proto-connect \\.\pipe\containerd-shim-17630016127144989388-pipe
Connecting to \\.\pipe\containerd-shim-17630016127144989388-pipe...
Sending `Connect` request...
Connect response: ConnectResponse { shim_pid: 0, task_pid: 0, version: "example", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }
Sending `Shutdown` request...
Shutdown response: Empty { special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }

# shim is still running
ps *skeleton*

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
-------  ------    -----      -----     ------     --  -- -----------
     71       5      812       4396       0.00   5932   1 skeleton

Logs (with added logging) show that there are two (?!) clients but the code only connects once.

 Get-Content log -Tail 200 -Wait
[INFO] server listen started
[INFO] server started
[INFO] Shim successfully started, waiting for exit signal...
[INFO] Got new client
[INFO] thread start for pipe 168 ThreadId(8)
[INFO] entered while loop 168
[INFO] thread start for pipe 168 ThreadId(9)
[INFO] entered while loop 168
[INFO] thread start for pipe 168 ThreadId(10)
[INFO] entered while loop 168
[INFO] waiting for control call from handler threads
[INFO] thread start for pipe 168 ThreadId(11)
[INFO] entered while loop 168
[INFO] thread start for pipe 168 ThreadId(13)
[INFO] entered while loop 168
[INFO] waiting for control call from handler threads
[INFO] Connect request
[INFO] thread start for pipe 168 ThreadId(12)
[INFO] entered while loop 168
[INFO] entered while loop 168
[INFO] thread exit for connection 168 ThreadId(8)
[INFO] Shutdown request
[INFO] Shutting down shim instance
[INFO] close monitor
[INFO] entered while loop 168
[ERROR] listener accept got Custom { kind: Other, error: "listener shutdown for quit flag" }
[INFO] ttrpc server listener stopped
[INFO] Got new client
[INFO] listener thread stopped
[INFO] begin to shutdown connection
[INFO] shutdown pip 168
[INFO] closing pipe 168
[INFO] shutdown pip 280
[INFO] waiting for control call from handler threads
[INFO] thread start for pipe 280 ThreadId(17)
[INFO] thread start for pipe 280 ThreadId(16)
[INFO] thread exit for connection 280 ThreadId(16)
[INFO] closing pipe 280
[INFO] thread exit for connection 280 ThreadId(17)
[INFO] thread exit for connection 168 ThreadId(10)
[INFO] quitting client thread.
[INFO] thread start for pipe 280 ThreadId(18)
[INFO] thread exit for connection 280 ThreadId(18)
[INFO] thread start for pipe 280 ThreadId(19)
[INFO] thread exit for connection 280 ThreadId(19)
[INFO] thread exit for connection 168 ThreadId(13)
[INFO] thread exit for connection 168 ThreadId(12)
[INFO] thread exit for connection 168 ThreadId(9)
[INFO] connections closed
[INFO] thread exit for connection 168 ThreadId(11)
[INFO] reaper recieved 168
[INFO] closing pipe 168
[INFO] closing pipe 168

you can see there are two (!?) connections being handled, this is because the accept was returning an invalid handle when closing up causing the race condition on shutdown.

With this new logic we can see there is only ever one client (268):

[INFO] Shim successfully started, waiting for exit signal...
[INFO] listening...
[INFO] Got new client: 268
[INFO] waiting for control call from handler threads
[INFO] start handler for connection id 268 on thread ThreadId(8)
[INFO] entered while loop 268
[INFO] Connect request
[INFO] start handler for connection id 268 on thread ThreadId(9)
[INFO] entered while loop 268
[INFO] start handler for connection id 268 on thread ThreadId(15)
[INFO] entered while loop 268
[INFO] entered while loop 268
[INFO] waiting for control call from handler threads
[INFO] start handler for connection id 268 on thread ThreadId(13)
[INFO] entered while loop 268
[INFO] start handler for connection id 268 on thread ThreadId(12)
[INFO] entered while loop 268
[INFO] start handler for connection id 268 on thread ThreadId(10)
[INFO] entered while loop 268
[INFO] start handler for connection id 268 on thread ThreadId(11)
[INFO] entered while loop 268
[INFO] exit handler for connection id 268 on thread ThreadId(11)
[INFO] start handler for connection id 268 on thread ThreadId(14)
[INFO] entered while loop 268
[INFO] exit handler for connection id 268 on thread ThreadId(14)
[INFO] start handler for connection id 268 on thread ThreadId(16)
[INFO] entered while loop 268
[INFO] exit handler for connection id 268 on thread ThreadId(16)
[INFO] Shutdown request
[INFO] Shutting down shim instance
[INFO] close monitor
[INFO] entered while loop 268
[ERROR] listener accept got Custom { kind: Other, error: "listener shutdown for quit flag" }
[INFO] ttrpc server listener stopped
[INFO] exit handler for connection id 268 on thread ThreadId(9)
[INFO] listener thread stopped
[INFO] begin to shutdown connection
[INFO] connections closed
[INFO] exit handler for connection id 268 on thread ThreadId(15)
[INFO] quitting client thread.
[INFO] exit handler for connection id 268 on thread ThreadId(8)
[INFO] exit handler for connection id 268 on thread ThreadId(13)
[INFO] exit handler for connection id 268 on thread ThreadId(12)
[INFO] exit handler for connection id 268 on thread ThreadId(10)
[INFO] reaper recieved message for 268
[INFO] reaper thread exited
[INFO] reaper thread stopped
[INFO] server listen started
[INFO] server started
[INFO] Shim successfully started, waiting for exit signal...
[INFO] Connect request
[INFO] Shutdown request
[INFO] Shutting down shim instance
[INFO] close monitor
[ERROR] listener accept got Custom { kind: Other, error: "closing pipe" }
[INFO] ttrpc server listener stopped
[INFO] listener thread stopped
[INFO] begin to shutdown connection
[INFO] connections closed
[INFO] reaper thread exited
[INFO] reaper thread stopped

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 25, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (4661bd1) 24.39% compared to head (c9f67f5) 24.39%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #185   +/-   ##
=======================================
  Coverage   24.39%   24.39%           
=======================================
  Files          17       17           
  Lines        2529     2529           
=======================================
  Hits          617      617           
  Misses       1912     1912           
Impacted Files Coverage Δ
src/error.rs 55.55% <0.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jsturtevant jsturtevant changed the title Closing connections properly on Windows [Windows] Closing connections properly on Windows Apr 25, 2023
The current closing logic was causing a deadlock because an unused pipe was being returned when the close was called on the listener.  This sets a flag so we can detect that we should not use that pipe connection (which wasn't actually connected to a client but returned from the async call becuase we told it too return early).

Signed-off-by: James Sturtevant <jstur@microsoft.com>
@jsturtevant jsturtevant force-pushed the fix-windows-closing branch 3 times, most recently from 053d478 to 9d251f9 Compare May 1, 2023 16:23
Copy link
Copy Markdown
Collaborator

@quanweiZhou quanweiZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jsturtevant, few comments.


pub fn close(&self) -> Result<()> {
// release the ConnectNamedPipe thread by signaling the event and clean up event handle
self.shutting_down.store(true, Ordering::SeqCst);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when shutting_down == true, should return and not call set_event() and close_handle.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need both of these. set_event is what signals the thread that is suspended by GetOverlappedResult. The close_handle ensures that we don't leak the handle now that we've shut down. I might be able to move that to a Drop function for the pipelistener since I guess we could call accept again later after closing? I would need to think about that a bit more.

The current closing logic was causing a deadlock becuase an unused pipe was being returned when the close was called on the listenering.  This sets a flag so we can detect that we should not use that pipe connection which wasn't actually connected to a client but returned from the async call becuase we told it too.

Signed-off-by: James Sturtevant <jstur@microsoft.com>
@Tim-Zhang Tim-Zhang merged commit 891a155 into containerd:master May 9, 2023
KarstenB pushed a commit to KarstenB/ttrpc-rust that referenced this pull request May 1, 2025
[Windows] Closing connections properly on Windows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants