Skip to content

Conversation

@smithdh
Copy link
Contributor

@smithdh smithdh commented Sep 9, 2025

No description provided.

@smithdh smithdh changed the title [Xrootd] Copy Response with correct stream ID to XrdXrootdAioTask [Xrootd] Copy Response with correct stream ID to XrdXrootdAioTask xrootd#2592 Sep 9, 2025
@smithdh
Copy link
Contributor Author

smithdh commented Sep 9, 2025

This is a fix for #2592. The issue appeared to me to be incorrect stream ID placed in responses sent on bound (i.e. sub) Links, when the aio system is being used. (This happens for xcache, but not regular xrootd server, when the client is trying to use sub-streams).

I think the problem is that a read/pgread command is read from XrdXrootdProtocol object for the main (login) Link, and that Protocol has the correct stream ID set in its Response (the one the client set in the the read/pgread command). The protocol object for the link of the correspoding return pathID is found ('pP'), and a XrdXrootdAioTask is allocated. But the AioTask is prepared with the stream ID in the protocol object for the bound link, which at this point may not be correct. The result is that response is eventually sent on the correct Link, but possibly with the wrong streamId; thus the client can not match the response to the request, discards it and keeps waiting for a response.

@smithdh smithdh marked this pull request as ready for review September 9, 2025 14:10
@amadio amadio linked an issue Sep 9, 2025 that may be closed by this pull request
@abh3 abh3 self-requested a review September 9, 2025 23:18
Copy link
Member

@abh3 abh3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call David. From a historical perspective, the uncorrected code should produced the same hang for a server as it does for Xcache. There is one big difference here. Regular servers no longer enable themselves for async I/O since the Linux implementation of disk async I/O is rather useless. However, Xcache servers do enable it because they necessarily bypass the Linux implementation and do the expected async operations. This happened in 5.3.0 2021-07-09. So, something between then and now must have broken the original design but we never noticed it because the feature was turned off for servers and, apparently, no one does multi-stream copies via xcache (though Atlas does do single stream copy to scratch).

@smithdh
Copy link
Contributor Author

smithdh commented Sep 10, 2025

Thanks for having a look @abh3

@amadio amadio added this to the 5.9.0 milestone Sep 10, 2025
@amadio amadio merged commit ef26ab3 into xrootd:master Sep 10, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

xrdcp from an xcache does not work with streams option

3 participants