Skip to content

Conversation

@bbockelm
Copy link
Collaborator

@bbockelm bbockelm commented Jun 6, 2025

By default, the bridge will retry operations internally until the maximum delay is hit and then return an error. By default, the maximum delay is 3600 seconds; this is well beyond the timeouts of a typical HTTP client.

This makes the delay configurable for HTTP with the following:

http.maxdelay <seconds>

as in

http.maxdelay 10

It also tunes down the default delay to 30s.

Note: the response time can still be greater than 30s; this just limits how long the bridge will process "wait" responses for.

Copy link
Member

@abh3 abh3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is all about avoiding behaviour changes and using consistent handling of directives. It also winds up being less code.

@bbockelm bbockelm requested a review from abh3 June 18, 2025 18:25
@bbockelm
Copy link
Collaborator Author

@abh3 - I believe all your review comments have been addressed in the latest branch revision and requested a new review in the GitHub interface. Thanks!

Copy link
Member

@abh3 abh3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

@amadio amadio added this to the 5.9.0 milestone Jun 23, 2025
@bbockelm
Copy link
Collaborator Author

@amadio - question about things tagged 5.9.0. Does that mean you're just waiting to branch 5.9.0 and then will merge them? That things are targeting 5.9.0 but some other work is needed?

I understand devel is for things for 6.x but keeping things that are targeted for 5.9.0 out of development branches makes it harder to track patchsets downstream. Less applicable for this issue but it also makes it difficult for any PRs that would stack on top of each other.

@amadio
Copy link
Member

amadio commented Aug 14, 2025

@amadio - question about things tagged 5.9.0. Does that mean you're just waiting to branch 5.9.0 and then will merge them? That things are targeting 5.9.0 but some other work is needed?

The tags don't say much about the state of the work per se. It works as a reminder for me not to forget to look at it when releasing that particular release, and that the current plan is to include it into that particular release, unless something prevents it from happening (could be because it's not ready yet, or some other reason, like needing to move the release earlier to push out important fixes). In any case, if a change tagged for 5.9 doesn't make, we'd tag it to the next release where we intend to include it.

As for branches, devel holds things that break ABI, and/or features meant for future releases. It's meant to be a staging ground where history can still change by reordering commits with the goal of keeping the history in the master branch linear. However, due to the inconveniences it creates, I think we should just sit down at the workshop and come up with a slightly different model, probably similar to curl's, where we have a structured release cycle with merge windows, etc. Then everything can just come as pull requests to master and I will manage when/where to merge them. The other alternative may be to not have linear history, with branches per minor release, as we had before. We can discuss details in the workshop (I assume you're coming!).

By default, the bridge will retry operations internally until the
maximum delay is hit and then return an error.  By default, the
maximum delay is 3600 seconds; this is well beyond the timeouts of
a typical HTTP client.

This makes the delay configurable for HTTP with the following:

```
http.maxdelay <seconds>
```

as in

```
http.maxdelay 10
```

It also tunes down the default delay to 30s.

Note: the response time can still be greater than 30s; this just limits
how long the bridge will process "wait" responses for.
@amadio amadio merged commit a4d6c4d into xrootd:master Sep 8, 2025
11 checks passed
@smithdh
Copy link
Contributor

smithdh commented Oct 8, 2025

Hi @bbockelm . I happened to also see some not desirable behvior around the wait retry, while testing fix(es) for #2602. We do not cancel or break out of the redrive incase the http client has gone away (the redrive job will wait for the delay period, reissue its XrdXrootdProtocol reqeust, and if the result is still kxr_wait another redrive is scheduled etc.). If there was a situation where say, Open was returning kxr_wait(some small value) for a period of e.g. 5 minutes, and there was a e.g. a population of 100 http clients reqeusting a file with a client-side timeout of 10 seconds with immediate retry, we chould have 3000 redrive jobs scheduled (and network fd's open).

I haven't found a proposal for properly detecting that the http client has gone. We could detect, e.g. POLLRDHUP from poll to indicate client gone, but that seems not a safe assuption, as an http client could have shutdown(fd,SHUT_WR) its connection, but still be reading for the request result on the half open conneciton.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants