Skip to content

Conversation

@jthiltges
Copy link
Contributor

Leave connection open after redirection if keepalive is enabled

At present, XRootD immediately closes HTTP connections when replying with a redirection, while still indicating keep-alive support in the reply. This is causing occasional problems for some HTTP clients, as they try to immediately reuse the connection, before the connection closure has been processed.

This change resolves the issue in local testing, but a thoughtful review would be appreciated to ensure connection reuse won't add other issues.

Copy link
Member

@abh3 abh3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty reasonable to me.; thanks @jthiltges; @ccaffy do you concurr?

@amadio
Copy link
Member

amadio commented Jul 4, 2024

@abh3 Looks like keeping CentOS 7 CI alive will not be so easy. Should we just drop it? I can check before releasing that everything still works on CentOS 7 and build the packages manually.

@abh3
Copy link
Member

abh3 commented Jul 4, 2024

Looks like there is a compiler mismatch. However, isn't it the case if we restrict ourselves to C++14 it will still work in CentOS17? If that is the case there is a trivial fix for that (I can put it in if we go this way) and hold off C++17 until the end of the year. How's that?

@amadio
Copy link
Member

amadio commented Jul 4, 2024

No, the problem is that nodejs needed in CI is now broken on CentOS 7, take a look at the log of the failed build, it seems there's a problem with the version of libc it expects.

@ccaffy
Copy link
Contributor

ccaffy commented Jul 10, 2024

Hi @jthiltges ,

Thanks for this contribution, for me it looks good as-is, I'll just try few things before approving. With which kind of setup did you try this ?
Also, I'm interested to understand, why would you need to keep the connection alive for the client as the latter is being redirected to a new host to perform the transfer? Does your client internally keeps the connection somewhere and re-use that to issue a brand new transfer ?

Thanks again!

@jthiltges
Copy link
Contributor Author

Hi @ccaffy,

With which kind of setup did you try this?

This PR came from investigating the uproot5 issue 1233 above. It includes a short reproducer.

Our server environment has a redirector and multiple data servers. The uproot client is retrieving files from the redirector via HTTPS. The client makes many range requests in short succession, reusing established connections where possible. Each request starts with the redirector.

Does your client internally keeps the connection somewhere and re-use that to issue a brand new transfer?

That's exactly it. The HTTP requests are handled by aiohttp, which keeps a connection pool.

Changing XRootD to reply to redirects with Connection: close should also resolve the issue, but would add overhead with each redirection needing a new SSL handshake.

Reproducing the uproot/aiohttp issue is a bit tricky due to timing issues. I was only able to reliably reproduce it when connecting from a client in the same datacenter. I'm guessing longer distances/delays allow the TCP stack to catch up and realize the connection is closed before the HTTP client reuses it.

The keep-alive behavior is thankfully easier to see with curl. The client opens the connection, but finds it dead:

$ curl -v https://xrootd-local.unl.edu:1094/store/user/AGC/samplegame/sample[1-2].root
...
* using HTTP/1.x
> GET /store/user/AGC/samplegame/sample1.root HTTP/1.1
> Host: xrootd-local.unl.edu:1094
> User-Agent: curl/8.6.0
> Accept: */*
> 
< HTTP/1.1 302 Redirect
< Connection: Keep-Alive
< Server: XrootD/v5.6.8
< Content-Length: 0
< Location: https://red-xfer6.unl.edu:1094/store/user/AGC/samplegame/sample1.root
< 
* Connection #0 to host xrootd-local.unl.edu left intact
* Found bundle for host: 0x55a79d363b40 [serially]
* Can not multiplex, even if we wanted to
* Connection 0 seems to be dead
* Closing connection

With the patch, curl is able to reuse the connection:

...
* Connection #0 to host xrootd-local.unl.edu left intact
* Found bundle for host: 0x55d7a8a60b40 [serially]
* Can not multiplex, even if we wanted to
* Re-using existing connection with host xrootd-local.unl.edu
> GET /store/user/AGC/samplegame/sample2.root HTTP/1.1

Regards,
John

@ccaffy
Copy link
Contributor

ccaffy commented Jul 11, 2024

Thanks a lot for the nice explanations!
I will test that today.

Copy link
Contributor

@ccaffy ccaffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I could test HTTP/TPC PULL/PUSH and HTTP GET/PUT,HEAD requests, all good!

@jthiltges
Copy link
Contributor Author

It seems that this patch does not correct handle the case where the client does not want (or support) keep-alive: the connection is left open. I'm switching the PR to draft while investigating.

I put this patch into production on our local redirector. It caused one of our HTTP monitoring checks to fail. Nagios check_http (also used by checkmk) uses HTTP 1.0 for requests, which does not support keep-alive. If the server keeps the connection open, the check eventually times out.

Ex.

$ check_http -u /user/ligo/test_access/access_unauth --ssl -e "302 Redirect" --sni -p 1094 129.93.239.164 -v
SSL initialized
GET /user/ligo/test_access/access_unauth HTTP/1.0
User-Agent: check_http/v2.2 (monitoring-plugins 2.2)
Connection: close


CRITICAL - Socket timeout after 10 seconds

@jthiltges jthiltges marked this pull request as draft July 18, 2024 18:35
@jthiltges
Copy link
Contributor Author

I believe I've found the problem. The reset() function clears the keepalive variable:

keepalive = true;

Then the following return keepalive; is always true.

In this PR, I'd followed the pattern used elsewhere. E.g.

reset();
return keepalive ? 1 : -1;

I'd be happy to update the PR to correct the problem. How best to solve it? Save keepalive in a temporary variable before resetting?

bool tmp_keepalive = keepalive;
reset();
return tmp_keepalive;

@abh3
Copy link
Member

abh3 commented Jul 18, 2024

It looks like using a temp variable is the simplest and thus best solution here. Anyway, the compiler will elide the variable.

@ccaffy
Copy link
Contributor

ccaffy commented Jul 19, 2024

Oh very nice catch! OK then please update this PR with that fix and add a comment explaining why you have this temporary variable! Thanks again for this work!

@jthiltges jthiltges marked this pull request as ready for review July 23, 2024 20:05
@jthiltges jthiltges requested review from abh3 and ccaffy July 23, 2024 20:05
@amadio amadio force-pushed the pr/redirkeepalive branch from 5761d2e to 389c797 Compare July 25, 2024 15:05
@amadio
Copy link
Member

amadio commented Jul 25, 2024

@jthiltges Please push at least one tag (e.g. with git fetch --tags && git push --tags jthiltges) so that the CI in your fork can work as well. Thank you!

@jthiltges
Copy link
Contributor Author

@jthiltges Please push at least one tag (e.g. with git fetch --tags && git push --tags jthiltges) so that the CI in your fork can work as well. Thank you!

Thank you. I'm hoping it's sorted out. Closing this PR and reopening to hopefully kick off the CI again.

@jthiltges jthiltges closed this Jul 25, 2024
@jthiltges jthiltges reopened this Jul 25, 2024
Copy link
Member

@amadio amadio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine as is, but I suggested a small improvement. Please consider it. Thank you!

jthiltges and others added 3 commits July 29, 2024 08:31
Leave connection open after redirection if keepalive is enabled
The reset function prepares the object for another request, including
clearing the keepalive variable. Save the keepalive value when it's
needed as a return value.
Co-authored-by: Guilherme Amadio <amadio@cern.ch>
@amadio amadio force-pushed the pr/redirkeepalive branch from dcfb2bf to 8ca30d6 Compare July 29, 2024 06:31
@amadio amadio added this to the 5.7.1 milestone Jul 29, 2024
@amadio amadio merged commit 81f39a6 into xrootd:devel Jul 29, 2024
@jthiltges jthiltges deleted the pr/redirkeepalive branch August 6, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants