-
Notifications
You must be signed in to change notification settings - Fork 166
[XrdHttp] Refactoring the read issuing for GET and fix issues 1976, 2076 #2072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // - 0: Perform a stat on the resource | ||
| // - 1: Perform a checksum request on the resource (only if requested in header; otherwise skipped) | ||
| // - 2: Perform an open request (dirlist as appropriate). | ||
| // - 3+: Reads from file; if at end, perform a close. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose reads are a lot more common than stats, checksums, or open requests, so maybe it's worth having that as if (reqstate >= 3) do_read(); else switch(reqstate) ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it eases the reading of the code, understanding it's a state machine?
5a37d26 to
1c48ca3
Compare
b6e0f5c to
d4c108b
Compare
src/XrdHttp.cmake
Outdated
| XrdServer | ||
| XrdUtils | ||
| XrdCrypto | ||
| XrdCl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concert here is this new dependency on XrdCl (due to XrdCl::ChunkList). I will wait for the other reviewers to confirm this is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that dependency, but maybe @abh3 is not? Otherwise, we can put this typedef into a common library used by XrdCl and XrdHttp, for example XrdUtils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the dependency is only for the ChunkInfo structure (and ChunkList, which is a vector of the former) as it had occurred to me reuse an exiting type of object, and on the client side this is used for almost exactly the purpose it is used here. I was going to use a utility function from XrdClUtils too, but it didn't do exactly what was needed. so there's really not much needed from XrdCl. maybe an alternative might be to use another structure from the server or other class, or just invent an own type somewhere in XrdHttp. (This PR already introduces a type XrdHttpReadRangeHandler::UserRange and UserRangeList for similar purpose).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's just a typedef, so you may add this typedef somewhere in XrdHttp --> I don't think it will cause any harm ;)
EDIT: Indeed, not just a typedef, I misread the code...
Thanks for all your work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely, the best long term solution is to move XrdClChunklist to the utility package and use it both in the client and the http server part. What do you think? I know it's more work, sigh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked around and saw he have XrdOucIOVec, which is used in several places server/plugin side. It's quite similar to XrdCl::ChunkInfo. So I updated this PR to use that (in fact XrdOucIOVec2, as that has a useful constructor added). The missing piece was a typedef for a container of those (to correspond to ChunkList); so I added a typedef for a std::vector in XrdHttpUtils.
We could move ChunkInfo/ChunkList to the utility package, and possibly try moving some uses of XrdOucIOVec to that, or simply kept both. But I thought perhaps it might be better to leave that out of this PR, given it seems quite alright to use XrdOucIOVec now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David! I am perfectly fine with what you did :)
| } | ||
| } | ||
|
|
||
| TEST(XrdHttpTests, xrdHttpReadRangeHandlerTwoRangesOfSizeEqualToMaxChunkSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding many more tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cedric started adding those, and I added a few more. these are all tests of the range parsing and chunking logic. Your & Brian's comments on #2076 seems good idea too for future/enhancement. (i.e. tests from an independent client like curl towards a running XrdHttp server).
|
As a summary; range header parsing, chunking read requests for read/readv is refactored but still following similar method: (i) For GET requests with no range header, kXR_read is issued. Previously this was in 1MiB ranges, now changed to 2MiB, issuing as many as needed to cover the file. (ii) For requests with 1 range, kXR_read was also used to read the range, so still similar to (i) (iii) For requests with >1 range, kXR_readv was used, adding in the user's range requests as elements in a read list, and breaking up individual elements if they are too large. Previously only 1 kXR_readv could be issued, with a hard limit of 1024 on the number of elements, and 128KiB on their size. Now ranges are broken up in a similar way to before, but with adjusted limits, 1023 elements and 2MiB-16 on the chunk size. If the resulting list has more than one element kXR_readv is used, kXR_read otherwise. There's no limit on the number of kXR_readv that can be issued. The limit on the number of ranges should be that implied by the header parsing, (e.g. if the http header becomes too big because it's full of ranges it won't be processed). I tested with 10000 (numerically short ranges to keep the line length low), and a smaller number of large ranges. There's scope to do something more involved, e.g. in (iii). But I wasn't sure this would give an advantage, so I stopped with the above. If we'd like to change I can adjust. |
abh3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment on adding an XrdCl dependency here. As we are not creating a circular dependency, it is technically correct. However, I did suggest to add a bit more effort to avoid it.
c454244 to
0e201f7
Compare
|
I had some worries about leaving the maximum readv chunk size and number of chunks at their limits; especially as the kXR_readv maximum element size supported can depend on the server buffer size configuration. (The limit can be discovered with a server query for "readv_ior_max"; but I'm not making that query). So I backed off the size limits used to 512 elements/ 512KiB max element size. And set the overall maximum request size issued (per kXR_read or kXR_readv) to 8MiB. In tests I'd noticed that http headers longer than 16KiB become unreliable (e.g. a header with a huge list of ranges "Range: bytes=aaa-bbb,ccc-ddd,..." might pass that limit). When larger than 16KiB it depends on whether the read from the network receives the http header bytes in one go. The request might be parsed and succeed, or XrdHttpProtocol can consider it invalid and close the connection. If that's a problem I suppose we open another ticket and think about what should be done. I'll move this PR out of draft now. |
|
I have two questions: 1) Why aren't you making the query so that you can use he actual limit values, and 2) if this is because the query is too complicated to process, would it help if the server simply set an envar with the actual limits so that it would be trivial to use the limits. Note that the limits are static, once established it is not changed for the duration of the server's lifetime. As far as invalid headers, this seems to be a common problem in HTTP. Why not set the header limit to one byte less than the buffer size. So, one reads a header longer than the effective limit (i.e. not the hard limit) reject the request. This prevents partially processing a header that would otherwise be invalid. |
|
Hi Andy. Thanks for looking into this. I can quickly try to answer you first questions: (1) it was a complication concern, but really nothing should prevent doing it; we'd have to send the query via the bridge, but I assume this would work. Then the XrdHttp part could keep it cached, as you mentioned it wouldn't change during the process lifetime. (2) Yes, that would be easier I think. But if you'd like to recommend the extra completeness of the query I will try to rework it in. (unless there's something unforeseen discovered) |
|
You are correct, it's just too complicated to use the bridge to just get two numbers. It's trivial to set the envar and trivial for you to pick it up and use the defaults you propose if it isn't set. So, let's call it XRD_READV_LIMITS and it will have two ints. The first will be the readv_ior_max value and the second will be the readv_iov_max value. OK? |
|
Hi. Thanks, that sounds good, definitely easier. So the content of the unix environment variable string will be decimal with - space(?) - separator, e.g. "2097136 1024" ? |
|
@smithdh if that can help, we pass configured checksum from the XRootD server to the XrdHttp layer via an environment variable that contains a string which is a comma-separated list of index:cksumName (https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdConfig.cc#L338). Like |
|
hi @ccaffy ; thanks, comma separated sounds find. I didn't check if we already had a convention on that, makes sense to keep using the same. i'll adapt the pr. |
0e201f7 to
5618228
Compare
|
I just pushed the change to set the XRD_READV_LIMITS envar to a comma separated list of readv_ior_max readv_iov_max. I also made sure it's used consistently throughout the server. |
This change was mostly extracted from a larger pull request that included other features, xrootd#1957. The current change aims to be a minimal refactor in preparation for other work in the same area of the code. Co-authored-by: Brian Bockelman <bbockelman@morgridge.org>
…ng (xrootd#1976) Co-authored-by: Cedric Caffy <cedric.caffy@cern.ch>
5618228 to
6c202f7
Compare
|
Thank you Andy. I rebased this PR (and fixed a bug) and tested that I pick up the values you put in the XRD_READV_LIMITS ok. About the invalid headers; I think this is connected to #1304; where a change was added that was to limit the acceptable length of a single http header line to 16KB. But I saw that sometimes it will accept longer header lines and sometimes not. I bet your suggestion would work - but I didn't check carefully, with the circular buffer, so I'll leave out attempting to fix that from this PR. |
abh3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @amadio will do the final approval and merge.
This change was mostly extracted from a larger pull request that included other features, #1957. The current change aims to be a minimal refactor in preparation for other work in the same area of the code.
( The followup work should be a fix for #1976 )