Skip to content

Conversation

@smithdh
Copy link
Contributor

@smithdh smithdh commented Aug 17, 2023

This change was mostly extracted from a larger pull request that included other features, #1957. The current change aims to be a minimal refactor in preparation for other work in the same area of the code.

( The followup work should be a fix for #1976 )

// - 0: Perform a stat on the resource
// - 1: Perform a checksum request on the resource (only if requested in header; otherwise skipped)
// - 2: Perform an open request (dirlist as appropriate).
// - 3+: Reads from file; if at end, perform a close.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose reads are a lot more common than stats, checksums, or open requests, so maybe it's worth having that as if (reqstate >= 3) do_read(); else switch(reqstate) ...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it eases the reading of the code, understanding it's a state machine?

@smithdh
Copy link
Contributor Author

smithdh commented Sep 5, 2023

@amadio Thanks for looking at this. I'll put it back to draft now, and will update soon with a couple of commits to also address #1976 and #2076.

@smithdh smithdh marked this pull request as draft September 5, 2023 08:17
@smithdh smithdh changed the title [XrdHttp] Refactoring of the request statemachine for GET [XrdHttp] Refactoring of the GET request and fix issues 1976, 2076 Sep 5, 2023
@amadio amadio requested review from abh3 and ccaffy September 5, 2023 13:07
XrdServer
XrdUtils
XrdCrypto
XrdCl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concert here is this new dependency on XrdCl (due to XrdCl::ChunkList). I will wait for the other reviewers to confirm this is ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that dependency, but maybe @abh3 is not? Otherwise, we can put this typedef into a common library used by XrdCl and XrdHttp, for example XrdUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the dependency is only for the ChunkInfo structure (and ChunkList, which is a vector of the former) as it had occurred to me reuse an exiting type of object, and on the client side this is used for almost exactly the purpose it is used here. I was going to use a utility function from XrdClUtils too, but it didn't do exactly what was needed. so there's really not much needed from XrdCl. maybe an alternative might be to use another structure from the server or other class, or just invent an own type somewhere in XrdHttp. (This PR already introduces a type XrdHttpReadRangeHandler::UserRange and UserRangeList for similar purpose).

Copy link
Contributor

@ccaffy ccaffy Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just a typedef, so you may add this typedef somewhere in XrdHttp --> I don't think it will cause any harm ;)

EDIT: Indeed, not just a typedef, I misread the code...

Thanks for all your work!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely, the best long term solution is to move XrdClChunklist to the utility package and use it both in the client and the http server part. What do you think? I know it's more work, sigh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around and saw he have XrdOucIOVec, which is used in several places server/plugin side. It's quite similar to XrdCl::ChunkInfo. So I updated this PR to use that (in fact XrdOucIOVec2, as that has a useful constructor added). The missing piece was a typedef for a container of those (to correspond to ChunkList); so I added a typedef for a std::vector in XrdHttpUtils.

We could move ChunkInfo/ChunkList to the utility package, and possibly try moving some uses of XrdOucIOVec to that, or simply kept both. But I thought perhaps it might be better to leave that out of this PR, given it seems quite alright to use XrdOucIOVec now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David! I am perfectly fine with what you did :)

}
}

TEST(XrdHttpTests, xrdHttpReadRangeHandlerTwoRangesOfSizeEqualToMaxChunkSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding many more tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cedric started adding those, and I added a few more. these are all tests of the range parsing and chunking logic. Your & Brian's comments on #2076 seems good idea too for future/enhancement. (i.e. tests from an independent client like curl towards a running XrdHttp server).

@smithdh
Copy link
Contributor Author

smithdh commented Sep 5, 2023

As a summary; range header parsing, chunking read requests for read/readv is refactored but still following similar method:

(i) For GET requests with no range header, kXR_read is issued. Previously this was in 1MiB ranges, now changed to 2MiB, issuing as many as needed to cover the file.

(ii) For requests with 1 range, kXR_read was also used to read the range, so still similar to (i)

(iii) For requests with >1 range, kXR_readv was used, adding in the user's range requests as elements in a read list, and breaking up individual elements if they are too large. Previously only 1 kXR_readv could be issued, with a hard limit of 1024 on the number of elements, and 128KiB on their size. Now ranges are broken up in a similar way to before, but with adjusted limits, 1023 elements and 2MiB-16 on the chunk size. If the resulting list has more than one element kXR_readv is used, kXR_read otherwise. There's no limit on the number of kXR_readv that can be issued. The limit on the number of ranges should be that implied by the header parsing, (e.g. if the http header becomes too big because it's full of ranges it won't be processed). I tested with 10000 (numerically short ranges to keep the line length low), and a smaller number of large ranges.

There's scope to do something more involved, e.g. in (iii). But I wasn't sure this would give an advantage, so I stopped with the above. If we'd like to change I can adjust.

Copy link
Member

@abh3 abh3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment on adding an XrdCl dependency here. As we are not creating a circular dependency, it is technically correct. However, I did suggest to add a bit more effort to avoid it.

@smithdh smithdh force-pushed the xrdhttpd_rangereqs branch 3 times, most recently from c454244 to 0e201f7 Compare September 11, 2023 09:51
@smithdh
Copy link
Contributor Author

smithdh commented Sep 12, 2023

I had some worries about leaving the maximum readv chunk size and number of chunks at their limits; especially as the kXR_readv maximum element size supported can depend on the server buffer size configuration. (The limit can be discovered with a server query for "readv_ior_max"; but I'm not making that query). So I backed off the size limits used to 512 elements/ 512KiB max element size. And set the overall maximum request size issued (per kXR_read or kXR_readv) to 8MiB.

In tests I'd noticed that http headers longer than 16KiB become unreliable (e.g. a header with a huge list of ranges "Range: bytes=aaa-bbb,ccc-ddd,..." might pass that limit). When larger than 16KiB it depends on whether the read from the network receives the http header bytes in one go. The request might be parsed and succeed, or XrdHttpProtocol can consider it invalid and close the connection. If that's a problem I suppose we open another ticket and think about what should be done.

I'll move this PR out of draft now.

@smithdh smithdh marked this pull request as ready for review September 12, 2023 07:17
@abh3
Copy link
Member

abh3 commented Sep 12, 2023

I have two questions: 1) Why aren't you making the query so that you can use he actual limit values, and 2) if this is because the query is too complicated to process, would it help if the server simply set an envar with the actual limits so that it would be trivial to use the limits. Note that the limits are static, once established it is not changed for the duration of the server's lifetime.

As far as invalid headers, this seems to be a common problem in HTTP. Why not set the header limit to one byte less than the buffer size. So, one reads a header longer than the effective limit (i.e. not the hard limit) reject the request. This prevents partially processing a header that would otherwise be invalid.

@smithdh
Copy link
Contributor Author

smithdh commented Sep 12, 2023

Hi Andy. Thanks for looking into this. I can quickly try to answer you first questions: (1) it was a complication concern, but really nothing should prevent doing it; we'd have to send the query via the bridge, but I assume this would work. Then the XrdHttp part could keep it cached, as you mentioned it wouldn't change during the process lifetime. (2) Yes, that would be easier I think. But if you'd like to recommend the extra completeness of the query I will try to rework it in. (unless there's something unforeseen discovered)

@abh3
Copy link
Member

abh3 commented Sep 12, 2023

You are correct, it's just too complicated to use the bridge to just get two numbers. It's trivial to set the envar and trivial for you to pick it up and use the defaults you propose if it isn't set. So, let's call it XRD_READV_LIMITS and it will have two ints. The first will be the readv_ior_max value and the second will be the readv_iov_max value. OK?

@smithdh
Copy link
Contributor Author

smithdh commented Sep 12, 2023

Hi. Thanks, that sounds good, definitely easier. So the content of the unix environment variable string will be decimal with - space(?) - separator, e.g. "2097136 1024" ?

@ccaffy
Copy link
Contributor

ccaffy commented Sep 13, 2023

@smithdh if that can help, we pass configured checksum from the XRootD server to the XrdHttp layer via an environment variable that contains a string which is a comma-separated list of index:cksumName (https://github.com/xrootd/xrootd/blob/master/src/XrdXrootd/XrdXrootdConfig.cc#L338). Like 0:adler32,1:sha512. So I believe a comma-separated list is good enough? Space also works anyway! I believe @abh 's point is that the two integers should be in the same environment variable that you parse :)

@smithdh
Copy link
Contributor Author

smithdh commented Sep 13, 2023

hi @ccaffy ; thanks, comma separated sounds find. I didn't check if we already had a convention on that, makes sense to keep using the same. i'll adapt the pr.

@abh3
Copy link
Member

abh3 commented Sep 14, 2023

I just pushed the change to set the XRD_READV_LIMITS envar to a comma separated list of readv_ior_max readv_iov_max. I also made sure it's used consistently throughout the server.

@smithdh smithdh changed the title [XrdHttp] Refactoring of the GET request and fix issues 1976, 2076 [XrdHttp] Refactoring the read issuing for GET and fix issues 1976, 2076 Sep 14, 2023
smithdh and others added 3 commits September 14, 2023 09:47
This change was mostly extracted from a larger pull request that
included other features, xrootd#1957. The current change aims to be
a minimal refactor in preparation for other work in the same area
of the code.

Co-authored-by: Brian Bockelman <bbockelman@morgridge.org>
…ng (xrootd#1976)

Co-authored-by: Cedric Caffy <cedric.caffy@cern.ch>


Also suppress Content-Length header for transfer-encoding chunked response.
@smithdh
Copy link
Contributor Author

smithdh commented Sep 14, 2023

Thank you Andy. I rebased this PR (and fixed a bug) and tested that I pick up the values you put in the XRD_READV_LIMITS ok.

About the invalid headers; I think this is connected to #1304; where a change was added that was to limit the acceptable length of a single http header line to 16KB. But I saw that sometimes it will accept longer header lines and sometimes not. I bet your suggestion would work - but I didn't check carefully, with the circular buffer, so I'll leave out attempting to fix that from this PR.

Copy link
Member

@abh3 abh3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @amadio will do the final approval and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XrdHttp: "TE" header + "X-Transfer-Status" + byte-range read results in bad transfer data

4 participants