Change content returned by warehouse raw data REST endpoint to help clients catch silent errors.#2058
Merged
aaronweeden merged 16 commits intoubccr:xdmod11.0from Jul 3, 2025
Merged
Conversation
This was referenced Jul 1, 2025
ryanrath
approved these changes
Jul 3, 2025
aaronweeden
added a commit
to aaronweeden/xdmod
that referenced
this pull request
Jul 3, 2025
…lients catch silent errors. (ubccr#2058)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR changes the
/rest/warehouse/raw-dataendpoint to do proper streaming using chunked transfer encoding wherein each row is sent as:and after all rows have been sent:
This allows clients to verify that all the rows were sent by checking for the final
0\r\n\r\n.While developing this PR, it was also noticed that some of the functions involved in getting the raw data can be made static since they don't manipulate instance variables; this PR updates those.
This PR also fixes a bug when generating raw data regression test artifacts.
This PR also makes it so the raw data regression tests always sort the output before comparing it since the output is nondeterministic.
The CI tests for this PR depend on ubccr/xdmod-qa#41.
ubccr/xdmod-supremm#431 updates the regression test artifacts for
xdmod-supremm.ubccr/xdmod-data#73 updates
xdmod-datato support the new response and warn if the0\r\n\r\nwas not received.Motivation and Context
The current endpoint (as implemented in #1858) sends data as a JSON text sequence, but it does not send any special chunk once all the rows have been sent. Thus, the client has no way of knowing if the connection was closed before all the data were sent (because the response is streaming, the status code 200 is sent prior to any data being sent, and the size of the content is not calculated ahead of time). In tandem with the fact that ACCESS XDMoD currently has a 30-minute limit on script execution time, this led to bugs in which
xdmod-datacould request raw data (e.g.,2024-01-01through2024-03-31in theSUPREMMrealm), but only some of the rows would be returned, the script would time out, and no errors or warnings would appear on the client side.Tests performed
This PR and ubccr/xdmod-data#73 update the regression tests.
Checklist: