Skip to content

Change content returned by warehouse raw data REST endpoint to help clients catch silent errors.#2058

Merged
aaronweeden merged 16 commits intoubccr:xdmod11.0from
aaronweeden:fix-raw-data-11.0
Jul 3, 2025
Merged

Change content returned by warehouse raw data REST endpoint to help clients catch silent errors.#2058
aaronweeden merged 16 commits intoubccr:xdmod11.0from
aaronweeden:fix-raw-data-11.0

Conversation

@aaronweeden
Copy link
Copy Markdown
Contributor

@aaronweeden aaronweeden commented Jul 1, 2025

Description

This PR changes the /rest/warehouse/raw-data endpoint to do proper streaming using chunked transfer encoding wherein each row is sent as:

<hex size of row>\r\n<row>\r\n

and after all rows have been sent:

0\r\n\r\n

This allows clients to verify that all the rows were sent by checking for the final 0\r\n\r\n.

While developing this PR, it was also noticed that some of the functions involved in getting the raw data can be made static since they don't manipulate instance variables; this PR updates those.

This PR also fixes a bug when generating raw data regression test artifacts.

This PR also makes it so the raw data regression tests always sort the output before comparing it since the output is nondeterministic.

The CI tests for this PR depend on ubccr/xdmod-qa#41.

ubccr/xdmod-supremm#431 updates the regression test artifacts for xdmod-supremm.

ubccr/xdmod-data#73 updates xdmod-data to support the new response and warn if the 0\r\n\r\n was not received.

Motivation and Context

The current endpoint (as implemented in #1858) sends data as a JSON text sequence, but it does not send any special chunk once all the rows have been sent. Thus, the client has no way of knowing if the connection was closed before all the data were sent (because the response is streaming, the status code 200 is sent prior to any data being sent, and the size of the content is not calculated ahead of time). In tandem with the fact that ACCESS XDMoD currently has a 30-minute limit on script execution time, this led to bugs in which xdmod-data could request raw data (e.g., 2024-01-01 through 2024-03-31 in the SUPREMM realm), but only some of the rows would be returned, the script would time out, and no errors or warnings would appear on the client side.

Tests performed

This PR and ubccr/xdmod-data#73 update the regression tests.

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request

@aaronweeden aaronweeden merged commit 929d50f into ubccr:xdmod11.0 Jul 3, 2025
4 checks passed
@aaronweeden aaronweeden deleted the fix-raw-data-11.0 branch July 3, 2025 18:46
aaronweeden added a commit to aaronweeden/xdmod that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Bugfixes Category: Data Analytics Framework php Pull requests that update Php code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants