Skip to content

Rewrote post analytics CSV export endpoint to stream results#27752

Merged
cmraible merged 27 commits into
mainfrom
chris-onc-1566-post-analytics-export-not-working-for-tangle
May 19, 2026
Merged

Rewrote post analytics CSV export endpoint to stream results#27752
cmraible merged 27 commits into
mainfrom
chris-onc-1566-post-analytics-export-not-working-for-tangle

Conversation

@cmraible

@cmraible cmraible commented May 7, 2026

Copy link
Copy Markdown
Collaborator

fixes https://linear.app/tryghost/issue/ONC-1566/

I recommend reviewing this PR one commit at a time.

Rewrites Ghost's post analytics CSV export to stream results to the client instead of buffering the entire CSV in memory before sending. This should fix the export endpoint timing out for sites with large post catalogs before returning even a single byte.

I've tested this manually with a large dataset, and the resulting CSV export is byte-for-byte identical with the current behavior on main. Even if this doesn't fix the timeouts, it is a step in the right direction to stream the CSV rather than buffering it all in memory.

@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR converts the post analytics CSV export from a synchronous, single-page fetch to a streaming, paginated approach. A new createCSVTransform() Node.js stream transforms objects to CSV incrementally. The PostsExporter is refactored to use an async generator that pages through posts with stable ordering, extracting export context once and mapping each batch before yielding. The API serializer wires the exporter and transform together via stream.pipeline, setting appropriate response headers and forwarding errors. Stable ordering support is added to findPage to ensure deterministic pagination across batches. All existing tests are updated to work with the streaming API, and new tests validate transform behavior, pagination stability, and error handling.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the purpose of the changes (streaming CSV export instead of buffering), references the related issue, notes testing verification, and provides context for reviewers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'Rewrote post analytics CSV export endpoint to stream results' clearly and concisely describes the main change across the PR: converting the export endpoint from buffering the entire CSV in memory to streaming results to the client.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chris-onc-1566-post-analytics-export-not-working-for-tangle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cmraible cmraible changed the title 🐛 Fixed post analytics export for large datasets 🐛 Fixed post analytics export for large datasets timing out May 7, 2026
@cmraible cmraible force-pushed the chris-onc-1566-post-analytics-export-not-working-for-tangle branch from 520d822 to 71e0faa Compare May 7, 2026 23:52
@cmraible cmraible marked this pull request as ready for review May 8, 2026 00:17

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
ghost/core/test/unit/server/models/post.test.js (1)

188-215: ⚡ Quick win

Add a companion test for the “already stable” order case.

Please add a test where order already contains id (for example published_at desc, id desc) with stableOrder: true, and assert no duplicate id ordering is appended. This protects the hasStableOrderField branch against regressions.

Suggested test shape
+        it('does not append duplicate stable order field when id is already present', function () {
+            const queries = [];
+            tracker.install();
+
+            tracker.on('query', (query) => {
+                queries.push(query);
+                query.response([]);
+            });
+
+            return models.Post.findPage({
+                filter: 'published_at:>\'2015-07-20\'',
+                order: 'published_at desc, id desc',
+                limit: 1,
+                skipPagination: true,
+                stableOrder: true
+            }).then(() => {
+                assert.equal(queries.length, 1);
+                assert.equal(
+                    queries[0].sql,
+                    'select `posts`.* from `posts` where (`posts`.`published_at` > ? and (`posts`.`type` = ? and `posts`.`status` = ?)) order by `posts`.`published_at` DESC, `posts`.`id` DESC limit ?'
+                );
+            });
+        });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/test/unit/server/models/post.test.js` around lines 188 - 215, Add
a new unit test next to the existing "adds a stable order field..." test that
calls models.Post.findPage with stableOrder: true but with an order string that
already includes the id field (e.g. order: 'published_at desc, id desc');
install the tracker and capture the generated SQL and bindings, then assert that
the generated query does not append a duplicate `posts.id` ordering (SQL should
contain only one `posts.id` DESC and the bindings remain correct) and that
result.meta is still {}—this will exercise the hasStableOrderField branch in the
Post.findPage logic.
ghost/core/test/unit/server/services/posts/posts-exporter.test.js (2)

385-443: 💤 Low value

Default- and invalid-limit fallback coverage looks good.

Both tests correctly assert the 15-row default and the 'invalid' → default fallback. Minor optional cleanup: the three new pagination tests duplicate the Array.from({length: N}, ...) post-factory and the models.Post = { findPage: ... } mock; extracting a small buildPosts(n) / mockPaginatedPostModel(posts) helper inside this describe would shrink ~60 lines and make future additions cheaper. Not blocking.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js` around
lines 385 - 443, Extract the duplicated post factory and paginated model mock
into two helpers (e.g., buildPosts(count) that returns the Array.from(...) posts
using the existing post and email createModel shape, and
mockPaginatedPostModel(posts) that assigns models.Post = { findPage: async
(options) => { ... } } using the same pagination logic) and replace the repeated
blocks in the tests that call buildPosts(20) and mockPaginatedPostModel(posts)
before invoking exporter.export and collectStream; keep the helpers scoped
inside the describe so tests still reference post, exporter.export and
collectStream unchanged.

326-383: 💤 Low value

Snapshot test only exercises the “stable order requested” path.

The mock unconditionally appends id desc whenever options.stableOrder is set, and the exporter always passes stableOrder: true, so hasStableOrder is true on every call and the deliberately-swapped page-2 branch (Lines 351‑358) is never executed. The test still meaningfully snapshots the requested orders and confirms no duplicates/missing IDs, but it does not actually demonstrate that without stableOrder the export would be unstable. Consider adding a sibling case that calls into findPage without stableOrder (or removes the id desc injection) so the unstable branch is actually exercised — that would document why the option is required.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js` around
lines 326 - 383, The mock for models.Post.findPage always injects "id desc" when
options.stableOrder is true, and exporter.export currently always passes
stableOrder: true so the swapped-page (unstable) branch is never exercised; add
a sibling assertion that exercises the unstable path by calling the exporter (or
invoking findPage) without stableOrder: e.g. run a second collectStream(await
exporter.export(...)) call or temporarily override models.Post.findPage to
accept/see stableOrder: false so the hasStableOrder check is false and the
page-2 swap branch executes, then snapshot the requestedOrders/exportedIds for
that run to prove instability when stableOrder is not requested.
ghost/core/core/server/services/posts/posts-exporter.js (2)

19-35: 💤 Low value

parseExportLimit silently rewrites 0, negatives, and partially-numeric strings.

A few edge cases worth being aware of (not necessarily fixing):

  • parseExportLimit(0)1 and parseExportLimit(-5)1 because of Math.max(parsed, 1). A caller passing 0 thinking it means “unbounded” will silently get a single row.
  • Number.parseInt('15abc', 10) returns 15, so partially-numeric strings are accepted as their leading-numeric prefix rather than falling back to the default.

If the API contract is "be lenient, never error", the current behaviour is fine and consistent with the new tests. If you'd rather surface bad input, validating with Number.isFinite(Number(limit)) (or letting the input validator upstream reject it) would be more explicit. Flagging as optional.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/server/services/posts/posts-exporter.js` around lines 19 -
35, parseExportLimit currently accepts partially-numeric strings and uses
parseInt plus Math.max(parsed, 1), which silently coalesces inputs like "15abc",
0, and negatives; change parseExportLimit to validate strictly with
Number.isFinite(Number(limit)) and convert using Number(limit) (instead of
Number.parseInt) so "15abc" is rejected, then decide on the clamping behavior:
keep Math.max(parsed, 1) to preserve the current minimum-of-1 behavior or switch
to Math.max(parsed, 0) if you want to allow 0 as a valid value; reference
function parseExportLimit and DEFAULT_EXPORT_LIMIT when making this change.

104-137: 💤 Low value

One extra empty-page fetch when total rows are an exact multiple of pageLimit.

When requestedLimit === null (i.e. limit: 'all') and the underlying dataset size is an exact multiple of pageLimit, the loop won't break on the last non-empty page (since posts.data.length < pageLimit is false), so it issues one additional findPage call that returns an empty array before exiting via the data.length === 0 check on Line 119. Not a correctness bug, just one wasted DB roundtrip on the boundary. If you want to avoid it, you can track the previous page size or break when mapped.length is exactly the requested remainder. Otherwise, leave as-is — the cost is negligible and the code reads cleanly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/server/services/posts/posts-exporter.js` around lines 104 -
137, The loop can do one extra DB fetch when total rows == exact multiple of
pageLimit; fix by using the pagination metadata returned by findPage to detect
the last page and avoid the extra fetch: in `#streamPosts`, after receiving posts
from this.#models.Post.findPage, check posts.meta?.pagination?.pages (or
equivalent meta.pages) and if the current page >= that total pages, break the
while loop instead of relying solely on posts.data.length < pageLimit; this uses
the existing posts result (from `#streamPosts` and posts.data) to stop iterating
without an extra roundtrip.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@ghost/core/core/server/services/posts/posts-exporter.js`:
- Around line 19-35: parseExportLimit currently accepts partially-numeric
strings and uses parseInt plus Math.max(parsed, 1), which silently coalesces
inputs like "15abc", 0, and negatives; change parseExportLimit to validate
strictly with Number.isFinite(Number(limit)) and convert using Number(limit)
(instead of Number.parseInt) so "15abc" is rejected, then decide on the clamping
behavior: keep Math.max(parsed, 1) to preserve the current minimum-of-1 behavior
or switch to Math.max(parsed, 0) if you want to allow 0 as a valid value;
reference function parseExportLimit and DEFAULT_EXPORT_LIMIT when making this
change.
- Around line 104-137: The loop can do one extra DB fetch when total rows ==
exact multiple of pageLimit; fix by using the pagination metadata returned by
findPage to detect the last page and avoid the extra fetch: in `#streamPosts`,
after receiving posts from this.#models.Post.findPage, check
posts.meta?.pagination?.pages (or equivalent meta.pages) and if the current page
>= that total pages, break the while loop instead of relying solely on
posts.data.length < pageLimit; this uses the existing posts result (from
`#streamPosts` and posts.data) to stop iterating without an extra roundtrip.

In `@ghost/core/test/unit/server/models/post.test.js`:
- Around line 188-215: Add a new unit test next to the existing "adds a stable
order field..." test that calls models.Post.findPage with stableOrder: true but
with an order string that already includes the id field (e.g. order:
'published_at desc, id desc'); install the tracker and capture the generated SQL
and bindings, then assert that the generated query does not append a duplicate
`posts.id` ordering (SQL should contain only one `posts.id` DESC and the
bindings remain correct) and that result.meta is still {}—this will exercise the
hasStableOrderField branch in the Post.findPage logic.

In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js`:
- Around line 385-443: Extract the duplicated post factory and paginated model
mock into two helpers (e.g., buildPosts(count) that returns the Array.from(...)
posts using the existing post and email createModel shape, and
mockPaginatedPostModel(posts) that assigns models.Post = { findPage: async
(options) => { ... } } using the same pagination logic) and replace the repeated
blocks in the tests that call buildPosts(20) and mockPaginatedPostModel(posts)
before invoking exporter.export and collectStream; keep the helpers scoped
inside the describe so tests still reference post, exporter.export and
collectStream unchanged.
- Around line 326-383: The mock for models.Post.findPage always injects "id
desc" when options.stableOrder is true, and exporter.export currently always
passes stableOrder: true so the swapped-page (unstable) branch is never
exercised; add a sibling assertion that exercises the unstable path by calling
the exporter (or invoking findPage) without stableOrder: e.g. run a second
collectStream(await exporter.export(...)) call or temporarily override
models.Post.findPage to accept/see stableOrder: false so the hasStableOrder
check is false and the page-2 swap branch executes, then snapshot the
requestedOrders/exportedIds for that run to prove instability when stableOrder
is not requested.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 655f6412-f954-4b17-8245-c6d929773fd5

📥 Commits

Reviewing files that changed from the base of the PR and between c5066aa and 71e0faa.

⛔ Files ignored due to path filters (3)
  • ghost/core/test/e2e-api/admin/__snapshots__/post-analytics-export.test.js.snap is excluded by !**/*.snap
  • ghost/core/test/e2e-api/admin/__snapshots__/posts.test.js.snap is excluded by !**/*.snap
  • ghost/core/test/unit/server/services/posts/__snapshots__/posts-exporter.test.js.snap is excluded by !**/*.snap
📒 Files selected for processing (11)
  • ghost/core/core/server/api/endpoints/posts.js
  • ghost/core/core/server/api/endpoints/utils/serializers/output/posts-csv-transform.ts
  • ghost/core/core/server/api/endpoints/utils/serializers/output/posts.js
  • ghost/core/core/server/models/base/plugins/crud.js
  • ghost/core/core/server/models/base/plugins/sanitize.js
  • ghost/core/core/server/services/posts/posts-exporter.js
  • ghost/core/test/e2e-api/admin/post-analytics-export.test.js
  • ghost/core/test/e2e-api/admin/posts.test.js
  • ghost/core/test/unit/api/canary/utils/serializers/output/posts-export-csv.test.ts
  • ghost/core/test/unit/server/models/post.test.js
  • ghost/core/test/unit/server/services/posts/posts-exporter.test.js
💤 Files with no reviewable changes (1)
  • ghost/core/test/e2e-api/admin/posts.test.js

@cmraible cmraible requested a review from EvanHahn May 8, 2026 00:26
@cmraible cmraible changed the title 🐛 Fixed post analytics export for large datasets timing out Rewrote post analytics CSV export endpoint to stream results May 8, 2026
EvanHahn added a commit that referenced this pull request May 11, 2026
ref #27752

_This change should have no user impact._

We use `papaparse` in a few places. Let's install types for it. Luckily,
there were no type errors!
EvanHahn added a commit that referenced this pull request May 11, 2026
ref #27752

_This change should have no user impact._

We use `papaparse` in a few places. Let's install types for it. Luckily,
there were no type errors!

@EvanHahn EvanHahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was great to review because

  • It was split up into easy-to-review chunks.
  • Streaming always makes me happy.
  • I learned a few things (got more practice with transform streams, learned about the performance of yield* versus yield, the *# syntax for a private generator).

Lots of little nitpicks, but looks good overall.

Comment thread ghost/core/core/server/services/posts/posts-exporter.js Outdated
Comment thread ghost/core/test/unit/server/services/posts/posts-exporter.test.js Outdated
Comment thread ghost/core/test/unit/server/services/posts/posts-exporter.test.js Outdated
Comment thread ghost/core/core/server/models/base/plugins/crud.js Outdated
Comment thread ghost/core/core/server/models/base/plugins/crud.js Outdated
@cmraible

Copy link
Copy Markdown
Collaborator Author

@EvanHahn thanks for the thorough review, lots of good stuff here! Planning to address these items tomorrow 👍

@cmraible cmraible force-pushed the chris-onc-1566-post-analytics-export-not-working-for-tangle branch from 71e0faa to ffef8fc Compare May 14, 2026 19:57
@EvanHahn

Copy link
Copy Markdown
Contributor

@cmraible Lemme know when you want me to take another look. (I think you're still working on this.)

@cmraible cmraible requested a review from EvanHahn May 14, 2026 23:21
@cmraible

cmraible commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

@EvanHahn thanks! I'm pretty sure I've addressed everything; ready for another look whenever you have time!

@EvanHahn EvanHahn left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few small nits/questions. Thanks for responding to all my comments!

Comment thread ghost/core/core/server/models/base/plugins/sanitize.js Outdated
Comment thread ghost/core/core/server/services/posts/posts-exporter.js
cmraible added 13 commits May 19, 2026 09:32
Introduces createCSVTransform(), an object-mode Transform stream that
emits header + first row on the first chunk and data-only rows
afterwards. Lets later commits stream post analytics directly to the
HTTP response instead of buffering the whole CSV in memory.
Marks the exportCSV endpoint as a stream so the framework defers body
serialization to the controller, and rewrites the serializer to pipe
models.data through the new CSV transform straight to res with the
appropriate download headers and pipeline error forwarding.

Note: this commit alone leaves the export endpoint non-functional. The
serializer now expects models.data to be a Readable, but the exporter
still returns a buffered array — the streaming exporter follows in the
next commit. The split is for review clarity; these will be squashed
before merging.
Pure refactor in preparation for streaming. Lifts EXPORT_WITH_RELATED
to a module constant, extracts the per-export setup work into
#getExportContext (with the three findAll calls parallelized via
Promise.all), and extracts the per-row mapping into #mapPosts.

No behavior change — export() still returns an array of mapped post
objects. All existing tests pass unchanged.
PostsExporter.export() now returns a Readable that yields posts
batch-by-batch via findPage with skipPagination, instead of loading
every post into memory at once. Adds a parseExportLimit helper so
'all' means unlimited and bad limits fall back to the default of 15.

Existing exporter tests are wrapped with collectStream(); new tests
cover multi-batch streaming and default-limit handling.
Adds an opt-in stableOrder option to findPage that appends
`<table>.id desc` as a tiebreaker when the requested order does not
already include id. The streaming exporter passes stableOrder: true so
that custom-ordered exports (e.g. 'published_at desc') don't duplicate
or skip rows when paginating across batches.

Covered by a model-level findPage test and an exporter-level snapshot
test that simulates an unstable tie order between pages.
Streamed responses no longer have content-length or a body-derived
etag, gain no-transform on Cache-Control (so proxies don't buffer or
recompress), and drop Accept-Encoding from Vary. Snapshots and the
header matchers in posts.test.js / post-analytics-export.test.js are
updated to match.
We're using the default value here, so we don't need to include this
option explicitly
cmraible added 14 commits May 19, 2026 09:32
One for the CSVTransform itself, one for the serializer
Replaced a local stream collector with the platform async iterable API so these tests no longer need a bespoke helper.

Kept this as a separate test-only cleanup from the exporter behavior commits.
Removed a leftover permitted option from the model sanitizer that is no longer needed after moving the ordering logic into the posts exporter.
Cleaned up the posts CSV transform after review because the explicit newline option was only set for subsequent rows and was not needed.
@cmraible cmraible force-pushed the chris-onc-1566-post-analytics-export-not-working-for-tangle branch from 3aa4dc1 to a87eaf8 Compare May 19, 2026 16:32
@cmraible cmraible merged commit 0be9f54 into main May 19, 2026
122 of 127 checks passed
@cmraible cmraible deleted the chris-onc-1566-post-analytics-export-not-working-for-tangle branch May 19, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants