Rewrote post analytics CSV export endpoint to stream results by cmraible · Pull Request #27752 · TryGhost/Ghost

cmraible · 2026-05-07T00:26:50Z

fixes https://linear.app/tryghost/issue/ONC-1566/

I recommend reviewing this PR one commit at a time.

Rewrites Ghost's post analytics CSV export to stream results to the client instead of buffering the entire CSV in memory before sending. This should fix the export endpoint timing out for sites with large post catalogs before returning even a single byte.

I've tested this manually with a large dataset, and the resulting CSV export is byte-for-byte identical with the current behavior on main. Even if this doesn't fix the timeouts, it is a step in the right direction to stream the CSV rather than buffering it all in memory.

coderabbitai · 2026-05-07T00:27:00Z

Walkthrough

This PR converts the post analytics CSV export from a synchronous, single-page fetch to a streaming, paginated approach. A new createCSVTransform() Node.js stream transforms objects to CSV incrementally. The PostsExporter is refactored to use an async generator that pages through posts with stable ordering, extracting export context once and mapping each batch before yielding. The API serializer wires the exporter and transform together via stream.pipeline, setting appropriate response headers and forwarding errors. Stable ordering support is added to findPage to ensure deterministic pagination across batches. All existing tests are updated to work with the streaming API, and new tests validate transform behavior, pagination stability, and error handling.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the purpose of the changes (streaming CSV export instead of buffering), references the related issue, notes testing verification, and provides context for reviewers.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title 'Rewrote post analytics CSV export endpoint to stream results' clearly and concisely describes the main change across the PR: converting the export endpoint from buffering the entire CSV in memory to streaming results to the client.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chris-onc-1566-post-analytics-export-not-working-for-tangle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (5)

ghost/core/test/unit/server/models/post.test.js (1)
188-215: ⚡ Quick win

Add a companion test for the “already stable” order case.

Please add a test where order already contains id (for example published_at desc, id desc) with stableOrder: true, and assert no duplicate id ordering is appended. This protects the hasStableOrderField branch against regressions.
Suggested test shape
+        it('does not append duplicate stable order field when id is already present', function () {
+            const queries = [];
+            tracker.install();
+
+            tracker.on('query', (query) => {
+                queries.push(query);
+                query.response([]);
+            });
+
+            return models.Post.findPage({
+                filter: 'published_at:>\'2015-07-20\'',
+                order: 'published_at desc, id desc',
+                limit: 1,
+                skipPagination: true,
+                stableOrder: true
+            }).then(() => {
+                assert.equal(queries.length, 1);
+                assert.equal(
+                    queries[0].sql,
+                    'select `posts`.* from `posts` where (`posts`.`published_at` > ? and (`posts`.`type` = ? and `posts`.`status` = ?)) order by `posts`.`published_at` DESC, `posts`.`id` DESC limit ?'
+                );
+            });
+        });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/test/unit/server/models/post.test.js` around lines 188 - 215, Add
a new unit test next to the existing "adds a stable order field..." test that
calls models.Post.findPage with stableOrder: true but with an order string that
already includes the id field (e.g. order: 'published_at desc, id desc');
install the tracker and capture the generated SQL and bindings, then assert that
the generated query does not append a duplicate `posts.id` ordering (SQL should
contain only one `posts.id` DESC and the bindings remain correct) and that
result.meta is still {}—this will exercise the hasStableOrderField branch in the
Post.findPage logic.
ghost/core/test/unit/server/services/posts/posts-exporter.test.js (2)
385-443: 💤 Low value

Default- and invalid-limit fallback coverage looks good.

Both tests correctly assert the 15-row default and the 'invalid' → default fallback. Minor optional cleanup: the three new pagination tests duplicate the Array.from({length: N}, ...) post-factory and the models.Post = { findPage: ... } mock; extracting a small buildPosts(n) / mockPaginatedPostModel(posts) helper inside this describe would shrink ~60 lines and make future additions cheaper. Not blocking.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js` around
lines 385 - 443, Extract the duplicated post factory and paginated model mock
into two helpers (e.g., buildPosts(count) that returns the Array.from(...) posts
using the existing post and email createModel shape, and
mockPaginatedPostModel(posts) that assigns models.Post = { findPage: async
(options) => { ... } } using the same pagination logic) and replace the repeated
blocks in the tests that call buildPosts(20) and mockPaginatedPostModel(posts)
before invoking exporter.export and collectStream; keep the helpers scoped
inside the describe so tests still reference post, exporter.export and
collectStream unchanged.
326-383: 💤 Low value

Snapshot test only exercises the “stable order requested” path.

The mock unconditionally appends id desc whenever options.stableOrder is set, and the exporter always passes stableOrder: true, so hasStableOrder is true on every call and the deliberately-swapped page-2 branch (Lines 351‑358) is never executed. The test still meaningfully snapshots the requested orders and confirms no duplicates/missing IDs, but it does not actually demonstrate that without stableOrder the export would be unstable. Consider adding a sibling case that calls into findPage without stableOrder (or removes the id desc injection) so the unstable branch is actually exercised — that would document why the option is required.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js` around
lines 326 - 383, The mock for models.Post.findPage always injects "id desc" when
options.stableOrder is true, and exporter.export currently always passes
stableOrder: true so the swapped-page (unstable) branch is never exercised; add
a sibling assertion that exercises the unstable path by calling the exporter (or
invoking findPage) without stableOrder: e.g. run a second collectStream(await
exporter.export(...)) call or temporarily override models.Post.findPage to
accept/see stableOrder: false so the hasStableOrder check is false and the
page-2 swap branch executes, then snapshot the requestedOrders/exportedIds for
that run to prove instability when stableOrder is not requested.
ghost/core/core/server/services/posts/posts-exporter.js (2)
19-35: 💤 Low value

parseExportLimit silently rewrites 0, negatives, and partially-numeric strings.

A few edge cases worth being aware of (not necessarily fixing):

parseExportLimit(0) → 1 and parseExportLimit(-5) → 1 because of Math.max(parsed, 1). A caller passing 0 thinking it means “unbounded” will silently get a single row.

Number.parseInt('15abc', 10) returns 15, so partially-numeric strings are accepted as their leading-numeric prefix rather than falling back to the default.

If the API contract is "be lenient, never error", the current behaviour is fine and consistent with the new tests. If you'd rather surface bad input, validating with Number.isFinite(Number(limit)) (or letting the input validator upstream reject it) would be more explicit. Flagging as optional.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/server/services/posts/posts-exporter.js` around lines 19 -
35, parseExportLimit currently accepts partially-numeric strings and uses
parseInt plus Math.max(parsed, 1), which silently coalesces inputs like "15abc",
0, and negatives; change parseExportLimit to validate strictly with
Number.isFinite(Number(limit)) and convert using Number(limit) (instead of
Number.parseInt) so "15abc" is rejected, then decide on the clamping behavior:
keep Math.max(parsed, 1) to preserve the current minimum-of-1 behavior or switch
to Math.max(parsed, 0) if you want to allow 0 as a valid value; reference
function parseExportLimit and DEFAULT_EXPORT_LIMIT when making this change.
104-137: 💤 Low value

One extra empty-page fetch when total rows are an exact multiple of pageLimit.

When requestedLimit === null (i.e. limit: 'all') and the underlying dataset size is an exact multiple of pageLimit, the loop won't break on the last non-empty page (since posts.data.length < pageLimit is false), so it issues one additional findPage call that returns an empty array before exiting via the data.length === 0 check on Line 119. Not a correctness bug, just one wasted DB roundtrip on the boundary. If you want to avoid it, you can track the previous page size or break when mapped.length is exactly the requested remainder. Otherwise, leave as-is — the cost is negligible and the code reads cleanly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ghost/core/core/server/services/posts/posts-exporter.js` around lines 104 -
137, The loop can do one extra DB fetch when total rows == exact multiple of
pageLimit; fix by using the pagination metadata returned by findPage to detect
the last page and avoid the extra fetch: in `#streamPosts`, after receiving posts
from this.#models.Post.findPage, check posts.meta?.pagination?.pages (or
equivalent meta.pages) and if the current page >= that total pages, break the
while loop instead of relying solely on posts.data.length < pageLimit; this uses
the existing posts result (from `#streamPosts` and posts.data) to stop iterating
without an extra roundtrip.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@ghost/core/core/server/services/posts/posts-exporter.js`:
- Around line 19-35: parseExportLimit currently accepts partially-numeric
strings and uses parseInt plus Math.max(parsed, 1), which silently coalesces
inputs like "15abc", 0, and negatives; change parseExportLimit to validate
strictly with Number.isFinite(Number(limit)) and convert using Number(limit)
(instead of Number.parseInt) so "15abc" is rejected, then decide on the clamping
behavior: keep Math.max(parsed, 1) to preserve the current minimum-of-1 behavior
or switch to Math.max(parsed, 0) if you want to allow 0 as a valid value;
reference function parseExportLimit and DEFAULT_EXPORT_LIMIT when making this
change.
- Around line 104-137: The loop can do one extra DB fetch when total rows ==
exact multiple of pageLimit; fix by using the pagination metadata returned by
findPage to detect the last page and avoid the extra fetch: in `#streamPosts`,
after receiving posts from this.#models.Post.findPage, check
posts.meta?.pagination?.pages (or equivalent meta.pages) and if the current page
>= that total pages, break the while loop instead of relying solely on
posts.data.length < pageLimit; this uses the existing posts result (from
`#streamPosts` and posts.data) to stop iterating without an extra roundtrip.

In `@ghost/core/test/unit/server/models/post.test.js`:
- Around line 188-215: Add a new unit test next to the existing "adds a stable
order field..." test that calls models.Post.findPage with stableOrder: true but
with an order string that already includes the id field (e.g. order:
'published_at desc, id desc'); install the tracker and capture the generated SQL
and bindings, then assert that the generated query does not append a duplicate
`posts.id` ordering (SQL should contain only one `posts.id` DESC and the
bindings remain correct) and that result.meta is still {}—this will exercise the
hasStableOrderField branch in the Post.findPage logic.

In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js`:
- Around line 385-443: Extract the duplicated post factory and paginated model
mock into two helpers (e.g., buildPosts(count) that returns the Array.from(...)
posts using the existing post and email createModel shape, and
mockPaginatedPostModel(posts) that assigns models.Post = { findPage: async
(options) => { ... } } using the same pagination logic) and replace the repeated
blocks in the tests that call buildPosts(20) and mockPaginatedPostModel(posts)
before invoking exporter.export and collectStream; keep the helpers scoped
inside the describe so tests still reference post, exporter.export and
collectStream unchanged.
- Around line 326-383: The mock for models.Post.findPage always injects "id
desc" when options.stableOrder is true, and exporter.export currently always
passes stableOrder: true so the swapped-page (unstable) branch is never
exercised; add a sibling assertion that exercises the unstable path by calling
the exporter (or invoking findPage) without stableOrder: e.g. run a second
collectStream(await exporter.export(...)) call or temporarily override
models.Post.findPage to accept/see stableOrder: false so the hasStableOrder
check is false and the page-2 swap branch executes, then snapshot the
requestedOrders/exportedIds for that run to prove instability when stableOrder
is not requested.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 655f6412-f954-4b17-8245-c6d929773fd5

📥 Commits

Reviewing files that changed from the base of the PR and between c5066aa and 71e0faa.

⛔ Files ignored due to path filters (3)

ghost/core/test/e2e-api/admin/__snapshots__/post-analytics-export.test.js.snap is excluded by !**/*.snap
ghost/core/test/e2e-api/admin/__snapshots__/posts.test.js.snap is excluded by !**/*.snap
ghost/core/test/unit/server/services/posts/__snapshots__/posts-exporter.test.js.snap is excluded by !**/*.snap

📒 Files selected for processing (11)

ghost/core/core/server/api/endpoints/posts.js
ghost/core/core/server/api/endpoints/utils/serializers/output/posts-csv-transform.ts
ghost/core/core/server/api/endpoints/utils/serializers/output/posts.js
ghost/core/core/server/models/base/plugins/crud.js
ghost/core/core/server/models/base/plugins/sanitize.js
ghost/core/core/server/services/posts/posts-exporter.js
ghost/core/test/e2e-api/admin/post-analytics-export.test.js
ghost/core/test/e2e-api/admin/posts.test.js
ghost/core/test/unit/api/canary/utils/serializers/output/posts-export-csv.test.ts
ghost/core/test/unit/server/models/post.test.js
ghost/core/test/unit/server/services/posts/posts-exporter.test.js

💤 Files with no reviewable changes (1)

ghost/core/test/e2e-api/admin/posts.test.js

ref #27752 _This change should have no user impact._ We use `papaparse` in a few places. Let's install types for it. Luckily, there were no type errors!

EvanHahn

This was great to review because

It was split up into easy-to-review chunks.
Streaming always makes me happy.
I learned a few things (got more practice with transform streams, learned about the performance of yield* versus yield, the *# syntax for a private generator).

Lots of little nitpicks, but looks good overall.

cmraible · 2026-05-12T01:58:41Z

@EvanHahn thanks for the thorough review, lots of good stuff here! Planning to address these items tomorrow 👍

EvanHahn · 2026-05-14T23:02:44Z

@cmraible Lemme know when you want me to take another look. (I think you're still working on this.)

cmraible · 2026-05-14T23:22:06Z

@EvanHahn thanks! I'm pretty sure I've addressed everything; ready for another look whenever you have time!

EvanHahn

LGTM with a few small nits/questions. Thanks for responding to all my comments!

Introduces createCSVTransform(), an object-mode Transform stream that emits header + first row on the first chunk and data-only rows afterwards. Lets later commits stream post analytics directly to the HTTP response instead of buffering the whole CSV in memory.

Marks the exportCSV endpoint as a stream so the framework defers body serialization to the controller, and rewrites the serializer to pipe models.data through the new CSV transform straight to res with the appropriate download headers and pipeline error forwarding. Note: this commit alone leaves the export endpoint non-functional. The serializer now expects models.data to be a Readable, but the exporter still returns a buffered array — the streaming exporter follows in the next commit. The split is for review clarity; these will be squashed before merging.

Pure refactor in preparation for streaming. Lifts EXPORT_WITH_RELATED to a module constant, extracts the per-export setup work into #getExportContext (with the three findAll calls parallelized via Promise.all), and extracts the per-row mapping into #mapPosts. No behavior change — export() still returns an array of mapped post objects. All existing tests pass unchanged.

PostsExporter.export() now returns a Readable that yields posts batch-by-batch via findPage with skipPagination, instead of loading every post into memory at once. Adds a parseExportLimit helper so 'all' means unlimited and bad limits fall back to the default of 15. Existing exporter tests are wrapped with collectStream(); new tests cover multi-batch streaming and default-limit handling.

Adds an opt-in stableOrder option to findPage that appends `<table>.id desc` as a tiebreaker when the requested order does not already include id. The streaming exporter passes stableOrder: true so that custom-ordered exports (e.g. 'published_at desc') don't duplicate or skip rows when paginating across batches. Covered by a model-level findPage test and an exporter-level snapshot test that simulates an unstable tie order between pages.

Streamed responses no longer have content-length or a body-derived etag, gain no-transform on Cache-Control (so proxies don't buffer or recompress), and drop Accept-Encoding from Vary. Snapshots and the header matchers in posts.test.js / post-analytics-export.test.js are updated to match.

We're using the default value here, so we don't need to include this option explicitly

…anged `stream` to `node:stream`

…ith `await text(csvTransform)`

One for the CSVTransform itself, one for the serializer

already present

unnecessary null checks

Replaced a local stream collector with the platform async iterable API so these tests no longer need a bespoke helper. Kept this as a separate test-only cleanup from the exporter behavior commits.

…rter directly

Removed a leftover permitted option from the model sanitizer that is no longer needed after moving the ordering logic into the posts exporter.

Cleaned up the posts CSV transform after review because the explicit newline option was only set for subsequent rows and was not needed.

cmraible changed the title ~~🐛 Fixed post analytics export for large datasets~~ 🐛 Fixed post analytics export for large datasets timing out May 7, 2026

cmraible force-pushed the chris-onc-1566-post-analytics-export-not-working-for-tangle branch from 520d822 to 71e0faa Compare May 7, 2026 23:52

cmraible marked this pull request as ready for review May 8, 2026 00:17

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

cmraible requested a review from EvanHahn May 8, 2026 00:26

cmraible changed the title ~~🐛 Fixed post analytics export for large datasets timing out~~ Rewrote post analytics CSV export endpoint to stream results May 8, 2026

EvanHahn mentioned this pull request May 11, 2026

Added types for Papaparse #27810

Merged

EvanHahn added a commit that referenced this pull request May 11, 2026

Added types for Papaparse

1aef65b

ref #27752 _This change should have no user impact._ We use `papaparse` in a few places. Let's install types for it. Luckily, there were no type errors!

EvanHahn added a commit that referenced this pull request May 11, 2026

Added types for Papaparse (#27810)

3f4b9ad

ref #27752 _This change should have no user impact._ We use `papaparse` in a few places. Let's install types for it. Luckily, there were no type errors!

EvanHahn requested changes May 11, 2026

View reviewed changes

cmraible force-pushed the chris-onc-1566-post-analytics-export-not-working-for-tangle branch from 71e0faa to ffef8fc Compare May 14, 2026 19:57

cmraible requested a review from EvanHahn May 14, 2026 23:21

EvanHahn approved these changes May 18, 2026

View reviewed changes

Comment thread ghost/core/core/server/api/endpoints/utils/serializers/output/posts-csv-transform.ts Outdated

Comment thread ghost/core/core/server/models/base/plugins/sanitize.js Outdated

Comment thread ghost/core/core/server/services/posts/posts-exporter.js

cmraible added 13 commits May 19, 2026 09:32

Removed unnecessary comment in transform

64f2f5f

Refactored createCSVTransform: remove isFirstChunk variable

9793380

Reworded createCSVTransform comment; made it a docstring

4c02aed

Removed unnecessary newline option in papaparse.unparse() call

88f8020

We're using the default value here, so we don't need to include this option explicitly

Changed _encoding type from BufferEncoding to unknown

99d2292

Changed papaparse require to an import and removed eslint-disable; ch…

3e19d68

…anged `stream` to `node:stream`

Eliminated cast of err as Error in createCSVTransform

5009287

cmraible added 14 commits May 19, 2026 09:32

Refactored unit tests to eliminate done-style callbacks; simplified w…

7c78e0b

…ith `await text(csvTransform)`

Split posts exporter tests into two separate files

8d2b835

One for the CSVTransform itself, one for the serializer

Added happy path test for posts csv export serializer

ac3c6d9

Renamed "datetime" variable to "todayIsoDate"

3b8bcd9

Fixed cache-control potentially duplicating 'no-transform' directive if

7bf7766

already present

Changed Number.isNaN() to !Number.isSafeInteger()

65a2f02

Shortcutted extra work in #mapPosts

ea5d2a4

Changed null to Infinity when parsing the limit option; removed

79764e0

unnecessary null checks

Swapped for loop for yield* delegation syntax

9d53a26

Changed posts exporter tests to use Array.fromAsync

631653a

Replaced a local stream collector with the platform async iterable API so these tests no longer need a bespoke helper. Kept this as a separate test-only cleanup from the exporter behavior commits.

Changed instanceof assertion to Readable.isReadable(stream)

e6f6c80

Removed stableOrder option from findPage; moved logic into posts expo…

cc3c5ec

…rter directly

Reverted stableOrder sanitizer option

adbe7be

Removed a leftover permitted option from the model sanitizer that is no longer needed after moving the ordering logic into the posts exporter.

Removed redundant CSV newline option

a87eaf8

Cleaned up the posts CSV transform after review because the explicit newline option was only set for subsequent rows and was not needed.

cmraible force-pushed the chris-onc-1566-post-analytics-export-not-working-for-tangle branch from 3aa4dc1 to a87eaf8 Compare May 19, 2026 16:32

cmraible merged commit 0be9f54 into main May 19, 2026
122 of 127 checks passed

cmraible deleted the chris-onc-1566-post-analytics-export-not-working-for-tangle branch May 19, 2026 19:16

This was referenced Jun 2, 2026

Improved analytics CSV exports to include the site name #28308

Merged

Moved CSV export filename ownership to the API #28395

Merged

Uh oh!

Conversation

cmraible commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

EvanHahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cmraible commented May 12, 2026

Uh oh!

EvanHahn commented May 14, 2026

Uh oh!

cmraible commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EvanHahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cmraible commented May 7, 2026 •

edited

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading

cmraible commented May 14, 2026 •

edited

Loading