Rewrote post analytics CSV export endpoint to stream results#27752
Conversation
WalkthroughThis PR converts the post analytics CSV export from a synchronous, single-page fetch to a streaming, paginated approach. A new 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
520d822 to
71e0faa
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (5)
ghost/core/test/unit/server/models/post.test.js (1)
188-215: ⚡ Quick winAdd a companion test for the “already stable” order case.
Please add a test where
orderalready containsid(for examplepublished_at desc, id desc) withstableOrder: true, and assert no duplicateidordering is appended. This protects thehasStableOrderFieldbranch against regressions.Suggested test shape
+ it('does not append duplicate stable order field when id is already present', function () { + const queries = []; + tracker.install(); + + tracker.on('query', (query) => { + queries.push(query); + query.response([]); + }); + + return models.Post.findPage({ + filter: 'published_at:>\'2015-07-20\'', + order: 'published_at desc, id desc', + limit: 1, + skipPagination: true, + stableOrder: true + }).then(() => { + assert.equal(queries.length, 1); + assert.equal( + queries[0].sql, + 'select `posts`.* from `posts` where (`posts`.`published_at` > ? and (`posts`.`type` = ? and `posts`.`status` = ?)) order by `posts`.`published_at` DESC, `posts`.`id` DESC limit ?' + ); + }); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/test/unit/server/models/post.test.js` around lines 188 - 215, Add a new unit test next to the existing "adds a stable order field..." test that calls models.Post.findPage with stableOrder: true but with an order string that already includes the id field (e.g. order: 'published_at desc, id desc'); install the tracker and capture the generated SQL and bindings, then assert that the generated query does not append a duplicate `posts.id` ordering (SQL should contain only one `posts.id` DESC and the bindings remain correct) and that result.meta is still {}—this will exercise the hasStableOrderField branch in the Post.findPage logic.ghost/core/test/unit/server/services/posts/posts-exporter.test.js (2)
385-443: 💤 Low valueDefault- and invalid-limit fallback coverage looks good.
Both tests correctly assert the 15-row default and the
'invalid'→ default fallback. Minor optional cleanup: the three new pagination tests duplicate theArray.from({length: N}, ...)post-factory and themodels.Post = { findPage: ... }mock; extracting a smallbuildPosts(n)/mockPaginatedPostModel(posts)helper inside thisdescribewould shrink ~60 lines and make future additions cheaper. Not blocking.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js` around lines 385 - 443, Extract the duplicated post factory and paginated model mock into two helpers (e.g., buildPosts(count) that returns the Array.from(...) posts using the existing post and email createModel shape, and mockPaginatedPostModel(posts) that assigns models.Post = { findPage: async (options) => { ... } } using the same pagination logic) and replace the repeated blocks in the tests that call buildPosts(20) and mockPaginatedPostModel(posts) before invoking exporter.export and collectStream; keep the helpers scoped inside the describe so tests still reference post, exporter.export and collectStream unchanged.
326-383: 💤 Low valueSnapshot test only exercises the “stable order requested” path.
The mock unconditionally appends
id descwheneveroptions.stableOrderis set, and the exporter always passesstableOrder: true, sohasStableOrderis true on every call and the deliberately-swapped page-2 branch (Lines 351‑358) is never executed. The test still meaningfully snapshots the requested orders and confirms no duplicates/missing IDs, but it does not actually demonstrate that without stableOrder the export would be unstable. Consider adding a sibling case that calls intofindPagewithoutstableOrder(or removes theid descinjection) so the unstable branch is actually exercised — that would document why the option is required.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js` around lines 326 - 383, The mock for models.Post.findPage always injects "id desc" when options.stableOrder is true, and exporter.export currently always passes stableOrder: true so the swapped-page (unstable) branch is never exercised; add a sibling assertion that exercises the unstable path by calling the exporter (or invoking findPage) without stableOrder: e.g. run a second collectStream(await exporter.export(...)) call or temporarily override models.Post.findPage to accept/see stableOrder: false so the hasStableOrder check is false and the page-2 swap branch executes, then snapshot the requestedOrders/exportedIds for that run to prove instability when stableOrder is not requested.ghost/core/core/server/services/posts/posts-exporter.js (2)
19-35: 💤 Low value
parseExportLimitsilently rewrites0, negatives, and partially-numeric strings.A few edge cases worth being aware of (not necessarily fixing):
parseExportLimit(0)→1andparseExportLimit(-5)→1because ofMath.max(parsed, 1). A caller passing0thinking it means “unbounded” will silently get a single row.Number.parseInt('15abc', 10)returns15, so partially-numeric strings are accepted as their leading-numeric prefix rather than falling back to the default.If the API contract is "be lenient, never error", the current behaviour is fine and consistent with the new tests. If you'd rather surface bad input, validating with
Number.isFinite(Number(limit))(or letting the input validator upstream reject it) would be more explicit. Flagging as optional.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/core/server/services/posts/posts-exporter.js` around lines 19 - 35, parseExportLimit currently accepts partially-numeric strings and uses parseInt plus Math.max(parsed, 1), which silently coalesces inputs like "15abc", 0, and negatives; change parseExportLimit to validate strictly with Number.isFinite(Number(limit)) and convert using Number(limit) (instead of Number.parseInt) so "15abc" is rejected, then decide on the clamping behavior: keep Math.max(parsed, 1) to preserve the current minimum-of-1 behavior or switch to Math.max(parsed, 0) if you want to allow 0 as a valid value; reference function parseExportLimit and DEFAULT_EXPORT_LIMIT when making this change.
104-137: 💤 Low valueOne extra empty-page fetch when total rows are an exact multiple of
pageLimit.When
requestedLimit === null(i.e.limit: 'all') and the underlying dataset size is an exact multiple ofpageLimit, the loop won't break on the last non-empty page (sinceposts.data.length < pageLimitis false), so it issues one additionalfindPagecall that returns an empty array before exiting via thedata.length === 0check on Line 119. Not a correctness bug, just one wasted DB roundtrip on the boundary. If you want to avoid it, you can track the previous page size or break whenmapped.lengthis exactly the requested remainder. Otherwise, leave as-is — the cost is negligible and the code reads cleanly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ghost/core/core/server/services/posts/posts-exporter.js` around lines 104 - 137, The loop can do one extra DB fetch when total rows == exact multiple of pageLimit; fix by using the pagination metadata returned by findPage to detect the last page and avoid the extra fetch: in `#streamPosts`, after receiving posts from this.#models.Post.findPage, check posts.meta?.pagination?.pages (or equivalent meta.pages) and if the current page >= that total pages, break the while loop instead of relying solely on posts.data.length < pageLimit; this uses the existing posts result (from `#streamPosts` and posts.data) to stop iterating without an extra roundtrip.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@ghost/core/core/server/services/posts/posts-exporter.js`:
- Around line 19-35: parseExportLimit currently accepts partially-numeric
strings and uses parseInt plus Math.max(parsed, 1), which silently coalesces
inputs like "15abc", 0, and negatives; change parseExportLimit to validate
strictly with Number.isFinite(Number(limit)) and convert using Number(limit)
(instead of Number.parseInt) so "15abc" is rejected, then decide on the clamping
behavior: keep Math.max(parsed, 1) to preserve the current minimum-of-1 behavior
or switch to Math.max(parsed, 0) if you want to allow 0 as a valid value;
reference function parseExportLimit and DEFAULT_EXPORT_LIMIT when making this
change.
- Around line 104-137: The loop can do one extra DB fetch when total rows ==
exact multiple of pageLimit; fix by using the pagination metadata returned by
findPage to detect the last page and avoid the extra fetch: in `#streamPosts`,
after receiving posts from this.#models.Post.findPage, check
posts.meta?.pagination?.pages (or equivalent meta.pages) and if the current page
>= that total pages, break the while loop instead of relying solely on
posts.data.length < pageLimit; this uses the existing posts result (from
`#streamPosts` and posts.data) to stop iterating without an extra roundtrip.
In `@ghost/core/test/unit/server/models/post.test.js`:
- Around line 188-215: Add a new unit test next to the existing "adds a stable
order field..." test that calls models.Post.findPage with stableOrder: true but
with an order string that already includes the id field (e.g. order:
'published_at desc, id desc'); install the tracker and capture the generated SQL
and bindings, then assert that the generated query does not append a duplicate
`posts.id` ordering (SQL should contain only one `posts.id` DESC and the
bindings remain correct) and that result.meta is still {}—this will exercise the
hasStableOrderField branch in the Post.findPage logic.
In `@ghost/core/test/unit/server/services/posts/posts-exporter.test.js`:
- Around line 385-443: Extract the duplicated post factory and paginated model
mock into two helpers (e.g., buildPosts(count) that returns the Array.from(...)
posts using the existing post and email createModel shape, and
mockPaginatedPostModel(posts) that assigns models.Post = { findPage: async
(options) => { ... } } using the same pagination logic) and replace the repeated
blocks in the tests that call buildPosts(20) and mockPaginatedPostModel(posts)
before invoking exporter.export and collectStream; keep the helpers scoped
inside the describe so tests still reference post, exporter.export and
collectStream unchanged.
- Around line 326-383: The mock for models.Post.findPage always injects "id
desc" when options.stableOrder is true, and exporter.export currently always
passes stableOrder: true so the swapped-page (unstable) branch is never
exercised; add a sibling assertion that exercises the unstable path by calling
the exporter (or invoking findPage) without stableOrder: e.g. run a second
collectStream(await exporter.export(...)) call or temporarily override
models.Post.findPage to accept/see stableOrder: false so the hasStableOrder
check is false and the page-2 swap branch executes, then snapshot the
requestedOrders/exportedIds for that run to prove instability when stableOrder
is not requested.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 655f6412-f954-4b17-8245-c6d929773fd5
⛔ Files ignored due to path filters (3)
ghost/core/test/e2e-api/admin/__snapshots__/post-analytics-export.test.js.snapis excluded by!**/*.snapghost/core/test/e2e-api/admin/__snapshots__/posts.test.js.snapis excluded by!**/*.snapghost/core/test/unit/server/services/posts/__snapshots__/posts-exporter.test.js.snapis excluded by!**/*.snap
📒 Files selected for processing (11)
ghost/core/core/server/api/endpoints/posts.jsghost/core/core/server/api/endpoints/utils/serializers/output/posts-csv-transform.tsghost/core/core/server/api/endpoints/utils/serializers/output/posts.jsghost/core/core/server/models/base/plugins/crud.jsghost/core/core/server/models/base/plugins/sanitize.jsghost/core/core/server/services/posts/posts-exporter.jsghost/core/test/e2e-api/admin/post-analytics-export.test.jsghost/core/test/e2e-api/admin/posts.test.jsghost/core/test/unit/api/canary/utils/serializers/output/posts-export-csv.test.tsghost/core/test/unit/server/models/post.test.jsghost/core/test/unit/server/services/posts/posts-exporter.test.js
💤 Files with no reviewable changes (1)
- ghost/core/test/e2e-api/admin/posts.test.js
ref #27752 _This change should have no user impact._ We use `papaparse` in a few places. Let's install types for it. Luckily, there were no type errors!
ref #27752 _This change should have no user impact._ We use `papaparse` in a few places. Let's install types for it. Luckily, there were no type errors!
EvanHahn
left a comment
There was a problem hiding this comment.
This was great to review because
- It was split up into easy-to-review chunks.
- Streaming always makes me happy.
- I learned a few things (got more practice with transform streams, learned about the performance of
yield*versusyield, the*#syntax for a private generator).
Lots of little nitpicks, but looks good overall.
|
@EvanHahn thanks for the thorough review, lots of good stuff here! Planning to address these items tomorrow 👍 |
71e0faa to
ffef8fc
Compare
|
@cmraible Lemme know when you want me to take another look. (I think you're still working on this.) |
|
@EvanHahn thanks! I'm pretty sure I've addressed everything; ready for another look whenever you have time! |
EvanHahn
left a comment
There was a problem hiding this comment.
LGTM with a few small nits/questions. Thanks for responding to all my comments!
Introduces createCSVTransform(), an object-mode Transform stream that emits header + first row on the first chunk and data-only rows afterwards. Lets later commits stream post analytics directly to the HTTP response instead of buffering the whole CSV in memory.
Marks the exportCSV endpoint as a stream so the framework defers body serialization to the controller, and rewrites the serializer to pipe models.data through the new CSV transform straight to res with the appropriate download headers and pipeline error forwarding. Note: this commit alone leaves the export endpoint non-functional. The serializer now expects models.data to be a Readable, but the exporter still returns a buffered array — the streaming exporter follows in the next commit. The split is for review clarity; these will be squashed before merging.
Pure refactor in preparation for streaming. Lifts EXPORT_WITH_RELATED to a module constant, extracts the per-export setup work into #getExportContext (with the three findAll calls parallelized via Promise.all), and extracts the per-row mapping into #mapPosts. No behavior change — export() still returns an array of mapped post objects. All existing tests pass unchanged.
PostsExporter.export() now returns a Readable that yields posts batch-by-batch via findPage with skipPagination, instead of loading every post into memory at once. Adds a parseExportLimit helper so 'all' means unlimited and bad limits fall back to the default of 15. Existing exporter tests are wrapped with collectStream(); new tests cover multi-batch streaming and default-limit handling.
Adds an opt-in stableOrder option to findPage that appends `<table>.id desc` as a tiebreaker when the requested order does not already include id. The streaming exporter passes stableOrder: true so that custom-ordered exports (e.g. 'published_at desc') don't duplicate or skip rows when paginating across batches. Covered by a model-level findPage test and an exporter-level snapshot test that simulates an unstable tie order between pages.
Streamed responses no longer have content-length or a body-derived etag, gain no-transform on Cache-Control (so proxies don't buffer or recompress), and drop Accept-Encoding from Vary. Snapshots and the header matchers in posts.test.js / post-analytics-export.test.js are updated to match.
We're using the default value here, so we don't need to include this option explicitly
…anged `stream` to `node:stream`
…ith `await text(csvTransform)`
One for the CSVTransform itself, one for the serializer
unnecessary null checks
Replaced a local stream collector with the platform async iterable API so these tests no longer need a bespoke helper. Kept this as a separate test-only cleanup from the exporter behavior commits.
Removed a leftover permitted option from the model sanitizer that is no longer needed after moving the ordering logic into the posts exporter.
Cleaned up the posts CSV transform after review because the explicit newline option was only set for subsequent rows and was not needed.
3aa4dc1 to
a87eaf8
Compare
fixes https://linear.app/tryghost/issue/ONC-1566/
I recommend reviewing this PR one commit at a time.
Rewrites Ghost's post analytics CSV export to stream results to the client instead of buffering the entire CSV in memory before sending. This should fix the export endpoint timing out for sites with large post catalogs before returning even a single byte.
I've tested this manually with a large dataset, and the resulting CSV export is byte-for-byte identical with the current behavior on main. Even if this doesn't fix the timeouts, it is a step in the right direction to stream the CSV rather than buffering it all in memory.