Skip to content

Conversation

@andy-paine
Copy link
Contributor

@andy-paine andy-paine commented Oct 1, 2021

A short explanation of the proposed change

Use the COUNT(*) OVER() window function to get the total results in the same query as selecting the actual objects

An explanation of the use cases your change solves

Running SELECT COUNT(*) for getting total results, followed by SELECT * means 2 potentially expensive queries. The count is just used to populate the total_results field in the pagination block of the list responses in the V3 API. This does everything in one query resulting in a ~40-50% query saving (DB caching meant the second query wasn't quite as expensive as the first but still a lot, especially for queries that required temp files/had too many results to efficiently cache)

The pagination extension does 3 main things:

  • Adds a limit/offset to get the correct "page" of results from the DB
  • Automatically calculates the total number of results using SELECT COUNT(*)
  • Extends the dataset with extra methods e.g. last_page?, current_page

As we don't use the dataset extensions (only the records are returned) then by switching to using the COUNT(*) OVER() window function, we can save the extra calculation query and by just doing the limit/offset ourselves we can remove this extension entirely, saving the overhead of extending the dataset.

Note: Whilst investigating this PR, @johha noticed that the sequel paginator wasn't respecting the position ordering of buildpacks. This has been addressed in a second commit in this PR.

Links to any other associated PRs

#2525 had the same end goal of improving the efficiency of this COUNT but this PR does it better and more generally

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

@andy-paine andy-paine force-pushed the use-window-functions-for-paginated-count branch 4 times, most recently from 278bdca to a85f2f3 Compare October 5, 2021 07:42
This window function (supported by both MySQL > v8 and all modern
Postgres versions) allows fetching of the total number of results a
query would return (OVER the whole window), ignoring the LIMIT/OFFSET
clauses from the pagination. This means that even if the pagination only
returns 50 results, the `total` field will contain e.g. 120 if that's
how many the filter would normally return. This means only a single
SELECT is needed rather than a SELECT COUNT(*) + SELECT which should
massively lower the load on the DB (and make us less reliant on cache
hits in the DB). As the pagination extension is only doing the
limit/offset in this scenario we can do it ourselves and avoid overhead
of extending the dataset to include the plugin.

The eager_graph plugin uses joins to fetch related data in a query and
then maps the joined entities into associations on the main object. As
part of doing this, it expects to know what all the fields of the base
model are so that it can tell what to extract from each row into the
base model. For eager_graph loaded queries, this was resulting the in
the :pagination_total_results field being thrown away. To avoid this
issue, we register the :pagination_total_results field as part of the
main table using add_graph_aliases then remove it before returning the
final records.

Additionally, as part of making this change, the get_page function has
been updated to use some nice Sequel constructs like order_append and
qualify to simplify the code and avoid string interpolation.

Co-Authored-By: Philipp Thun <philipp.thun@sap.com>
@andy-paine andy-paine force-pushed the use-window-functions-for-paginated-count branch from a85f2f3 to 88395f1 Compare October 5, 2021 09:52
The sequel paginator uses the .order function which overrides any
ordering already applied to the dataset. For buildpacks, the :list
plugin means the Buildpack.dataset has a default ORDER BY of position.
To avoid this, make sure the pagination_options contain the correct
ordering.

This breaks the test for the Kpack buildpacks list endpoint but this
test has always been wrong as the returned results should be ordered by
the order specified in the builder, not ordered by :id (which was the
default in pagination_options for buildpacks before)

Co-Authored-By: Johannes Haass <johannes.haass@sap.com>
@tjvman
Copy link
Contributor

tjvman commented Oct 7, 2021

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants