Skip to content

Conversation

@andy-paine
Copy link
Contributor

A short explanation of the proposed change:

Only count the primary key fields for DISTINCT queries

An explanation of the use cases your change solves

When paginating results, a COUNT is performed to get the total number of
results that would be returned by the query. For queries with a
DISTINCT in their SELECT, this requires the DB to perform a full
sort over the entire SELECT table.* columns (sometimes 15+) which can be
very slow. Since this is only being used for a COUNT, only a unique
identifier is needed to do the count which is already unique so does not
require sorting. This halves the EXPLAIN cost of some of the
service_plans queries since that is currently sorting over 15+ fields
just to return a single COUNT.

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

When paginating results, a COUNT is performed to get the total number of
results that would be returned by the query. For queries with a
DISTINCT in their SELECT, this requires the DB to perform a full
sort over the entire SELECT table.* columns (sometimes 15+) which can be
very slow. Since this is only being used for a COUNT, only a unique
identifier is needed to do the count which is already unique so does not
require sorting. This halves the EXPLAIN cost of some of the
service_plans queries since that is currently sorting over 15+ fields
just to return a single COUNT.

Co-Authored-By: Philipp Thun <philipp.thun@sap.com>
@andy-paine
Copy link
Contributor Author

This doesn't actually perform correctly if the SELECT DISTINCT query is selecting from multiple tables such as SELECT DISTINCT a.id, a.b, c.id, c.d) as it would count identical a.id rows as only 1 when in fact they are different as the c.id differs between them. Closing in favour of #2527 which achieves better performance improvements whilst keeping the same query semantics in all scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants