Skip to content

Conversation

@andy-paine
Copy link
Contributor

Overview

A short explanation of the proposed change:
Remove eager loading of service plan visibilities when fetching the list of service plans

Note: there are 2 commits here - the 1st one just adds a helpful rake task for populating the DB and isn't critical to the behaviour, we just included it to help test locally/test in the future. If whoever reviews this would rather just take the functional second commit please let us know

An explanation of the use cases your change solves
Users trying to query cf marketplace against deployments with large numbers of organisations and a reasonably large number of brokers/services/plans which thus have an enormous number of service plan visibilities

Links to any other associated PRs
Fixes #2213

More info

After a bunch of investigation we found that the service plan visibilities table was causing issues when eagerly loaded into the service plans endpoint, particularly when listing service plans (as cf marketplace does). Whilst the query that the ORM executes (SELECT * FROM service_plan_visibilities WHERE service_plan_id IN (<long list of IDs>)) isn't that expensive for a DB, it returns >10k rows in many cases and the process of parsing that to Ruby objects and linking it to the correct plan does seem to be pretty slow.

The only usage of this data in this endpoint appears to be checking whether there are any visibilities for this plan and only in the case where the plan is not public or not space-scoped. By not eagerly loading this table, the ORM instead issues a SELECT 1 ... WHERE service_plan_id = '1234' for each plan in the list but as this a) is only called if it is not public or space scoped and b) returns much less data it is far faster than the eager loading still. Potentially, this may remain true in environments with significant DB latency as more, smaller requests are needed but as the numbers below will say it would have to be really bad to be worse than it currently is.

Tests

Setting up a database as per this comment:

  • 22,000 orgs
  • 66,000 spaces
  • 300 service brokers
  • 600 services
  • 1200 service plans
  • 660,000 service plan visibilities (all plans have worst-case scenario of being organization visible)

We got the following timings for querying the service_plan endpoints before and after this change:

  • /v2/service_plans?results-per-page=50 - 0.12s
  • /v2/service_plans?results-per-page=100 - 0.17s
    Before change:
  • /v3/service_plans?per_page=50 - 1.24s
  • /v3/service_plans?per_page=100 - 2.21s
  • /v3/service_plans?per_page=5000 - 30.20s
    After change:
  • /v3/service_plans?per_page=50 - 0.16s
  • /v3/service_plans?per_page=100 - 0.22s
  • /v3/service_plans?per_page=5000 - 1.72s

Unfortunately this hasn't actually done much for CLI users yet as there seems to be something related to the space_guids filter (it hasn't made it worse) but at the very least this clearly makes the basic endpoint much faster for other users (e.g. Stratos, AppsMan)

Checklist

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 21, 2021

CLA Signed

The committers are authorized under a signed CLA.

@andy-paine andy-paine force-pushed the speed-up-v3-service-plans branch 2 times, most recently from 28cd3c2 to 2f85338 Compare April 21, 2021 16:41
andy-paine and others added 2 commits April 21, 2021 17:47
For assessing performance of service related endpoints it is useful to
be able to generate large amounts of data about services, plans,
instances etc.

Add a rake task to do this to replace the slow script that was piped
into bin/console as this suffered from being a TTY/slow loading of
commands. To speed up inserts, <Class>.new is used instead of
<Class>.create as this way we can multi_insert which uses a single DB
query to do the whole insert rather than 1 per object which gets slow
from so many round-trips to the DB.

Co-Authored-By: Florian Braun <FloThinksPi@users.noreply.github.com>
The only place that the visibilities relationship is used is in
determining what "visibility_type" a plan has and that only needs to
check `.empty?` on the visibilties. By not eagerly loading the
visibilities we save having to query and parse a huge amount of data
(num_visibilities ~= num_orgs * num_plans). When this data is needed to
check for `empty?` the ORM will instead issue a `SELECT 1 ... WHERE
service_plan_id = '1234'` for each plan. Even in the scenario where a
query is needed for every plan (i.e. the plan is neither public nor
space-scoped), this is far more efficient than fetching and parsing the
actual table contents. We have seen an improvement from 90s for
/v3/service_plans?per_page=500 to 1.3s, even in the worst-case scenario
of every plan having visibility_type of organization.

Co-Authored-By: Florian Braun <FloThinksPi@users.noreply.github.com>
@andy-paine andy-paine force-pushed the speed-up-v3-service-plans branch from 2f85338 to 34817c0 Compare April 21, 2021 16:47
@sethboyles
Copy link
Member

Interesting! Thanks for the detailed investigation.

If the including the n+1 ends up being slow for some cases, we could consider creating a new association on the ServicePlan model :organization, through: :service_plan_visibilities, which would allow us to check the same information but reduce the scope from Org count * Service Plan count to just Org count. However, given that this query isn't needed for every ServicePlan, it still might not be worth it.

@andy-paine
Copy link
Contributor Author

After doing some more profiling with rbspy, it looks like the majority of the time is spent parsing the created_at and updated_at columns from the service_plan_visibilities table. If we don't want to remove eager loading entirely I think we could probably just add a :select to the visibilities association in service plans.

I tested this locally and it is not as big an improvement as removing the eager loading entirely but it is still ~6x faster I think (early numbers). That would suggest that even the worst case "I need to run 5000 SELECT 1 queries" is better but I think it is interesting to note that most of the processing time when parsing that table is dealing with the datetimes.

@sethboyles
Copy link
Member

Closing this as we cherry-picked the relevant commit: 2e363c6

We tested this out on one of our environments and everything looked good. FWIW we did quickly try a version of my above suggestion of eager-loading a newly defined organizations association and did not see a performance improvement. But I think we discovered why there was still an n+1, even after eager loading service_plan_visibilities--we believe it is because this method:

https://github.com/cloudfoundry/cloud_controller_ng/blob/main/app/models/services/service_plan.rb#L172

calls service_plan_visibilities_dataset and not service_plan_visibilities.

as a side note there is probably a minor perf improvement to call service_plan_visibilities_dataset.count > 0 instead of empty?, since empty? still loads the ruby models into memory and count executes a SQL COUNT statement.

Again, thanks for this investigation and PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/v3/service_plans endpoint is slow in deployments with large numbers of service plan visibilities

3 participants