Skip to content

"cf service" slow when user is member of many orgs #1591

@stephanme

Description

@stephanme

Issue

"cf service" slow when user is member of many orgs.

Users complain about really slow "cf service " calls. Execution takes sometimes >5min.
Execution time seems to be related to the number of orgs that the user is member of. We have users with access to several thousand orgs.

Context

CF 12.36
CF API 2.146.0
cf version 6.50.0+4f0c3a2ce.2020-03-03

Our CF foundation is really large: ~800 cells, 90 CC, 37k app instances.
CC VMs are well loaded but not overloaded according to our monitoring.
CC DB: AWS RDS db.m5.12xlarge, 20..40% CPU

Steps to Reproduce

This is not exactly 'steps to reproduce' but more an analysis of a 'cf service postgres01' call.
The user in this example has access to 35 orgs.

Expected result

Execution is fast (a few seconds).

Current result

Execution took 52s.

I traced the CF API calls and the involved DB queries:

GET /v2/spaces/b79d0289-75dc-47bd-9460-7283508db450/service_instances?q=name%3Apostgres01&return_user_provided_service_instances=true HTTP/1.1
X-Vcap-Request-Id: 34b1ecb0-f22d-4ca8-473f-012d93d29f33::20e6abf8-4e51-4edc-8756-1c2e813812c9

  • 13s
  • results in 1 service instance (d1703063-df67-4952-ba09-8f615db92771)
  • 51 DB queries
  • 35x SELECT * FROM "service_plan_visibilities" WHERE ("service_plan_visibilities"."organization_id" = ?)
    • most time spent here (9s)
    • singly query took ~100ms

GET /v2/service_instances/d1703063-df67-4952-ba09-8f615db92771/shared_to HTTP/1.1
X-Vcap-Request-Id: 76563e6b-687b-4157-6671-17f584110e51::8af040ac-0009-441b-89c6-fc7b0a7b9102

  • 2s
  • 11 DB queries

GET /v2/service_plans/eefb5965-d893-4377-805d-876898887a28 HTTP/1.1
X-Vcap-Request-Id: 22465555-4ded-4f7b-4cb9-ad5ab695b931::a77c1b10-f7a1-4dec-b79a-9b3f3a79560f

  • 15s
  • 1178 DB queries
  • 1117x SELECT * FROM "service_plans" WHERE "id" = ?
    • took ~4s
    • single queries are fast
    • no delay between queries (= processing CC was fast)
  • 35x SELECT * FROM "service_plan_visibilities" WHERE ("service_plan_visibilities"."organization_id" = ?)
    • took 4s
  • 7s for result rendering (time from last DB query until request completion)

GET /v2/services/c4ec80ae-5949-4a79-99ef-e42a894d73d3 HTTP/1.1
X-Vcap-Request-Id: 8c7e2d5a-0ea9-4263-6332-a970b3f640cd::e33f399f-a533-4180-a6e0-b030d839a761

  • 7s
  • 46 DB queries
  • 35x SELECT * FROM "service_plan_visibilities" WHERE ("service_plan_visibilities"."organization_id" = ?)
    • took 5s

GET /v2/service_instances/d1703063-df67-4952-ba09-8f615db92771/service_bindings HTTP/1.1
X-Vcap-Request-Id: 6a9ddf12-e6db-48b5-6ebb-3c8ef2915c2a::631efc46-853c-4633-b8d5-c4bb5e4fa925

  • 4s
  • 18 DB queries, 5 of them with really long parameter lists

GET /v2/apps/d9652cff-5418-4442-9d60-37dc9e5f7685 HTTP/1.1
X-Vcap-Request-Id: 8bff6f9f-d881-4f22-4182-cb8b9b28421e::8afde353-3b2d-4a33-8800-a78afe8f589a

  • fast, <1s

Summary:

  • 3x35x SELECT * FROM "service_plan_visibilities" WHERE ("service_plan_visibilities"."organization_id" = ?)
    • this statement is invoked per org that the user has access
    • "cf service " invokes CF API requests that lead to 3 times runnning this query per org (= in my example 3x35 queries)
    • a single query takes ~100ms
      • table service_plan_visibilities has ~92k entries, there is a btree (organization_id, service_plan_id) index
      • query time varies from 10ms up to 1s, I might ran into a load peak, overall 90% of queries take <= 10ms
    • it looks like there is a lot of additional CC processing between the queries. The 35 queries in /v2/spaces/b79d0289-75dc-47bd-9460-7283508db450/service_instances take 9s overall, each single query was ~120ms in average summing up to ~4s.
    • 18s are spent on this query including processing time in CC which is a good 1/3 of the execution time of this "cf service". This time increases with the # orgs that user has access to.
  • 1117x SELECT * FROM "service_plans" WHERE "id" = ?
    • not sure why >1k plans have to be queried to find out the information about one service plan that is known by id (GET /v2/service_plans/eefb5965-d893-4377-805d-876898887a28)
    • maybe related to access rights/service enablements
    • this query is much faster compared to service_plan_visibilities (4s for ~1k queries)
  • GET /v2/service_plans/eefb5965-d893-4377-805d-876898887a28 takes 7s from the last query until the request is completed

Possible Fix

Unsure, just some ideas:

  • Use joins instead of single queries that increase with e.g. number of orgs that the user is member of.
  • Is the btree (organization_id, service_plan_id) index on service_plan_visibilities sufficient for the service_plan_visibilities queries? I guess so.
  • Consider scalibility aspects in the v3 implementation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions