Skip to content

[4.0] memcached: Make config non-HA-aware (bsc#1038223)#1340

Closed
cmurphy wants to merge 1 commit intocrowbar:stable/4.0from
cmurphy:fix-memcached-4.0
Closed

[4.0] memcached: Make config non-HA-aware (bsc#1038223)#1340
cmurphy wants to merge 1 commit intocrowbar:stable/4.0from
cmurphy:fix-memcached-4.0

Conversation

@cmurphy
Copy link
Contributor

@cmurphy cmurphy commented Oct 2, 2017

Without this patch, the keystone and nova barclamps set their cache
servers to all of the memcached servers in the cluster in
lexicographical order. This is not actually an optimal way to configure
memcached servers since if part of the cluster is down, the memcached
servers living on it will be inaccessible. The python-memcached backend
is not tied to pacemaker and has no way of knowing that the server is
down, so it attempts to connect to each server serially, not attempting
the next one until the first times out. The effect is that any query to
the OpenStack service will take a very long time. This patch fixes the
issue by only using the local memcached server for keystonemiddleware
instead of using all in the cluster. This means every controller in the
cluster will use only its own memcached server, similar to how it would
work if it was using an in-process cache.

@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 2, 2017

Cloud8 version is here: #1341 (not cherry-picked)

Copy link
Contributor

@nicolasbock nicolasbock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood this correctly then we won't use several memcached instances within HA but only the on-node instance. That is giving up on quite some potential performance, isn't it? This sounds like a pretty fundamental limitation of the oslo cache code.

@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 2, 2017

@nicolasbock what performance gains did we get from using multiple cache instances? I can't find information on how configuring memcached in a cluster improves performance, and in fact from https://github.com/memcached/memcached/wiki/Performance#maximum-number-of-nodes-in-a-cluster there is the potential for it to impede performance.

The issue is actually in python-memcached, not oslo.cache: https://github.com/linsomniac/python-memcached/blob/1.58/memcache.py#L444-L448
python-memcached is common to both the dogpile.cache.memcached and oslo_cache.memcache_pool backends of oslo.cache, so I don't think there's any way to tune oslo.cache for this. I guess it's worth noting that I reproduced this with cache backend oslo_cache.memcache_pool but Dirk reported the bug when we were still using dogpile.cache.memcache as the cache backend.

@nicolasbock
Copy link
Contributor

what performance gains did we get from using multiple cache instances?

In a multiprocess keystone setup in which all keystone instances access the same pool of memcached servers I would expect a potential increase in keystone performance because of the shared cache. While you are correct to point out that a large number of servers can slow down a client we don't know at what number this happens. And of course it would help if we had some actual benchmarks to show that we do end up with a performance gain. At this point I am just speculating. 😄

It sounds like the memcached design does not consider failed nodes though (I couldn't find anything in their documentation) and waiting for a connection to time out definitely decreases performance. Since you found that we can't tune anything to change that behavior I agree that it's better to simply use the local memcached only.

Thanks for the additional details!

nicolasbock
nicolasbock previously approved these changes Oct 2, 2017
@stefannica
Copy link
Contributor

@cmurphy there were two memcached_servers.join in the nova config file

@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 4, 2017

@stefannica thanks, fixed

nicolasbock
nicolasbock previously approved these changes Oct 4, 2017
Without this patch, the keystone and nova barclamps set their cache
servers to all of the memcached servers in the cluster in
lexicographical order. This is not actually an optimal way to configure
memcached servers since if part of the cluster is down, the memcached
servers living on it will be inaccessible. The python-memcached backend
is not tied to pacemaker and has no way of knowing that the server is
down, so it attempts to connect to each server serially, not attempting
the next one until the first times out. The effect is that any query to
the OpenStack service will take a very long time. This patch fixes the
issue by only using the local memcached server for keystonemiddleware
instead of using all in the cluster. This means every controller in the
cluster will use only its own memcached server, similar to how it would
work if it was using an in-process cache.
@cmurphy cmurphy force-pushed the fix-memcached-4.0 branch from 5562d2f to 764c1b9 Compare October 4, 2017 14:41
@cmurphy cmurphy changed the title memcached: Make config non-HA-aware (bsc#1038223) [4.0] memcached: Make config non-HA-aware (bsc#1038223) Oct 6, 2017
@cmurphy cmurphy added the wip label Oct 6, 2017
@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 6, 2017

I'm pretty sure the failure here must be caused by this change

@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 12, 2017

@nicolasbock after doing a lot of reading I understand better what you were saying about performance gains - if we have two separate caches then each controller has to cache everything itself, doubling or tripling the number of writes we have to do. Not ideal.

The HA job is failing here because when the ceph cookbook tries to make a role assignment, it makes two requests, one to GET the role assignments for the ceph user and one to PUT the new role assignment. It was lucky that it was fairly consistent about which controller ended up receiving the requests. One GET for role assignments would go to one controller and produce a cache hit containing just the role member, which comes from the ceph user having a default tenant set, and which is not the intended assignment of 'admin'. It would then issue a PUT to try to correct the role assignment and fail with an HTTP 409 because it had already created this role assignment in an earlier chef run, just the request had gone to a different controller and therefore was only cached on that controller.

I think this particular issue could be corrected by using the keystone v3 API for role assignments (which we already do in master) which wouldn't consider a default project to be a role assignment and would therefore have a cache miss and seek the role assignments from the database. But this illustrates the potential for a sort of split-brain problem that is not really acceptable, in addition to the performance hit.

I commented on the bug that I think the problem that prompted this is not really a problem any more since we switched to the memcache_pool backend, so closing this.

@cmurphy cmurphy closed this Oct 12, 2017
@nicolasbock
Copy link
Contributor

That's interesting @cmurphy. Nice analysis!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants