Skip to content

Conversation

@soloestoy
Copy link
Contributor

@soloestoy soloestoy commented Jul 18, 2023

A bug introduced in #11657 (7.2 RC1), causes client-eviction (#8687) and INFO to have inaccurate memory usage metrics of MONITOR clients.

Because the type in c->type and the type in getClientType() are confusing
(in the later, CLIENT_TYPE_NORMAL not CLIENT_TYPE_SLAVE), the comment
we wrote in updateClientMemUsageAndBucket was wrong, and in fact that function
didn't skip monitor clients.
And since it doesn't skip monitor clients, it was wrong to delete the call for it from
replicationFeedMonitors (it wasn't a NOP).
That deletion could mean that the monitor client memory usage is not always up to
date (updated less frequently, but still a candidate for client eviction).

@soloestoy soloestoy requested review from hpatro, oranagra and sundb July 18, 2023 11:04
@sundb
Copy link
Collaborator

sundb commented Jul 18, 2023

@soloestoy Sorry for my bad, you are right, thanks for figuring it out.

@soloestoy
Copy link
Contributor Author

@soloestoy Sorry for my bad, you are right, thanks for figuring it out.

don't worry, it's harmless 😄

@soloestoy
Copy link
Contributor Author

BTW @oranagra , I think we need backport #12025 to 7.0. Since now we only update client's memory usage via updateClientMemoryUsage in clientsCron when client eviction is disabled, we need fair iteration.

@oranagra
Copy link
Member

BTW @oranagra , I think we need backport #12025 to 7.0. Since now we only update client's memory usage via updateClientMemoryUsage in clientsCron when client eviction is disabled, we need fair iteration.

how is that different than what we had in 6.X?
i'd rather avoid backporting things with any potential for causing issues, unless the fix a serious problem or a recent regression.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so because the type in c->type and the type in getClientType are confusing, the comment we wrote in updateClientMemUsageAndBucket was wrong, and in fact that function didn't skip monitor clients.
and since it doesn't skip monitor clients, it was wrong to delete the call for it from replicationFeedMonitors (it wasn't a NOP).
that deletion could mean that the monitor client memory usage is not always up to date (updated less frequently, but still a candidate for client eviction).
since it's just monitors (not a critical feature), with client eviction (off by default), it's not a critical bug IMHO.

however, now i wonder what's the right fix, maybe instead of reverting the change in replicationFeedMonitors and updating the comment in updateClientMemUsageAndBucket, we need to keep that comment and change the code so that it is true.
i.e. maybe ignoring monitors the same way as we ignore replicas is the right behavior?

same as replicas, monitors accumulate output buffers as a result of other clients work (or abuse), so disconnecting the monitor will not stop that work (or abuse).
but on the other hand, one possible way the monitor can accumulate a large output buffer is if the client doesn't read the data from the socket fast enough. and indeed disconnecting the monitor will release the memory.

I tend to agree with this PR, just want to make sure we fix it the right way this time.
WDYT?

@hpatro
Copy link
Contributor

hpatro commented Jul 21, 2023

I perceive MONITOR command as a debug command (shouldn't be used in regular case) and client memory usage tracking in realtime doesn't have a lot of value for it. accurate memory usage tracking in general is an expensive operation. I would rather prefer if we make the change to not track it accurately.

@oranagra oranagra merged commit 01eb939 into redis:unstable Jul 25, 2023
@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Jul 25, 2023
@oranagra oranagra mentioned this pull request Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants