Skip to content

mgr/balancer: reduce info-level logs#66068

Merged
pdvian merged 3 commits intoceph:mainfrom
timqn22:balancer-info-log-reduction
Dec 15, 2025
Merged

mgr/balancer: reduce info-level logs#66068
pdvian merged 3 commits intoceph:mainfrom
timqn22:balancer-info-log-reduction

Conversation

@timqn22
Copy link
Contributor

@timqn22 timqn22 commented Oct 27, 2025

This change includes the changing of many info level logs in the Balancer module to debug level and if/else logic to further filter info level logs to only show for when balancing action is actually happening. This is in an effort to reduce the overall amount of logs the Balancer produces for clients.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2395894

Upstream Tracker: https://tracker.ceph.com/issues/73656

Signed-off-by: Timothy Q Nguyen timqn22@gmail.com

@timqn22 timqn22 force-pushed the balancer-info-log-reduction branch 2 times, most recently from 7f2f80f to feccf2d Compare October 27, 2025 22:05
@ljflores ljflores added the core label Nov 14, 2025
@ljflores ljflores self-requested a review November 14, 2025 18:20
Copy link
Member

@ljflores ljflores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @timqn22 thanks for the improvements! I requested a few changes.

Comment on lines +1302 to +1306
self.log.debug('Skipping root %s (pools %s), total pgs %d '
'< minimum %d (%d per osd)',
root, pools,
best_pe.total_by_root[root][key],
min_pgs, min_pg_per_osd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix this so only "debug" is modified and not the spacing? This will just help minimize the overall number of changes.

Comment on lines 1143 to 1151
self.no_optimization_needed = True
self.log.debug('unable to balance reads.')
return -errno.EALREADY, msg
self.log.info('prepared {} read changes'.format(total_num_changes))
if total_num_changes != 0:
self.log.info('prepared {} read changes'.format(total_num_changes))
if total_num_changes == 0:
self.log.debug('prepared {} read changes'.format(total_num_changes))
self.no_optimization_needed = True
return -errno.EALREADY, msg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From lines 1142-1151, I think it would be cleaner to do this:

if total_num_changes > 0:
    self.log.info('prepared {} read changes'.format(total_num_changes))
else:
    self.log.debug('prepared {} read changes'.format(total_num_changes))
    self.no_optimization_needed = True
    return -errno.EALREADY, msg

When I originally wrote this, I differentiated between a negative number of changes and 0 changes. But, since it should be impossible to ever hit negative changes (we should still log it though if it somehow happens), I think it is better to combine it with the case of 0 total changes. In both cases, we set no_optimization_needed=True. And the error message applies to both cases. It would be obvious by the "prepared X changes" log message which case we hit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good I will implement change.

Comment on lines 1214 to 1221
if total_did != 0:
self.log.info('prepared %d/%d upmap changes' % (total_did, max_optimizations))
if total_did == 0:
self.log.debug('prepared %d/%d upmap changes' % (total_did, max_optimizations))
self.no_optimization_needed = True
return -errno.EALREADY, 'Unable to find further optimization, ' \
'or pool(s) pg_num is decreasing, ' \
'or distribution is already perfect'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if total_did != 0:
self.log.info('prepared %d/%d upmap changes' % (total_did, max_optimizations))
if total_did == 0:
self.log.debug('prepared %d/%d upmap changes' % (total_did, max_optimizations))
self.no_optimization_needed = True
return -errno.EALREADY, 'Unable to find further optimization, ' \
'or pool(s) pg_num is decreasing, ' \
'or distribution is already perfect'
if total_did > 0:
self.log.info('prepared %d/%d upmap changes' % (total_did, max_optimizations))
else:
self.log.debug('prepared %d/%d upmap changes' % (total_did, max_optimizations))
self.no_optimization_needed = True
return -errno.EALREADY, 'Unable to find further optimization, ' \
'or pool(s) pg_num is decreasing, ' \
'or distribution is already perfect'

Same as with do_read_balancing. We only care about logging "info" when the changes are greater than 0. In the "else" case, we will log 0 and negative changes. Negative changes should not ever happen, but it's good to log it just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also implement this as apart of a new commit.

Comment on lines +1308 to +1313
if left == max_iterations:
self.log.info('Balancing root %s (pools %s) by %s' %
(root, pools, key))
else:
self.log.debug('Balancing root %s (pools %s) by %s' %
(root, pools, key))
Copy link
Member

@ljflores ljflores Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here to only log the "balancing root" message as "info" one time per root at the beginning of the while loop, and then do the rest at "debug"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are correct that is the idea for this change.

@ljflores
Copy link
Member

jenkins retest this please

In order to reduce the overall log load at info level, logs that
were deemed unnecessary for users was moved to debug
level. This includes logs that repeat information and logs
that track no actual balancing changes.

Signed-off-by: Timothy Q Nguyen <timqn22@gmail.com>
Improved the quality of one of the info level logs by
adding more information to it and removing a
similar log in debug level. This is in an effort to
provide maximum information at info level without
clogging up the info level logs with multiple logs.
I've also added a check for the "balancing root"
logs which are repeated multiple times without
adding new information. Now only the first of the
repeating "balancing root" logs is logged at info
level, the rest will be logged in debug level.

Signed-off-by: Timothy Q Nguyen <timqn22@gmail.com>
Improved logic of log code by reducing redundant
lines and combining cases.

Signed-off-by: Timothy Q Nguyen <timqn22@gmail.com>
@ljflores ljflores force-pushed the balancer-info-log-reduction branch from 38e40c2 to 9ef4391 Compare November 17, 2025 17:40
@ljflores
Copy link
Member

ljflores commented Nov 17, 2025

The branch needed to be rebased to fix the doc failure (unrelated to the changes in this PR).

Here are the steps I took:

cd ceph
git remote add timqn22 git@github.com:timqn22/ceph.git
git fetch timqn22
git checkout --track timqn22/balancer-info-log-reduction
git remote -v (check which remote points to git@github.com:ceph/ceph.git. In my case, it's "origin")
git fetch origin
git pull origin main --rebase
git log (verify that commits look okay)
git push -f timqn22 balancer-info-log-reduction

@ljflores
Copy link
Member

We can add the "needs-qa" label once the CI checks pass.

@Naveenaidu
Copy link
Contributor

RADOS Approved: https://tracker.ceph.com/issues/73968#note-3

@pdvian pdvian merged commit 4a04625 into ceph:main Dec 15, 2025
13 checks passed
@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

9 similar comments
@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

3 similar comments
@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@ceph-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants