mgr/balancer: reduce info-level logs by timqn22 · Pull Request #66068 · ceph/ceph

timqn22 · 2025-10-27T18:08:29Z

This change includes the changing of many info level logs in the Balancer module to debug level and if/else logic to further filter info level logs to only show for when balancing action is actually happening. This is in an effort to reduce the overall amount of logs the Balancer produces for clients.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2395894

Upstream Tracker: https://tracker.ceph.com/issues/73656

Signed-off-by: Timothy Q Nguyen timqn22@gmail.com

ljflores

Hey @timqn22 thanks for the improvements! I requested a few changes.

ljflores · 2025-11-14T18:28:25Z

src/pybind/mgr/balancer/module.py

+                    self.log.debug('Skipping root %s (pools %s), total pgs %d '
+                                   '< minimum %d (%d per osd)',
+                                   root, pools,
+                                   best_pe.total_by_root[root][key],
+                                   min_pgs, min_pg_per_osd)


Can you fix this so only "debug" is modified and not the spacing? This will just help minimize the overall number of changes.

ljflores · 2025-11-14T19:17:59Z

src/pybind/mgr/balancer/module.py

            self.no_optimization_needed = True
            self.log.debug('unable to balance reads.')
            return -errno.EALREADY, msg
-        self.log.info('prepared {} read changes'.format(total_num_changes))
+        if total_num_changes != 0:
+            self.log.info('prepared {} read changes'.format(total_num_changes))
        if total_num_changes == 0:
+            self.log.debug('prepared {} read changes'.format(total_num_changes))
            self.no_optimization_needed = True
            return -errno.EALREADY, msg


From lines 1142-1151, I think it would be cleaner to do this:

if total_num_changes > 0: self.log.info('prepared {} read changes'.format(total_num_changes)) else: self.log.debug('prepared {} read changes'.format(total_num_changes)) self.no_optimization_needed = True return -errno.EALREADY, msg

When I originally wrote this, I differentiated between a negative number of changes and 0 changes. But, since it should be impossible to ever hit negative changes (we should still log it though if it somehow happens), I think it is better to combine it with the case of 0 total changes. In both cases, we set no_optimization_needed=True. And the error message applies to both cases. It would be obvious by the "prepared X changes" log message which case we hit.

This sounds good I will implement change.

ljflores · 2025-11-14T19:21:41Z

src/pybind/mgr/balancer/module.py

+        if total_did != 0:
+            self.log.info('prepared %d/%d upmap changes' % (total_did, max_optimizations))
        if total_did == 0:
+            self.log.debug('prepared %d/%d upmap changes' % (total_did, max_optimizations))
            self.no_optimization_needed = True
            return -errno.EALREADY, 'Unable to find further optimization, ' \
                                    'or pool(s) pg_num is decreasing, ' \
                                    'or distribution is already perfect'


Suggested change

if total_did != 0:

self.log.info('prepared %d/%d upmap changes' % (total_did, max_optimizations))

if total_did == 0:

self.log.debug('prepared %d/%d upmap changes' % (total_did, max_optimizations))

self.no_optimization_needed = True

return -errno.EALREADY, 'Unable to find further optimization, ' \

'or pool(s) pg_num is decreasing, ' \

'or distribution is already perfect'

if total_did > 0:

self.log.info('prepared %d/%d upmap changes' % (total_did, max_optimizations))

else:

self.log.debug('prepared %d/%d upmap changes' % (total_did, max_optimizations))

self.no_optimization_needed = True

return -errno.EALREADY, 'Unable to find further optimization, ' \

'or pool(s) pg_num is decreasing, ' \

'or distribution is already perfect'

Same as with do_read_balancing. We only care about logging "info" when the changes are greater than 0. In the "else" case, we will log 0 and negative changes. Negative changes should not ever happen, but it's good to log it just in case.

I will also implement this as apart of a new commit.

ljflores · 2025-11-14T19:27:14Z

src/pybind/mgr/balancer/module.py

+                if left == max_iterations:
+                    self.log.info('Balancing root %s (pools %s) by %s' %
+                                  (root, pools, key))
+                else:
+                    self.log.debug('Balancing root %s (pools %s) by %s' %
+                                   (root, pools, key))


Is the idea here to only log the "balancing root" message as "info" one time per root at the beginning of the while loop, and then do the rest at "debug"?

Yes you are correct that is the idea for this change.

ljflores · 2025-11-17T17:32:10Z

jenkins retest this please

In order to reduce the overall log load at info level, logs that were deemed unnecessary for users was moved to debug level. This includes logs that repeat information and logs that track no actual balancing changes. Signed-off-by: Timothy Q Nguyen <timqn22@gmail.com>

Improved the quality of one of the info level logs by adding more information to it and removing a similar log in debug level. This is in an effort to provide maximum information at info level without clogging up the info level logs with multiple logs. I've also added a check for the "balancing root" logs which are repeated multiple times without adding new information. Now only the first of the repeating "balancing root" logs is logged at info level, the rest will be logged in debug level. Signed-off-by: Timothy Q Nguyen <timqn22@gmail.com>

Improved logic of log code by reducing redundant lines and combining cases. Signed-off-by: Timothy Q Nguyen <timqn22@gmail.com>

ljflores · 2025-11-17T17:41:32Z

The branch needed to be rebased to fix the doc failure (unrelated to the changes in this PR).

Here are the steps I took:

cd ceph
git remote add timqn22 git@github.com:timqn22/ceph.git
git fetch timqn22
git checkout --track timqn22/balancer-info-log-reduction
git remote -v (check which remote points to git@github.com:ceph/ceph.git. In my case, it's "origin")
git fetch origin
git pull origin main --rebase
git log (verify that commits look okay)
git push -f timqn22 balancer-info-log-reduction

ljflores · 2025-11-17T17:42:47Z

We can add the "needs-qa" label once the CI checks pass.

Naveenaidu · 2025-12-09T03:13:22Z

RADOS Approved: https://tracker.ceph.com/issues/73968#note-3

ceph-jenkins · 2026-02-11T17:56:28Z