Bug #68657: squid: mgr/balancer preventing orchestrator and dashboard functionality - mgr - Ceph

Actions

Copy link

Bug #68657

closed

squid: mgr/balancer preventing orchestrator and dashboard functionality

Added by Laimis Juzeliunas over 1 year ago. Updated 5 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Laura Flores

Category:

Target version:

Ceph - v19.2.1

% Done:

Source:

Backport:

squid

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v19.2.0

ceph-qa-suite:

Pull request ID:

60523

Tags (freeform):

backport_processed

Merge Commit:

5174b1f180323827daaf0422d2da51643326ea35

Fixed In:

v19.3.0-6142-g5174b1f180

Released In:

v20.2.0~1604

Upkeep Timestamp:

2025-11-01T01:30:14+00:00

Description

Hi Team,
We have been facing poor mgr/balancer (upmap mode) behaviour after upgrading from Reef 18.4.2 to Squid 19.2.0 - when turned on it instantly crashes cephadm orchestrator (all ceph orch commands hang) and dashboard (freezing and ultimately 503 error when loading). We run docker daemons (Docker version 25.0.1, build 29cf629) on bare metal Debian (5.10.0-27-amd64 #1 SMP Debian 5.10.205-2x86_64 GNU/Linux). This issue was encountered when upgrading as mgr daemons are the first in line - Ceph UI dashboard crashed right after mgr pulled and started using the 19.2.0 images. We had to turn off the balancer for the upgrade to complete and are keeping it off most of the time now. The balancer itself seems to work in the background resulting in some backfills when turned on.

Detailed log trace below.
Our cluster Health status (issue happens with HEALTH_OK as well):

root@ceph-node001 ~ # ceph -s
  cluster:
    id:     96df99f6-fc1a-11ea-90a4-6cb3113cb732
    health: HEALTH_WARN
            1 backfillfull osd(s)
            1 pgs not deep-scrubbed in time
            14 pool(s) backfillfull

  services:
    mon:        5 daemons, quorum ceph-node004,ceph-node003,ceph-node001,ceph-node005,ceph-node002 (age 6h)
    mgr:        ceph-node001.hgythj(active, since 53m), standbys: ceph-node002.jphtvg
    mds:        20/20 daemons up, 12 standby
    osd:        384 osds: 384 up (since 6h), 384 in (since 10d); 9 remapped pgs
    rbd-mirror: 2 daemons active (2 hosts)
    rgw:        64 daemons active (32 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   14 pools, 8681 pgs
    objects: 755.16M objects, 1.5 PiB
    usage:   4.6 PiB used, 1.1 PiB / 5.6 PiB avail
    pgs:     644454/2265477489 objects misplaced (0.028%)
             7545 active+clean
             583  active+clean+scrubbing
             544  active+clean+scrubbing+deep
             9    active+remapped+backfilling

  io:
    client:   4.8 GiB/s rd, 32 MiB/s wr, 34.33k op/s rd, 919 op/s wr
    recovery: 111 MiB/s, 51 objects/s

Balancer configs are set as below:

global        advanced  mgr/balancer/active                       false
global        advanced  mgr/balancer/log_level                    debug
global        advanced  mgr/balancer/log_to_cluster               true
mgr           advanced  mgr/balancer/log_to_cluster_level         debug
mgr           advanced  mgr/balancer/mode                         upmap
global        basic     mgr/balancer/upmap_max_optimizations      20

To keep track of the log entries here are the timestamps:

root@ceph-node001 ~ # date
Mon 21 Oct 2024 10:47:33 AM UTC
root@ceph-node001 ~ # ceph balancer on

root@ceph-node001 ~ # date
Mon 21 Oct 2024 10:55:12 AM UTC
root@ceph-node001 ~ # ceph balancer off

While the balancer is on ceph orch commands either hang or prompt back the following messages:

root@ceph-node001 ~ # ceph orch ls
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'orchestrator' is not enabled/loaded (required by command 'orch ls'): use `ceph mgr module enable orchestrator` to enable it
root@ceph-node001 ~ # ceph mgr module enable orchestrator
module 'orchestrator' is already enabled (always-on)
root@ceph-node001 ~ # ceph orch ls
Error ENOTSUP: Module 'orchestrator' is not enabled/loaded (required by command 'orch ls'): use `ceph mgr module enable orchestrator` to enable it

Active ceph-node001.hgythj mgr log:

debug 2024-10-21T10:47:39.667+0000 7f38cdbc7640  0 log_channel(audit) log [DBG] : from='client.22145222 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
debug 2024-10-21T10:47:40.499+0000 7f38ccbc5640  0 log_channel(cluster) log [DBG] : pgmap v1618: 8681 pgs: 9 active+remapped+backfilling, 582 active+clean+scrubbing, 7545 active+clean, 545 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.5 GiB/s rd, 165 MiB/s wr, 39.61k op/s; 641560/2265483588 objects misplaced (0.028%); 120 MiB/s, 50 objects/s recovering
[21/Oct/2024:10:47:40] ENGINE Bus STOPPING
debug 2024-10-21T10:47:40.883+0000 7f38d6f59640  0 [balancer DEBUG root] Waking up [active, now 2024-10-21_10:47:40]
debug 2024-10-21T10:47:40.883+0000 7f38d6f59640  0 [balancer DEBUG root] Running
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option ssh_config_file = None
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option device_cache_timeout = 1800
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option device_enhanced_scan = False
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option inventory_list_all = False
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option daemon_cache_timeout = 600
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option facts_cache_timeout = 60
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option host_check_interval = 600
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option mode = root
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_base = quay.io/ceph/ceph
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_prometheus = quay.io/prometheus/prometheus:v2.43.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_nvmeof = quay.io/ceph/nvmeof:1.2.5
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_grafana = quay.io/ceph/grafana:9.4.12
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_alertmanager = quay.io/prometheus/alertmanager:v0.25.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_node_exporter = quay.io/prometheus/node-exporter:v1.5.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_loki = docker.io/grafana/loki:3.0.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_promtail = docker.io/grafana/promtail:3.0.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_haproxy = quay.io/ceph/haproxy:2.3
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_keepalived = quay.io/ceph/keepalived:2.2.4
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_snmp_gateway = docker.io/maxwo/snmp-notifier:v1.2.1
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_elasticsearch = quay.io/omrizeneva/elasticsearch:6.8.23
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_jaeger_agent = quay.io/jaegertracing/jaeger-agent:1.29
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_jaeger_collector = quay.io/jaegertracing/jaeger-collector:1.29
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_jaeger_query = quay.io/jaegertracing/jaeger-query:1.29
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_image_samba = quay.io/samba.org/samba-server:devbuilds-centos-amd64
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option warn_on_stray_hosts = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option warn_on_stray_daemons = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option warn_on_failed_host_check = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option log_to_cluster = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option allow_ptrace = False
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option container_init = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option prometheus_alerts_path = /etc/prometheus/ceph/ceph_default_alerts.yml
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option grafana_dashboards_path = /etc/grafana/dashboards/ceph-dashboard/
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option migration_current = 7
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option config_dashboard = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option manage_etc_ceph_ceph_conf = False
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option manage_etc_ceph_ceph_conf_hosts = *
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option registry_url = None
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option registry_username = None
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option registry_password = None
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option registry_insecure = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option use_repo_digest = True
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option config_checks_enabled = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option default_registry = docker.io
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option max_count_per_host = 10
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option autotune_memory_target_ratio = 0.7
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option autotune_interval = 600
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option use_agent = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option agent_refresh_rate = 20
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option agent_starting_port = 4721
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option agent_down_multiplier = 3.0
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option hw_monitoring = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option max_osd_draining_count = 10
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option service_discovery_port = 8765
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option cgroups_split = True
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option log_refresh_metadata = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option secure_monitoring_stack = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option default_cephadm_command_timeout = 900
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option cephadm_log_destination =
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option oob_default_addr = 169.254.1.1
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option log_level = debug
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option log_to_file = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640  0 [cephadm DEBUG root]  mgr option log_to_cluster_level = info
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640  0 [balancer INFO root] Optimize plan auto_2024-10-21_10:47:40
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640  0 [balancer INFO root] Mode upmap, max misplaced 0.050000
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640  0 [balancer DEBUG root] unknown 0.000000 degraded 0.000000 inactive 0.000000 misplaced 0.000283189
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640  0 [balancer INFO root] do_upmap
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640  0 [balancer INFO root] pools ['cephfs.storage.data', 'my-k8s-cluster4.ceph-csi.rbd', 'europe-1.rgw.buckets.index', 'my-k8s-cluster1.ceph-csi.rbd', 'europe-1.rgw.log', 'my-k8s-cluster2.ceph-csi.rbd', 'my-k8s-cluster3.ceph-csi.rbd', '.rgw.root', 'europe-1.rgw.meta', 'europe-1.rgw.buckets.data', 'europe-1.rgw.buckets.non-ec', 'cephfs.storage.meta', '.mgr', 'europe-1.rgw.control']
debug 2024-10-21T10:47:41.183+0000 7f38d6f59640  0 [balancer INFO root] prepared 0/10 upmap changes

After this entry the log is mostly entries with pgmap information. You can see the orch ls commands in between:

...
debug 2024-10-21T10:47:58.815+0000 7f38ccbc5640  0 log_channel(cluster) log [DBG] : pgmap v1627: 8681 pgs: 9 active+remapped+backfilling, 582 active+clean+scrubbing, 7545 active+clean, 545 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.8 GiB/s rd, 150 MiB/s wr, 17.11k op/s; 640643/2265486588 objects misplaced (0.028%); 113 MiB/s, 52 objects/s recovering
debug 2024-10-21T10:47:59.595+0000 7f38cdbc7640  0 log_channel(audit) log [DBG] : from='client.22116924 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
debug 2024-10-21T10:48:00.847+0000 7f38ccbc5640  0 log_channel(cluster) log [DBG] : pgmap v1628: 8681 pgs: 9 active+remapped+backfilling, 582 active+clean+scrubbing, 7545 active+clean, 545 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.4 GiB/s rd, 128 MiB/s wr, 15.22k op/s; 640534/2265486750 objects misplaced (0.028%); 112 MiB/s, 50 objects/s recovering
...
debug 2024-10-21T10:51:23.764+0000 7f38cdbc7640  0 log_channel(audit) log [DBG] : from='client.22114830 -' entity='client.admin' cmd=[{"prefix": "orch ls", "target": ["mon-mgr", ""]}]: dispatch
debug 2024-10-21T10:51:24.256+0000 7f38ccbc5640  0 log_channel(cluster) log [DBG] : pgmap v1728: 8681 pgs: 9 active+remapped+backfilling, 583 active+clean+scrubbing, 7545 active+clean, 544 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.7 GiB/s rd, 216 MiB/s wr, 6.05k op/s; 629911/2265503763 objects misplaced (0.028%); 115 MiB/s, 53 objects/s recovering
...
debug 2024-10-21T10:53:36.474+0000 7f3976376640  0 client.0 ms_handle_reset on v2:162.55.93.25:6850/669498911
debug 2024-10-21T10:53:38.410+0000 7f38ccbc5640  0 log_channel(cluster) log [DBG] : pgmap v1794: 8681 pgs: 9 active+remapped+backfilling, 583 active+clean+scrubbing, 7545 active+clean, 544 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.9 GiB/s rd, 34 MiB/s wr, 3.86k op/s; 622688/2265511929 objects misplaced (0.027%); 119 MiB/s, 56 objects/s recovering
debug 2024-10-21T10:53:40.430+0000 7f38ccbc5640  0 log_channel(cluster) log [DBG] : pgmap v1795: 8681 pgs: 9 active+remapped+backfilling, 583 active+clean+scrubbing, 7545 active+clean, 544 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.4 GiB/s rd, 32 MiB/s wr, 3.20k op/s; 622688/2265511950 objects misplaced (0.027%); 99 MiB/s, 44 objects/s recovering
debug 2024-10-21T10:53:40.682+0000 7f38cdbc7640  0 log_channel(audit) log [DBG] : from='client.22145247 -' entity='client.admin' cmd=[{"prefix": "orch ls", "target": ["mon-mgr", ""]}]: dispatch
...

After a while the balancer kicks in again:

debug 2024-10-21T10:53:54.478+0000 7f38d6f59640  0 [balancer DEBUG root] Sleeping for 60
debug 2024-10-21T10:53:54.482+0000 7f38c0bad640  0 [orchestrator DEBUG root] _oremote orchestrator -> cephadm.describe_service(*(None, None), **{'refresh': False})
debug 2024-10-21T10:53:54.526+0000 7f38d6f59640  0 [balancer DEBUG root] Waking up [active, now 2024-10-21_10:53:54]
debug 2024-10-21T10:53:54.526+0000 7f38d6f59640  0 [balancer DEBUG root] Running
debug 2024-10-21T10:53:54.658+0000 7f38ccbc5640  0 log_channel(cluster) log [DBG] : pgmap v1802: 8681 pgs: 9 active+remapped+backfilling, 583 active+clean+scrubbing, 7545 active+clean, 544 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 3.6 GiB/s rd, 23 MiB/s wr, 2.77k op/s; 621874/2265512487 objects misplaced (0.027%); 95 MiB/s, 45 objects/s recovering
debug 2024-10-21T10:53:54.714+0000 7f38ade88640  0 log_channel(stats) log [WRN] : cmdtag  not found in client metadata
debug 2024-10-21T10:53:55.090+0000 7f38d6f59640  0 [balancer INFO root] Optimize plan auto_2024-10-21_10:53:54
debug 2024-10-21T10:53:55.090+0000 7f38d6f59640  0 [balancer INFO root] Mode upmap, max misplaced 0.050000
debug 2024-10-21T10:53:55.090+0000 7f38d6f59640  0 [balancer DEBUG root] unknown 0.000000 degraded 0.000000 inactive 0.000000 misplaced 0.000274496
debug 2024-10-21T10:53:55.090+0000 7f38d6f59640  0 [balancer INFO root] do_upmap
debug 2024-10-21T10:53:55.094+0000 7f38d6f59640  0 [balancer INFO root] pools ['europe-1.rgw.meta', 'europe-1.rgw.buckets.non-ec', 'my-k8s-cluster1.ceph-csi.rbd', 'cephfs.storage.data', 'my-k8s-cluster2.ceph-csi.rbd', 'europe-1.rgw.log', 'europe-1.rgw.control', 'europe-1.rgw.buckets.data', 'my-k8s-cluster3.ceph-csi.rbd', '.mgr', 'my-k8s-cluster4.ceph-csi.rbd', '.rgw.root', 'cephfs.storage.meta', 'europe-1.rgw.buckets.index']
debug 2024-10-21T10:53:56.402+0000 7f38c0bad640  0 [orchestrator DEBUG root] _oremote orchestrator -> cephadm.describe_service(*(None, None), **{'refresh': False})
debug 2024-10-21T10:53:56.466+0000 7f38d6f59640  0 [balancer INFO root] prepared 0/10 upmap changes

This repeats in a infite loop every 5 or so minutes. To get orchestrator and dashboard working again we need to turn off the balancer and restart/failover the mgr daemons.

The monitor logs dont seem to carry any useful information apart from different entries regarding balancer being turned on (not sure if the difference between monitor daemons is expected).
Other than that they seem to be performing the usual tasks with a lot of "entity='mgr.ceph-node002.jphtvg' cmd=[{"prefix": "balancer status", "format": "json"}]: dispatch" entries

root@ceph-node001 ~ # grep mgr/balancer/active mon-log
debug 2024-10-21T10:47:39.703+0000 7f31d6d07640  0 mon.ceph-node001@2(peon) e253 handle_command mon_command([{prefix=config set, name=mgr/balancer/active}] v 0)
root@ceph-node001 ~ # grep "balancer on" mon-log
root@ceph-node001 ~ #

root@ceph-node002 ~ # grep mgr/balancer/active mon-log
root@ceph-node002 ~ # grep "balancer on" mon-log
root@ceph-node002 ~ #

root@ceph-node003 ~ # grep mgr/balancer/active mon-log
root@ceph-node003 ~ # grep "balancer on" mon-log
audit 2024-10-21T10:47:39.670403+0000 mgr.ceph-node001.hgythj (mgr.22123253) 2832 : audit [DBG] from='client.22145222 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
root@ceph-node003 ~ #

root@ceph-node004 ~ # grep mgr/balancer/active mon-log
debug 2024-10-21T10:47:39.953+0000 7fc9766eb640  0 mon.ceph-node004@0(leader) e253 handle_command mon_command([{prefix=config set, name=mgr/balancer/active}] v 0)
root@ceph-node004 ~ # grep "balancer on" mon-log
audit 2024-10-21T10:47:39.670403+0000 mgr.ceph-node001.hgythj (mgr.22123253) 2832 : audit [DBG] from='client.22145222 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
root@ceph-node004 ~ #

root@ceph-node005 ~ # grep mgr/balancer/active mon-log
root@ceph-node005 ~ # grep "balancer on" mon-log
audit 2024-10-21T10:47:39.670403+0000 mgr.ceph-node001.hgythj (mgr.22123253) 2832 : audit [DBG] from='client.22145222 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
root@ceph-node005 ~ #

If anyone could point out where to look further with this issue we would be beyond grateful as we're currently stuck with a non working balancer.
I'm attaching full logs for the managers and monitors (though I can't tell at the spot which monitor was at which rank)>

Thanks in advance,
Laimis J.

Files

Download all files

mon-log-ceph-node001 (153 KB) mon-log-ceph-node001	monitor 1 log	Laimis Juzeliunas, 10/22/2024 02:15 PM
mon-log-ceph-node002 (372 KB) mon-log-ceph-node002	monitor 2 log	Laimis Juzeliunas, 10/22/2024 02:15 PM
mon-log-ceph-node005 (523 KB) mon-log-ceph-node005	monitor 5 log	Laimis Juzeliunas, 10/22/2024 02:15 PM
mon-log-ceph-node003 (518 KB) mon-log-ceph-node003	monitor 3 log	Laimis Juzeliunas, 10/22/2024 02:16 PM
mon-log-ceph-node004 (734 KB) mon-log-ceph-node004	monitor 4 log	Laimis Juzeliunas, 10/22/2024 02:18 PM
mgr-log-active (90.4 KB) mgr-log-active		Laimis Juzeliunas, 10/22/2024 02:18 PM
mgr-log-standby (48.8 KB) mgr-log-standby		Laimis Juzeliunas, 10/22/2024 02:18 PM
mgr-prom-metrics.png (289 KB) mgr-prom-metrics.png		Laimis Juzeliunas, 10/28/2024 11:38 PM
mgr-log-2 (71.7 KB) mgr-log-2		Laimis Juzeliunas, 10/28/2024 11:38 PM
cephadm serve (725 KB) cephadm serve	cephadm serve logs for one hour	Laimis Juzeliunas, 10/29/2024 12:02 AM

Related issues 4 (2 open — 2 closed)

Actions

Copy link

Updated by John Mulligan over 1 year ago

Tracker changed from Support to Bug
Project changed from Ceph to mgr
Regression set to No
Severity set to 3 - minor

Actions

Copy link

Updated by John Mulligan over 1 year ago

Just to be clear: there are no actual "crashes" - no cores or python tracebacks that you see, correct?
This issue is that when the balancer module is loaded, the orchestrator and dashboard modules fail to function, right?

Actions

Copy link

Updated by Laimis Juzeliunas over 1 year ago

Subject changed from squid: mgr/balancer crashing orchestrator and dashboard to squid: mgr/balancer preventing orchestrator and dashboard functionality

John Mulligan wrote in #note-2:

Just to be clear: there are no actual "crashes" - no cores or python tracebacks that you see, correct?
This issue is that when the balancer module is loaded, the orchestrator and dashboard modules fail to function, right?

Yes, correct - we have not observed any crash tracebacks. Orchestrator fails to function and dashboard brings back "503 Service Unavailable" message.
Apologies for the poor wording - changed it for the tracker.

Actions

Copy link

Updated by Brad Hubbard over 1 year ago

Related to Bug #68648: Commands using Mgr Modules fail with ENOTSUP if run immediately post a mgr failover/ restart added

Actions

Copy link

Updated by Brad Hubbard over 1 year ago

It's my belief this should be returning EAGAIN, not ENOTSUP and I have a fix for that to resolve Bug #68648

Module 'orchestrator' is not enabled/loaded (required by command 'orch ls'): use `ceph mgr module enable orchestrator` to enable it

Actions

Copy link

Updated by Brad Hubbard over 1 year ago

Laimis Juzeliunas wrote in #note-3:

John Mulligan wrote in #note-2:

Just to be clear: there are no actual "crashes" - no cores or python tracebacks that you see, correct?
This issue is that when the balancer module is loaded, the orchestrator and dashboard modules fail to function, right?

Yes, correct - we have not observed any crash tracebacks. Orchestrator fails to function and dashboard brings back "503 Service Unavailable" message.
Apologies for the poor wording - changed it for the tracker.

Current theory is that the loading of the other modules is serialised behind the loading balancer module and current avenue of investigation IMHO should be into why the balancer is taking a long time to finish loading.

Actions

Copy link

Updated by Laimis Juzeliunas over 1 year ago

Brad Hubbard wrote in #note-6:

Laimis Juzeliunas wrote in #note-3:

John Mulligan wrote in #note-2:

Just to be clear: there are no actual "crashes" - no cores or python tracebacks that you see, correct?
This issue is that when the balancer module is loaded, the orchestrator and dashboard modules fail to function, right?

Yes, correct - we have not observed any crash tracebacks. Orchestrator fails to function and dashboard brings back "503 Service Unavailable" message.
Apologies for the poor wording - changed it for the tracker.

Current theory is that the loading of the other modules is serialised behind the loading balancer module and current avenue of investigation IMHO should be into why the balancer is taking a long time to finish loading.

Agree. Also important notice: we have observed some orchestrator commands actually completing after a long wait with the balancer turned on.
At this point looks a bit random if orch command will complete or bring back 'not enabled/loaded' message, I'll see if we can find some correlation.

Actions

Copy link

Updated by Laura Flores over 1 year ago

Assignee set to Laura Flores

Actions

Copy link

Updated by Laura Flores over 1 year ago · Edited

@laimis9133 do these observations about #68648 ring true in your case?

In our reproducer (described in #68648), this error message took up to 15-30 minutes (or longer) to hit on a freshly deployed test cluster.
Once hit, the error presented much more frequently.
The error only presents after failing the mgr, then immediately running a "ceph orch", "ceph progress", or "ceph crash" command.
Even when the error does present, waiting a few seconds for the module to finish loading after failing the mgr allows the commands to work as expected.

Let me know if any of the above details differ from your experience.

I mainly want to confirm if, rather than disabling the balancer, if waiting several seconds after failing the mgr allows `ceph orch ls` to work as expected. It looks like that is the case from your comment "we have observed some orchestrator commands actually completing after a long wait with the balancer turned on."

Actions

Copy link

#10

Updated by Laura Flores over 1 year ago

Status changed from New to In Progress
Pull request ID set to 60523

I worked on reproducing this, and I found that after increasing the number of pgs on a cluster, the error reproduced a lot more consistently. I also tested out turning off the balancer, and this decreased the number of times the error came up.

After looking into why, I found it is related to this commit (https://github.com/ceph/ceph/commit/f3c74a7be4bb7502dc63aaeba8b0bac683c27dc0) in which I had added a "ceph balancer status detail" command that includes more details about which pgs were updated during the last optimization. The current implementation pulls the osdmap and iterates through every pg_upmap_items entry at one point, and it's called during the "serve" function which is run while the balancer module is loading. This increased the balancer loading time, which led to subsequent modules not loading in time after the mgr failover since modules are loaded one at a time alphabetically.

I'm testing a fix, which optimizes how we update the pg upmap activity.

Actions

Copy link

#11

Updated by Laura Flores over 1 year ago

Upon hitting the error, I experimented with turning off the balancer, and while that greatly decreases the amount of times the error is hit, it still does come up occasionally. Below, I had the balancer turned off, and I still hit the error after many iterations:

2024-10-28T22:38:42.422+0000 7f3c5a5c9640 -1 WARNING: all dangerous and experimental features are enabled.
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:44.423+0000 7fb467d6f640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:44.432+0000 7fb467d6f640 -1 WARNING: all dangerous and experimental features are enabled.
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:44.916+0000 7fc365f61640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:44.925+0000 7fc365f61640 -1 WARNING: all dangerous and experimental features are enabled.
Telemetry is off. Please consider opting-in with `ceph telemetry on`.
Preview sample reports with `ceph telemetry preview`.
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:46.984+0000 7fa685361640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:46.993+0000 7fa685361640 -1 WARNING: all dangerous and experimental features are enabled.
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:47.465+0000 7f4a9b6d4640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:47.474+0000 7f4a9b6d4640 -1 WARNING: all dangerous and experimental features are enabled.
^CInterrupted
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:49.774+0000 7ff289334640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:49.782+0000 7ff289334640 -1 WARNING: all dangerous and experimental features are enabled.
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:50.276+0000 7fc8e9e43640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:50.284+0000 7fc8e9e43640 -1 WARNING: all dangerous and experimental features are enabled.
^CInterrupted
^C
[lflores@folio01 build]$ ./bin/ceph balancer status
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:56.624+0000 7f6b962ea640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:56.633+0000 7f6b962ea640 -1 WARNING: all dangerous and experimental features are enabled.
{
    "active": false,
    "last_optimize_duration": "",
    "last_optimize_started": "",
    "mode": "upmap-read",
    "no_optimization_needed": false,
    "optimize_result": "",
    "plans": []
}

This is why Brad's fix in #68648 is still a good idea to add.

Actions

Copy link Download all files

#12

Updated by Laimis Juzeliunas over 1 year ago

File mgr-prom-metrics.png mgr-prom-metrics.png added
File mgr-log-2 mgr-log-2 added

Laura Flores wrote in #note-9:

@laimis9133 do these observations about #68648 ring true in your case?

In our reproducer (described in #68648), this error message took up to 15-30 minutes (or longer) to hit on a freshly deployed test cluster.

Once hit, the error presented much more frequently.

The error only presents after failing the mgr, then immediately running a "ceph orch", "ceph progress", or "ceph crash" command.

Even when the error does present, waiting a few seconds for the module to finish loading after failing the mgr allows the commands to work as expected.

Let me know if any of the above details differ from your experience.

I mainly want to confirm if, rather than disabling the balancer, if waiting several seconds after failing the mgr allows `ceph orch ls` to work as expected. It looks like that is the case from your comment "we have observed some orchestrator commands actually completing after a long wait with the balancer turned on."

Thanks a lot for picking this up, Laura! The mentioned points are very similar to our experience but there are some differences in what we experience/observe (they might be related to a bit different setup we have).
We keep the cluster running at all times so what we observe with the balancer turned on is a mgr failover every 15-18mins. I'm attaching a visualisation from prometheus - it is visible that with balancer on cluster metrics are briefly exposed when the mgr starts after a failover for a very short period of time. The dashboard UI also becomes available for a very short duration until returning back to 503 once the balancer kicks in. The logs for such a mgr cycle are also attached.
One observation is that roughly 5 minutes into the mgr failover the active mgr throws this stacktrace:

debug 2024-10-26T07:05:29.041+0000 7f838fb8b640  0 [cephadm DEBUG root] Cherrypy engine started.
debug 2024-10-26T07:05:29.041+0000 7f838fb8b640  0 [cephadm DEBUG root] _kick_serve_loop
debug 2024-10-26T07:05:29.077+0000 7f8347932640  0 [cephadm ERROR cherrypy.error] [26/Oct/2024:07:05:29] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
  File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve
    self._connections.run(self.expiration_interval)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in run
    self._run(expiration_interval)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in _run
    new_conn = self._from_server_socket(self.server.socket)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in _from_server_socket
    s, ssl_env = self.server.ssl_adapter.wrap(s)
  File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in wrap
    s = self.context.wrap_socket(
  File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
    return self.sslsocket_class._create(
  File "/lib64/python3.9/ssl.py", line 1074, in _create
    self.do_handshake()
  File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:1133)
debug 2024-10-26T07:05:29.077+0000 7f8347932640 -1 log_channel(cephadm) log [ERR] : [26/Oct/2024:07:05:29] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
  File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve
    self._connections.run(self.expiration_interval)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in run
    self._run(expiration_interval)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in _run
    new_conn = self._from_server_socket(self.server.socket)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in _from_server_socket
    s, ssl_env = self.server.ssl_adapter.wrap(s)
  File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in wrap
    s = self.context.wrap_socket(
  File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
    return self.sslsocket_class._create(
  File "/lib64/python3.9/ssl.py", line 1074, in _create
    self.do_handshake()
  File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:1133)

Since these failovers happen frequently we cant say that orch commads begin to work eventually - some of them do bring back results after a long delay but this does not stay consistent.
I'll try to dig through the logs more if I can find anything for a more consistent answer to your questions. Hope this compliments the findings and the fix! Thanks again!

Actions

Copy link

#13

Updated by Laimis Juzeliunas over 1 year ago

File cephadm serve cephadm serve added

Laura Flores wrote in #note-10:

I worked on reproducing this, and I found that after increasing the number of pgs on a cluster, the error reproduced a lot more consistently. I also tested out turning off the balancer, and this decreased the number of times the error came up.

After looking into why, I found it is related to this commit (https://github.com/ceph/ceph/commit/f3c74a7be4bb7502dc63aaeba8b0bac683c27dc0) in which I had added a "ceph balancer status detail" command that includes more details about which pgs were updated during the last optimization. The current implementation pulls the osdmap and iterates through every pg_upmap_items entry at one point, and it's called during the "serve" function which is run while the balancer module is loading. This increased the balancer loading time, which led to subsequent modules not loading in time after the mgr failover since modules are loaded one at a time alphabetically.

I'm testing a fix, which optimizes how we update the pg upmap activity.

Attaching [cephadm DEBUG cephadm.serve] logs for 1 hour - they might be useful for investigation.

Actions

Copy link

#14

Updated by Tyler Stachecki over 1 year ago

I worked on reproducing this, and I found that after increasing the number of pgs on a cluster, the error reproduced a lot more consistently. I also tested out turning off the balancer, and this decreased the number of times the error came up.

After looking into why, I found it is related to this commit (https://github.com/ceph/ceph/commit/f3c74a7be4bb7502dc63aaeba8b0bac683c27dc0) in which I had added a "ceph balancer status detail" command that includes more details about which pgs were updated during the last optimization. The current implementation pulls the osdmap and iterates through every pg_upmap_items entry at one point...

Laura, could it be a time complexity problem?

Looking at the list comprehensions calculating the pgmap_upmap... definitions added in that commit as part of update_pg_upmap_activity, it seems like they are doing a lot of membership tests in self.last_pg_upmap_primaries. self.last_pg_upmap_primaries seems like it might be a List type per the module definition. If so, that would imply that there is a lot of O(n) lookups going on that becomes more apparent as N (#PGs) gets bumped.

Actions

Copy link

#15

Updated by Laura Flores over 1 year ago

Exactly, it's a time complexity problem. The fix I'm testing optimizes that, so we don't iterate through all the upmap entries.

Actions

Copy link

#16

Updated by Chris Palmer over 1 year ago

Would this issue also explain the following problem I am having?

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZL227CPAX32DHMYBD6ZUEFS7I265HPEO/

Actions

Copy link

#17

Updated by Laura Flores over 1 year ago

Status changed from In Progress to Fix Under Review

Actions

Copy link

#18

Updated by Laura Flores over 1 year ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

#19

Updated by Upkeep Bot over 1 year ago

Tags (freeform) set to backport_processed

Actions

Copy link

#20

Updated by Laura Flores over 1 year ago

Backport set to squid

Actions

Copy link

#21

Updated by Laura Flores over 1 year ago

Copied to Backport #68910: squid: squid: mgr/balancer preventing orchestrator and dashboard functionality added

Actions

Copy link

#22

Updated by Laura Flores over 1 year ago

Status changed from Pending Backport to Fix Under Review

Actions

Copy link

#23

Updated by Laura Flores over 1 year ago

Target version changed from v19.2.0 to v19.2.1

Actions

Copy link

#24

Updated by Laura Flores over 1 year ago

Status changed from Fix Under Review to Resolved

Actions

Copy link

#25

Updated by Laura Flores over 1 year ago

Related to Bug #69012: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart added

Actions

Copy link

#26

Updated by Laimis Juzeliunas about 1 year ago

Laura Flores wrote in #note-15:

Exactly, it's a time complexity problem. The fix I'm testing optimizes that, so we don't iterate through all the upmap entries.

Hi Laura,
Just wanted to ask if this fix will get backported to 19.2.0 once released?
We already have a few users hiting this so it would be fair to warn them of potential balancer issues in https://docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid the same way iSCSI users are warned.

Laimis J.
laimis.juzeliunas@oxylabs.io

Actions

Copy link

#27

Updated by Laura Flores about 1 year ago · Edited

Hi Laimis,

Laimis Juzeliunas wrote in #note-26:

Hi Laura,
Just wanted to ask if this fix will get backported to 19.2.0 once released?

The fix will be included in v19.2.1. Once a new point release is out, we don't redo point releases, so users may upgrade to v19.2.1 to get the fix.

We already have a few users hiting this so it would be fair to warn them of potential balancer issues in https://docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid the same way iSCSI users are warned.

Sure, we can include a note in the docs.

Actions

Copy link

#28

Updated by Laura Flores 9 months ago

Related to Bug #71631: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart added

Actions

Copy link

#29

Updated by Upkeep Bot 8 months ago

Merge Commit set to 5174b1f180323827daaf0422d2da51643326ea35
Fixed In set to v19.3.0-6142-g5174b1f1803
Upkeep Timestamp set to 2025-07-10T09:11:14+00:00

Actions

Copy link

#30

Updated by Upkeep Bot 8 months ago

Fixed In changed from v19.3.0-6142-g5174b1f1803 to v19.3.0-6142-g5174b1f180
Upkeep Timestamp changed from 2025-07-10T09:11:14+00:00 to 2025-07-14T20:09:51+00:00

Actions

Copy link

#31

Updated by Upkeep Bot 5 months ago

Released In set to v20.2.0~1604
Upkeep Timestamp changed from 2025-07-14T20:09:51+00:00 to 2025-11-01T01:30:14+00:00

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » mgr

Tags

Custom queries

Bug #68657

squid: mgr/balancer preventing orchestrator and dashboard functionality

Updated by John Mulligan over 1 year ago

Updated by John Mulligan over 1 year ago

Updated by Laimis Juzeliunas over 1 year ago

Updated by Brad Hubbard over 1 year ago

Updated by Brad Hubbard over 1 year ago

Updated by Brad Hubbard over 1 year ago

Updated by Laimis Juzeliunas over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago · Edited

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laimis Juzeliunas over 1 year ago

Updated by Laimis Juzeliunas over 1 year ago

Updated by Tyler Stachecki over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Chris Palmer over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Upkeep Bot over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laimis Juzeliunas about 1 year ago

Updated by Laura Flores about 1 year ago · Edited

Updated by Laura Flores 9 months ago

Updated by Upkeep Bot 8 months ago

Updated by Upkeep Bot 8 months ago

Updated by Upkeep Bot 5 months ago