Bug #68657
closedsquid: mgr/balancer preventing orchestrator and dashboard functionality
0%
Description
Hi Team,
We have been facing poor mgr/balancer (upmap mode) behaviour after upgrading from Reef 18.4.2 to Squid 19.2.0 - when turned on it instantly crashes cephadm orchestrator (all ceph orch commands hang) and dashboard (freezing and ultimately 503 error when loading). We run docker daemons (Docker version 25.0.1, build 29cf629) on bare metal Debian (5.10.0-27-amd64 #1 SMP Debian 5.10.205-2x86_64 GNU/Linux). This issue was encountered when upgrading as mgr daemons are the first in line - Ceph UI dashboard crashed right after mgr pulled and started using the 19.2.0 images. We had to turn off the balancer for the upgrade to complete and are keeping it off most of the time now. The balancer itself seems to work in the background resulting in some backfills when turned on.
Detailed log trace below.
Our cluster Health status (issue happens with HEALTH_OK as well):
root@ceph-node001 ~ # ceph -s
cluster:
id: 96df99f6-fc1a-11ea-90a4-6cb3113cb732
health: HEALTH_WARN
1 backfillfull osd(s)
1 pgs not deep-scrubbed in time
14 pool(s) backfillfull
services:
mon: 5 daemons, quorum ceph-node004,ceph-node003,ceph-node001,ceph-node005,ceph-node002 (age 6h)
mgr: ceph-node001.hgythj(active, since 53m), standbys: ceph-node002.jphtvg
mds: 20/20 daemons up, 12 standby
osd: 384 osds: 384 up (since 6h), 384 in (since 10d); 9 remapped pgs
rbd-mirror: 2 daemons active (2 hosts)
rgw: 64 daemons active (32 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 14 pools, 8681 pgs
objects: 755.16M objects, 1.5 PiB
usage: 4.6 PiB used, 1.1 PiB / 5.6 PiB avail
pgs: 644454/2265477489 objects misplaced (0.028%)
7545 active+clean
583 active+clean+scrubbing
544 active+clean+scrubbing+deep
9 active+remapped+backfilling
io:
client: 4.8 GiB/s rd, 32 MiB/s wr, 34.33k op/s rd, 919 op/s wr
recovery: 111 MiB/s, 51 objects/s
Balancer configs are set as below:
global advanced mgr/balancer/active false global advanced mgr/balancer/log_level debug global advanced mgr/balancer/log_to_cluster true mgr advanced mgr/balancer/log_to_cluster_level debug mgr advanced mgr/balancer/mode upmap global basic mgr/balancer/upmap_max_optimizations 20
To keep track of the log entries here are the timestamps:
root@ceph-node001 ~ # date
Mon 21 Oct 2024 10:47:33 AM UTC
root@ceph-node001 ~ # ceph balancer on
root@ceph-node001 ~ # date
Mon 21 Oct 2024 10:55:12 AM UTC
root@ceph-node001 ~ # ceph balancer off
While the balancer is on ceph orch commands either hang or prompt back the following messages:
root@ceph-node001 ~ # ceph orch ls
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'orchestrator' is not enabled/loaded (required by command 'orch ls'): use `ceph mgr module enable orchestrator` to enable it
root@ceph-node001 ~ # ceph mgr module enable orchestrator
module 'orchestrator' is already enabled (always-on)
root@ceph-node001 ~ # ceph orch ls
Error ENOTSUP: Module 'orchestrator' is not enabled/loaded (required by command 'orch ls'): use `ceph mgr module enable orchestrator` to enable it
Active ceph-node001.hgythj mgr log:
debug 2024-10-21T10:47:39.667+0000 7f38cdbc7640 0 log_channel(audit) log [DBG] : from='client.22145222 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
debug 2024-10-21T10:47:40.499+0000 7f38ccbc5640 0 log_channel(cluster) log [DBG] : pgmap v1618: 8681 pgs: 9 active+remapped+backfilling, 582 active+clean+scrubbing, 7545 active+clean, 545 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.5 GiB/s rd, 165 MiB/s wr, 39.61k op/s; 641560/2265483588 objects misplaced (0.028%); 120 MiB/s, 50 objects/s recovering
[21/Oct/2024:10:47:40] ENGINE Bus STOPPING
debug 2024-10-21T10:47:40.883+0000 7f38d6f59640 0 [balancer DEBUG root] Waking up [active, now 2024-10-21_10:47:40]
debug 2024-10-21T10:47:40.883+0000 7f38d6f59640 0 [balancer DEBUG root] Running
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option ssh_config_file = None
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option device_cache_timeout = 1800
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option device_enhanced_scan = False
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option inventory_list_all = False
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option daemon_cache_timeout = 600
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option facts_cache_timeout = 60
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option host_check_interval = 600
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option mode = root
debug 2024-10-21T10:47:40.887+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_base = quay.io/ceph/ceph
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_prometheus = quay.io/prometheus/prometheus:v2.43.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_nvmeof = quay.io/ceph/nvmeof:1.2.5
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_grafana = quay.io/ceph/grafana:9.4.12
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_alertmanager = quay.io/prometheus/alertmanager:v0.25.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_node_exporter = quay.io/prometheus/node-exporter:v1.5.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_loki = docker.io/grafana/loki:3.0.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_promtail = docker.io/grafana/promtail:3.0.0
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_haproxy = quay.io/ceph/haproxy:2.3
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_keepalived = quay.io/ceph/keepalived:2.2.4
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_snmp_gateway = docker.io/maxwo/snmp-notifier:v1.2.1
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_elasticsearch = quay.io/omrizeneva/elasticsearch:6.8.23
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_jaeger_agent = quay.io/jaegertracing/jaeger-agent:1.29
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_jaeger_collector = quay.io/jaegertracing/jaeger-collector:1.29
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_jaeger_query = quay.io/jaegertracing/jaeger-query:1.29
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_image_samba = quay.io/samba.org/samba-server:devbuilds-centos-amd64
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option warn_on_stray_hosts = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option warn_on_stray_daemons = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option warn_on_failed_host_check = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option log_to_cluster = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option allow_ptrace = False
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option container_init = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option prometheus_alerts_path = /etc/prometheus/ceph/ceph_default_alerts.yml
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option grafana_dashboards_path = /etc/grafana/dashboards/ceph-dashboard/
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option migration_current = 7
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option config_dashboard = True
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option manage_etc_ceph_ceph_conf = False
debug 2024-10-21T10:47:40.891+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option manage_etc_ceph_ceph_conf_hosts = *
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option registry_url = None
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option registry_username = None
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option registry_password = None
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option registry_insecure = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option use_repo_digest = True
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option config_checks_enabled = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option default_registry = docker.io
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option max_count_per_host = 10
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option autotune_memory_target_ratio = 0.7
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option autotune_interval = 600
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option use_agent = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option agent_refresh_rate = 20
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option agent_starting_port = 4721
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option agent_down_multiplier = 3.0
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option hw_monitoring = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option max_osd_draining_count = 10
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option service_discovery_port = 8765
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option cgroups_split = True
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option log_refresh_metadata = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option secure_monitoring_stack = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option default_cephadm_command_timeout = 900
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option cephadm_log_destination =
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option oob_default_addr = 169.254.1.1
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option log_level = debug
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option log_to_file = False
debug 2024-10-21T10:47:40.895+0000 7f38ca3c0640 0 [cephadm DEBUG root] mgr option log_to_cluster_level = info
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640 0 [balancer INFO root] Optimize plan auto_2024-10-21_10:47:40
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640 0 [balancer INFO root] Mode upmap, max misplaced 0.050000
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640 0 [balancer DEBUG root] unknown 0.000000 degraded 0.000000 inactive 0.000000 misplaced 0.000283189
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640 0 [balancer INFO root] do_upmap
debug 2024-10-21T10:47:41.103+0000 7f38d6f59640 0 [balancer INFO root] pools ['cephfs.storage.data', 'my-k8s-cluster4.ceph-csi.rbd', 'europe-1.rgw.buckets.index', 'my-k8s-cluster1.ceph-csi.rbd', 'europe-1.rgw.log', 'my-k8s-cluster2.ceph-csi.rbd', 'my-k8s-cluster3.ceph-csi.rbd', '.rgw.root', 'europe-1.rgw.meta', 'europe-1.rgw.buckets.data', 'europe-1.rgw.buckets.non-ec', 'cephfs.storage.meta', '.mgr', 'europe-1.rgw.control']
debug 2024-10-21T10:47:41.183+0000 7f38d6f59640 0 [balancer INFO root] prepared 0/10 upmap changes
After this entry the log is mostly entries with pgmap information. You can see the orch ls commands in between:
...
debug 2024-10-21T10:47:58.815+0000 7f38ccbc5640 0 log_channel(cluster) log [DBG] : pgmap v1627: 8681 pgs: 9 active+remapped+backfilling, 582 active+clean+scrubbing, 7545 active+clean, 545 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.8 GiB/s rd, 150 MiB/s wr, 17.11k op/s; 640643/2265486588 objects misplaced (0.028%); 113 MiB/s, 52 objects/s recovering
debug 2024-10-21T10:47:59.595+0000 7f38cdbc7640 0 log_channel(audit) log [DBG] : from='client.22116924 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
debug 2024-10-21T10:48:00.847+0000 7f38ccbc5640 0 log_channel(cluster) log [DBG] : pgmap v1628: 8681 pgs: 9 active+remapped+backfilling, 582 active+clean+scrubbing, 7545 active+clean, 545 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.4 GiB/s rd, 128 MiB/s wr, 15.22k op/s; 640534/2265486750 objects misplaced (0.028%); 112 MiB/s, 50 objects/s recovering
...
debug 2024-10-21T10:51:23.764+0000 7f38cdbc7640 0 log_channel(audit) log [DBG] : from='client.22114830 -' entity='client.admin' cmd=[{"prefix": "orch ls", "target": ["mon-mgr", ""]}]: dispatch
debug 2024-10-21T10:51:24.256+0000 7f38ccbc5640 0 log_channel(cluster) log [DBG] : pgmap v1728: 8681 pgs: 9 active+remapped+backfilling, 583 active+clean+scrubbing, 7545 active+clean, 544 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.7 GiB/s rd, 216 MiB/s wr, 6.05k op/s; 629911/2265503763 objects misplaced (0.028%); 115 MiB/s, 53 objects/s recovering
...
debug 2024-10-21T10:53:36.474+0000 7f3976376640 0 client.0 ms_handle_reset on v2:162.55.93.25:6850/669498911
debug 2024-10-21T10:53:38.410+0000 7f38ccbc5640 0 log_channel(cluster) log [DBG] : pgmap v1794: 8681 pgs: 9 active+remapped+backfilling, 583 active+clean+scrubbing, 7545 active+clean, 544 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.9 GiB/s rd, 34 MiB/s wr, 3.86k op/s; 622688/2265511929 objects misplaced (0.027%); 119 MiB/s, 56 objects/s recovering
debug 2024-10-21T10:53:40.430+0000 7f38ccbc5640 0 log_channel(cluster) log [DBG] : pgmap v1795: 8681 pgs: 9 active+remapped+backfilling, 583 active+clean+scrubbing, 7545 active+clean, 544 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 4.4 GiB/s rd, 32 MiB/s wr, 3.20k op/s; 622688/2265511950 objects misplaced (0.027%); 99 MiB/s, 44 objects/s recovering
debug 2024-10-21T10:53:40.682+0000 7f38cdbc7640 0 log_channel(audit) log [DBG] : from='client.22145247 -' entity='client.admin' cmd=[{"prefix": "orch ls", "target": ["mon-mgr", ""]}]: dispatch
...
After a while the balancer kicks in again:
debug 2024-10-21T10:53:54.478+0000 7f38d6f59640 0 [balancer DEBUG root] Sleeping for 60
debug 2024-10-21T10:53:54.482+0000 7f38c0bad640 0 [orchestrator DEBUG root] _oremote orchestrator -> cephadm.describe_service(*(None, None), **{'refresh': False})
debug 2024-10-21T10:53:54.526+0000 7f38d6f59640 0 [balancer DEBUG root] Waking up [active, now 2024-10-21_10:53:54]
debug 2024-10-21T10:53:54.526+0000 7f38d6f59640 0 [balancer DEBUG root] Running
debug 2024-10-21T10:53:54.658+0000 7f38ccbc5640 0 log_channel(cluster) log [DBG] : pgmap v1802: 8681 pgs: 9 active+remapped+backfilling, 583 active+clean+scrubbing, 7545 active+clean, 544 active+clean+scrubbing+deep; 1.5 PiB data, 4.6 PiB used, 1.1 PiB / 5.6 PiB avail; 3.6 GiB/s rd, 23 MiB/s wr, 2.77k op/s; 621874/2265512487 objects misplaced (0.027%); 95 MiB/s, 45 objects/s recovering
debug 2024-10-21T10:53:54.714+0000 7f38ade88640 0 log_channel(stats) log [WRN] : cmdtag not found in client metadata
debug 2024-10-21T10:53:55.090+0000 7f38d6f59640 0 [balancer INFO root] Optimize plan auto_2024-10-21_10:53:54
debug 2024-10-21T10:53:55.090+0000 7f38d6f59640 0 [balancer INFO root] Mode upmap, max misplaced 0.050000
debug 2024-10-21T10:53:55.090+0000 7f38d6f59640 0 [balancer DEBUG root] unknown 0.000000 degraded 0.000000 inactive 0.000000 misplaced 0.000274496
debug 2024-10-21T10:53:55.090+0000 7f38d6f59640 0 [balancer INFO root] do_upmap
debug 2024-10-21T10:53:55.094+0000 7f38d6f59640 0 [balancer INFO root] pools ['europe-1.rgw.meta', 'europe-1.rgw.buckets.non-ec', 'my-k8s-cluster1.ceph-csi.rbd', 'cephfs.storage.data', 'my-k8s-cluster2.ceph-csi.rbd', 'europe-1.rgw.log', 'europe-1.rgw.control', 'europe-1.rgw.buckets.data', 'my-k8s-cluster3.ceph-csi.rbd', '.mgr', 'my-k8s-cluster4.ceph-csi.rbd', '.rgw.root', 'cephfs.storage.meta', 'europe-1.rgw.buckets.index']
debug 2024-10-21T10:53:56.402+0000 7f38c0bad640 0 [orchestrator DEBUG root] _oremote orchestrator -> cephadm.describe_service(*(None, None), **{'refresh': False})
debug 2024-10-21T10:53:56.466+0000 7f38d6f59640 0 [balancer INFO root] prepared 0/10 upmap changes
This repeats in a infite loop every 5 or so minutes. To get orchestrator and dashboard working again we need to turn off the balancer and restart/failover the mgr daemons.
The monitor logs dont seem to carry any useful information apart from different entries regarding balancer being turned on (not sure if the difference between monitor daemons is expected).
Other than that they seem to be performing the usual tasks with a lot of "entity='mgr.ceph-node002.jphtvg' cmd=[{"prefix": "balancer status", "format": "json"}]: dispatch" entries
root@ceph-node001 ~ # grep mgr/balancer/active mon-log
debug 2024-10-21T10:47:39.703+0000 7f31d6d07640 0 mon.ceph-node001@2(peon) e253 handle_command mon_command([{prefix=config set, name=mgr/balancer/active}] v 0)
root@ceph-node001 ~ # grep "balancer on" mon-log
root@ceph-node001 ~ #
root@ceph-node002 ~ # grep mgr/balancer/active mon-log
root@ceph-node002 ~ # grep "balancer on" mon-log
root@ceph-node002 ~ #
root@ceph-node003 ~ # grep mgr/balancer/active mon-log
root@ceph-node003 ~ # grep "balancer on" mon-log
audit 2024-10-21T10:47:39.670403+0000 mgr.ceph-node001.hgythj (mgr.22123253) 2832 : audit [DBG] from='client.22145222 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
root@ceph-node003 ~ #
root@ceph-node004 ~ # grep mgr/balancer/active mon-log
debug 2024-10-21T10:47:39.953+0000 7fc9766eb640 0 mon.ceph-node004@0(leader) e253 handle_command mon_command([{prefix=config set, name=mgr/balancer/active}] v 0)
root@ceph-node004 ~ # grep "balancer on" mon-log
audit 2024-10-21T10:47:39.670403+0000 mgr.ceph-node001.hgythj (mgr.22123253) 2832 : audit [DBG] from='client.22145222 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
root@ceph-node004 ~ #
root@ceph-node005 ~ # grep mgr/balancer/active mon-log
root@ceph-node005 ~ # grep "balancer on" mon-log
audit 2024-10-21T10:47:39.670403+0000 mgr.ceph-node001.hgythj (mgr.22123253) 2832 : audit [DBG] from='client.22145222 -' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mon-mgr", ""]}]: dispatch
root@ceph-node005 ~ #
If anyone could point out where to look further with this issue we would be beyond grateful as we're currently stuck with a non working balancer.
I'm attaching full logs for the managers and monitors (though I can't tell at the spot which monitor was at which rank)>
Thanks in advance,
Laimis J.
Files
Updated by John Mulligan over 1 year ago
- Tracker changed from Support to Bug
- Project changed from Ceph to mgr
- Regression set to No
- Severity set to 3 - minor
Updated by John Mulligan over 1 year ago
Just to be clear: there are no actual "crashes" - no cores or python tracebacks that you see, correct?
This issue is that when the balancer module is loaded, the orchestrator and dashboard modules fail to function, right?
Updated by Laimis Juzeliunas over 1 year ago
- Subject changed from squid: mgr/balancer crashing orchestrator and dashboard to squid: mgr/balancer preventing orchestrator and dashboard functionality
John Mulligan wrote in #note-2:
Just to be clear: there are no actual "crashes" - no cores or python tracebacks that you see, correct?
This issue is that when the balancer module is loaded, the orchestrator and dashboard modules fail to function, right?
Yes, correct - we have not observed any crash tracebacks. Orchestrator fails to function and dashboard brings back "503 Service Unavailable" message.
Apologies for the poor wording - changed it for the tracker.
Updated by Brad Hubbard over 1 year ago
- Related to Bug #68648: Commands using Mgr Modules fail with ENOTSUP if run immediately post a mgr failover/ restart added
Updated by Brad Hubbard over 1 year ago
It's my belief this should be returning EAGAIN, not ENOTSUP and I have a fix for that to resolve Bug #68648
Module 'orchestrator' is not enabled/loaded (required by command 'orch ls'): use `ceph mgr module enable orchestrator` to enable it
Updated by Brad Hubbard over 1 year ago
Laimis Juzeliunas wrote in #note-3:
John Mulligan wrote in #note-2:
Just to be clear: there are no actual "crashes" - no cores or python tracebacks that you see, correct?
This issue is that when the balancer module is loaded, the orchestrator and dashboard modules fail to function, right?Yes, correct - we have not observed any crash tracebacks. Orchestrator fails to function and dashboard brings back "503 Service Unavailable" message.
Apologies for the poor wording - changed it for the tracker.
Current theory is that the loading of the other modules is serialised behind the loading balancer module and current avenue of investigation IMHO should be into why the balancer is taking a long time to finish loading.
Updated by Laimis Juzeliunas over 1 year ago
Brad Hubbard wrote in #note-6:
Laimis Juzeliunas wrote in #note-3:
John Mulligan wrote in #note-2:
Just to be clear: there are no actual "crashes" - no cores or python tracebacks that you see, correct?
This issue is that when the balancer module is loaded, the orchestrator and dashboard modules fail to function, right?Yes, correct - we have not observed any crash tracebacks. Orchestrator fails to function and dashboard brings back "503 Service Unavailable" message.
Apologies for the poor wording - changed it for the tracker.Current theory is that the loading of the other modules is serialised behind the loading balancer module and current avenue of investigation IMHO should be into why the balancer is taking a long time to finish loading.
Agree. Also important notice: we have observed some orchestrator commands actually completing after a long wait with the balancer turned on.
At this point looks a bit random if orch command will complete or bring back 'not enabled/loaded' message, I'll see if we can find some correlation.
Updated by Laura Flores over 1 year ago · Edited
@laimis9133 do these observations about #68648 ring true in your case?
- In our reproducer (described in #68648), this error message took up to 15-30 minutes (or longer) to hit on a freshly deployed test cluster.
- Once hit, the error presented much more frequently.
- The error only presents after failing the mgr, then immediately running a "ceph orch", "ceph progress", or "ceph crash" command.
- Even when the error does present, waiting a few seconds for the module to finish loading after failing the mgr allows the commands to work as expected.
Let me know if any of the above details differ from your experience.
I mainly want to confirm if, rather than disabling the balancer, if waiting several seconds after failing the mgr allows `ceph orch ls` to work as expected. It looks like that is the case from your comment "we have observed some orchestrator commands actually completing after a long wait with the balancer turned on."
Updated by Laura Flores over 1 year ago
- Status changed from New to In Progress
- Pull request ID set to 60523
I worked on reproducing this, and I found that after increasing the number of pgs on a cluster, the error reproduced a lot more consistently. I also tested out turning off the balancer, and this decreased the number of times the error came up.
After looking into why, I found it is related to this commit (https://github.com/ceph/ceph/commit/f3c74a7be4bb7502dc63aaeba8b0bac683c27dc0) in which I had added a "ceph balancer status detail" command that includes more details about which pgs were updated during the last optimization. The current implementation pulls the osdmap and iterates through every pg_upmap_items entry at one point, and it's called during the "serve" function which is run while the balancer module is loading. This increased the balancer loading time, which led to subsequent modules not loading in time after the mgr failover since modules are loaded one at a time alphabetically.
I'm testing a fix, which optimizes how we update the pg upmap activity.
Updated by Laura Flores over 1 year ago
Upon hitting the error, I experimented with turning off the balancer, and while that greatly decreases the amount of times the error is hit, it still does come up occasionally. Below, I had the balancer turned off, and I still hit the error after many iterations:
2024-10-28T22:38:42.422+0000 7f3c5a5c9640 -1 WARNING: all dangerous and experimental features are enabled.
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:44.423+0000 7fb467d6f640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:44.432+0000 7fb467d6f640 -1 WARNING: all dangerous and experimental features are enabled.
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:44.916+0000 7fc365f61640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:44.925+0000 7fc365f61640 -1 WARNING: all dangerous and experimental features are enabled.
Telemetry is off. Please consider opting-in with `ceph telemetry on`.
Preview sample reports with `ceph telemetry preview`.
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:46.984+0000 7fa685361640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:46.993+0000 7fa685361640 -1 WARNING: all dangerous and experimental features are enabled.
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:47.465+0000 7f4a9b6d4640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:47.474+0000 7f4a9b6d4640 -1 WARNING: all dangerous and experimental features are enabled.
^CInterrupted
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:49.774+0000 7ff289334640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:49.782+0000 7ff289334640 -1 WARNING: all dangerous and experimental features are enabled.
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:50.276+0000 7fc8e9e43640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:50.284+0000 7fc8e9e43640 -1 WARNING: all dangerous and experimental features are enabled.
^CInterrupted
^C
[lflores@folio01 build]$ ./bin/ceph balancer status
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-10-28T22:38:56.624+0000 7f6b962ea640 -1 WARNING: all dangerous and experimental features are enabled.
2024-10-28T22:38:56.633+0000 7f6b962ea640 -1 WARNING: all dangerous and experimental features are enabled.
{
"active": false,
"last_optimize_duration": "",
"last_optimize_started": "",
"mode": "upmap-read",
"no_optimization_needed": false,
"optimize_result": "",
"plans": []
}
This is why Brad's fix in #68648 is still a good idea to add.
Updated by Laimis Juzeliunas over 1 year ago
- File mgr-prom-metrics.png mgr-prom-metrics.png added
- File mgr-log-2 mgr-log-2 added
Laura Flores wrote in #note-9:
@laimis9133 do these observations about #68648 ring true in your case?
- In our reproducer (described in #68648), this error message took up to 15-30 minutes (or longer) to hit on a freshly deployed test cluster.
- Once hit, the error presented much more frequently.
- The error only presents after failing the mgr, then immediately running a "ceph orch", "ceph progress", or "ceph crash" command.
- Even when the error does present, waiting a few seconds for the module to finish loading after failing the mgr allows the commands to work as expected.
Let me know if any of the above details differ from your experience.
I mainly want to confirm if, rather than disabling the balancer, if waiting several seconds after failing the mgr allows `ceph orch ls` to work as expected. It looks like that is the case from your comment "we have observed some orchestrator commands actually completing after a long wait with the balancer turned on."
Thanks a lot for picking this up, Laura! The mentioned points are very similar to our experience but there are some differences in what we experience/observe (they might be related to a bit different setup we have).
We keep the cluster running at all times so what we observe with the balancer turned on is a mgr failover every 15-18mins. I'm attaching a visualisation from prometheus - it is visible that with balancer on cluster metrics are briefly exposed when the mgr starts after a failover for a very short period of time. The dashboard UI also becomes available for a very short duration until returning back to 503 once the balancer kicks in. The logs for such a mgr cycle are also attached.
One observation is that roughly 5 minutes into the mgr failover the active mgr throws this stacktrace:
debug 2024-10-26T07:05:29.041+0000 7f838fb8b640 0 [cephadm DEBUG root] Cherrypy engine started.
debug 2024-10-26T07:05:29.041+0000 7f838fb8b640 0 [cephadm DEBUG root] _kick_serve_loop
debug 2024-10-26T07:05:29.077+0000 7f8347932640 0 [cephadm ERROR cherrypy.error] [26/Oct/2024:07:05:29] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve
self._connections.run(self.expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in run
self._run(expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in _run
new_conn = self._from_server_socket(self.server.socket)
File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in _from_server_socket
s, ssl_env = self.server.ssl_adapter.wrap(s)
File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in wrap
s = self.context.wrap_socket(
File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
return self.sslsocket_class._create(
File "/lib64/python3.9/ssl.py", line 1074, in _create
self.do_handshake()
File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:1133)
debug 2024-10-26T07:05:29.077+0000 7f8347932640 -1 log_channel(cephadm) log [ERR] : [26/Oct/2024:07:05:29] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve
self._connections.run(self.expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in run
self._run(expiration_interval)
File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in _run
new_conn = self._from_server_socket(self.server.socket)
File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in _from_server_socket
s, ssl_env = self.server.ssl_adapter.wrap(s)
File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in wrap
s = self.context.wrap_socket(
File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
return self.sslsocket_class._create(
File "/lib64/python3.9/ssl.py", line 1074, in _create
self.do_handshake()
File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:1133)
Since these failovers happen frequently we cant say that orch commads begin to work eventually - some of them do bring back results after a long delay but this does not stay consistent.
I'll try to dig through the logs more if I can find anything for a more consistent answer to your questions. Hope this compliments the findings and the fix! Thanks again!
Updated by Laimis Juzeliunas over 1 year ago
- File cephadm serve cephadm serve added
Laura Flores wrote in #note-10:
I worked on reproducing this, and I found that after increasing the number of pgs on a cluster, the error reproduced a lot more consistently. I also tested out turning off the balancer, and this decreased the number of times the error came up.
After looking into why, I found it is related to this commit (https://github.com/ceph/ceph/commit/f3c74a7be4bb7502dc63aaeba8b0bac683c27dc0) in which I had added a "ceph balancer status detail" command that includes more details about which pgs were updated during the last optimization. The current implementation pulls the osdmap and iterates through every pg_upmap_items entry at one point, and it's called during the "serve" function which is run while the balancer module is loading. This increased the balancer loading time, which led to subsequent modules not loading in time after the mgr failover since modules are loaded one at a time alphabetically.
I'm testing a fix, which optimizes how we update the pg upmap activity.
Attaching [cephadm DEBUG cephadm.serve] logs for 1 hour - they might be useful for investigation.
Updated by Tyler Stachecki over 1 year ago
I worked on reproducing this, and I found that after increasing the number of pgs on a cluster, the error reproduced a lot more consistently. I also tested out turning off the balancer, and this decreased the number of times the error came up.
After looking into why, I found it is related to this commit (https://github.com/ceph/ceph/commit/f3c74a7be4bb7502dc63aaeba8b0bac683c27dc0) in which I had added a "ceph balancer status detail" command that includes more details about which pgs were updated during the last optimization. The current implementation pulls the osdmap and iterates through every pg_upmap_items entry at one point...
Laura, could it be a time complexity problem?
Looking at the list comprehensions calculating the pgmap_upmap... definitions added in that commit as part of update_pg_upmap_activity, it seems like they are doing a lot of membership tests in self.last_pg_upmap_primaries. self.last_pg_upmap_primaries seems like it might be a List type per the module definition. If so, that would imply that there is a lot of O(n) lookups going on that becomes more apparent as N (#PGs) gets bumped.
Updated by Laura Flores over 1 year ago
Exactly, it's a time complexity problem. The fix I'm testing optimizes that, so we don't iterate through all the upmap entries.
Updated by Chris Palmer over 1 year ago
Would this issue also explain the following problem I am having?
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZL227CPAX32DHMYBD6ZUEFS7I265HPEO/
Updated by Laura Flores over 1 year ago
- Status changed from In Progress to Fix Under Review
Updated by Laura Flores over 1 year ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Laura Flores over 1 year ago
- Copied to Backport #68910: squid: squid: mgr/balancer preventing orchestrator and dashboard functionality added
Updated by Laura Flores over 1 year ago
- Status changed from Pending Backport to Fix Under Review
Updated by Laura Flores over 1 year ago
- Target version changed from v19.2.0 to v19.2.1
Updated by Laura Flores over 1 year ago
- Status changed from Fix Under Review to Resolved
Updated by Laura Flores over 1 year ago
- Related to Bug #69012: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart added
Updated by Laimis Juzeliunas about 1 year ago
Laura Flores wrote in #note-15:
Exactly, it's a time complexity problem. The fix I'm testing optimizes that, so we don't iterate through all the upmap entries.
Hi Laura,
Just wanted to ask if this fix will get backported to 19.2.0 once released?
We already have a few users hiting this so it would be fair to warn them of potential balancer issues in https://docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid the same way iSCSI users are warned.
Laimis J.
laimis.juzeliunas@oxylabs.io
Updated by Laura Flores about 1 year ago · Edited
Hi Laimis,
Laimis Juzeliunas wrote in #note-26:
Hi Laura,
Just wanted to ask if this fix will get backported to 19.2.0 once released?
The fix will be included in v19.2.1. Once a new point release is out, we don't redo point releases, so users may upgrade to v19.2.1 to get the fix.
We already have a few users hiting this so it would be fair to warn them of potential balancer issues in https://docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid the same way iSCSI users are warned.
Sure, we can include a note in the docs.
Updated by Laura Flores 9 months ago
- Related to Bug #71631: Commands using Mgr Modules fail if run immediately post a mgr failover/ restart added
Updated by Upkeep Bot 8 months ago
- Merge Commit set to 5174b1f180323827daaf0422d2da51643326ea35
- Fixed In set to v19.3.0-6142-g5174b1f1803
- Upkeep Timestamp set to 2025-07-10T09:11:14+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-6142-g5174b1f1803 to v19.3.0-6142-g5174b1f180
- Upkeep Timestamp changed from 2025-07-10T09:11:14+00:00 to 2025-07-14T20:09:51+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~1604
- Upkeep Timestamp changed from 2025-07-14T20:09:51+00:00 to 2025-11-01T01:30:14+00:00