mgr/cephadm: fixing prometheus port handling#45241
Conversation
|
seems to work in testing |
f32b13e to
e10a788
Compare
|
@adk3798 when I change the port everything works correctly. How ever if after that I disable/enable the cephadm mgr module, I see the following error which I'm note if it's something that has to do with cephadm or the promethus mgr module itself. |
e10a788 to
4542a5c
Compare
4542a5c to
395b298
Compare
|
jenkins retest this please |
1 similar comment
|
jenkins retest this please |
395b298 to
9dbe8d4
Compare
|
jenkins retest this please |
9dbe8d4 to
3328d54
Compare
|
jenkins retest this please |
p-se
left a comment
There was a problem hiding this comment.
This code works well, with the exception of the result of ceph mgr services not being updated after the port has changed. This may lead to issues with config generation, as the result used in cephadm for retrieval of active services.
➜ build git:(rkachach-fix_issue_51072) ✗ ceph config set mgr mgr/prometheus/server_port 8888
➜ build git:(rkachach-fix_issue_51072) ✗ curl -fSsl http://home:8888/metrics | wc -l
2561
➜ build git:(rkachach-fix_issue_51072) ✗ ceph mgr services
{
"dashboard": "https://192.168.1.2:41481/",
"prometheus": "http://home:7777/"
}I'm not sure if the result of ceph mgr services can be updated without a restart after the port has changed. Should that not work, we'd need to ensure the mgr_map returned reflects the actual configuration, which might make it necessary to not dynamically restart cherrypy on change of a port (in notify_config). At least for this setting. When the mgr is restarted (like in ceph mgr module disable ... && ceph mgr module enable ...), then the change is reflected properly.
3328d54 to
b09012a
Compare
@p-se Thanks for reviewing this. I fixed the issue (basically a call to: On mgr active node (ceph-node-0): From testing host: Force a failover:
On the new active node (ceph-node-1): From testing host: |
|
jenkins test api |
Fixes: https://tracker.ceph.com/issues/51072 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
b09012a to
8eb1397
Compare
|
2 Failures caused by wrong error code from host add command due to another PR included in the run |
Fixes: https://tracker.ceph.com/issues/51072
Signed-off-by: Redouane Kachach rkachach@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox