Skip to content

mon/LogMonitor: Use generic cluster log level config#47502

Merged
yuriw merged 1 commit intoceph:mainfrom
pdvian:wip-clog-level
Apr 17, 2024
Merged

mon/LogMonitor: Use generic cluster log level config#47502
yuriw merged 1 commit intoceph:mainfrom
pdvian:wip-clog-level

Conversation

@pdvian
Copy link
Copy Markdown
Contributor

@pdvian pdvian commented Aug 8, 2022

We do not control the verbosity of the LogEntry
which is getting logged to stderr, graylog and
journald. This causes excessive flooding of logs
to /var/log, making a filesystem to fill up quickly.
Also we have different config variables namely
mon_cluster_log_file_level and mon_cluster_log_to_syslog_level
to control verbosity at cluster log file and
syslog level respectively. Add a generic cluster log
level config variable which controls cluster log
verbosity for all external entities.

Fixes: https://tracker.ceph.com/issues/57061

Signed-off-by: Prashant D pdhange@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@pdvian
Copy link
Copy Markdown
Contributor Author

pdvian commented Aug 10, 2022

jenkins test windows

@neha-ojha neha-ojha requested a review from sseshasa August 16, 2022 08:03
Copy link
Copy Markdown
Contributor

@sseshasa sseshasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I am just wondering if a standalone/teuthology test can be added to verify the logging levels. A standalone test appears to be the easiest. If it's time consuming or too much effort, it can be added in a separate PR.

Copy link
Copy Markdown
Contributor

@badone badone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vumrao
Copy link
Copy Markdown
Contributor

vumrao commented Aug 17, 2022

jenkins retest this please

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@pdvian
Copy link
Copy Markdown
Contributor Author

pdvian commented Aug 31, 2022

LGTM. I am just wondering if a standalone/teuthology test can be added to verify the logging levels. A standalone test appears to be the easiest. If it's time consuming or too much effort, it can be added in a separate PR.

Sure Sridhar. Let me add a testcase for it.

@pdvian
Copy link
Copy Markdown
Contributor Author

pdvian commented Sep 29, 2022

jenkins test make check

Comment thread qa/standalone/mon/mon-cluster-log.sh Outdated
Comment thread qa/standalone/mon/mon-cluster-log.sh Outdated
Comment thread qa/standalone/mon/mon-cluster-log.sh Outdated
Comment thread qa/standalone/mon/mon-cluster-log.sh Outdated
Comment thread src/common/options/mon.yaml.in
Comment thread src/mon/LogMonitor.cc Outdated
Comment thread src/mon/LogMonitor.cc Outdated
@pdvian
Copy link
Copy Markdown
Contributor Author

pdvian commented Sep 30, 2022

Thanks @idryomov. I saw your review comments a bit later. I have added the relevant section in PendingReleaseNotes for removing mon_cluster_log_*_level options. Let me address your review comments for standalone testcases.

@neha-ojha
Copy link
Copy Markdown
Member

jenkins test make check

Comment thread qa/standalone/mon/mon-cluster-log.sh Outdated
@pdvian
Copy link
Copy Markdown
Contributor Author

pdvian commented Jan 8, 2024

The make check is failing due to tracker#63950. I have opened PR#55070 to fix it.

@rzarzynski
Copy link
Copy Markdown
Contributor

@ljflores, @yuriw: ping.

@ljflores
Copy link
Copy Markdown
Member

ljflores commented Jan 9, 2024

@rzarzynski ACK

@NitzanMordhai
Copy link
Copy Markdown
Contributor

@pdvian can you please review: /a/yuriw-2024-01-10_19:18:18-rados-wip-yuri3-testing-2024-01-10-0735-distro-default-smithi/7512554
it failed when trying to find the debugging message

@ljflores
Copy link
Copy Markdown
Member

@pdvian looks like this needs some changes. Feel free to re-add "needs-qa" when this is ready!

@pdvian
Copy link
Copy Markdown
Contributor Author

pdvian commented Feb 7, 2024

@pdvian can you please review: /a/yuriw-2024-01-10_19:18:18-rados-wip-yuri3-testing-2024-01-10-0735-distro-default-smithi/7512554 it failed when trying to find the debugging message

Hi @NitzanMordhai, This testcase failed because the ceph cluster log messages were missing the loglevel in human readable format. I have opened a PR#49730 to fix this issue but could not get into main.

The cluster logs were printing loglevel in numeric format :

2024-02-07T18:51:42.108862+0000 mon.a (mon.0) 903 : cluster 1 Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set)
2024-02-07T18:51:42.116129+0000 mon.a (mon.0) 905 : cluster 0 osdmap e148: 8 total, 4 up, 6 in
2024-02-07T18:51:42.108862+0000 mon.a (mon.0) 903 : cluster 1 Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set)
2024-02-07T18:51:42.116129+0000 mon.a (mon.0) 905 : cluster 0 osdmap e148: 8 total, 4 up, 6 in

instead of :

2024-02-07T18:51:42.108862+0000 mon.a (mon.0) 903 : cluster [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set)
2024-02-07T18:51:42.116129+0000 mon.a (mon.0) 905 : cluster [DBG] osdmap e148: 8 total, 4 up, 6 in
2024-02-07T18:51:42.108862+0000 mon.a (mon.0) 903 : cluster [INF] Health check cleared: OSDMAP_FLAGS (was: nodeep-scrub flag(s) set)
2024-02-07T18:51:42.116129+0000 mon.a (mon.0) 905 : cluster [DBG] osdmap e148: 8 total, 4 up, 6 in

I have re-opened PR#49730 to address this issue.

@rzarzynski
Copy link
Copy Markdown
Contributor

@pdvian, @ljflores: what's the status of this PR? It looks the issue mentioned above has been addressed by #49730 (already in QA). Is there anything more to do here?

@pdvian pdvian added needs-qa and removed TESTED labels Mar 13, 2024
We do not control the verbosity of the LogEntry
which is getting logged to stderr, graylog and
journald. This causes excessive flooding of logs
to /var/log, making a filesystem to fill up quickly.
Also we have different config variables namely
mon_cluster_log_file_level and mon_cluster_log_to_syslog_level
to control verbosity at cluster log file and
syslog level respectively. Add a generic cluster log
level config variable which controls cluster log
verbosity for all external entities.

Additionally, this patch addresses the regression of
`mon_cluster_log_file_level` option which doesn't take effect
because of code refactoring of LogMonitor::update_from_paxos
(commit : 7c84e06).

Fixes: https://tracker.ceph.com/issues/57061
Fixes: https://tracker.ceph.com/issues/57049

Signed-off-by: Prashant D <pdhange@redhat.com>
@pdvian
Copy link
Copy Markdown
Contributor Author

pdvian commented Mar 14, 2024

jenkins test api

@rzarzynski
Copy link
Copy Markdown
Contributor

Not sure it's related:

RROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)

⇒ Check for [already reported issues](https://tracker.ceph.com/search?q=ERROR:%20test_list_enabled_module%20%28tasks.mgr.dashboard.test_mgr_module.MgrModuleTest%29&open_issues=1) on "ERROR: test_list_enabled_module (tasks.mgr.dashboard.test_mgr_module.MgrModuleTest)" in Ceph Tracker...
... or open[ a new issue](https://tracker.ceph.com/projects/dashboard/issues/new?issue[subject]=ERROR:%20test_list_enabled_module%20%28tasks.mgr.dashboard.test_mgr_module.MgrModuleTest%29&issue[category_id]=212&issue[assigned_to_id]=&issue[description]=%3Cpre%3EPlease%20copy%20here%20the%20console%20error%20message%3C/pre%3E).

2024-03-14 09:42:15,172.172 INFO:__main__:----------------------------------------------------------------------
2024-03-14 09:42:15,172.172 INFO:__main__:Traceback (most recent call last):
2024-03-14 09:42:15,172.172 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/connection.py", line 203, in _new_conn
2024-03-14 09:42:15,172.172 INFO:__main__:    sock = connection.create_connection(
2024-03-14 09:42:15,172.172 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
2024-03-14 09:42:15,172.172 INFO:__main__:    raise err
2024-03-14 09:42:15,172.172 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
2024-03-14 09:42:15,172.172 INFO:__main__:    sock.connect(sa)
2024-03-14 09:42:15,172.172 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/gevent/_socketcommon.py", line 590, in connect
2024-03-14 09:42:15,172.172 INFO:__main__:    self._internal_connect(address)
2024-03-14 09:42:15,172.172 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/gevent/_socketcommon.py", line 634, in _internal_connect
2024-03-14 09:42:15,172.172 INFO:__main__:    raise _SocketError(err, strerror(err))
2024-03-14 09:42:15,172.172 INFO:__main__:ConnectionRefusedError: [Errno 111] Connection refused
2024-03-14 09:42:15,172.172 INFO:__main__:
2024-03-14 09:42:15,173.173 INFO:__main__:The above exception was the direct cause of the following exception:
2024-03-14 09:42:15,173.173 INFO:__main__:
2024-03-14 09:42:15,173.173 INFO:__main__:Traceback (most recent call last):
2024-03-14 09:42:15,173.173 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 791, in urlopen
2024-03-14 09:42:15,173.173 INFO:__main__:    response = self._make_request(
2024-03-14 09:42:15,173.173 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 492, in _make_request
2024-03-14 09:42:15,173.173 INFO:__main__:    raise new_e
2024-03-14 09:42:15,173.173 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 468, in _make_request
2024-03-14 09:42:15,173.173 INFO:__main__:    self._validate_conn(conn)
2024-03-14 09:42:15,173.173 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1097, in _validate_conn
2024-03-14 09:42:15,173.173 INFO:__main__:    conn.connect()
2024-03-14 09:42:15,173.173 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/connection.py", line 611, in connect
2024-03-14 09:42:15,173.173 INFO:__main__:    self.sock = sock = self._new_conn()
2024-03-14 09:42:15,173.173 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/connection.py", line 218, in _new_conn
2024-03-14 09:42:15,173.173 INFO:__main__:    raise NewConnectionError(
2024-03-14 09:42:15,174.174 INFO:__main__:urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [Errno 111] Connection refused
2024-03-14 09:42:15,174.174 INFO:__main__:
2024-03-14 09:42:15,174.174 INFO:__main__:The above exception was the direct cause of the following exception:
2024-03-14 09:42:15,174.174 INFO:__main__:
2024-03-14 09:42:15,174.174 INFO:__main__:Traceback (most recent call last):
2024-03-14 09:42:15,174.174 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
2024-03-14 09:42:15,174.174 INFO:__main__:    resp = conn.urlopen(
2024-03-14 09:42:15,174.174 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 845, in urlopen
2024-03-14 09:42:15,174.174 INFO:__main__:    retries = retries.increment(
2024-03-14 09:42:15,174.174 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
2024-03-14 09:42:15,174.174 INFO:__main__:    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
2024-03-14 09:42:15,174.174 INFO:__main__:urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.21.3.227', port=7789): Max retries exceeded with url: /api/mgr/module (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
2024-03-14 09:42:15,174.174 INFO:__main__:
2024-03-14 09:42:15,174.174 INFO:__main__:During handling of the above exception, another exception occurred:
2024-03-14 09:42:15,174.174 INFO:__main__:
2024-03-14 09:42:15,174.174 INFO:__main__:Traceback (most recent call last):
2024-03-14 09:42:15,175.175 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/test_mgr_module.py", line 58, in test_list_enabled_module
2024-03-14 09:42:15,175.175 INFO:__main__:    data = self._get('/api/mgr/module')
2024-03-14 09:42:15,175.175 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 341, in _get
2024-03-14 09:42:15,175.175 INFO:__main__:    return cls._request(url, 'GET', params=params, version=version,
2024-03-14 09:42:15,175.175 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-api/qa/tasks/mgr/dashboard/helper.py", line 313, in _request
2024-03-14 09:42:15,175.175 INFO:__main__:    cls._resp = cls._session.get(url, params=params, verify=False,
2024-03-14 09:42:15,175.175 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
2024-03-14 09:42:15,175.175 INFO:__main__:    return self.request("GET", url, **kwargs)
2024-03-14 09:42:15,175.175 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
2024-03-14 09:42:15,175.175 INFO:__main__:    resp = self.send(prep, **send_kwargs)
2024-03-14 09:42:15,175.175 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
2024-03-14 09:42:15,175.175 INFO:__main__:    r = adapter.send(request, **kwargs)
2024-03-14 09:42:15,175.175 INFO:__main__:  File "/tmp/tmp.iDrxhKZ2Mf/venv/lib/python3.10/site-packages/requests/adapters.py", line 519, in send
2024-03-14 09:42:15,175.175 INFO:__main__:    raise ConnectionError(e, request=request)
2024-03-14 09:42:15,175.175 INFO:__main__:requests.exceptions.ConnectionError: HTTPSConnectionPool(host='172.21.3.227', port=7789): Max retries exceeded with url: /api/mgr/module (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
2024-03-14 09:42:15,175.175 INFO:__main__:
2024-03-14 09:42:15,176.176 INFO:__main__:> ip netns list
2024-03-14 09:42:15,180.180 INFO:__main__:> sudo ip link delete ceph-brx
Cannot find device "ceph-brx"
2024-03-14 09:42:15,195.195 INFO:__main__:
2024-03-14 09:42:15,195.195 INFO:__main__:----------------------------------------------------------------------
2024-03-14 09:42:15,195.195 INFO:__main__:Ran 93 tests in 1322.196s
2024-03-14 09:42:15,195.195 INFO:__main__:
2024-03-14 09:42:15,196.196 INFO:__main__:

@rzarzynski
Copy link
Copy Markdown
Contributor

jenkins test api

@rzarzynski
Copy link
Copy Markdown
Contributor

@yuriw, @ljflores: we need to rerun the QA.

@yuriw
Copy link
Copy Markdown
Contributor

yuriw commented Apr 17, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.