Skip to content

mgr/dashboard: provide the service events when showing a service in the UI#40328

Merged
epuertat merged 1 commit intoceph:masterfrom
rhcs-dashboard:fix-49262-master
May 20, 2021
Merged

mgr/dashboard: provide the service events when showing a service in the UI#40328
epuertat merged 1 commit intoceph:masterfrom
rhcs-dashboard:fix-49262-master

Conversation

@aaSharma14
Copy link
Contributor

@aaSharma14 aaSharma14 commented Mar 23, 2021

When service deployment failures occur, events are generated and associated with the service. This PR intends to add these events as Daemon Events and Service Events to the UI for troubleshooting and doing diagnostics for the same.

Fixes: https://tracker.ceph.com/issues/49262
Signed-off-by: Aashish Sharma aasharma@redhat.com

Service Events-
Screenshot from 2021-05-06 17-19-46

Daemon Events-
Screenshot from 2021-05-06 17-19-38

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@aaSharma14 aaSharma14 requested review from a team, avanthakkar, callithea and s0nea and removed request for a team March 23, 2021 04:53
@aaSharma14 aaSharma14 requested a review from a team as a code owner March 23, 2021 04:55
@tchaikov
Copy link
Contributor

@aaSharma14 sorry for aborting your "make check" test. as it will fail anyway as i am testing the "make check" run focal test node.

@aaSharma14
Copy link
Contributor Author

@aaSharma14 sorry for aborting your "make check" test. as it will fail anyway as i am testing the "make check" run focal test node.

No issues @tchaikov . Thanks for informing.

@tchaikov
Copy link
Contributor

jenkins test make check

@alfonsomthd alfonsomthd added the skip-teuthology For PRs whose changes do not have an effect on QA runs/changes are not being tested in Teuthology label Mar 23, 2021
@alfonsomthd alfonsomthd requested a review from nizamial09 March 23, 2021 16:15
@alfonsomthd alfonsomthd requested a review from epuertat March 24, 2021 09:01
@aaSharma14 aaSharma14 requested a review from alfonsomthd March 24, 2021 09:04
@epuertat
Copy link
Member

@pcuzner we'd love to hear your thoughts about this one!

@pcuzner
Copy link
Contributor

pcuzner commented Mar 24, 2021

Great to see this!

Couple of questions.

  • service logs = service events right? Is there a reason that we're calling them logs in the GUI and events CLI?
  • when there are multiple events for a service, do the 'cell' just contain all entries - ie the service 'logs' view will only ever be a single row, single cell display for content?
  • daemon logs - there's no example of the content here, what is this. As an admin if you say daemon log I'm expecting to see the same sort of display as from journalctl -u ?
    @aaSharma14 ^^

@aaSharma14 aaSharma14 force-pushed the fix-49262-master branch 2 times, most recently from fa7f7ee to eed7e16 Compare March 26, 2021 07:25
@aaSharma14
Copy link
Contributor Author

Great to see this!
Thanks @pcuzner .

Couple of questions.

  • service logs = service events right? Is there a reason that we're calling them logs in the GUI and events CLI?

No specific reason for this. Just thought that Logs might be a more familiar term for the user. However changed it to Events now.

  • when there are multiple events for a service, do the 'cell' just contain all entries - ie the service 'logs' view will only ever be a single row, single cell display for content?

One cell would contain all the entries. New event would start from new line. Like in this screenshot there are two daemon events for one daemon..same would be for multiple service events.
Screenshot from 2021-03-26 12-43-15

  • daemon logs - there's no example of the content here, what is this. As an admin if you say daemon log I'm expecting to see the same sort of display as from journalctl -u ?

I have updated the screenshot for Daemon events with the content. It is basically the events that we are getting from ceph orch ps --service_name="" --format yaml

@aaSharma14 ^^

@aaSharma14
Copy link
Contributor Author

jenkins test make check

@pcuzner
Copy link
Contributor

pcuzner commented Mar 30, 2021

@aaSharma14 thanks!

</cd-table>
</div>

<ng-template #elseBlock>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO "elseBlock" is an undescriptive name for the template, maybe sth like "serviceDetailsTpl" would be better.

@aaSharma14 aaSharma14 requested a review from a team April 27, 2021 15:14
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The screenshot looks great (I know there's a lot of work here, so thank you @aaSharma14!), but I think the code generating that needs some refinement yet.

I added my comments on the performance. It'd be great if you could provide the ng.profiler.timeChangeDetection({record : true}) stats before & after this change, so we can see how worse this gets.

I still think that splitting the log lines in the back-end is preferred (especially when it's already done there), but exploring the OnPush strategy is a must, same as trying the template Cell instead of the current Pipe approach.

osdspec_affinity: Optional[str] = None,
last_deployed: Optional[datetime.datetime] = None,
events: Optional[List['OrchestratorEvent']] = None,
event_dict: Optional[List[dict]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avanthakkar, the idea of something like to_json(True) returning anything other than a JSON string makes me shiver... We shouldn't bury ad-hoc functionality inside the existing methods... It costs nothing to extract the common code and to come up with 2 separate methods (to_json() and to_dict()), but we should dispel the idea of reusing functions for doing many unrelated things.

Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really improved now (IMHO)! What do you think @aaSharma14 ? I just left a few extra improvements.

Comment on lines +1131 to +1146
status = {
'container_image_id': self.container_image_id,
'container_image_name': self.container_image_name,
'rados_config_location': self.rados_config_location,
'service_url': self.service_url,
'size': self.size,
'running': self.running,
'last_refresh': self.last_refresh,
'created': self.created,
'virtual_ip': self.virtual_ip,
'ports': self.ports if self.ports else None,
}
for k in ['last_refresh', 'created']:
if getattr(self, k):
status[k] = datetime_to_str(getattr(self, k))
status = {k: v for (k, v) in status.items() if v is not None}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's just me but I prefer to have clear declarative data struct definitions rather than ones that are dynamically stretched:

Suggested change
status = {
'container_image_id': self.container_image_id,
'container_image_name': self.container_image_name,
'rados_config_location': self.rados_config_location,
'service_url': self.service_url,
'size': self.size,
'running': self.running,
'last_refresh': self.last_refresh,
'created': self.created,
'virtual_ip': self.virtual_ip,
'ports': self.ports if self.ports else None,
}
for k in ['last_refresh', 'created']:
if getattr(self, k):
status[k] = datetime_to_str(getattr(self, k))
status = {k: v for (k, v) in status.items() if v is not None}
status = {
'container_image_id': self.container_image_id,
'container_image_name': self.container_image_name,
'rados_config_location': self.rados_config_location,
'service_url': self.service_url,
'size': self.size,
'running': self.running,
'last_refresh': datetime_to_str(self.last_refresh),
'created': datetime_to_str(self.created),
'virtual_ip': self.virtual_ip,
'ports': self.ports if self.ports else None,
}
# REMOVE the following?
# status = {k: v for (k, v) in status.items() if v is not None}

Why do we remove the keys that are None? For data sharing it's better to have stable/fixed data structure that ones that randomly show some properties...

@sebastian-philipp @jmolmo what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This is new code. We should deliver clean new code, and not follow suboptimal code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epuertat your suggestion is also favoring explicit over implicit, which is something we should follow :)

@aaSharma14 aaSharma14 requested a review from epuertat May 6, 2021 12:14
@aaSharma14 aaSharma14 force-pushed the fix-49262-master branch 2 times, most recently from 9115c28 to 22da72f Compare May 6, 2021 13:12
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More suggestions going there! :D

Comment on lines +63 to +65
<div *ngIf="value.length != 0 && value != undefined"
class="ul-margin">
<ul *ngFor="let event of value">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this work without the previous div? I'd assume that if value.length is 0 or undefined or null, the ngFor won't be displayed either:

Suggested change
<div *ngIf="value.length != 0 && value != undefined"
class="ul-margin">
<ul *ngFor="let event of value">
<ul *ngFor="let event of events" class="ul-margin">

Additionally, ngfors can be computationally expensive (on every change cycle, Angular will probably re-generate the whole loop). To avoid that, ngfor has an extra option trackBy, which is a function (perhaps an inline one works) returning a unique field for each component to identify if it changed (in our case, do we have a uuid for each event? perhaps the timestamp?).

BTW, I'd also suggest to change value with events as it makes clearer what the variable is about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @epuertat . I have updated this with a trackBy function and have changed value to events. Regarding removing the previous div - I tried removing it but it shows both the 'event details' and 'No data avialable' in this case, so decided to keep it as it is.
Screenshot from 2021-05-07 14-25-08

Comment on lines +976 to +1008
def to_dict(self) -> dict:
out: Dict[str, Any] = OrderedDict()
out['daemon_type'] = self.daemon_type
out['daemon_id'] = self.daemon_id
out['hostname'] = self.hostname
out['container_id'] = self.container_id
out['container_image_id'] = self.container_image_id
out['container_image_name'] = self.container_image_name
out['container_image_digests'] = self.container_image_digests
out['memory_usage'] = self.memory_usage
out['memory_request'] = self.memory_request
out['memory_limit'] = self.memory_limit
out['version'] = self.version
out['status'] = self.status.value if self.status is not None else None
out['status_desc'] = self.status_desc
if self.daemon_type == 'osd':
out['osdspec_affinity'] = self.osdspec_affinity
out['is_active'] = self.is_active
out['ports'] = self.ports
out['ip'] = self.ip

for k in ['last_refresh', 'created', 'started', 'last_deployed',
'last_configured']:
if getattr(self, k):
out[k] = datetime_to_str(getattr(self, k))

if self.events:
out['events'] = [e.to_dict() for e in self.events]

empty = [k for k, v in out.items() if v is None]
for e in empty:
del out[e]
return out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same for this, sorry!

return {
  'daemon_type': ...

If we drop support for Python 3.6 (I think CentOS 8 supports 3.8), we could use dataclasses.asdict().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring actually requires a lot of code change in the cephadm code. There are a number of tests that are impacted by this. It would be better if we take this as an improvement in a separate PR as those changes would be unrelated to the purpose that this PR is serving.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the dataclasses? Because a refactoring like the one suggested shouldn't impact any existing tests given this is a new method and I don't see any unit test testing this.

BTW, no need to use an OrderedDict here as dict's in Py3.6 are by default ordered:

Suggested change
def to_dict(self) -> dict:
out: Dict[str, Any] = OrderedDict()
out['daemon_type'] = self.daemon_type
out['daemon_id'] = self.daemon_id
out['hostname'] = self.hostname
out['container_id'] = self.container_id
out['container_image_id'] = self.container_image_id
out['container_image_name'] = self.container_image_name
out['container_image_digests'] = self.container_image_digests
out['memory_usage'] = self.memory_usage
out['memory_request'] = self.memory_request
out['memory_limit'] = self.memory_limit
out['version'] = self.version
out['status'] = self.status.value if self.status is not None else None
out['status_desc'] = self.status_desc
if self.daemon_type == 'osd':
out['osdspec_affinity'] = self.osdspec_affinity
out['is_active'] = self.is_active
out['ports'] = self.ports
out['ip'] = self.ip
for k in ['last_refresh', 'created', 'started', 'last_deployed',
'last_configured']:
if getattr(self, k):
out[k] = datetime_to_str(getattr(self, k))
if self.events:
out['events'] = [e.to_dict() for e in self.events]
empty = [k for k, v in out.items() if v is None]
for e in empty:
del out[e]
return out
def to_dict(self) -> dict:
return {
'daemon_type': self.daemon_type,
'daemon_id': self.daemon_id,
'hostname': self.hostname,
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests in cephadm and orchestrator which are affected :

def test_service_ls(self, cephadm_module):

@pytest.mark.parametrize(

Since in the approach that you are suggesting, we are not removing the attributes that are 'None', so we need to change the expected schemas in all these tests. And in-fact in each case there are 5-10 values that are expected as None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I now see where it fails. My bad. Sorry @aaSharma14. I don't think it's a good practice to have a dynamic schema 'because some values are None' (in that case, the Schema should define that a property can either have a value or be None).

In any case, let's go with it. We have enough code to clean-up in dashboard without having to deal with orchestrator too.

@aaSharma14 aaSharma14 requested a review from epuertat May 17, 2021 08:24
@aaSharma14 aaSharma14 force-pushed the fix-49262-master branch 4 times, most recently from a6ca029 to 7e26f7f Compare May 17, 2021 13:29
…he UI

When service deployment failures occur, events are generated and associated with the service. This PR intends to add these events as Daemon Logs and Service Logs to the UI for troubleshooting and doing diagnostics for the same.

Fixes: https://tracker.ceph.com/issues/49262
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Comment on lines +976 to +1008
def to_dict(self) -> dict:
out: Dict[str, Any] = OrderedDict()
out['daemon_type'] = self.daemon_type
out['daemon_id'] = self.daemon_id
out['hostname'] = self.hostname
out['container_id'] = self.container_id
out['container_image_id'] = self.container_image_id
out['container_image_name'] = self.container_image_name
out['container_image_digests'] = self.container_image_digests
out['memory_usage'] = self.memory_usage
out['memory_request'] = self.memory_request
out['memory_limit'] = self.memory_limit
out['version'] = self.version
out['status'] = self.status.value if self.status is not None else None
out['status_desc'] = self.status_desc
if self.daemon_type == 'osd':
out['osdspec_affinity'] = self.osdspec_affinity
out['is_active'] = self.is_active
out['ports'] = self.ports
out['ip'] = self.ip

for k in ['last_refresh', 'created', 'started', 'last_deployed',
'last_configured']:
if getattr(self, k):
out[k] = datetime_to_str(getattr(self, k))

if self.events:
out['events'] = [e.to_dict() for e in self.events]

empty = [k for k, v in out.items() if v is None]
for e in empty:
del out[e]
return out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the dataclasses? Because a refactoring like the one suggested shouldn't impact any existing tests given this is a new method and I don't see any unit test testing this.

BTW, no need to use an OrderedDict here as dict's in Py3.6 are by default ordered:

Suggested change
def to_dict(self) -> dict:
out: Dict[str, Any] = OrderedDict()
out['daemon_type'] = self.daemon_type
out['daemon_id'] = self.daemon_id
out['hostname'] = self.hostname
out['container_id'] = self.container_id
out['container_image_id'] = self.container_image_id
out['container_image_name'] = self.container_image_name
out['container_image_digests'] = self.container_image_digests
out['memory_usage'] = self.memory_usage
out['memory_request'] = self.memory_request
out['memory_limit'] = self.memory_limit
out['version'] = self.version
out['status'] = self.status.value if self.status is not None else None
out['status_desc'] = self.status_desc
if self.daemon_type == 'osd':
out['osdspec_affinity'] = self.osdspec_affinity
out['is_active'] = self.is_active
out['ports'] = self.ports
out['ip'] = self.ip
for k in ['last_refresh', 'created', 'started', 'last_deployed',
'last_configured']:
if getattr(self, k):
out[k] = datetime_to_str(getattr(self, k))
if self.events:
out['events'] = [e.to_dict() for e in self.events]
empty = [k for k, v in out.items() if v is None]
for e in empty:
del out[e]
return out
def to_dict(self) -> dict:
return {
'daemon_type': self.daemon_type,
'daemon_id': self.daemon_id,
'hostname': self.hostname,
...
}

Comment on lines +1131 to +1146
status = {
'container_image_id': self.container_image_id,
'container_image_name': self.container_image_name,
'rados_config_location': self.rados_config_location,
'service_url': self.service_url,
'size': self.size,
'running': self.running,
'last_refresh': self.last_refresh,
'created': self.created,
'virtual_ip': self.virtual_ip,
'ports': self.ports if self.ports else None,
}
for k in ['last_refresh', 'created']:
if getattr(self, k):
status[k] = datetime_to_str(getattr(self, k))
status = {k: v for (k, v) in status.items() if v is not None}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This is new code. We should deliver clean new code, and not follow suboptimal code.

@alfonsomthd alfonsomthd requested a review from epuertat May 20, 2021 07:46
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, @aaSharma14 ! 🥳

@epuertat epuertat dismissed jmolmo’s stale review May 20, 2021 09:33

No further requests.

@epuertat epuertat merged commit fb7175c into ceph:master May 20, 2021
@epuertat epuertat deleted the fix-49262-master branch May 20, 2021 09:33
Comment on lines +976 to +986
def to_dict(self) -> dict:
out: Dict[str, Any] = OrderedDict()
out['daemon_type'] = self.daemon_type
out['daemon_id'] = self.daemon_id
out['hostname'] = self.hostname
out['container_id'] = self.container_id
out['container_image_id'] = self.container_image_id
out['container_image_name'] = self.container_image_name
out['container_image_digests'] = self.container_image_digests
out['memory_usage'] = self.memory_usage
out['memory_request'] = self.memory_request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no, you can't just copy the to_json method and rename it to_dict:

(Diff at the time you copied the method)

image

because over time they will diverge:

(Diff on current master HEAD)

image

Copying such a large method is just bad practice. sorry that I haven't noticed this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a wrongly resolved conflict or what? So it's just a matter of adding the missing fields?

@aaSharma14 please have a look at this, please? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epuertat, No its not a wrongly resolved commit. At that time we introduced a new to_dict method. Over the time new fields were added to the to_json method..but those were not added to the to_dict method. So, now both these two methods are not in sync with each other as there are different number of fields in both. A quick solution would be add those new fields to to_dict method as well. Or if we want to refactor the code..then I will need to find a better solution for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epuertat, No its not a wrongly resolved commit. At that time we introduced a new to_dict method. Over the time new fields were added to the to_json method.

Right. That's how it is. If you want to avoid having to maintain this method every now and then, you have to unify those two again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aaSharma14! I remember I already asked this, but isn't there a way to avoid this by reusing the same method? I find quite unusual to have 2 methods that do almost the same (I'd remove the to_json() one as it's not JSON (text) what that's returning but a dict with some subfields in JSON format...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I avoided calling those methods to_dict is, that they actually sometimes don't return a dict:

➜  src git:(_check_for_moved_osds) ✗ find * -name '*.py' | grep -v -e rook_client -e tox -e test | xargs egrep 'def.*to_json'
ceph-volume/venv/lib/python3.9/site-packages/pip/_internal/models/direct_url.py:    def to_json(self):
pybind/mgr/cephadm/services/osd.py:    def to_json(self) -> dict:
pybind/mgr/cephadm/configchecks.py:    def to_json(self) -> Dict[str, Any]:
pybind/mgr/cephadm/configchecks.py:    def to_json(self) -> List[Dict[str, str]]:
pybind/mgr/cephadm/upgrade.py:    def to_json(self) -> dict:
pybind/mgr/cephadm/inventory.py:    def to_json(self) -> Dict[str, Any]:
pybind/mgr/orchestrator/module.py:    def to_json_1(obj: Any) -> Any:
pybind/mgr/orchestrator/module.py:    def to_json_n(objs: List) -> List:
pybind/mgr/orchestrator/_interface.py:    def to_json(self) -> dict:
pybind/mgr/orchestrator/_interface.py:    def to_json(self) -> OrderedDict:
pybind/mgr/orchestrator/_interface.py:    def to_json(self) -> dict:
pybind/mgr/orchestrator/_interface.py:    def to_json(self) -> str:
pybind/mgr/progress/module.py:    def to_json(self):
pybind/mgr/progress/module.py:    def to_json(self):
pybind/mgr/rbd_support/schedule.py:    def to_json(self) -> str:
pybind/mgr/rbd_support/task.py:    def to_json(self) -> str:
pybind/mgr/volumes/fs/volume.py:def name_to_json(names):
python-common/ceph/deployment/inventory.py:    def to_json(self):
python-common/ceph/deployment/inventory.py:    def to_json(self):
python-common/ceph/deployment/hostspec.py:    def to_json(self) -> Dict[str, Any]:
python-common/ceph/deployment/drive_group.py:    def to_json(self):
python-common/ceph/deployment/service_spec.py:    def to_json(self) -> str:
python-common/ceph/deployment/service_spec.py:    def to_json(self) -> dict:
python-common/ceph/deployment/service_spec.py:    def to_json(self):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephadm dashboard feature orchestrator pybind skip-teuthology For PRs whose changes do not have an effect on QA runs/changes are not being tested in Teuthology

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

10 participants