Skip to content

mgr/dashboard: Refactor grpc proto files generation. #61967

Closed
pecastro wants to merge 5 commits intoceph:mainfrom
pecastro:mgr-proto-refactor
Closed

mgr/dashboard: Refactor grpc proto files generation. #61967
pecastro wants to merge 5 commits intoceph:mainfrom
pecastro:mgr-proto-refactor

Conversation

@pecastro
Copy link
Copy Markdown
Contributor

  • Rewrite grpc proto code generation logic to not depend on a venv installed
    during the build process.
  • Ensure OS packages are installed to support development.
  • Extend generation to cater with multiple OSs.
  • Move grpc generation it's own CmakeLists.txt.
  • Notes added to the generation script.

Fixes: #61089
Alternative fix to: https://tracker.ceph.com/issues/69995 & #61846

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@pecastro pecastro requested a review from a team as a code owner February 23, 2025 23:50
@pecastro pecastro requested review from aaSharma14 and cloudbehl and removed request for a team February 23, 2025 23:50
@pecastro
Copy link
Copy Markdown
Contributor Author

jenkins retest this please


# Set defaults for variables
set(GRPC_TOOLS_EXPERIMENTAL_FLAG "")
set(GRPC_TOOLS_NVME_PREPARE_SOURCE_CMD cp ${CEPH_NVME_GATEWAY_PROTO_FILE} ${CEPH_DASHBOARD_GATEWAY_PROTO_FILE})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we set a symlink in the git tree instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baum had actually suggested that in the previous MR.
This time around I left it like this because I was either copying or seding the optional keyword out.
Check the comment below for the sed reasons.

Comment on lines +53 to +55
set(UBUNTU_22_04_GRPC_TOOLS_VERSION "3.5.1")
set(CENTOS_9_GRPC_TOOLS_VERSION "3.14.0")
set(FEDORA_40_GRPC_TOOLS_VERSION "3.19.6")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than explicitly hard-coding distro-versions (this is going to complicate maintenance, and excludes distros-versions: F41, Ubuntu 24.04, ...). We should only distinguish by breaking changes libproto versions:

  • <3.12: optional not supported (remove optional from proto file).
  • >=3.12: add --experimental_allow_proto3_optional (from 3.14 it's not required, but it's still accepted, so let's simplify and add it until some future 3.* breaks if this is present).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are hardcoded in the variable name only, and you are right it was the bit I least liked about it but it helped me navigate the minefield.
If you follow the logic a bit, below you'll see the F41 and higher are supported but just marked as Unknown.
They can definitely be renamed to match the libproto version.

Regarding the optional keyword and <3.12 ...
There's a reason why I'm doing that, the current version of grpc_tools shipped by Ubuntu22.04 can't handle the current gateway.proto as is.

I'll simplify in line with the suggestions.


elseif(LIBPROTOC_VERSION VERSION_LESS_EQUAL UBUNTU_22_04_GRPC_TOOLS_VERSION)
message(DEBUG "System likely to be Ubuntu 22.04 or earlier. libprotoc ${LIBPROTOC_VERSION}")
set(GRPC_TOOLS_NVME_PREPARE_SOURCE_CMD sed 's|optional ||g' ${CEPH_NVME_GATEWAY_PROTO_FILE} > ${CEPH_DASHBOARD_GATEWAY_PROTO_FILE})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbregman @baum FYI: is this ok (removing optional from the protobuf file)? Could it be problematic or cause issues?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epuertat it might cause comparability issues when mixing different versions of gateway and CLI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, grpc APIs, as long as they are in the same protobuf version (v2 or v3), are compatible no matter the grpc/protobuf library version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about compatibility between the gateway and the CLI, not the protocol layer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a straightforward way to test that ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about compatibility between the gateway and the CLI, not the protocol layer.

But if they're not colocated, they could (and probably will) run different versions of the gRPC libs (GW runs inside a ubi9 container, while Ceph Dashboard and CLI might run on Ubuntu 22/24,for those users running Deb packages).

@pecastro pecastro force-pushed the mgr-proto-refactor branch 3 times, most recently from da0ec73 to 6c84559 Compare March 2, 2025 19:31
@pecastro pecastro requested a review from a team as a code owner March 2, 2025 19:31
@pecastro pecastro force-pushed the mgr-proto-refactor branch 5 times, most recently from 5dd3f87 to 516af47 Compare March 2, 2025 23:42
@pecastro
Copy link
Copy Markdown
Contributor Author

pecastro commented Mar 2, 2025

Hi folks.

The grpc build code from the previous MR that this current MR is trying to fix is breaking the main build for some of the more bleeding edge users. :(
That crappy setup venv code and install all deps from hell needs to be removed and we either revert that original MR or we progress this one, or else.

If the only contention is regarding the Ubuntu and the removal of the optional keyword, one of the way outs could be to introduce those hardcoded files ( the old manually generated ones ) back into the repository.

I could retrofit that logic in, if ubuntu22.04 copy over the hardcoded generated files from ubuntu22.04 and move on.
Else, keep the current logic and regenerate the files ? Thoughts ?

In the meantime, I'm fighting the unintended consequences of sitepackages = true.
Adding this to tox makes some of the Ubuntu dashboards tests unhappy and right now it's a question of dealing with a fake import-error which if it wasn't for the whack-a-mole behaviour of adding pylint: disable=import-error to having first flake8 complaint of a too long a line and then isort complaint of an incorrectly sorted and/or formatted import this would be passing tests.

@pecastro pecastro force-pushed the mgr-proto-refactor branch from 516af47 to 59299be Compare March 3, 2025 22:39
@nizamial09
Copy link
Copy Markdown
Member

nizamial09 commented Mar 4, 2025

I am also seeing that in the latest main, the grpc proto files are generated using grpc version

╰─$ podman run -it quay.ceph.io/ceph-ci/ceph:main cat /usr/share/ceph/mgr/dashboard/services/proto/gateway_pb2_grpc.py                                                                                                                 255 ↵
# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
"""Client and server classes corresponding to protobuf-defined services."""
import grpc
import warnings

from dashboard.services.proto import gateway_pb2 as dashboard_dot_services_dot_proto_dot_gateway__pb2

GRPC_GENERATED_VERSION = '1.70.0'
GRPC_VERSION = grpc.__version__
_version_not_supported = False

which I don't remember seeing before IIRC. And that broke the nvmeof in main repo because our grpc version in centos is 1.45.x.

Probably the sitepackages = true might fix that thing.

@pecastro Shall we revert the original commit in the main branch for now and then bring it over here as this PRs base commit? That way people can build ceph main for now and test nvmeof stuffs as well. Meanwhile we can test the entire new changes here and merge it only when the supported distros (newer versions of it too?) are a pass. WDYT?

@epuertat ^

@pecastro
Copy link
Copy Markdown
Contributor Author

pecastro commented Mar 4, 2025

which I don't remember seeing before IIRC. And that broke the nvmeof in main repo because our grpc version in centos is 1.45.x.

I did mention this would happen in the original PR. #61089 (comment)
Maybe I should have been more explicit. 😞

I believe this existing PR will fix all the underlying issues. But given what has happen maybe caution is the way to go.

@nizamial09 I don't mind if you revert those back to the original hardcoded files and mechanism whilst we work through this one properly this time, it might even be the better option other than the one on #61954

@nizamial09 nizamial09 mentioned this pull request Mar 4, 2025
14 tasks
@nizamial09
Copy link
Copy Markdown
Member

Thanks @pecastro , i agree with it. I opened the Revert PR here: #62103. I think reverting it would help us focus on the actual issue rather than getting diverted into other distro specific issues for now.

I did mention this would happen in the original PR. #61089 (comment)
Maybe I should have been more explicit. 😞

Ah yeah 😕 I should have looked past the dashboard error and understood the issue of the nvmeof controller itself. I missed it. Sorry for that.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 4, 2025

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@nizamial09
Copy link
Copy Markdown
Member

@pecastro the revert is merged in main now.

@pecastro
Copy link
Copy Markdown
Contributor Author

If the only contention is regarding the Ubuntu and the removal of the optional keyword, one of the way outs could be to introduce those hardcoded files ( the old manually generated ones ) back into the repository.

I could retrofit that logic in, if ubuntu22.04 copy over the hardcoded generated files from ubuntu22.04 and move on. Else, keep the current logic and regenerate the files ? Thoughts ?

Any thoughts on this ?

@pecastro
Copy link
Copy Markdown
Contributor Author

On second though, looking into this bit https://github.com/ceph/ceph/blob/main/src/CMakeLists.txt#L907 ) which I'm not sure how I missed it in the original work but that's a different question ...) to try to factor in WITH_NVMEOF_GATEWAY_MONITOR_CLIENT , it looks like this is only ever built for Redhat based systems.

So my question is, for the dashboard generated protos, do we want to honour this variable or are we rebuilding the protos regardless ? And if so, do we want to to keep the new logic of removing the optional keyword or shall we reuse the originals just in the Ubuntu case.

@nizamial09
Copy link
Copy Markdown
Member

So my question is, for the dashboard generated protos, do we want to honour this variable or are we rebuilding the protos regardless ?

to be honest here, I've only ever tested the nvmeof against the centos8/9 only. And I am almost sure that our APIs won't be there for ubuntu releases because ubuntu has a different grpcio version which is not supported with the one that we used to generate the proto files with. And if the core nvmeof is also doing the same thing because of the dependency issue, I am okay with it for the time being. If the monitor is required for starting the nvmeof gateways, I don't see a reason why we would need to build the proto files in ubuntu...

@pecastro pecastro force-pushed the mgr-proto-refactor branch from 09d6e15 to 10e189c Compare March 12, 2025 00:00
@pecastro pecastro force-pushed the mgr-proto-refactor branch 2 times, most recently from 5eba9f7 to 8846e2b Compare March 13, 2025 21:28
Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
* Rewrite grpc proto code generation logic to not depend on a venv installed
during the build process.
* Ensure OS packages are installed to support development.
* Extend generation to cater with multiple OSs.
* Move grpc generation it's own CmakeLists.txt.
* Notes added to the generation script.

Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
openapi-check: install_deps /ceph/src/pybind/mgr/dashboard> python -I -m pip install -r requirements.txt -r requirements-test.txt -c constraints.txt
Looking in links: /ceph/src/pybind/mgr/dashboard/wheelhouse
Obtaining file:///ceph/src/python-common (from -r requirements.txt (line 7))
  Preparing metadata (setup.py) ... error

  error: subprocess-exited-with-error

  _ python setup.py egg_info did not run successfully.
  │ exit code: 1
  _─> [1 lines of output]
      ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

_ Encountered error while generating package metadata.
_─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
openapi-check: exit 1 (0.31 seconds) /ceph/src/pybind/mgr/dashboard> python -I -m pip install -r requirements.txt -r requirements-test.txt -c constraints.txt pid=270650
  openapi-check: FAIL code 1 (0.33 seconds)
  evaluation failed :( (0.36 seconds)

Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
These all came about with the introduction of `sitepackages = true` in
tox to allow the use system grpc python packages.
Unfortunately these errors only seem to show up in old systems like
Ubuntu 22.04 due most likely to the legacy version of some of the
python libraries used.

```
10: Requirement already satisfied: distlib<1,>=0.3.7 in /ceph/build/mgr-dashboard-lint-virtualenv/lib/python3.10/site-packages (from virtualenv>=20.27.1->tox) (0.3.9)
10: lint: install_deps /ceph/src/pybind/mgr/dashboard> python -I -m pip install -r requirements.txt -r requirements-lint.txt -c constraints.txt
10: lint: commands[0] /ceph/src/pybind/mgr/dashboard> flake8
10: lint: commands[1] /ceph/src/pybind/mgr/dashboard> flake8 --config=tox.ini ../../../../qa/tasks/mgr/dashboard
10: lint: commands[2] /ceph/src/pybind/mgr/dashboard> isort ../../../../qa/tasks/mgr/dashboard --check
10: lint: commands[3] /ceph/src/pybind/mgr/dashboard> pylint -rn --rcfile=.pylintrc --jobs=1 . api controllers plugins services tests ../../../../qa/tasks/mgr/dashboard
10: ************* Module controllers.osd
10: controllers/osd.py:9:0: E0401: Unable to import 'ceph.deployment.drive_group' (import-error)
10: ************* Module plugins.motd
10: plugins/motd.py:8:0: E0401: Unable to import 'ceph.utils' (import-error)
10: ************* Module services.orchestrator
10: services/orchestrator.py:7:0: E0401: Unable to import 'ceph.deployment.service_spec' (import-error)
10: ************* Module tests.test_osd
10: tests/test_osd.py:7:0: E0401: Unable to import 'ceph.deployment.drive_group' (import-error)
10: tests/test_osd.py:8:0: E0401: Unable to import
'ceph.deployment.service_spec' (import-error)
...
```

Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
@pecastro pecastro force-pushed the mgr-proto-refactor branch from 8846e2b to 55576f9 Compare March 16, 2025 23:16
@pecastro
Copy link
Copy Markdown
Contributor Author

I don't see a reason why we would need to build the proto files in ubuntu...

I thought so as well but ...

The naive approach of just honouring this WITH_NVMEOF_GATEWAY_MONITOR_CLIENT flag will mean the ubuntu builds will start failing tests.
The absense of the generated proto files will mean a mismatch of what the dashboards API will generate https://github.com/ceph/ceph/blob/main/src/pybind/mgr/dashboard/tox.ini#L173 versus the API expectation and the test run-tox-mgr-dashboard-openapi fails.
I can't at this moment see a way out without adding a bunch of extra complexity to handle this only for the ubuntu case which I don't believe is worth it.
Also, Alex gripe with it seemed to be solely around the failures he was seeing which were related with the earlier implementation of this work and the xmlsec dep.

So, I can leave as is ( doing the "naughty" optional keyword removal ) or I can add another commit so it uses the existing generated files that exist in the repo ONLY for the Ubuntu case.

@batrick batrick changed the title mgr/dashboard: Refactor grpc proto files generation. mgr/dashboard: Refactor grpc proto files generation. Apr 28, 2025
@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Aug 19, 2025
@github-actions
Copy link
Copy Markdown

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Sep 18, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Ceph-Dashboard Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants