feat(integration-tests): Add port range assignment based on a configurable base port to avoid package spin-up failures.#1662
Conversation
…ig file; address comments.
…er-level clp-config.yml
…E_CONFIGS other than clp-text and clp-json
quinntaylormitchell
left a comment
There was a problem hiding this comment.
Thanks for the review Bill.
re. the replies that say "see top-level comment":
The first issue with using the required_components list in assign_ports_from_base is that the component names on the list are all in their Docker container name forms (with - characters and no _), whereas the names of the ClpConfig class attributes are all in _ form.
In addition, it seems to me like it's more future-proof to iterate through the attributes of the ClpConfig class, as this code is currently doing. That way, the only thing we have to keep track of as we maintain the code in the future is the list of components that each operating mode needs (which is human-level anyway).
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
integration-tests/tests/utils/port_utils.py (1)
13-14: Document the origin ofREDUCER_MAX_PORTS = 128.Add a comment explaining where this value comes from (e.g., reducer configuration or implementation constraint). This will help maintainers understand why 128 ports are reserved and keep the constant in sync with any upstream changes.
# CLP constants. +# Maximum number of consecutive ports the reducer component requires. +# This value should match the reducer's implementation that binds to a port range. REDUCER_MAX_PORTS = 128
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
integration-tests/tests/fixtures/package_config.py(3 hunks)integration-tests/tests/utils/port_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (2)
integration-tests/tests/fixtures/package_config.py (2)
integration-tests/tests/utils/port_utils.py (1)
assign_ports_from_base(21-67)integration-tests/tests/utils/clp_mode_utils.py (1)
get_clp_config_from_mode(77-88)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpConfig(705-955)
🔇 Additional comments (5)
integration-tests/tests/fixtures/package_config.py (2)
31-38: LGTM!The base port parsing with clear error handling and the subsequent port assignment call are well-implemented. The error message is descriptive and will help users identify configuration issues.
43-49: LGTM!The
base_portis correctly passed to thePackageConfigconstructor, enabling downstream components to access the configured base port.integration-tests/tests/utils/port_utils.py (3)
32-52: Component discovery approach is reasonable.The dynamic discovery via
vars(clp_config)provides flexibility and avoids maintaining a separate list of components. Since Python 3.7+,vars()returns attributes in definition order, making the port assignment deterministic for a givenClpConfigstructure.
54-67: LGTM!The validation-before-assignment pattern ensures that either all ports are available and assigned, or an error is raised before any changes are made. This atomic approach prevents partial configuration states.
98-134: LGTM!The validation functions provide clear, actionable error messages that include both the problematic port and the expected range. The helper function
_format_inclusive_port_rangekeeps the output consistent and readable.
Bill-hbrhbr
left a comment
There was a problem hiding this comment.
I agree with your statement on inspecting the whole CLPConfig instead of the required component list.
Have some final comments on assign_ports_from_base, the main function of this PR.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
integration-tests/tests/utils/port_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (1)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpConfig(779-1060)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: build (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (3)
integration-tests/tests/utils/port_utils.py (3)
8-20: Port constants and reducer block sizing look reasonable for the test contextThe valid port range and
REDUCER_MAX_PORTSconstant read clearly and give a sensible safety margin for reducer instances in tests. No functional concerns here.
69-78: Range pretty-printer is correct and keeps error messages readableThe helper correctly interprets Python’s half-open
rangeas inclusive on both ends for display, and the output format reads well in your error messages. Looks good as-is.
115-133: Port-range validation is correct and produces helpful diagnosticsThe inclusive bound check against
VALID_PORT_RANGEis logically correct, and the error message constructed with_format_inclusive_port_rangegives the user both the derived and valid ranges, which should make misconfigured--clp-base-portissues easy to spot. No issues here.
Bill-hbrhbr
left a comment
There was a problem hiding this comment.
passing on to @junhaoliao
|
Hey @quinntaylormitchell, while reviewing your PR, I'd suggest the following code changes: 👉 i tried refactoring this file a bit You can also review and apply these suggestions locally on your machine. Learn more about GitKraken Code Suggest
Join your team on GitKraken to speed up PR review. |
BASE_PORT to avoid package spin-up failure.There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
integration-tests/tests/utils/port_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T06:29:59.610Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: .github/workflows/clp-core-build.yaml:294-332
Timestamp: 2025-08-25T06:29:59.610Z
Learning: In the CLP project, Bill-hbrhbr prefers a "fail fast" approach for CI workflows - allowing potential command availability issues (like getconf in musllinux) to surface through CI failures rather than preemptively adding fallback logic, as they will fix issues when they occur.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.
Applied to files:
integration-tests/tests/utils/port_utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (7)
integration-tests/tests/utils/port_utils.py (7)
12-21: Port constant definitions and discovery keys look solid.Non-privileged/maximum port bounds and VALID_PORT_RANGE are consistent, and PORT_ATTR_NAMES aligns with how components are discovered later; no issues here.
24-36: PortAssignment dataclass is clear and appropriate.The fields and docstring line up and make the per-component requirements easy to reason about.
64-79: Port availability check and error messaging look good.Linear scan over the small reserved range is fine for tests, and the error message (including the formatted range) should make port conflicts easy to diagnose.
82-116: Port-assignment discovery logic is clean and maintainable.Iterating vars(clp_config), skipping private/None attributes, and probing for PORT_ATTR_NAMES keeps this generic and avoids hard-coding component lists. The reducer special-case via REDUCER_COMPONENT_NAME and REDUCER_MAX_PORTS is also clear. As long as components that need ports continue to expose exactly one of these attributes directly, this should remain robust.
119-128: Range pretty-printer is concise and readable.Nicely encapsulates the inclusive start/end representation used in error messages; no changes needed.
131-144: _is_port_free helper is straightforward and suitable for tests.The bind-based check is simple and keeps the port-availability logic easy to follow; behaviour matches what _check_ports_available expects.
147-166: Port-range validation logic and messaging look correct.The bounds check against VALID_PORT_RANGE and the use of _format_port_range for both required and valid ranges make invalid-configuration failures clear and debuggable.
…rable base port to avoid package spin-up failures. (y-scope#1662) Co-authored-by: Bingran Hu <bingran.hu@yscope.com>
…rable base port to avoid package spin-up failures. (y-scope#1662) Co-authored-by: Bingran Hu <bingran.hu@yscope.com>
…rable base port to avoid package spin-up failures. (y-scope#1662) Co-authored-by: Bingran Hu <bingran.hu@yscope.com>
Description
This PR adds a system that assigns a port to each component of the clp package. The intent is to avoid port conflicts on package startup.
--base-portis an optional CLI flag which has a default value of55000if it is not specified. A port range is constructed starting at that base port and stretching to accommodate the maximum number of ports needed by the CLP package described in a givenClpConfigobject. The system checks whether or not each port within this range is available, and if they are all available, the system assigns ports to the components appropriately.PR dependencies
This PR depends on the following being merged:
Checklist
breaking change.
Validation performed
Ran the following command from
clp/integration-tests:uv run pyetst -m 'package'Both the
clp-jsonandclp-texttests passed.Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.