Skip to content

feat(integration-tests): Add port range assignment based on a configurable base port to avoid package spin-up failures.#1662

Merged
quinntaylormitchell merged 68 commits into
y-scope:mainfrom
quinntaylormitchell:testing-start-stop-5
Dec 12, 2025
Merged

feat(integration-tests): Add port range assignment based on a configurable base port to avoid package spin-up failures.#1662
quinntaylormitchell merged 68 commits into
y-scope:mainfrom
quinntaylormitchell:testing-start-stop-5

Conversation

@quinntaylormitchell

@quinntaylormitchell quinntaylormitchell commented Nov 25, 2025

Copy link
Copy Markdown
Collaborator

Description

This PR adds a system that assigns a port to each component of the clp package. The intent is to avoid port conflicts on package startup.

--base-port is an optional CLI flag which has a default value of 55000 if it is not specified. A port range is constructed starting at that base port and stretching to accommodate the maximum number of ports needed by the CLP package described in a given ClpConfig object. The system checks whether or not each port within this range is available, and if they are all available, the system assigns ports to the components appropriately.

PR dependencies

This PR depends on the following being merged:

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Ran the following command from clp/integration-tests:

uv run pyetst -m 'package'

Both the clp-json and clp-text tests passed.

Summary by CodeRabbit

  • New Features

    • Added a CLI option --base-port (default: 55000) for integration tests.
    • Automatic assignment of consecutive service ports derived from the chosen base port and recorded in test configuration.
    • Added port-management utilities to allocate and verify port ranges before test startup.
  • Bug Fixes

    • Clearer startup errors indicating port conflicts and how to override the base port.
    • Validation and descriptive errors for invalid or unavailable base-port ranges.

✏️ Tip: You can customize this high-level summary in your review settings.

Comment thread integration-tests/tests/fixtures/package_config.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/fixtures/package_config.py

@quinntaylormitchell quinntaylormitchell left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Bill.

re. the replies that say "see top-level comment":

The first issue with using the required_components list in assign_ports_from_base is that the component names on the list are all in their Docker container name forms (with - characters and no _), whereas the names of the ClpConfig class attributes are all in _ form.

In addition, it seems to me like it's more future-proof to iterate through the attributes of the ClpConfig class, as this code is currently doing. That way, the only thing we have to keep track of as we maintain the code in the future is the list of components that each operating mode needs (which is human-level anyway).

Comment thread integration-tests/tests/fixtures/package_config.py Outdated
Comment thread integration-tests/tests/fixtures/package_config.py
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
integration-tests/tests/utils/port_utils.py (1)

13-14: Document the origin of REDUCER_MAX_PORTS = 128.

Add a comment explaining where this value comes from (e.g., reducer configuration or implementation constraint). This will help maintainers understand why 128 ports are reserved and keep the constant in sync with any upstream changes.

 # CLP constants.
+# Maximum number of consecutive ports the reducer component requires.
+# This value should match the reducer's implementation that binds to a port range.
 REDUCER_MAX_PORTS = 128
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a37994 and 0afc57b.

📒 Files selected for processing (2)
  • integration-tests/tests/fixtures/package_config.py (3 hunks)
  • integration-tests/tests/utils/port_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (2)
integration-tests/tests/fixtures/package_config.py (2)
integration-tests/tests/utils/port_utils.py (1)
  • assign_ports_from_base (21-67)
integration-tests/tests/utils/clp_mode_utils.py (1)
  • get_clp_config_from_mode (77-88)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpConfig (705-955)
🔇 Additional comments (5)
integration-tests/tests/fixtures/package_config.py (2)

31-38: LGTM!

The base port parsing with clear error handling and the subsequent port assignment call are well-implemented. The error message is descriptive and will help users identify configuration issues.


43-49: LGTM!

The base_port is correctly passed to the PackageConfig constructor, enabling downstream components to access the configured base port.

integration-tests/tests/utils/port_utils.py (3)

32-52: Component discovery approach is reasonable.

The dynamic discovery via vars(clp_config) provides flexibility and avoids maintaining a separate list of components. Since Python 3.7+, vars() returns attributes in definition order, making the port assignment deterministic for a given ClpConfig structure.


54-67: LGTM!

The validation-before-assignment pattern ensures that either all ports are available and assigned, or an error is raised before any changes are made. This atomic approach prevents partial configuration states.


98-134: LGTM!

The validation functions provide clear, actionable error messages that include both the problematic port and the expected range. The helper function _format_inclusive_port_range keeps the output consistent and readable.

Comment thread integration-tests/tests/utils/port_utils.py

@Bill-hbrhbr Bill-hbrhbr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your statement on inspecting the whole CLPConfig instead of the required component list.
Have some final comments on assign_ports_from_base, the main function of this PR.

Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0afc57b and bbc4780.

📒 Files selected for processing (1)
  • integration-tests/tests/utils/port_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (1)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpConfig (779-1060)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: build (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (3)
integration-tests/tests/utils/port_utils.py (3)

8-20: Port constants and reducer block sizing look reasonable for the test context

The valid port range and REDUCER_MAX_PORTS constant read clearly and give a sensible safety margin for reducer instances in tests. No functional concerns here.


69-78: Range pretty-printer is correct and keeps error messages readable

The helper correctly interprets Python’s half-open range as inclusive on both ends for display, and the output format reads well in your error messages. Looks good as-is.


115-133: Port-range validation is correct and produces helpful diagnostics

The inclusive bound check against VALID_PORT_RANGE is logically correct, and the error message constructed with _format_inclusive_port_range gives the user both the derived and valid ranges, which should make misconfigured --clp-base-port issues easy to spot. No issues here.

Comment thread integration-tests/tests/utils/port_utils.py
Comment thread integration-tests/tests/utils/port_utils.py Outdated
Bill-hbrhbr
Bill-hbrhbr previously approved these changes Dec 8, 2025

@Bill-hbrhbr Bill-hbrhbr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing on to @junhaoliao

@junhaoliao

Copy link
Copy Markdown
Member

Hey @quinntaylormitchell, while reviewing your PR, I'd suggest the following code changes:

👉 i tried refactoring this file a bit

You can also review and apply these suggestions locally on your machine.

Learn more about GitKraken Code Suggest

Code Suggest liberates your code reviews from GitHub's restrictive, comment-only feedback style. As simple as suggesting changes in a Google-doc, provide real code suggestions from where you code, e.g. your IDE, and on anything in your project — not just on the lines of code changed in the PR.

Join your team on GitKraken to speed up PR review.

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the title, how about:

feat(integration-tests): Add port range assignment to avoid package spin-up failures based on a configurable base port.

Comment thread integration-tests/tests/utils/port_utils.py
Comment thread integration-tests/tests/conftest.py Outdated
@quinntaylormitchell quinntaylormitchell changed the title feat(integration-tests): Assign ports based on BASE_PORT to avoid package spin-up failure. feat(integration-tests): Add port range assignment based on a configurable base port to avoid package spin-up failures. Dec 9, 2025

@junhaoliao junhaoliao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest lgtm

Comment thread integration-tests/tests/utils/port_utils.py Outdated
Comment thread integration-tests/tests/utils/port_utils.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9922b08 and c7909f4.

📒 Files selected for processing (1)
  • integration-tests/tests/utils/port_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T06:29:59.610Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: .github/workflows/clp-core-build.yaml:294-332
Timestamp: 2025-08-25T06:29:59.610Z
Learning: In the CLP project, Bill-hbrhbr prefers a "fail fast" approach for CI workflows - allowing potential command availability issues (like getconf in musllinux) to surface through CI failures rather than preemptively adding fallback logic, as they will fix issues when they occur.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2024-11-18T16:49:20.248Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-07-01T14:51:19.172Z
Learnt from: jackluo923
Repo: y-scope/clp PR: 1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (7)
integration-tests/tests/utils/port_utils.py (7)

12-21: Port constant definitions and discovery keys look solid.

Non-privileged/maximum port bounds and VALID_PORT_RANGE are consistent, and PORT_ATTR_NAMES aligns with how components are discovered later; no issues here.


24-36: PortAssignment dataclass is clear and appropriate.

The fields and docstring line up and make the per-component requirements easy to reason about.


64-79: Port availability check and error messaging look good.

Linear scan over the small reserved range is fine for tests, and the error message (including the formatted range) should make port conflicts easy to diagnose.


82-116: Port-assignment discovery logic is clean and maintainable.

Iterating vars(clp_config), skipping private/None attributes, and probing for PORT_ATTR_NAMES keeps this generic and avoids hard-coding component lists. The reducer special-case via REDUCER_COMPONENT_NAME and REDUCER_MAX_PORTS is also clear. As long as components that need ports continue to expose exactly one of these attributes directly, this should remain robust.


119-128: Range pretty-printer is concise and readable.

Nicely encapsulates the inclusive start/end representation used in error messages; no changes needed.


131-144: _is_port_free helper is straightforward and suitable for tests.

The bind-based check is simple and keeps the port-availability logic easy to follow; behaviour matches what _check_ports_available expects.


147-166: Port-range validation logic and messaging look correct.

The bounds check against VALID_PORT_RANGE and the use of _format_port_range for both required and valid ranges make invalid-configuration failures clear and debuggable.

Comment thread integration-tests/tests/utils/port_utils.py
@quinntaylormitchell quinntaylormitchell merged commit 71daa70 into y-scope:main Dec 12, 2025
19 checks passed
davidlion pushed a commit to davidlion/clp that referenced this pull request Jan 17, 2026
…rable base port to avoid package spin-up failures. (y-scope#1662)

Co-authored-by: Bingran Hu <bingran.hu@yscope.com>
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…rable base port to avoid package spin-up failures. (y-scope#1662)

Co-authored-by: Bingran Hu <bingran.hu@yscope.com>
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…rable base port to avoid package spin-up failures. (y-scope#1662)

Co-authored-by: Bingran Hu <bingran.hu@yscope.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants