Skip to content

feat(clp-package): Add Spider and its dependencies to the build.#1318

Merged
LinZhihao-723 merged 51 commits into
y-scope:mainfrom
sitaowang1998:spider-setup
Sep 25, 2025
Merged

feat(clp-package): Add Spider and its dependencies to the build.#1318
LinZhihao-723 merged 51 commits into
y-scope:mainfrom
sitaowang1998:spider-setup

Conversation

@sitaowang1998

@sitaowang1998 sitaowang1998 commented Sep 21, 2025

Copy link
Copy Markdown
Contributor

Description

This PR

  • Adds yscope-spider-py as a dependency of job-orchestration.
  • Changes to install c++ libraries in build/deps/cpp.
  • Adds tasks to install mariadb-connector-cpp and spider.

Notice that spider and mariadb-connector-cpp are both Ubuntu only tasks because of the build issue of mariadb-connector-cpp on other platforms and that we don't need to use spider on other platforms.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • spider C++ package installed successfully in Ubuntu 24.04.
  • Running deps:core and then deps:spider or vice versa does not build libraries again in Ubuntu 24.04.
  • job-orchestration builds.
  • GitHub workflows pass.

Summary by CodeRabbit

  • New Features

    • Added Spider build/orchestration tasks and a Spider build-dir variable; added yscope-spider-py dependency.
  • Refactor

    • Migrated public/config namespace from “core” to “cpp” (paths, checksum handling, CMake settings, tasks and artifact outputs).
  • Chores

    • Standardized parallelism env var to CLP_CPP_MAX_PARALLELISM_PER_BUILD_TASK across workflows and tasks.
    • Tightened job-orchestration Python constraint to >=3.9,<4.0.

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner September 21, 2025 20:55
@coderabbitai

coderabbitai Bot commented Sep 21, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Migrates public/task/config surfaces from the "core" namespace to "cpp": adds G_DEPS_CPP_* variables, renames core tasks to cpp, adds spider-related tasks, updates CI env-var names to CLP_CPP_*, adjusts a CMake deps include to use build/deps/cpp, and tightens/extends a Python project's dependencies.

Changes

Cohort / File(s) Summary
Deps taskfile namespace migration (core → cpp)
taskfiles/deps/main.yaml, taskfiles/deps/utils.yaml
Added G_DEPS_CPP_* variables (DIR, CHECKSUMS_DIR, CMAKE_SETTINGS_DIR, CHECKSUM_FILE); replaced G_DEPS_CORE_* references with G_DEPS_CPP_*; renamed checksum and helper tasks from corecpp; updated CMake settings, build/install/checksum paths and public artifact references to the CPP namespace; added/wired spider-related public tasks (spider, spider-dep, spider-all-parallel).
Root taskfile vars and usage
taskfile.yaml
Added G_SPIDER_BUILD_DIR; replaced G_CORE_MAX_PARALLELISM_PER_BUILD_TASK with G_CPP_MAX_PARALLELISM_PER_BUILD_TASK and environment default reference CLP_CPP_MAX_PARALLELISM_TASK; updated core-generate/core-build and package flows to use G_DEPS_CPP_* and include spider build/rsync steps.
CI workflows env var rename & spider invocation
.github/workflows/clp-artifact-build.yaml, .github/workflows/clp-core-build-macos.yaml
Replaced CLP_CORE_MAX_PARALLELISM_PER_BUILD_TASK with CLP_CPP_MAX_PARALLELISM_PER_BUILD_TASK in run_command/env contexts; updated workflow steps that invoke deps tasks to use the CPP-named parallelism variable; artifact build workflow adjusted to invoke CPP/spider deps in relevant job(s).
CMake deps path update
components/core/CMakeLists.txt
Changed deps include path from ../../build/deps/core to ../../build/deps/cpp so optional include loads cmake-settings from the cpp deps dir.
Job orchestration Python project
components/job-orchestration/pyproject.toml
Tightened Python requirement from >= 3.9 to >= 3.9,<4.0 and added dependency yscope-spider-py = "0.1.0".

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CI as CI Workflow
  participant Tasks as Taskfiles (deps)
  participant CMake as CMake (build/deps/cpp)
  participant Spider as Spider build

  CI->>Tasks: run deps:cpp / cpp-all-parallel
  Note right of Tasks #E7F6FF: Tasks now use G_DEPS_CPP_* and\nG_CPP_MAX_PARALLELISM_PER_BUILD_TASK
  Tasks->>CMake: configure / build / install (CMAKE_SETTINGS_DIR=.../cpp)
  CMake-->>Tasks: deps installed under build/deps/cpp

  CI->>Tasks: run spider / spider-all-parallel
  Tasks->>Spider: invoke spider build with cpp CMake args
  Spider-->>Tasks: spider artifacts produced

  CI->>CoreComponent: configure components/core
  CoreComponent->>CMake: optionally include ../../build/deps/cpp/cmake-settings/all.cmake
  CoreComponent-->>CI: continue build using cpp deps if included
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title focuses solely on adding Spider and its dependencies but omits the significant refactoring that migrates the core namespace to C++ (the G_DEPS_CPP_* variables and related tasks). As a result, it does not fully summarise the primary scope of changes, which also include extensive updates to the build layout and namespace for C++ libraries. Consider revising the title to reflect both the core-to-CPP namespace migration and the addition of Spider build tasks, for example: “feat(deps): migrate core to cpp namespace and add Spider build tasks and dependencies.” This will ensure the title conveys the full scope of the changeset at a glance.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
taskfiles/lint.yaml (2)

763-765: Include spider-orchestration in UV-driven lint/type-check passes.

It’s added to the first loop (black/ruff), but not to UV_PYTHON_PROJECTS, so it will miss ruff format and mypy under uv.

   py:
     internal: true
     vars:
       UV_PYTHON_PROJECTS:
-        - "{{.G_INTEGRATION_TESTS_DIR}}"
+        - "{{.G_INTEGRATION_TESTS_DIR}}"
+        - "components/spider-orchestration"

769-779: Optional: run tools from the project root for consistency.

You currently lint the package subdir "components/spider-orchestration/spider_orchestration". Consider running from the project root to let tools read pyproject.toml configs consistently.

-          - "components/spider-orchestration/spider_orchestration"
+          - "components/spider-orchestration"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b83e3cf and d5751df.

⛔ Files ignored due to path filters (1)
  • components/spider-orchestration/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • components/spider-orchestration/README.md (1 hunks)
  • components/spider-orchestration/pyproject.toml (1 hunks)
  • taskfiles/deps/main.yaml (9 hunks)
  • taskfiles/lint.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-08-20T05:38:30.720Z
Learnt from: junhaoliao
PR: y-scope/clp#1242
File: taskfile.yaml:545-547
Timestamp: 2025-08-20T05:38:30.720Z
Learning: The python component directories in the y-scope/clp repository are kept very clean and only contain essential files: source code directories, pyproject.toml, poetry.lock, README.md, and the dist/ directory. No cache directories, virtual environments, or other transient files exist in the component directories, so additional exclude patterns beyond "dist/**/*" are unnecessary.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • components/spider-orchestration/README.md
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • taskfiles/deps/main.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • taskfiles/deps/main.yaml
📚 Learning: 2025-09-15T18:53:15.844Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1226
File: taskfiles/deps/main.yaml:522-527
Timestamp: 2025-09-15T18:53:15.844Z
Learning: When ystdlib is built with -Dystdlib_BUILD_TESTING=OFF in taskfiles/deps/main.yaml, it does not require Catch2 settings to be injected via CMAKE_GEN_ARGS since testing is disabled and Catch2 is not needed during the build process.

Applied to files:

  • taskfiles/deps/main.yaml
🪛 markdownlint-cli2 (0.18.1)
components/spider-orchestration/README.md

3-3: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (7)
components/spider-orchestration/pyproject.toml (1)

13-22: Validate dependency pins and pre-release usage.

Confirm that yscope-spider-py "~0.0.1-beta.2" is intended (pre-release) and compatible with Python 3.10+, and that Pydantic v1 is required (v2 is common now). If not intentional, update pins accordingly.

taskfiles/deps/main.yaml (6)

21-29: LGTM: centralizing LIB_NAME variables improves maintainability.

Good move; reduces duplication and eases future renames.


84-92: Verify CI behaviour when spider/mariadb tasks are Linux/Ubuntu-gated.

They’re declared as deps of core-all-parallel. Ensure the platform gating won’t hard-fail on non-Linux runners (Task should skip cleanly). If not, gate at the caller or split platform-specific aggregates.


578-593: LGTM: ystdlib PIC enabled and LIB_NAME centralized.

Matches prior learning about testing OFF; PIC is a good default for linkers.


210-217: LGTM: adding Boost.Process target.

Likely needed by Spider; no issues.


454-471: LGTM: spdlog configured to use external fmt and centralized LIB_NAME.

Consistent with fmt dep; good.


409-421: LGTM: msgpack-cxx C++20 and CMP0074 NEW with centralized LIB_NAME.

Settings look correct.

Comment thread components/spider-orchestration/pyproject.toml Outdated
Comment thread components/spider-orchestration/pyproject.toml Outdated
Comment thread components/spider-orchestration/README.md Outdated
Comment thread taskfiles/deps/main.yaml
Comment thread taskfiles/deps/main.yaml Outdated
@sitaowang1998 sitaowang1998 changed the title feat: Add project setup for spider orchestration. feat: Add project setup for spider-orchestration. Sep 21, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5751df and 8a33b1c.

⛔ Files ignored due to path filters (1)
  • components/spider-orchestration/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • components/spider-orchestration/README.md (1 hunks)
  • components/spider-orchestration/pyproject.toml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • components/spider-orchestration/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
components/spider-orchestration/README.md (3)

1-1: Header reads well

Title is clear and scoped to Spider usage.


3-3: Remove trailing whitespace (markdownlint MD009).

There’s one trailing space at the end of Line 3.

-This Python module contains CLP's scheduler and worker to handle distributed compression.·
+This Python module contains CLP's scheduler and worker to handle distributed compression.

3-4: Fix README scope — don't claim the package ships a scheduler/worker.
This package orchestrates CLP via Spider and does not ship the scheduler or worker; tighten wording accordingly.

-This Python module contains CLP's scheduler and worker to handle distributed compression.
-CLP's Compression Job Handler can be used to interface and submit compression jobs to the CLP scheduler.
+This Python module orchestrates CLP compression jobs via Spider, integrating with the CLP scheduler and workers.
+The Compression Job Handler interfaces with the CLP scheduler to submit compression jobs.
components/spider-orchestration/pyproject.toml (3)

11-13: Confirm minimum Python version policy (3.9 EOL in Oct 2025).

If you don’t need 3.9, consider bumping to ">=3.10,<4.0" to align with Ubuntu 22.04+ and reduce maintenance.

Would you like a follow-up patch to raise and add classifiers accordingly?


1-3: Avoid duplicate Python version sources; drop the unused [project] block.

With poetry-core, metadata comes from [tool.poetry]. Keeping [project] risks conflicting signals.

Apply:

-[project]
-requires-python = ">= 3.9, <4.0"
-

9-9: README present — no action required.
components/spider-orchestration/README.md exists (verified); pyproject.toml's readme = "README.md" is valid.

Comment on lines +4 to +9
[tool.poetry]
name = "spider-orchestration"
version = "0.5.1-dev"
description = "Scheduler and worker cluster for CLP's distributed architecture."
authors = ["YScope Inc. <dev@yscope.com>"]
readme = "README.md"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Add license metadata.

Include the project licence in package metadata.

Apply:

 readme = "README.md"
+license = "Apache-2.0"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[tool.poetry]
name = "spider-orchestration"
version = "0.5.1-dev"
description = "Scheduler and worker cluster for CLP's distributed architecture."
authors = ["YScope Inc. <dev@yscope.com>"]
readme = "README.md"
[tool.poetry]
name = "spider-orchestration"
version = "0.5.1-dev"
description = "Scheduler and worker cluster for CLP's distributed architecture."
authors = ["YScope Inc. <dev@yscope.com>"]
readme = "README.md"
license = "Apache-2.0"
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around lines 4-9, the Poetry
project metadata is missing license information; add a license entry (use the
repository's SPDX identifier, e.g. license = "MIT") and include license-file =
"LICENSE" under the [tool.poetry] table so the package metadata exposes the
project's license and points to the LICENSE file.


[tool.poetry]
name = "spider-orchestration"
version = "0.5.1-dev"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use a PEP 440–compliant version.

"0.5.1-dev" isn’t valid per PEP 440 and can break build/publish. Use a .devN pre-release.

Apply:

-version = "0.5.1-dev"
+version = "0.5.1.dev0"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
version = "0.5.1-dev"
version = "0.5.1.dev0"
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around line 6, the version
"0.5.1-dev" is not PEP 440–compliant; replace it with a PEP 440 .dev pre-release
(for example "0.5.1.dev0" or "0.5.1.dev1") by updating the version field
accordingly so packaging and publishing tools accept it.

Comment on lines +14 to +18
# MariaDB version must be compatible with libmariadb-dev <3.3.1 installed in the runtime
# environment. The default version of libmariadb-dev in Ubuntu 22.04 is 3.2.6.
# See https://mariadb.com/docs/connectors/mariadb-connector-python/install#build-prerequisites
mariadb = "~1.0.11"
msgpack = "^1.0.7"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Gate mariadb install to Linux to prevent cross‑platform failures.

The MariaDB Python package often relies on system libmariadb; on macOS/Windows this can fail unexpectedly.

Apply:

-mariadb = "~1.0.11"
+mariadb = { version = "~1.0.11", markers = "sys_platform == 'linux'" }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# MariaDB version must be compatible with libmariadb-dev <3.3.1 installed in the runtime
# environment. The default version of libmariadb-dev in Ubuntu 22.04 is 3.2.6.
# See https://mariadb.com/docs/connectors/mariadb-connector-python/install#build-prerequisites
mariadb = "~1.0.11"
msgpack = "^1.0.7"
# MariaDB version must be compatible with libmariadb-dev <3.3.1 installed in the runtime
# environment. The default version of libmariadb-dev in Ubuntu 22.04 is 3.2.6.
# See https://mariadb.com/docs/connectors/mariadb-connector-python/install#build-prerequisites
mariadb = { version = "~1.0.11", markers = "sys_platform == 'linux'" }
msgpack = "^1.0.7"
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around lines 14 to 18, the
mariadb dependency must be gated to Linux to avoid install failures on
macOS/Windows; change the mariadb entry to be platform‑marked so it only
installs on Linux (use a PEP 508 marker such as sys_platform == "linux" on the
mariadb requirement), leaving msgpack unchanged.

Comment on lines +14 to +16
# MariaDB version must be compatible with libmariadb-dev <3.3.1 installed in the runtime
# environment. The default version of libmariadb-dev in Ubuntu 22.04 is 3.2.6.
# See https://mariadb.com/docs/connectors/mariadb-connector-python/install#build-prerequisites

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Validate the libmariadb note and keep it current.

The comment references specific libmariadb-dev versions on Ubuntu 22.04. Please verify these constraints against the current MariaDB Connector/Python 1.0.x guidance.


🌐 Web query:

What libmariadb (Connector/C) versions are compatible with MariaDB Connector/Python 1.0.11 on Ubuntu 22.04, and is there any documented upper bound (e.g., <3.3.1)?

💡 Result:

Short answer

  • MariaDB Connector/Python 1.0.11 requires MariaDB Connector/C (libmariadb) >= 3.1.5 per the official install docs for the 1.0.x line. [1][2]
  • There was a reported installer/runtime check that required >= 3.2.4 for some installs (Jira CONPY-215), so in practice you may need Connector/C ≥ 3.2.4 on some systems. [3]
  • There is no documented upper‑bound (e.g. “< 3.3.1”) listed for Connector/Python 1.0.11; later Connector/Python 1.1.x simply raised the documented minimum to Connector/C ≥ 3.3.1. [1][4]

Sources

  1. MariaDB Connector/Python installation / dependencies (shows 3.1.5 minimum for <1.1.0 and 3.3.1 for 1.1+). [1]
  2. Release notes for Connector/Python 1.0.11. [2]
  3. MariaDB Jira CONPY‑215 (installer reported requirement >= 3.2.4 for 1.0.11). [3]
  4. Connector/Python 1.1 install docs (Connector/C >= 3.3.1). [4]

If you want, I can check which libmariadb package versions are available on Ubuntu 22.04 (apt names/versions) and recommend a specific package to install.


Update libmariadb note: remove the '<3.3.1' upper bound and state the correct Connector/C minima
Connector/Python 1.0.11 documents MariaDB Connector/C (libmariadb) ≥ 3.1.5 (some installs reported needing ≥ 3.2.4); there is no documented upper bound such as "<3.3.1" — Connector/Python 1.1+ raises the minimum to ≥ 3.3.1. Update components/spider-orchestration/pyproject.toml (lines 14–16) to reflect this.

🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around lines 14 to 16, remove
the erroneous "<3.3.1" upper bound and update the note to state the correct
Connector/C (libmariadb) minima: for Connector/Python 1.0.11 require libmariadb
≥ 3.1.5 (with some installs reporting need for ≥ 3.2.4), and note that
Connector/Python 1.1+ raises the minimum to ≥ 3.3.1; keep the Ubuntu 22.04
default version note and link reference intact.

Comment on lines +25 to +27
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api" No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Pin a minimum poetry-core for reproducibility.

Unpinned build backends can change behaviour subtly across environments.

Apply:

-requires = ["poetry-core"]
+requires = ["poetry-core>=1.8.0"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
[build-system]
requires = ["poetry-core>=1.8.0"]
build-backend = "poetry.core.masonry.api"
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around lines 25 to 27, pin the
poetry-core build dependency to a minimum version to ensure reproducible builds;
change the requires entry from ["poetry-core"] to a minimum-pinned spec such as
["poetry-core>=1.5.0"] (keep build-backend = "poetry.core.masonry.api"
unchanged).

# CLP Job Orchestration using Spider

This Python module contains CLP's scheduler and worker to handle distributed compression.
CLP's Compression Job Handler can be used to interface and submit compression jobs to the CLP scheduler.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

Add platform/build note for Spider and MariaDB Connector C++.

Readers should know these C++ deps are built only on Ubuntu as per this PR.

Suggested addition (new paragraph after Line 4):

+Note: As part of project setup, the Spider library and MariaDB Connector/C++ are built and installed on Ubuntu platforms only. Other platforms skip these native builds.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CLP's Compression Job Handler can be used to interface and submit compression jobs to the CLP scheduler.
CLP's Compression Job Handler can be used to interface and submit compression jobs to the CLP scheduler.
Note: As part of project setup, the Spider library and MariaDB Connector/C++ are built and installed on Ubuntu platforms only. Other platforms skip these native builds.
🤖 Prompt for AI Agents
In components/spider-orchestration/README.md around line 4, add a new paragraph
after the existing sentence stating that the Spider and MariaDB Connector C++
dependencies are built only on Ubuntu (per the referenced PR); briefly note the
supported OS (Ubuntu), any required build/toolchain/version constraints if
known, and recommend using Ubuntu or an Ubuntu-based Docker image for local
builds or CI; include a link or reference to the PR for details.

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we proceed: do we need a new namespace under components? I thought spider related subcomponents can also be located in the current job-orchestration: the current job orchestration already contains non-celery related components.
But as I'm also new to the package side of thing, let's wait for @junhaoliao @kirkrodrigues to confirm.

@kirkrodrigues

Copy link
Copy Markdown
Member

Before we proceed: do we need a new namespace under components? I thought spider related subcomponents can also be located in the current job-orchestration: the current job orchestration already contains non-celery related components. But as I'm also new to the package side of thing, let's wait for @junhaoliao @kirkrodrigues to confirm.

Yeah, name-wise, I think it makes more sense to be under job-orchestration. I guess Sitao was making it a separate "component" since the package will only run one or the other (job-orchestration or spider-orchestration)?

@sitaowang1998

Copy link
Copy Markdown
Contributor Author

I have two reasons to use a separate "component":

  1. The task function name is already occupied. I can go with spider_compress though.
  2. It will be easier to do the switch later if we have a component that does not have celery dependency.

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add deps:spider to the package workflow? It's better that the changes to this type of PR can be tested immediately on the CI.

Comment thread taskfiles/deps/main.yaml
Comment thread taskfiles/deps/main.yaml Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc443f0 and 98d6563.

📒 Files selected for processing (1)
  • taskfiles/deps/main.yaml (17 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • taskfiles/deps/main.yaml
📚 Learning: 2025-05-28T18:33:30.155Z
Learnt from: anlowee
PR: y-scope/clp#925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.

Applied to files:

  • taskfiles/deps/main.yaml
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Applied to files:

  • taskfiles/deps/main.yaml
📚 Learning: 2025-09-15T18:53:15.844Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1226
File: taskfiles/deps/main.yaml:522-527
Timestamp: 2025-09-15T18:53:15.844Z
Learning: When ystdlib is built with -Dystdlib_BUILD_TESTING=OFF in taskfiles/deps/main.yaml, it does not require Catch2 settings to be injected via CMAKE_GEN_ARGS since testing is disabled and Catch2 is not needed during the build process.

Applied to files:

  • taskfiles/deps/main.yaml
📚 Learning: 2025-08-19T07:08:15.583Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1229
File: taskfiles/deps/main.yaml:494-504
Timestamp: 2025-08-19T07:08:15.583Z
Learning: In zlib v1.3.1's CMake build system, only ZLIB_BUILD_EXAMPLES is a valid option. Other ZLIB_BUILD_* flags (MINIZIP, SHARED, STATIC, TESTING) and ZLIB_INSTALL are not defined in the CMakeLists.txt and have no effect. Use BUILD_SHARED_LIBS=ON for shared library control and ZLIB_BUILD_EXAMPLES=OFF to disable examples.

Applied to files:

  • taskfiles/deps/main.yaml
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • taskfiles/deps/main.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: ubuntu-jammy-deps-image
  • GitHub Check: musllinux_1_2-x86_64-deps-image
  • GitHub Check: centos-stream-9-deps-image
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (6)
taskfiles/deps/main.yaml (6)

10-18: LGTM! Directory structure changes are consistent.

The migration from core to cpp naming is applied consistently across all directory variables and paths. This aligns with the PR objective to reorganize the build layout.


91-111: LGTM! Spider task implementation is correct.

The spider task correctly uses preconditions instead of status, restricts to Ubuntu as intended, and includes proper dependency management through the spider-dep task. The CMake arguments are appropriate for the spider build configuration.


113-125: LGTM! Spider dependency task follows established patterns.

The spider-dep task correctly mirrors the structure of the core task, ensuring spider dependencies are built and configured properly before the main spider build.


127-139: LGTM! Spider parallel dependencies are correctly defined.

The dependency list for spider-all-parallel includes all necessary libraries for building spider, including the MariaDB connector which is appropriately Ubuntu-gated.


391-411: LGTM! MariaDB connector task correctly implements Ubuntu gating.

The task properly uses preconditions with detailed comments explaining the Ubuntu limitation and DEB install layout requirement. The implementation addresses previous review feedback effectively.


596-596: Remove unnecessary CMAKE_POSITION_INDEPENDENT_CODE flag.

Based on the past review comments indicating this flag is not needed, consider removing it unless spider specifically requires position-independent code for shared library usage.

Apply this diff to remove the unnecessary flag:

          CMAKE_GEN_ARGS:
            - "-C {{.G_DEPS_CPP_CMAKE_SETTINGS_DIR}}/{{.G_BOOST_LIB_NAME}}.cmake"
-            - "-DCMAKE_POSITION_INDEPENDENT_CODE=ON"
            - "-Dystdlib_BUILD_TESTING=OFF"

Comment thread taskfiles/deps/main.yaml
TARGETS:
- "filesystem"
- "iostreams"
- "process"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick

❓ Verification inconclusive

Verify spider dependency on boost::process.

The addition of "process" to the boost targets suggests spider requires this library. Ensure this dependency is actually needed by spider's codebase.
The search results don't provide specific information about y-scope spider's boost::process dependency. Let me check for more specific information about the spider project directly.


🏁 Script executed:

#!/bin/bash
# Check if spider repository contains any references to boost process
rg -i "process" --type cpp --type hpp -C3 -g '!test*' | head -20

Length of output: 27


🏁 Script executed:

#!/bin/bash
# Search for any inclusion or direct usage of Boost.Process
rg -n "#include.*boost/process.hpp" -C2 -g '!test*'
rg -n "boost::process::" -C2 -g '!test*'
rg -n "Boost::process" -C2 -g '!test*'

Length of output: 446


🏁 Script executed:

#!/bin/bash
# Search entire repo for Boost.Process includes and namespace usages without file‐type filters
grep -RIn "#include\s*<boost/process.hpp>" .
grep -RIn "boost::process::" .

Length of output: 0


Remove unused Boost.Process dependency
Search across the repository (code and build files) shows no #include <boost/process.hpp> or boost::process usages; drop "process" from the Boost targets in taskfiles/deps/main.yaml.

🤖 Prompt for AI Agents
In taskfiles/deps/main.yaml around line 254, the Boost target list includes
"process" but there are no usages of Boost.Process in the repository; remove the
"process" entry from the Boost targets list in that file so the dependency is no
longer declared in your taskfiles/deps/main.yaml.

Comment thread .github/workflows/clp-artifact-build.yaml Outdated

@LinZhihao-723 LinZhihao-723 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(clp-package): Add Spider and its dependencies to the build.

@sitaowang1998 sitaowang1998 changed the title feat: Add spider build task; Add yscope-spider-py as job-orchestration dependency. feat(clp-package): Add Spider and its dependencies to the build. Sep 24, 2025
Comment thread taskfile.yaml
- "core"
- "clp-package-utils"
- "clp-py-utils"
- "deps:spider"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should let deps:spider compute a checksum file and let add the checksum file to the package task's source list; otherwise, the package might not be re-run if there are changes in the artifacts generated.

alternatively, we can simply add spider_scheduler and spider_worker to the package task's source list.

@coderabbitai create an issue to track

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create GitHub issue to track spider artifact checksum problem

gh issue create \
  --title "Add checksum tracking for spider artifacts in package task" \
  --body "## Problem

The \`package\` task in \`taskfile.yaml\` currently depends on \`deps:spider\` but does not track changes to spider artifacts through checksums or direct source dependencies. This means the \`package\` task might not re-run when spider artifacts (\`spider_scheduler\` and \`spider_worker\`) change.

## Proposed Solutions

1. **Option 1**: Let \`deps:spider\` compute a checksum file and add it to the \`package\` task's \`source\` list
2. **Option 2**: Add \`spider_scheduler\` and \`spider_worker\` binaries directly to the \`package\` task's \`source\` list

## Context

- PR: https://github.com/y-scope/clp/pull/1318
- Comment: https://github.com/y-scope/clp/pull/1318#discussion_r2425015201
- Requested by: @junhaoliao" \
  --assignee junhaoliao \
  --label "enhancement"

Length of output: 42


@junhaoliao, I've created issue #1409 to track adding checksum tracking for spider artifacts in the package task: #1409

The issue includes both proposed solutions and references this PR and comment as backlinks.

Comment thread taskfile.yaml

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(first reported by @sitaowang1998 )

it was found that libmariadbcpp.so, which both spider_scheduler and spider_worker dynamically link to, are not added into the package execution environment (i.e., the clp-package Docker execution image)

@coderabbitai create an issue to track

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Create a GitHub issue for the missing libmariadbcpp.so library
gh issue create \
  --title "Missing libmariadbcpp.so in clp-package Docker execution image" \
  --body "## Description

The shared library \`libmariadbcpp.so\`, which both \`spider_scheduler\` and \`spider_worker\` dynamically link to, is not included in the package execution environment (the \`clp-package\` Docker execution image). This will cause runtime failures when attempting to execute these binaries.

## Context

- Originally reported by: @sitaowang1998
- Reported by: @junhaoliao
- Related PR: #1318
- Comment: https://github.com/y-scope/clp/pull/1318#discussion_r2425018371

## Impact

The spider scheduler and worker binaries cannot run in the Docker execution environment without this library present." \
  --assignee "junhaoliao"

Length of output: 42


@junhaoliao, I've created issue #1410 to track the missing libmariadbcpp.so library in the clp-package Docker execution environment. The issue includes references to this PR and the original reporters.

🐰✨

junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
…cope#1318)

Co-authored-by: LinZhihao-723 <zh.lin@mail.utoronto.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants