Skip to content

fix(deployment): Enforce read-only permissions on logs input and stream directories (fixes #1689).#1690

Merged
junhaoliao merged 7 commits into
y-scope:mainfrom
junhaoliao:fix-ro
Dec 8, 2025
Merged

fix(deployment): Enforce read-only permissions on logs input and stream directories (fixes #1689).#1690
junhaoliao merged 7 commits into
y-scope:mainfrom
junhaoliao:fix-ro

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Nov 28, 2025

Copy link
Copy Markdown
Member

Description

Service Directory (container path) Change in diff
compression-scheduler /mnt/logs Added :ro (read-only)
compression-worker /mnt/logs Added :ro (read-only)
query-worker /var/data/archives Added :ro (read-only)
webui /var/data/streams Added :ro (read-only)

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

task

cd build/clp-package
./sbin/start-clp.sh
./sbin/compress.sh --timestamp-key=Timestamp ~/samples/spark-event-logs/

Then, in the webui, performed search with "1" and observed results returned. Clicked the "Original file" link in any results and observed the log viewer is opened with the log context shown.

Summary by CodeRabbit

  • Chores
    • Tightened deployment configuration: made logging and output directory mounts read-only for compression-scheduler, compression-worker, webui and query-worker to enforce stricter file access permissions.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Nov 28, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

The pull request tightens bind mount access in the Docker Compose file by appending :ro to multiple host:container binds (CLP_LOGS_INPUT_DIR, CLP_STREAM_OUTPUT_DIR, CLP_ARCHIVE_OUTPUT_DIR) across services (compression-scheduler, compression-worker, webui, query-worker). No wiring or public API signatures changed.

Changes

Cohort / File(s) Summary
Docker Compose Configuration
tools/deployment/package/docker-compose-all.yaml
Append :ro to bind mount specifications for CLP_LOGS_INPUT_DIR, CLP_STREAM_OUTPUT_DIR, and CLP_ARCHIVE_OUTPUT_DIR in services compression-scheduler, compression-worker, webui, and query-worker, converting those mounts to read-only access.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Review the affected service mount lines in tools/deployment/package/docker-compose-all.yaml to confirm no mounts that require write access were made read-only (especially for any service performing output/archive operations).
  • Verify runtime behaviour in environments where those directories are used (e.g., ensure workers or scheduler do not need write permissions).

Possibly related issues

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: enforcing read-only permissions on logs input and stream directories in the deployment configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3697a34 and b6f07bc.

📒 Files selected for processing (1)
  • tools/deployment/package/docker-compose-all.yaml (4 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
📚 Learning: 2025-10-07T07:54:32.427Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-py-utils/clp_py_utils/clp_config.py:47-47
Timestamp: 2025-10-07T07:54:32.427Z
Learning: In components/clp-py-utils/clp_py_utils/clp_config.py, the CONTAINER_AWS_CONFIG_DIRECTORY constant is intentionally set to pathlib.Path("/") / ".aws" (i.e., `/.aws`) rather than a user-specific home directory. This hardcoded path is part of the container orchestration design.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
tools/deployment/package/docker-compose-all.yaml (1)

267-267: LGTM—read-only enforcement applied consistently.

The :ro flags are correctly applied across services to enforce the principle of least privilege:

  • compression-scheduler (line 267) and compression-worker (line 302): Logs input mounted read-only—these services only read source logs for compression.
  • query-worker (line 484): Archives mounted read-only—query-worker reads archives but does not modify them; line 487 correctly preserves read-write access to /var/data/streams since query-worker writes to that directory (per learnings).
  • webui (line 370): Streams mounted read-only—the UI reads streams for display only.

The PR also correctly leaves spider-compression-worker's logs mount (line 334) unchanged, consistent with the noted intent in the summary. All write-capable mounts (compression-worker archives, staged-archives, query-worker staged-streams, garbage-collector streams) remain correctly unpinned from read-only constraints.

Also applies to: 302-302, 370-370, 484-484


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao marked this pull request as ready for review November 28, 2025 20:44
@junhaoliao junhaoliao requested a review from a team as a code owner November 28, 2025 20:44
Comment thread tools/deployment/package/docker-compose-all.yaml

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a7c1e5 and 3697a34.

📒 Files selected for processing (1)
  • tools/deployment/package/docker-compose-all.yaml (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Applied to files:

  • tools/deployment/package/docker-compose-all.yaml
🔇 Additional comments (1)
tools/deployment/package/docker-compose-all.yaml (1)

239-239: All four specified read-only mount changes are correctly implemented.

The PR successfully adds :ro flags to:

  • compression-scheduler /mnt/logs (line 239)
  • compression-worker /mnt/logs (line 273)
  • query-worker /var/data/archives (line 424) — addresses prior reviewer concern
  • webui /var/data/streams (line 310)

The changes align with the PR objectives and validation. The prior reviewer's question about query-worker archive access has been resolved by adding :ro.

Also applies to: 273-273, 310-310, 424-424

Comment on lines +424 to 427
- "${CLP_ARCHIVE_OUTPUT_DIR_HOST:-empty}:/var/data/archives:ro"
- "${CLP_AWS_CONFIG_DIR_HOST:-empty}:/opt/clp/.aws:ro"
- "${CLP_STAGED_STREAM_OUTPUT_DIR_HOST:-empty}:/var/data/staged-streams"
- "${CLP_STREAM_OUTPUT_DIR_HOST:-empty}:/var/data/streams"

@coderabbitai coderabbitai Bot Dec 3, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider enforcing read-only for query-worker's streams directory in a follow-up.

Query-worker reads archives (line 424, now :ro) and streams (line 427, no :ro), but writes only to staged-streams (line 426). If query-worker doesn't modify /var/data/streams, the mount at line 427 should also be read-only for consistency with webui (line 310) and defensive security posture.

This is outside the current PR scope but worth validating for a future enhancement.

🤖 Prompt for AI Agents
In tools/deployment/package/docker-compose-all.yaml around lines 424 to 427, the
query-worker service mounts /var/data/streams without read-only flag while
archives and staged-streams are mounted read-only; verify whether query-worker
ever writes to /var/data/streams and if it does not, change the bind for
CLP_STREAM_OUTPUT_DIR_HOST to be read-only (add :ro) for consistency and
improved security; run a quick grep/search and a local integration run to
confirm no writes occur, then update the docker-compose file and any relevant
docs/tests in a follow-up commit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the query worker writes to /var/data/streams

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao, thank you for the clarification! That makes perfect sense—if the query-worker writes to /var/data/streams, then it correctly needs read-write access, and the current configuration is appropriate. I appreciate you confirming this architectural detail.


✏️ Learnings added
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.214Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Can you resolve the conflict?

# Conflicts:
#	tools/deployment/package/docker-compose-all.yaml
@junhaoliao junhaoliao merged commit fb9eb12 into y-scope:main Dec 8, 2025
19 checks passed
@junhaoliao junhaoliao deleted the fix-ro branch May 7, 2026 19:46
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants