-
Notifications
You must be signed in to change notification settings - Fork 136
[linstor] Build linstor-server with custom patches #1726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThis pull request introduces a new piraeus-server Docker image build process integrated into the system build pipeline, implements DRBD stall detection logic in satellite operations, adds multiple LINSTOR Server patches for disk toggle handling, and migrates Helm templates from piraeus-operator image config references to explicit image repository/tag values. Changes
Sequence DiagramsequenceDiagram
participant Satellite as Satellite Loop
participant DRBD as DRBD Status
participant State as Persistent State
participant Action as Disconnect/Reconnect
rect rgb(200, 220, 240)
Note over Satellite,Action: DRBD Stall Detection & Recovery Cycle
loop Every INTERVAL_SEC
Satellite->>DRBD: Query DRBD status
DRBD-->>Satellite: Resource list with peers, sync %
Satellite->>Satellite: drbd_stall_candidates()<br/>(identify Inconsistent+SyncTarget)
rect rgb(240, 200, 200)
Note over State: Per-Resource Stall Tracking
Satellite->>State: drbd_stall_load_state(resource)
State-->>Satellite: Previous sync % (or none)
Satellite->>Satellite: Compare current % vs previous %
alt Sync % unchanged after STALL_ITERS
Satellite->>Action: drbd_stall_act(peer)<br/>(disconnect → reconnect)
Action-->>Satellite: Peer reconnected
Satellite->>State: drbd_stall_save_state(resource)<br/>(reset progress)
else Sync % progressing
Satellite->>State: drbd_stall_save_state(resource)<br/>(update %)
end
end
Satellite->>Satellite: Continue loop<br/>(sleep INTERVAL_SEC)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the build process for the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the build process for linstor-server by adding a new Dockerfile and updating the Makefiles. The overall approach of using a multi-stage build to compile from source is sound. However, my review identifies several critical and high-severity issues related to error handling in the build scripts. Many commands use || true, which dangerously suppresses errors and can lead to silent build failures, producing broken container images. I've provided suggestions to make the build process more robust by failing fast on errors. Additionally, I've included recommendations to improve maintainability by parameterizing hardcoded versions and simplifying the build steps.
Add patches for piraeus-server: - adjust-on-resfile-change: handle resource file changes - allow-toggle-disk-retry: enable disk operation retries - force-metadata-check-on-disk-add: verify metadata on disk addition Update plunger-satellite script and values.yaml for new build. Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
packages/system/linstor/images/piraeus-server/Dockerfile (3)
27-31: Git commit in Docker build adds complexity but may be necessary.The git commit after applying patches appears to be required for
dpkg-buildpackageto work correctly with modified source. If the build requires a clean git state, this is acceptable. Otherwise, consider ifgit apply --indexalone would suffice.
131-134: Package installation pattern is acceptable but could be clearer.The
dpkg -i ... || apt-get install -f -ypattern is standard Debian practice:dpkgmay fail on missing dependencies, andapt-get install -fresolves them. The preceding validation at lines 68-70 ensures packages exist in the builder stage.However, if the
COPY --from=builderfails silently, thels packages/*.deb >/dev/nullwould fail and skip installation. Consider adding explicit validation:🔎 Suggested improvement
- && ls packages/*.deb >/dev/null && (dpkg -i packages/*.deb || apt-get install -f -y) \ + && test -n "$(ls packages/*.deb 2>/dev/null)" || (echo "ERROR: No .deb packages found" && exit 1) \ + && (dpkg -i packages/*.deb || apt-get install -f -y) \
158-167: Consider centralizing version constants as build args for consistency.
K8S_AWAIT_ELECTION_VERSIONandLOSETUP_CONTAINER_VERSIONare hardcoded here. For consistency withLINSTOR_VERSION(which is passed via build arg from the Makefile), consider exposing these as build args too. This centralizes version management.Current approach is functional; this is a maintainability suggestion.
packages/system/linstor/Makefile (1)
9-23: Verify values.yaml initialization for yq commands.The image target builds and tags the piraeus-server image correctly, then updates
values.yamlwith the repository and digest-pinned tag. However, ifvalues.yamldoesn't exist or is not valid YAML, theyq -icommands on lines 20 and 22 will fail.Consider adding a check or initialization step to ensure
values.yamlexists:🔎 Proposed initialization check
image: + test -f values.yaml || echo '{}' > values.yaml docker buildx build images/piraeus-server \ --build-arg LINSTOR_VERSION=$(LINSTOR_VERSION) \packages/system/linstor/images/piraeus-server/patches/force-metadata-check-on-disk-add.diff (1)
20-32: Metadata creation logic is correct, but consider extracting the flag check.The condition properly allows metadata creation during toggle-disk operations. However, the flag check pattern is repeated across multiple hunks.
💡 Optional refactoring to reduce duplication
Consider extracting the repeated flag check into a helper method in
DrbdLayer.java:private boolean isDiskAdding(AccessContext workerCtx, DrbdRscData<Resource> drbdRscData) throws AccessDeniedException { return drbdRscData.getAbsResource().getStateFlags() .isSomeSet(workerCtx, Resource.Flags.DISK_ADD_REQUESTED, Resource.Flags.DISK_ADDING); }This would reduce duplication and improve maintainability, though given this is a patch to upstream code, minimizing the change footprint may be intentional.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
Makefilepackages/system/linstor/Makefilepackages/system/linstor/hack/plunger/plunger-satellite.shpackages/system/linstor/images/piraeus-server/Dockerfilepackages/system/linstor/images/piraeus-server/patches/README.mdpackages/system/linstor/images/piraeus-server/patches/adjust-on-resfile-change.diffpackages/system/linstor/images/piraeus-server/patches/allow-toggle-disk-retry.diffpackages/system/linstor/images/piraeus-server/patches/force-metadata-check-on-disk-add.diffpackages/system/linstor/images/piraeus-server/patches/skip-adjust-when-device-inaccessible.diffpackages/system/linstor/templates/_helpers.tplpackages/system/linstor/templates/cluster.yamlpackages/system/linstor/templates/satellites-cozy.yamlpackages/system/linstor/templates/satellites-plunger.yamlpackages/system/linstor/values.yaml
💤 Files with no reviewable changes (1)
- packages/system/linstor/templates/_helpers.tpl
🧰 Additional context used
📓 Path-based instructions (3)
**/*.md
📄 CodeRabbit inference engine (AGENTS.md)
Git commits should follow
[component] Descriptionformat and include--signoffflag
Files:
packages/system/linstor/images/piraeus-server/patches/README.md
packages/system/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/system/linstor/images/piraeus-server/patches/README.mdpackages/system/linstor/Makefilepackages/system/linstor/images/piraeus-server/Dockerfilepackages/system/linstor/images/piraeus-server/patches/skip-adjust-when-device-inaccessible.diffpackages/system/linstor/values.yamlpackages/system/linstor/images/piraeus-server/patches/force-metadata-check-on-disk-add.diffpackages/system/linstor/templates/cluster.yamlpackages/system/linstor/images/piraeus-server/patches/adjust-on-resfile-change.diffpackages/system/linstor/templates/satellites-cozy.yamlpackages/system/linstor/hack/plunger/plunger-satellite.shpackages/system/linstor/templates/satellites-plunger.yamlpackages/system/linstor/images/piraeus-server/patches/allow-toggle-disk-retry.diff
**/*.yaml
📄 CodeRabbit inference engine (AGENTS.md)
Do NOT directly edit upstream Helm charts in vendored charts directory; use proper chart vendoring mechanisms
Files:
packages/system/linstor/values.yamlpackages/system/linstor/templates/cluster.yamlpackages/system/linstor/templates/satellites-cozy.yamlpackages/system/linstor/templates/satellites-plunger.yaml
🧠 Learnings (2)
📚 Learning: 2025-06-26T04:29:24.830Z
Learnt from: NickVolynkin
Repo: cozystack/cozystack PR: 1117
File: packages/apps/mysql/Makefile:8-8
Timestamp: 2025-06-26T04:29:24.830Z
Learning: The cozystack project uses yq v4+ on their CI runner, so yq v4 syntax (-o json --indent 4) is compatible and version checks are not needed.
Applied to files:
packages/system/linstor/Makefile
📚 Learning: 2025-07-14T16:23:12.803Z
Learnt from: NickVolynkin
Repo: cozystack/cozystack PR: 1196
File: packages/apps/http-cache/Makefile:24-27
Timestamp: 2025-07-14T16:23:12.803Z
Learning: In the cozystack repository, the `readme-generator` tool removes enum contents from values.schema.json files during its operation. Therefore, when using readme-generator in Makefiles, any enum values need to be injected back into the schema using yq commands after readme-generator has run, not before.
Applied to files:
packages/system/linstor/Makefile
🪛 checkmake (0.2.2)
packages/system/linstor/Makefile
[warning] 9-9: Target body for "image" exceeds allowed length of 5 (14).
(maxbodylength)
🪛 YAMLlint (1.37.1)
packages/system/linstor/templates/cluster.yaml
[error] 31-31: syntax error: expected , but found ''
(syntax)
packages/system/linstor/templates/satellites-cozy.yaml
[error] 16-16: syntax error: expected , but found ''
(syntax)
packages/system/linstor/templates/satellites-plunger.yaml
[error] 14-14: syntax error: expected , but found ''
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (17)
packages/system/linstor/images/piraeus-server/patches/adjust-on-resfile-change.diff (1)
1-48: LGTM!The patch correctly addresses two toggle-disk related issues:
- The
isDiskAddingflag check ensures the actual device path is used during disk-add operations instead of incorrectly treating it as diskless.- The fixed LUKS2 data offset (16 MiB = 32768 sectors) ensures consistent header sizing across nodes, preventing "Low.dev. smaller than requested DRBD-dev. size" errors during toggle-disk.
Both changes are well-documented with inline comments explaining the rationale.
packages/system/linstor/images/piraeus-server/patches/skip-adjust-when-device-inaccessible.diff (1)
1-93: LGTM!The patch adds defensive guards to prevent errors when underlying devices (e.g., LUKS volumes) are inaccessible during resource deletion. The
canRegenerateResFileandcanAdjustchecks properly verify device accessibility before attempting regeneration or adjustment operations.Note: The duplicate checking pattern between lines 13-28 and 44-65 could theoretically be extracted to a helper, but since this is an upstream patch maintaining parity with the LINBIT PR, keeping it as-is is appropriate.
packages/system/linstor/hack/plunger/plunger-satellite.sh (3)
59-65: Consider separating disconnect and connect for better error handling.If
drbdadm disconnectfails, the&&preventsconnectfrom running, which is correct. However, after a failed disconnect attempt, logging "action failed" may not provide enough context. Consider adding explicit error handling or retrying on next iteration.The current implementation is acceptable since failures are logged and the next iteration will re-evaluate the stall condition.
67-126: LGTM - Well-designed stall detection logic.The state machine correctly tracks resync progress across iterations:
- Persists
(resource, peer, percent, count, last_action_time)state- Increments count when percent unchanged, resets on progress
- Triggers reconnect after
STALL_ITERSconsecutive stalls- Uses atomic file writes (
mvfrom tmp) for state persistence
128-159: LGTM!The main loop integration is clean:
- Configurable interval via
INTERVAL_SEC- Stall detection runs after existing loop detachment and secondary volume handling
- Error suppression (
|| true) ondrbd_fix_stalled_syncprevents the plunger from exiting on transient errorspackages/system/linstor/images/piraeus-server/Dockerfile (1)
68-70: Good addition of package validation.The explicit check for built
.debpackages prevents silent build failures, addressing a previously identified concern.packages/system/linstor/images/piraeus-server/patches/allow-toggle-disk-retry.diff (1)
1-235: LGTM - Comprehensive retry/cancel logic for toggle-disk operations.The patch properly handles several edge cases:
- Cancel add-disk: Reverts to diskless using the existing removal flow for proper satellite cleanup
- Retry add-disk: Cleans up partial layer data and recreates to ensure a fresh state
- Cancel remove-disk: Simply clears the request flag
- Retry remove-disk: Continues the pending operation
- Orphaned layers: Detects and cleans up storage layers on resources marked as diskless
The
hasNonDisklessStorageLayerhelper correctly traverses the layer tree recursively to detect non-diskless storage providers.packages/system/linstor/templates/satellites-cozy.yaml (1)
15-17: LGTM!The image field addition is consistent with other template changes in this PR. The YAMLlint syntax error is a false positive—the linter doesn't understand Helm templating syntax (
{{ }}), but the template will render correctly.packages/system/linstor/images/piraeus-server/patches/README.md (1)
1-12: LGTM - Good documentation of patches with upstream references.The README properly documents each patch with:
- Brief description of what it fixes
- Upstream PR references for tracking when patches can be dropped
All documented patches exist in the directory, confirming the documentation is accurate and helps maintainability by making it clear when to remove patches after upstream merges them.
packages/system/linstor/templates/satellites-plunger.yaml (1)
14-14: LGTM! Explicit image references improve clarity.The migration from template helper functions to explicit
.Values.piraeusServer.imagereferences makes the image sources clearer and more maintainable. Both container image references are consistent with the new approach.Note: The YAMLlint syntax error is a false positive—it doesn't recognize Helm template syntax.
Also applies to: 51-51
Makefile (1)
21-21: LGTM! Logical integration into build sequence.The linstor image build step is appropriately placed in the build sequence and aligns with the PR's objective to build piraeus-server with custom patches.
packages/system/linstor/templates/cluster.yaml (1)
30-31: LGTM! Consistent image reference migration.The addition of the
linstor-controllercontainer and the update to theplungercontainer both use explicit.Values.piraeusServer.imagereferences, which is consistent with the migration pattern across other templates. This improves clarity and maintainability.Note: The YAMLlint syntax error on line 31 is a false positive—it doesn't recognize Helm template syntax.
Also applies to: 33-33
packages/system/linstor/values.yaml (1)
1-4: Initial image configuration added correctly.The new
piraeusServer.imagevalues provide the image configuration referenced by the updated templates. The tag formatlatest@sha256:...is valid and pins the image to a specific digest while maintaining the "latest" label.Note: These values serve as the initial state and will be updated by the Makefile during the build process (lines 19-22 of
packages/system/linstor/Makefile).packages/system/linstor/images/piraeus-server/patches/force-metadata-check-on-disk-add.diff (4)
5-19: LGTM! Logic for processing children during disk-add is correct.The patch correctly extends
processChildrento handle toggle-disk operations by detectingDISK_ADD_REQUESTEDorDISK_ADDINGstates. This ensures child layers are processed when converting from diskless to diskful.
33-44: LGTM! Metadata check gating includes disk-add scenario.The expanded condition correctly includes
DISK_ADD_REQUESTEDandDISK_ADDINGflags to ensure metadata is checked during toggle-disk operations, complementing the existingDISK_REMOVINGcheck.
45-63: LGTM! Metadata verification properly handles toggle-disk state.The condition correctly forces metadata verification during disk-add operations, including retry scenarios where storage may exist but metadata doesn't. The detailed comment clearly explains the three gating conditions.
1-63: Patch is sound and consistent with documented feature set.The flags
DISK_ADD_REQUESTEDandDISK_ADDINGare confirmed to exist in Resource.Flags (used consistently across related patches in this series). The patch format is valid with correct file path and standard StateFlags method usage (.isSet(), .isSomeSet()). This patch is part of the documented toggle-disk feature set for linstor-server v1.32.3 (upstream PR #474), coordinated with three other patches that all use these same flags consistently.
|
/retest |
|
Successfully created backport PR for |
## What this PR does Build piraeus-server (linstor-server) from source with custom patches: - **adjust-on-resfile-change.diff** — Use actual device path in res file during toggle-disk; fix LUKS data offset - Upstream: [#473](LINBIT/linstor-server#473), [#472](LINBIT/linstor-server#472) - **allow-toggle-disk-retry.diff** — Allow retry and cancellation of failed toggle-disk operations - Upstream: [#475](LINBIT/linstor-server#475) - **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful - Upstream: [#474](LINBIT/linstor-server#474) - **skip-adjust-when-device-inaccessible.diff** — Skip DRBD adjust/res file regeneration when child layer device is inaccessible - Upstream: [#471](LINBIT/linstor-server#471) Also updates plunger-satellite script and values.yaml for the new build. ### Release note ```release-note [linstor] Build linstor-server with custom patches for improved disk handling ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added automatic DRBD stall detection and recovery, improving storage resync resilience without manual intervention. * Introduced configurable container image references via Helm values for streamlined deployment. * **Bug Fixes** * Enhanced disk toggle operations with retry and cancellation support for better error handling. * Improved metadata creation during disk state transitions. * Added device accessibility checks to prevent errors when underlying storage devices are unavailable. * Fixed LUKS encryption header sizing for consistent deployment across nodes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does Build piraeus-server (linstor-server) from source with custom patches: - **adjust-on-resfile-change.diff** — Use actual device path in res file during toggle-disk; fix LUKS data offset - Upstream: [#473](LINBIT/linstor-server#473), [#472](LINBIT/linstor-server#472) - **allow-toggle-disk-retry.diff** — Allow retry and cancellation of failed toggle-disk operations - Upstream: [#475](LINBIT/linstor-server#475) - **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful - Upstream: [#474](LINBIT/linstor-server#474) - **skip-adjust-when-device-inaccessible.diff** — Skip DRBD adjust/res file regeneration when child layer device is inaccessible - Upstream: [#471](LINBIT/linstor-server#471) Also updates plunger-satellite script and values.yaml for the new build. ### Release note ```release-note [linstor] Build linstor-server with custom patches for improved disk handling ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added automatic DRBD stall detection and recovery, improving storage resync resilience without manual intervention. * Introduced configurable container image references via Helm values for streamlined deployment. * **Bug Fixes** * Enhanced disk toggle operations with retry and cancellation support for better error handling. * Improved metadata creation during disk state transitions. * Added device accessibility checks to prevent errors when underlying storage devices are unavailable. * Fixed LUKS encryption header sizing for consistent deployment across nodes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## What this PR does Build piraeus-server (linstor-server) from source with custom patches: - **adjust-on-resfile-change.diff** — Use actual device path in res file during toggle-disk; fix LUKS data offset - Upstream: [#473](LINBIT/linstor-server#473), [#472](LINBIT/linstor-server#472) - **allow-toggle-disk-retry.diff** — Allow retry and cancellation of failed toggle-disk operations - Upstream: [#475](LINBIT/linstor-server#475) - **force-metadata-check-on-disk-add.diff** — Create metadata during toggle-disk from diskless to diskful - Upstream: [#474](LINBIT/linstor-server#474) - **skip-adjust-when-device-inaccessible.diff** — Skip DRBD adjust/res file regeneration when child layer device is inaccessible - Upstream: [#471](LINBIT/linstor-server#471) Also updates plunger-satellite script and values.yaml for the new build. ### Release note ```release-note [linstor] Build linstor-server with custom patches for improved disk handling ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added automatic DRBD stall detection and recovery, improving storage resync resilience without manual intervention. * Introduced configurable container image references via Helm values for streamlined deployment. * **Bug Fixes** * Enhanced disk toggle operations with retry and cancellation support for better error handling. * Improved metadata creation during disk state transitions. * Added device accessibility checks to prevent errors when underlying storage devices are unavailable. * Fixed LUKS encryption header sizing for consistent deployment across nodes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
What this PR does
Build piraeus-server (linstor-server) from source with custom patches:
Also updates plunger-satellite script and values.yaml for the new build.
Release note
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.