-
Notifications
You must be signed in to change notification settings - Fork 136
[kubernetes] Fix endpoints for cilium-gateway #1729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThis change adds a fallback mechanism to handle endpoints without NodeName in KubeVirt cloud provider's endpoint discovery. When endpoints lack NodeName, the controller now retrieves endpoints from all VMIs via a new helper function instead of filtering by node. Logging updated to reflect fallback behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Ctrl as Controller
participant Api as Kubernetes API
participant Log as Logger
rect rgb(200, 220, 255)
Note over Ctrl,Api: Original Flow: Endpoints with NodeName
Ctrl->>Api: Get endpoints by NodeName
Api-->>Ctrl: Return filtered endpoints
Ctrl->>Log: Log "Desired nodes"
end
rect rgb(255, 220, 200)
Note over Ctrl,Api: Fallback Flow: Endpoints without NodeName
Ctrl->>Ctrl: Check hasEndpointsWithoutNodeName
alt NodeName endpoints empty AND orphan endpoints exist
Ctrl->>Api: getAllVMIEndpoints() - list all VMIs
Api-->>Ctrl: Return all VMIs from infra namespace
Ctrl->>Ctrl: Construct endpoints from VMI IPs<br/>(Ready, Serving, Terminating conditions)
Ctrl->>Log: Emit fallback notice
Ctrl->>Log: Log current node set
else Standard path sufficient
Ctrl->>Log: Normal logging
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the Kubernetes endpoint management within a Kubevirt environment, specifically targeting the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request modifies kubevirteps_controller.go to introduce a fallback mechanism in getDesiredEndpoints. This mechanism, triggered when no endpoints with NodeName are found but endpoints without NodeName exist, calls a new getAllVMIEndpoints function. The new function lists all VMIs in the infra namespace and converts them into discovery.Endpoint objects. A review comment suggests refactoring this change by extracting the common logic for converting a VirtualMachineInstance to a discovery.Endpoint into a shared helper function to avoid code duplication between getDesiredEndpoints and getAllVMIEndpoints.
| +// getAllVMIEndpoints returns endpoints for all VMIs in the infra namespace. | ||
| +// This is used as a fallback when tenant endpoints don't have NodeName specified, | ||
| +// similar to ExternalTrafficPolicy=Cluster behavior where traffic is distributed to all nodes. | ||
| +func (c *Controller) getAllVMIEndpoints() []*discovery.Endpoint { | ||
| + var endpoints []*discovery.Endpoint | ||
| + | ||
| + // List all VMIs in the infra namespace | ||
| + vmiList, err := c.infraDynamic. | ||
| + Resource(kubevirtv1.VirtualMachineInstanceGroupVersionKind.GroupVersion().WithResource("virtualmachineinstances")). | ||
| + Namespace(c.infraNamespace). | ||
| + List(context.TODO(), metav1.ListOptions{}) | ||
| + if err != nil { | ||
| + klog.Errorf("Failed to list VMIs in namespace %q: %v", c.infraNamespace, err) | ||
| + return endpoints | ||
| + } | ||
| + | ||
| + for _, obj := range vmiList.Items { | ||
| + vmi := &kubevirtv1.VirtualMachineInstance{} | ||
| + err = runtime.DefaultUnstructuredConverter.FromUnstructured(obj.Object, vmi) | ||
| + if err != nil { | ||
| + klog.Errorf("Failed to convert Unstructured to VirtualMachineInstance: %v", err) | ||
| + continue | ||
| + } | ||
| + | ||
| + if vmi.Status.NodeName == "" { | ||
| + klog.Warningf("Skipping VMI %s/%s: NodeName is empty", vmi.Namespace, vmi.Name) | ||
| + continue | ||
| + } | ||
| + nodeNamePtr := &vmi.Status.NodeName | ||
| + | ||
| + ready := vmi.Status.Phase == kubevirtv1.Running | ||
| + serving := vmi.Status.Phase == kubevirtv1.Running | ||
| + terminating := vmi.Status.Phase == kubevirtv1.Failed || vmi.Status.Phase == kubevirtv1.Succeeded | ||
| + | ||
| + for _, i := range vmi.Status.Interfaces { | ||
| + if i.Name == "default" { | ||
| + if i.IP == "" { | ||
| + klog.Warningf("VMI %s/%s interface %q has no IP, skipping", vmi.Namespace, vmi.Name, i.Name) | ||
| + continue | ||
| + } | ||
| + endpoints = append(endpoints, &discovery.Endpoint{ | ||
| + Addresses: []string{i.IP}, | ||
| + Conditions: discovery.EndpointConditions{ | ||
| + Ready: &ready, | ||
| + Serving: &serving, | ||
| + Terminating: &terminating, | ||
| + }, | ||
| + NodeName: nodeNamePtr, | ||
| + }) | ||
| + break | ||
| + } | ||
| + } | ||
| + } | ||
| + | ||
| + klog.Infof("Fallback: created %d endpoints from all VMIs in namespace %s", len(endpoints), c.infraNamespace) | ||
| + return endpoints | ||
| +} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new function getAllVMIEndpoints duplicates a significant amount of logic from getDesiredEndpoints for converting a VirtualMachineInstance into a discovery.Endpoint. To improve maintainability and avoid code duplication, I recommend extracting this logic into a separate helper function. This function could accept a *kubevirtv1.VirtualMachineInstance and return a *discovery.Endpoint, or nil if conversion is not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff (2)
99-155: Consider extracting shared endpoint creation logic.The endpoint creation logic (NodeName validation, phase checks, interface iteration, IP validation, Endpoint construction) is duplicated between
getAllVMIEndpointsand the main loop ingetDesiredEndpoints.Extracting a helper would reduce duplication and ensure consistent behavior:
// createEndpointFromVMI creates a discovery.Endpoint from a VMI, or returns nil if the VMI is not ready func (c *Controller) createEndpointFromVMI(vmi *kubevirtv1.VirtualMachineInstance) *discovery.Endpoint { if vmi.Status.NodeName == "" { klog.Warningf("Skipping VMI %s/%s: NodeName is empty", vmi.Namespace, vmi.Name) return nil } ready := vmi.Status.Phase == kubevirtv1.Running serving := vmi.Status.Phase == kubevirtv1.Running terminating := vmi.Status.Phase == kubevirtv1.Failed || vmi.Status.Phase == kubevirtv1.Succeeded for _, i := range vmi.Status.Interfaces { if i.Name == "default" { if i.IP == "" { klog.Warningf("VMI %s/%s interface %q has no IP, skipping", vmi.Namespace, vmi.Name, i.Name) return nil } return &discovery.Endpoint{ Addresses: []string{i.IP}, Conditions: discovery.EndpointConditions{ Ready: &ready, Serving: &serving, Terminating: &terminating, }, NodeName: &vmi.Status.NodeName, } } } return nil }
169-219: Good test coverage for the skip behavior.The test correctly validates that endpoints and VMIs without NodeName are skipped. Consider adding an additional test case for the fallback path (
getAllVMIEndpoints) to ensure full coverage of the new functionality - specifically testing that whenhasEndpointsWithoutNodeNameis true andnodeSetis empty, the fallback returns endpoints from all valid VMIs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-27T11:21:45.808Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-27T11:21:45.808Z
Learning: Applies to **/*.go : Use Controller-runtime patterns and kubebuilder style for Go code
Applied to files:
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (4)
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff (4)
17-28: LGTM - Proper handling for endpoints without NodeName.The flag tracking and skip logic with appropriate warning logging is well implemented. This enables the fallback behavior when needed.
31-36: Well-designed fallback mechanism.The conditional correctly ensures fallback only when endpoints exist but lack NodeName information. The comment explaining the ExternalTrafficPolicy=Cluster similarity is helpful for future maintainers.
56-67: Good reliability improvement - removing Fatal crash.Replacing
klog.Fatal(err)withcontinueis an important fix. The controller will now gracefully skip problematic VMIs instead of crashing entirely. The added NodeName validation provides additional robustness.
76-92: Correct semantic change fromcontinuetobreak.The IP validation is a good defensive addition. Changing to
breakis semantically correct - once the "default" interface is processed, there's no need to iterate remaining interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
hack/e2e-apps/gateway-api.bats (2)
159-165: Consider increasing request sample size for load balancing verification.With only 10 requests, there's a small probability that both backends might not be hit, especially if the load balancer has warm-up behavior or sticky sessions. Consider increasing to 20-30 requests to improve test reliability.
🔎 Proposed enhancement
# Test load balancing responses="" - for i in {1..10}; do + for i in {1..20}; do response=$(curl -s --max-time 5 $test_url) responses="$responses$response\n" done # Check that both backends are hit echo -e "$responses" | grep -q "Hello from backend1" echo -e "$responses" | grep -q "Hello from backend2"
167-173: Consider trap-based cleanup for robustness.The current cleanup only runs if the test completes successfully. If the test fails earlier, resources may be leaked. Consider using a
trapto ensure cleanup runs even on test failure.🔎 Example trap-based cleanup pattern
@test "test gateway api tcp load balancer in tenant cluster" { name=test # Setup trap for cleanup cleanup() { if [ -n "$PF_PID" ]; then kill $PF_PID 2>/dev/null || true; fi kubectl delete deployment backend1 backend2 --namespace tenant-test --ignore-not-found kubectl delete configmap backend1-config backend2-config --namespace tenant-test --ignore-not-found kubectl delete gateway ${name}-gateway --namespace tenant-test --ignore-not-found kubectl delete tcproute ${name}-tcproute --namespace tenant-test --ignore-not-found kubectl delete service backend-svc --namespace tenant-test --ignore-not-found } trap cleanup EXIT # ... rest of test ... }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
hack/e2e-apps/gateway-api.batshack/e2e-install-cozystack.bats
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (3)
hack/e2e-install-cozystack.bats (1)
127-129: LGTM! Gateway API enablement is correctly positioned.The configmap patch appropriately enables Gateway API as a precondition before tenant configuration, ensuring the feature is available for the subsequent tests.
hack/e2e-apps/gateway-api.bats (2)
151-151: Verify netcat availability in test environment.The
nc -zcommand requires netcat to be installed. Ensure it's available in the test environment, or consider using an alternative like checking the port-forward process health or using a curl retry loop.Alternative approach without nc dependency
- timeout 30 sh -ec "until nc -z localhost 8080; do sleep 1; done" + timeout 30 sh -ec "until curl -s --max-time 1 http://localhost:8080 > /dev/null 2>&1; do sleep 1; done"
149-149: No issues found. The service namecilium-gateway-${name}-gatewayfollows the official Cilium Gateway API naming convention<gatewayClassName>-gateway-<gatewayName>, so the port-forward fallback will correctly target the service created by Cilium.
hack/e2e-apps/gateway-api.bats
Outdated
| gatewayClassName: cilium | ||
| listeners: | ||
| - name: tcp | ||
| hostname: "*.example.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Kubernetes Gateway API hostname field TCP protocol listener support
💡 Result:
Short answer: No — the Listener.hostname is ignored for TCP listeners. Hostname matching is defined for TLS/HTTP/HTTPS (SNI/Host header) in the Gateway API; protocols that don't require hostname‑based matching (e.g., TCP, UDP) ignore the hostname field. [1][2]
Sources:
- Gateway API Listener spec (hostname rules: applied for TLS/HTTP/HTTPS; ignored for protocols that don’t require hostname matching). [1]
- Gateway API TCP guide and TCPRoute examples (shows TCP listeners identified by port/sectionName; no hostname matching). [2]
References:
[1] gateway-api.sigs.k8s.io — Listener / Hostname description.
[2] gateway-api.sigs.k8s.io — TCP routing guide (TCPRoute examples).
Remove hostname field from TCP listener.
The hostname: "*.example.com" field has no effect on TCP listeners. According to the Kubernetes Gateway API specification, hostname matching applies only to TLS/HTTP/HTTPS protocols; TCP listeners ignore this field since they operate at L4 and don't perform hostname-based routing. Remove it to avoid confusion.
🤖 Prompt for AI Agents
In hack/e2e-apps/gateway-api.bats around line 98, the TCP listener includes a
hostname: "*.example.com" field which has no effect for L4/TCP listeners; remove
the hostname line from the TCP listener definition so the listener contains only
fields valid for TCP (port, protocol, routes/etc.) to avoid misleading
configuration and match the Gateway API spec.
hack/e2e-apps/gateway-api.bats
Outdated
| # Test load balancing | ||
| responses="" | ||
| for i in {1..10}; do | ||
| response=$(curl -s $test_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add timeout to curl command.
The curl request lacks a timeout, which could cause the test to hang indefinitely if the service is unresponsive.
🔎 Proposed fix
- response=$(curl -s $test_url)
+ response=$(curl -s --max-time 5 $test_url)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response=$(curl -s $test_url) | |
| response=$(curl -s --max-time 5 $test_url) |
🤖 Prompt for AI Agents
In hack/e2e-apps/gateway-api.bats around line 160, the curl invocation uses no
timeout and can hang; update the curl command to include timeouts (for example
add a short connect timeout and a total/max-time, e.g., --connect-timeout 5 and
--max-time 10), keep silent output (-s) but consider using -S/--fail so failures
return non-zero, and ensure any new flags still capture response in the response
variable so the test fails fast instead of hanging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
hack/e2e-install-cozystack.bats (1)
127-129: LGTM! Gateway API enablement is correctly placed.The ConfigMap patch to enable Gateway API is syntactically correct and appropriately positioned before the tenant patch. The change aligns with the PR objective to fix cilium-gateway endpoints and supports the new gateway API tests.
Optional: Add verification step
While the current pattern is consistent with other ConfigMap patches in this file (lines 148, 175), you could optionally add a verification step to ensure the Gateway API feature is fully reconciled before proceeding:
# Enable Gateway API for testing kubectl patch configmap/cozystack -n cozy-system --type merge -p '{"data":{"addons.gatewayAPI.enabled":"true"}}' + +# Wait for Gateway API resources to be available +timeout 60 sh -ec 'until kubectl get crd gateways.gateway.networking.k8s.io >/dev/null 2>&1; do sleep 1; done'This would guard against potential race conditions, though the current implementation may be sufficient if the Gateway API controller reconciles quickly.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hack/e2e-install-cozystack.bats
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
hack/e2e-apps/loadbalancer-service.bats (3)
31-36: Consider increasing readiness probe delays.The readiness probe has aggressive timing (
initialDelaySeconds: 2,periodSeconds: 2). In slower test environments or under load, nginx might not be fully ready within 2 seconds, potentially causing test flakiness.🔎 Suggested adjustment
readinessProbe: httpGet: path: / port: 80 - initialDelaySeconds: 2 - periodSeconds: 2 + initialDelaySeconds: 5 + periodSeconds: 5
111-114: Consider adding functional verification of the LoadBalancer service.The test verifies that endpoints are registered but doesn't confirm the LoadBalancer service is actually reachable and functional. For a complete LoadBalancer test, consider making an HTTP request through the LB IP/hostname to verify end-to-end connectivity.
🔎 Suggested addition after endpoint verification
# Get LoadBalancer IP or hostname lb_address=$(kubectl get svc $name -n $ns \ -o jsonpath='{.status.loadBalancer.ingress[0].ip}') if [ -z "$lb_address" ]; then lb_address=$(kubectl get svc $name -n $ns \ -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') fi # Verify service is reachable through LoadBalancer timeout 30 sh -ec " until curl -sf http://$lb_address/ | grep -q 'Hello from backend'; do sleep 2 done "
116-118: Consider more robust cleanup.The cleanup doesn't use
--ignore-not-found, which could cause errors if resources weren't created due to earlier test failures. While BATS typically doesn't fail on cleanup errors, adding this flag improves robustness.🔎 Suggested improvement
# Cleanup - kubectl delete deployment backend1 backend2 -n $ns - kubectl delete service $name -n $ns + kubectl delete deployment backend1 backend2 -n $ns --ignore-not-found=true + kubectl delete service $name -n $ns --ignore-not-found=true
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hack/e2e-apps/loadbalancer-service.bats
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (4)
hack/e2e-apps/loadbalancer-service.bats (4)
83-97: LGTM!The LoadBalancer service configuration is correct. The selector properly targets both backend deployments.
99-101: LGTM!Properly waits for both deployments to be available with reasonable timeouts.
103-109: LGTM!The LoadBalancer provisioning wait logic is correct and uses appropriate timeout and polling intervals.
111-114: Endpoint verification assumes all endpoints are ready.The jsonpath query
{.subsets[0].addresses[*].ip}only counts endpoints in theaddressesarray, which contains ready endpoints. If endpoints exist but are not ready (innotReadyAddresses), this check will incorrectly pass or fail.Given the PR's focus on fixing endpoint resolution and the controller's handling of Ready/Serving/Terminating conditions (as mentioned in the AI summary), this verification should explicitly confirm endpoint readiness.
🔎 Proposed fix to verify ready endpoints
- # Ensure exactly 2 ready endpoints - endpoints_count=$(kubectl get endpoints $name -n $ns \ - -o jsonpath='{.subsets[0].addresses[*].ip}' | wc -w) - [ "$endpoints_count" -eq 2 ] + # Ensure exactly 2 ready endpoints + ready_endpoints=$(kubectl get endpoints $name -n $ns \ + -o jsonpath='{range .subsets[*].addresses[*]}{.ip}{"\n"}{end}') + endpoints_count=$(echo "$ready_endpoints" | grep -c '^') + [ "$endpoints_count" -eq 2 ]Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
hack/e2e-apps/run-kubernetes.sh (1)
188-197: Consider fixing indentation for consistency.These lines lack the 2-space indentation used elsewhere in the function, making the code harder to read.
🔎 Proposed fix
-LB_ADDR=$( - kubectl get svc --kubeconfig tenantkubeconfig-${test_name} "${test_name}-backend" \ - -n tenant-test \ - -o jsonpath='{.status.loadBalancer.ingress[0].ip}{.status.loadBalancer.ingress[0].hostname}' -) - -if [ -z "$LB_ADDR" ]; then - echo "LoadBalancer address is empty" >&2 - exit 1 -fi + LB_ADDR=$( + kubectl get svc --kubeconfig tenantkubeconfig-${test_name} "${test_name}-backend" \ + -n tenant-test \ + -o jsonpath='{.status.loadBalancer.ingress[0].ip}{.status.loadBalancer.ingress[0].hostname}' + ) + + if [ -z "$LB_ADDR" ]; then + echo "LoadBalancer address is empty" >&2 + exit 1 + fi
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hack/e2e-apps/run-kubernetes.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (4)
hack/e2e-apps/run-kubernetes.sh (4)
75-75: LGTM!Increasing the timeout from 2m to 5m for the Kubernetes resources readiness check is reasonable for E2E environments.
103-103: LGTM!Creating the namespace after nodes are ready is the correct sequence.
177-186: LGTM!The wait logic for deployment availability and LoadBalancer provisioning is correct.
199-216: LGTM!The health check using a temporary curl pod with retries is a reasonable approach for validating LoadBalancer connectivity. The cleanup properly removes the test resources.
| # Set up port forwarding to the Kubernetes API server for a 200 second timeout | ||
| bash -c 'timeout 300s kubectl port-forward service/kubernetes-'"${test_name}"' -n tenant-test '"${port}"':6443 > /dev/null 2>&1 &' | ||
| bash -c 'timeout 500s kubectl port-forward service/kubernetes-'"${test_name}"' -n tenant-test '"${port}"':6443 > /dev/null 2>&1 &' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale comment: timeout value doesn't match.
The comment says "200 second timeout" but the actual timeout is 500s.
🔎 Proposed fix
- # Set up port forwarding to the Kubernetes API server for a 200 second timeout
+ # Set up port forwarding to the Kubernetes API server for a 500 second timeout
bash -c 'timeout 500s kubectl port-forward service/kubernetes-'"${test_name}"' -n tenant-test '"${port}"':6443 > /dev/null 2>&1 &'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Set up port forwarding to the Kubernetes API server for a 200 second timeout | |
| bash -c 'timeout 300s kubectl port-forward service/kubernetes-'"${test_name}"' -n tenant-test '"${port}"':6443 > /dev/null 2>&1 &' | |
| bash -c 'timeout 500s kubectl port-forward service/kubernetes-'"${test_name}"' -n tenant-test '"${port}"':6443 > /dev/null 2>&1 &' | |
| # Set up port forwarding to the Kubernetes API server for a 500 second timeout | |
| bash -c 'timeout 500s kubectl port-forward service/kubernetes-'"${test_name}"' -n tenant-test '"${port}"':6443 > /dev/null 2>&1 &' |
🤖 Prompt for AI Agents
In hack/e2e-apps/run-kubernetes.sh around lines 89 to 90, the inline comment
says "200 second timeout" but the command uses timeout 500s; update one of them
so they match — either change the comment to "500 second timeout" or change the
timeout invocation to "200s" depending on the intended behavior; keep wording
precise and consistent with the chosen timeout value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
hack/e2e-apps/run-kubernetes.sh (2)
89-90: Stale comment: timeout value doesn't match.The comment says "200 second timeout" but the actual timeout is
500s. This was previously flagged in past reviews but remains unfixed.🔎 Proposed fix
- # Set up port forwarding to the Kubernetes API server for a 200 second timeout + # Set up port forwarding to the Kubernetes API server for a 500 second timeout bash -c 'timeout 500s kubectl port-forward service/kubernetes-'"${test_name}"' -n tenant-test '"${port}"':6443 > /dev/null 2>&1 &'
128-174: Critical: Missing namespace and service selector mismatch.Two issues prevent this LoadBalancer test from working:
Missing namespace creation: The Deployment and Service reference
namespace: tenant-test(lines 134, 166) in the tenant cluster, but this namespace is never created. Thekubectl applycommands (lines 129, 161) will fail because the namespace doesn't exist in the tenant cluster.Service selector mismatch: The Service selector
app: "${test_name}-backend"(line 170) won't match the pod labelsapp: backendandbackend: "${test_name}-backend"(lines 139-140, 144-145). The LoadBalancer will have no endpoints and health checks will fail. This issue was flagged in previous reviews but remains unfixed.🔎 Proposed fixes
Fix 1: Create the namespace in the tenant cluster
Add this before line 128:
+ # Create namespace in tenant cluster for backend test + kubectl create namespace tenant-test --kubeconfig tenantkubeconfig-${test_name} --dry-run=client -o yaml | \ + kubectl apply --kubeconfig tenantkubeconfig-${test_name} -f- + # Backend 1Fix 2: Align service selector with pod labels
selector: - app: "${test_name}-backend" + app: backend + backend: "${test_name}-backend"
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hack/e2e-apps/run-kubernetes.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (1)
hack/e2e-apps/run-kubernetes.sh (1)
75-75: LGTM: Reasonable timeout increase.Increasing the readiness timeout from 2m to 5m provides more buffer for resource provisioning in E2E tests.
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
691d126 to
f596652
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff (1)
99-155: Address code duplication as previously noted.This function duplicates significant logic from
getDesiredEndpoints(lines 40-94). The conversion of a VMI to an Endpoint object is repeated in both places.Consider extracting the VMI-to-Endpoint conversion logic into a separate helper function as suggested in the previous review. This would improve maintainability and reduce the risk of divergence between the two code paths.
🔎 Example helper function structure
// vmiToEndpoint converts a VirtualMachineInstance to a discovery.Endpoint. // Returns nil if the VMI cannot be converted (missing NodeName, no default interface, etc.) func vmiToEndpoint(vmi *kubevirtv1.VirtualMachineInstance) *discovery.Endpoint { if vmi.Status.NodeName == "" { klog.Warningf("Skipping VMI %s/%s: NodeName is empty", vmi.Namespace, vmi.Name) return nil } ready := vmi.Status.Phase == kubevirtv1.Running serving := vmi.Status.Phase == kubevirtv1.Running terminating := vmi.Status.Phase == kubevirtv1.Failed || vmi.Status.Phase == kubevirtv1.Succeeded for _, i := range vmi.Status.Interfaces { if i.Name == "default" { if i.IP == "" { klog.Warningf("VMI %s/%s interface %q has no IP, skipping", vmi.Namespace, vmi.Name, i.Name) return nil } nodeNamePtr := &vmi.Status.NodeName return &discovery.Endpoint{ Addresses: []string{i.IP}, Conditions: discovery.EndpointConditions{ Ready: &ready, Serving: &serving, Terminating: &terminating, }, NodeName: nodeNamePtr, } } } return nil }Then use it in both functions:
// In getDesiredEndpoints if endpoint := vmiToEndpoint(vmi); endpoint != nil { desiredEndpoints = append(desiredEndpoints, endpoint) } // In getAllVMIEndpoints if endpoint := vmiToEndpoint(vmi); endpoint != nil { endpoints = append(endpoints, endpoint) }
🧹 Nitpick comments (1)
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff (1)
169-219: Consider adding test coverage for the fallback behavior.This test validates that endpoints without NodeName are skipped, but it doesn't test the new fallback logic (lines 31-36) where
getAllVMIEndpoints()is called. Consider adding additional test cases to cover:
- The fallback to
getAllVMIEndpoints()when all tenant endpoints lack NodeName- Direct testing of the
getAllVMIEndpoints()function with various VMI states- Scenarios with valid VMIs that have both NodeName and IP addresses
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff
🧰 Additional context used
📓 Path-based instructions (1)
packages/apps/**
📄 CodeRabbit inference engine (AGENTS.md)
Use Helm Chart umbrella pattern with vendored upstream charts in
charts/directory
Files:
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff
🧠 Learnings (3)
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to pkg/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in pkg directory
Applied to files:
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to internal/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in internal directory
Applied to files:
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff
📚 Learning: 2025-12-25T09:45:26.524Z
Learnt from: CR
Repo: cozystack/cozystack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-25T09:45:26.524Z
Learning: Applies to cmd/**/*.go : Follow controller-runtime patterns and kubebuilder style for Go code in cmd directory
Applied to files:
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build
🔇 Additional comments (5)
packages/apps/kubernetes/images/kubevirt-cloud-provider/patches/379.diff (5)
56-62: Good improvement to error handling.Replacing
klog.Fatal(err)withcontinueis a critical fix. Fatal would crash the entire controller process, whereas continue allows the controller to skip problematic VMIs and continue processing others.
64-68: Good validation added for NodeName.The validation check prevents creating endpoints with nil NodeName pointers, and the early continue ensures the pointer created on line 68 always points to a non-empty string.
76-79: Good validation added for IP address.Skipping interfaces without IP addresses prevents creating invalid endpoints.
91-91: Correct use of break instead of continue.Using
breakhere is correct since the code should stop iterating through interfaces once the "default" interface is found and processed.
31-36: The fallback mechanism is intentional and defensive. It only triggers when no endpoints with NodeName are found in the tenant slices AND endpoints without NodeName exist—meaning there are no valid endpoints to route traffic to anyway. This is not a change to routing behavior but rather a safeguard against incomplete endpoint data.The patch documents this explicitly: "distribute traffic to all VMIs (similar to ExternalTrafficPolicy=Cluster behavior)" and the test case validates that endpoints lacking NodeName are properly skipped. When the fallback does activate, it re-applies the same validation (skipping VMIs without NodeName and IPs), ensuring consistency.
This behavior does not conflict with ExternalTrafficPolicy; it only applies when the normal endpoint selection path produces no results due to malformed data.
|
Successfully created backport PR for |
Signed-off-by: Andrei Kvapil <kvapss@gmail.com> <!-- Thank you for making a contribution! Here are some tips for you: - Start the PR title with the [label] of Cozystack component: - For system components: [platform], [system], [linstor], [cilium], [kube-ovn], [dashboard], [cluster-api], etc. - For managed apps: [apps], [tenant], [kubernetes], [postgres], [virtual-machine] etc. - For development and maintenance: [tests], [ci], [docs], [maintenance]. - If it's a work in progress, consider creating this PR as a draft. - Don't hesistate to ask for opinion and review in the community chats, even if it's still a draft. - Add the label `backport` if it's a bugfix that needs to be backported to a previous version. --> ## What this PR does Integrate fix - kubevirt/cloud-provider-kubevirt#379 ### Release note <!-- Write a release note: - Explain what has changed internally and for users. - Start with the same [label] as in the PR title - Follow the guidelines at https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md. --> ```release-note [kubernetes] Fix endpoints for cilium-gateway ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved handling of incomplete endpoint data by introducing fallback detection mechanisms. * Enhanced service discovery to gather endpoints from all available resources when standard detection fails. * Updated logging to provide better visibility into fallback operations and current resource status. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Andrei Kvapil <kvapss@gmail.com> <!-- Thank you for making a contribution! Here are some tips for you: - Start the PR title with the [label] of Cozystack component: - For system components: [platform], [system], [linstor], [cilium], [kube-ovn], [dashboard], [cluster-api], etc. - For managed apps: [apps], [tenant], [kubernetes], [postgres], [virtual-machine] etc. - For development and maintenance: [tests], [ci], [docs], [maintenance]. - If it's a work in progress, consider creating this PR as a draft. - Don't hesistate to ask for opinion and review in the community chats, even if it's still a draft. - Add the label `backport` if it's a bugfix that needs to be backported to a previous version. --> ## What this PR does Integrate fix - kubevirt/cloud-provider-kubevirt#379 ### Release note <!-- Write a release note: - Explain what has changed internally and for users. - Start with the same [label] as in the PR title - Follow the guidelines at https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md. --> ```release-note [kubernetes] Fix endpoints for cilium-gateway ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved handling of incomplete endpoint data by introducing fallback detection mechanisms. * Enhanced service discovery to gather endpoints from all available resources when standard detection fails. * Updated logging to provide better visibility into fallback operations and current resource status. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Andrei Kvapil <kvapss@gmail.com> <!-- Thank you for making a contribution! Here are some tips for you: - Start the PR title with the [label] of Cozystack component: - For system components: [platform], [system], [linstor], [cilium], [kube-ovn], [dashboard], [cluster-api], etc. - For managed apps: [apps], [tenant], [kubernetes], [postgres], [virtual-machine] etc. - For development and maintenance: [tests], [ci], [docs], [maintenance]. - If it's a work in progress, consider creating this PR as a draft. - Don't hesistate to ask for opinion and review in the community chats, even if it's still a draft. - Add the label `backport` if it's a bugfix that needs to be backported to a previous version. --> ## What this PR does Integrate fix - kubevirt/cloud-provider-kubevirt#379 ### Release note <!-- Write a release note: - Explain what has changed internally and for users. - Start with the same [label] as in the PR title - Follow the guidelines at https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md. --> ```release-note [kubernetes] Fix endpoints for cilium-gateway ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved handling of incomplete endpoint data by introducing fallback detection mechanisms. * Enhanced service discovery to gather endpoints from all available resources when standard detection fails. * Updated logging to provide better visibility into fallback operations and current resource status. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Andrei Kvapil kvapss@gmail.com
What this PR does
Integrate fix
Release note
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.