[v1.16] azure/ipam: Replace subscription-wide VNet enumeration with targeted subnet queries#41554
Closed
yuecong wants to merge 1 commit intocilium:v1.16from
Closed
[v1.16] azure/ipam: Replace subscription-wide VNet enumeration with targeted subnet queries#41554yuecong wants to merge 1 commit intocilium:v1.16from
yuecong wants to merge 1 commit intocilium:v1.16from
Conversation
…subnet queries Backport to v1.16: This change optimizes Azure IPAM by replacing subscription-wide VNet enumeration with targeted subnet queries to eliminate Azure NRP throttling. Uses a three-phase strategy: discover subnet IDs from instances, query only referenced subnets, then re-parse instances with subnet details. Key improvements: - Replace GetVpcsAndSubnets() with GetNodesSubnets() for targeted queries - Add parseSubnetID() with regex validation for subnet ID parsing - Remove unused VNet tracking reducing API calls significantly - Add extractSubnetIDs() with deduplication for efficient discovery This reduces Azure API calls from O(n*m) where n=VNets and m=subnets to O(k) where k=unique subnets actually in use, significantly improving performance in large Azure environments and eliminating NRP throttling issues. Testing: - Added unit tests for parseSubnetID() (8 test cases) - Added TestExtractSubnetIDs() validating deduplication - Updated all existing tests to work with new implementation Signed-off-by: Cong Yue <cong@databricks.com> Signed-off-by: yuecong <cong@databricks.com>
yuecong
added a commit
to yuecong/cilium
that referenced
this pull request
Sep 6, 2025
Renamed TestGetVpcsAndSubnets to TestSubnetDiscovery and updated test expectations to match the new targeted subnet discovery behavior. The optimization only discovers subnets that are actually used by node instances, not all subnets subscription-wide. This aligns with the same test fix applied in PR cilium#41554 for v1.16 branch.
yuecong
added a commit
to yuecong/cilium
that referenced
this pull request
Sep 6, 2025
…xcept Azure SDK v2 This change ensures perfect alignment between PR cilium#41555 (main branch) and PR cilium#41554 (v1.16 branch): - Fixed GetNodesSubnets signature to return only SubnetMap (not VNetMap + SubnetMap) - Updated extractSubnetIDs to use memory-efficient map[string]struct{} instead of map[string]bool - Aligned error handling in resyncInstances to continue with empty subnets on GetNodesSubnets failure - Added missing TestExtractSubnetIDs test for deduplication validation - Maintained three-phase subnet discovery optimization identical to PR cilium#41554 The only differences between the PRs are now Azure SDK version upgrades (v1 → v2). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
4 tasks
yuecong
added a commit
to yuecong/cilium
that referenced
this pull request
Sep 6, 2025
Final alignment between PR cilium#41555 (main branch) and PR cilium#41554 (v1.16 branch): ALIGNED FEATURES: - Pre-compiled regex pattern with expectedCaptureGroups constant - getSubnetWithPagination method for accurate IP configuration counting - Enhanced parseSubnetID function with proper error handling - Subscription ID tracking in Client struct - Proper logrus-based logging in GetNodesSubnets - Three-phase subnet discovery optimization identical to PR cilium#41554 SDK V2 ADAPTATIONS: - Uses armnetwork.SubnetsClient instead of network.SubnetsClient - Simplified pagination logic using SDK v2 built-in capabilities - Updated ARM response handling for result.Subnet access - Removed SDK v1-specific Azure API version detection (not applicable) VERIFICATION: - TestSubnetDiscovery: ✅ PASS - TestExtractSubnetIDs: ✅ PASS - TestParseSubnetID: ✅ PASS - Azure API mock tests: ✅ PASS Both PRs now implement identical optimization logic while using appropriate Azure SDK versions. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Member
|
Hi, we don't accept patches like this directly into v1.16 branch at this time. Thanks. |
yuecong
added a commit
to yuecong/cilium
that referenced
this pull request
Oct 17, 2025
…subnet queries to eliminate NRP throttling This PR implements a targeted subnet discovery optimization for Azure IPAM that eliminates Azure NRP throttling by replacing subscription-wide VNet enumeration with targeted subnet queries. This PR is the same fix as cilium#41554 but ported to use Azure SDK v2. Key Changes: THREE-PHASE SUBNET DISCOVERY STRATEGY: 1. Query all node instances (existing method) 2. Extract unique subnet IDs from node network interfaces 3. Query only the specific subnets that nodes actually use AZURE SDK V2 MIGRATION: - Updated from SDK v1 to SDK v2 with armnetwork clients - Added SubnetsClient for targeted subnet queries - Implemented Pager pattern for pagination - Added parseSubnetID function with regex validation PERFORMANCE IMPROVEMENTS: - Reduces API calls from O(n*m) to O(k) where k = unique subnets used by nodes - Eliminates subscription-wide VNet enumeration that causes throttling - Maintains backward compatibility with fallback to full VNet discovery ALIGNED FEATURES WITH PR cilium#41554: - Pre-compiled regex pattern with expectedCaptureGroups constant - getSubnetWithPagination method for accurate IP configuration counting - Enhanced parseSubnetID function with proper error handling - Subscription ID tracking in Client struct - Proper logrus-based logging in GetNodesSubnets - Three-phase subnet discovery optimization TEST RESULTS: - All API tests pass including new parseSubnetID validation tests - Azure IPAM functionality preserved with enhanced logging - Existing IPAM allocation tests demonstrate continued functionality BREAKING CHANGES: None - maintains full backward compatibility with existing IPAM behavior. PERFORMANCE IMPACT: Significantly improved performance in large Azure environments with many VNets while maintaining the same IPAM functionality. The optimization shows "targeted_subnets" count in logs for visibility. Signed-off-by: Cong Yue <yuecong1104@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Backport to v1.16: This PR optimizes Azure IPAM by replacing subscription-wide VNet enumeration with targeted subnet queries to eliminate Azure Network Resource Provider (NRP) throttling and significantly improve performance in large Azure environments.
Problem Statement
The current Azure IPAM implementation in v1.16 enumerates all VNets in a subscription, which causes:
Solution
Implements a three-phase strategy for efficient subnet discovery:
Key Changes
GetVpcsAndSubnets()withGetNodesSubnets()for targeted subnet queriesparseSubnetID()with regex validation for subnet ID parsingextractSubnetIDs()with deduplication for efficient subnet discoveryPerformance Impact
Before (Subscription-wide enumeration)
After (Targeted subnet queries)
Real-world Impact
In production environments with hundreds of VNets but only using a few subnets for Kubernetes:
Testing
parseSubnetID()with 8 test cases covering valid/invalid formatsTestExtractSubnetIDs()validating deduplication (100 instances → 2 unique subnet IDs)TestSubnetDiscovery()ensuring only referenced subnets are queriedWhy v1.16?
This PR targets v1.16 branch because:
Checklist
Related Issues
This addresses common Azure IPAM issues in v1.16 deployments:
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXXline if the commit addresses a particularGitHub issue.