Azure IPAM: Replace subscription-wide VNet enumeration with targeted subnet queries to eliminate NRP throttling#41555
Conversation
|
Commit addd599 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
|
Commits addd599, 6ecbdd0 do not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
…xcept Azure SDK v2 This change ensures perfect alignment between PR cilium#41555 (main branch) and PR cilium#41554 (v1.16 branch): - Fixed GetNodesSubnets signature to return only SubnetMap (not VNetMap + SubnetMap) - Updated extractSubnetIDs to use memory-efficient map[string]struct{} instead of map[string]bool - Aligned error handling in resyncInstances to continue with empty subnets on GetNodesSubnets failure - Added missing TestExtractSubnetIDs test for deduplication validation - Maintained three-phase subnet discovery optimization identical to PR cilium#41554 The only differences between the PRs are now Azure SDK version upgrades (v1 → v2). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Commits addd599, 6ecbdd0, 172f4ce do not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
Final alignment between PR cilium#41555 (main branch) and PR cilium#41554 (v1.16 branch): ALIGNED FEATURES: - Pre-compiled regex pattern with expectedCaptureGroups constant - getSubnetWithPagination method for accurate IP configuration counting - Enhanced parseSubnetID function with proper error handling - Subscription ID tracking in Client struct - Proper logrus-based logging in GetNodesSubnets - Three-phase subnet discovery optimization identical to PR cilium#41554 SDK V2 ADAPTATIONS: - Uses armnetwork.SubnetsClient instead of network.SubnetsClient - Simplified pagination logic using SDK v2 built-in capabilities - Updated ARM response handling for result.Subnet access - Removed SDK v1-specific Azure API version detection (not applicable) VERIFICATION: - TestSubnetDiscovery: ✅ PASS - TestExtractSubnetIDs: ✅ PASS - TestParseSubnetID: ✅ PASS - Azure API mock tests: ✅ PASS Both PRs now implement identical optimization logic while using appropriate Azure SDK versions. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Commits addd599, 6ecbdd0, 172f4ce, 0b4a995 do not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
|
Thanks for the submission. Apologies for the delay in review, but at this time we cannot accept the PR due to provenance concerns. I think it is in the interests of the project to improve this area of the codebase, but we'll need to figure out the right approach. I'll mark the PR as draft for now. |
|
@yuecong Thanks for your PR. I would like to understand a bit more context on this PR. Please feel free to reach out to me over slack directly, we can discuss a bit more on this. Also fix your commits as its missing signed signature |
0b4a995 to
7b739d4
Compare
|
Talked with @yuecong offline. Overall intention lgtm as it tries to avoid expensive listall virtual networks in a subscription and iterate through all subnets. Instead his PR tries to get subnet info for subnet id already retrieved from getInstances api call. Few things to make sure if this is supported construct from azure and azure sdk will populate subnetID in getInstance API in all cases.. cc @hemanthmalla |
|
Commit e703a97 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
e703a97 to
b35d525
Compare
b35d525 to
d60623b
Compare
b1258c1 to
6fce46d
Compare
6fce46d to
50f277c
Compare
41fc38a to
ca53852
Compare
could you help run /test again |
ca53852 to
bc253a8
Compare
bc253a8 to
5c6ec23
Compare
pippolo84
left a comment
There was a problem hiding this comment.
Overall LGTM, thanks!
Left a couple of nits and a question inline.
|
/test |
The CI failures are unrelated to the Azure IPAM changes in this PR:
Neither error touches Azure IPAM code ( |
5c6ec23 to
65a29d7
Compare
|
/test |
|
@yuecong looks like you need to run |
65a29d7 to
2bbdc2e
Compare
This PR implements a targeted subnet discovery optimization for Azure IPAM that eliminates Azure NRP throttling by replacing subscription-wide VNet enumeration with targeted subnet queries. Key Changes: - Three-phase subnet discovery strategy: 1. Query all node instances (existing method) 2. Extract unique subnet IDs from node network interfaces 3. Query only the specific subnets that nodes actually use - Performance improvements: - Reduces API calls from O(n*m) to O(k) where k = unique subnets used by nodes - Eliminates subscription-wide VNet enumeration that causes throttling - Maintains backward compatibility with fallback to full VNet discovery Test Results: - All API tests pass including new parseSubnetID validation tests - Azure IPAM functionality preserved with enhanced logging - Existing IPAM allocation tests demonstrate continued functionality Breaking Changes: None - maintains full backward compatibility with existing IPAM behavior. Performance Impact: Significantly improved performance in large Azure environments with many VNets while maintaining the same IPAM functionality. The optimization shows "targeted_subnets" count in logs for visibility. Test plan: - Unit tests for parseSubnetID function with 8 validation scenarios - Azure API tests pass - Mock implementation updated for GetNodesSubnets - IPAM allocation tests demonstrate preserved functionality I deployed this change to the main branch to our internal clusters and confirmed that this works. Also for the 1.16 version, we have used it in production for more than 2 months in ~100 clusters Signed-off-by: yuecong <cong@databricks.com>
2bbdc2e to
712e904
Compare
|
/test |
|
(I'll prime the PR for merging, but there's still an unresolved comment somewhere. Please make sure that it's addressed) |
@tamilmani1989 could you check the 2 comments not resolved if that lgty? if so, can you please resolve those? thanks |
|
The Failed Test
can someone help trigger to rerun this test maybe it is just flaky |
Summary
This PR implements a targeted subnet discovery optimization for Azure IPAM that eliminates Azure NRP throttling by replacing subscription-wide VNet enumeration with targeted subnet queries.
Key Changes
Three-phase subnet discovery strategy:
Performance improvements:
Test Results
Breaking Changes
None - maintains full backward compatibility with existing IPAM behavior.
Performance Impact
Significantly improved performance in large Azure environments with many VNets while maintaining the same IPAM functionality. The optimization shows "targeted_subnets" count in logs for visibility.
Test plan
I deployed this change to the main branch to our internal clusters and confirmed that this works. Also for the 1.16 version, we have used it in production for more than 2 months in ~100 clusters