Skip to content

Dynamic BGP peer update and delete support#1963

Merged
prsunny merged 9 commits intosonic-net:masterfrom
anish-n:dynPeerUpdateHld
Oct 23, 2025
Merged

Dynamic BGP peer update and delete support#1963
prsunny merged 9 commits intosonic-net:masterfrom
anish-n:dynPeerUpdateHld

Conversation

@anish-n
Copy link
Contributor

@anish-n anish-n commented Apr 12, 2025

This PR contains the HLD for Dynamic BGP Peer modification, specifically it lists the design for the following:

  • Making dynamic peers' listen ranges modifiable at runtime such that ranges can be added and removed(changing today's behavior of being a create only attribute)
  • Modifying configurations like route-maps, peer-groups, prefix-lists when dynamic peers are updated and by rendering BGP/FRR template files
  • Ability to track the configuration of a BGP peer via a State DB entry so that it is query-able by an SDN controller which programs the table
  • CLI commands to show VRF/VNET BGP neighbors

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

### New changes:
An update.conf.j2 and delete.conf.j2 will be defined in the same folder path structure as above, and similar branch out logic like the existing j2 files can be used with update.conf.j2 and delete.conf.j2. The update.conf.j2 and delete.conf.j2 will host the required logic to not only modify and delete the peers respectively, but also modify/delete associated configurations like route maps, peer groups, prefix lists etc. For the the purposes of this design we aim to simplify and only have a single j2 for update and delete each, which implies that update.j2 and delete.j2 will contain configuration that spans across all 3 types of config: instance, peer-group and policies. The alternate approach we considered was defining an update/delete version for each type of config, thereby defining multiple templates as follows: instance.update.conf.j2, policies.update.conf.j2, peer-group.update.conf.j2, instances.delete.conf.j2, policies.delete.conf.j2, peer-group.delete.conf.j2. We preferred the former approach of a single update.conf.j2 and delete.conf.j2 because of the simplicity it brings, moving to the latter model can be taken up in the future if use-cases for update and delete expand beyond and it becomes important to have more segregation of configuration across instance, policies and peer groups for the newly added update and delete operations.

# 3 Test Plan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is some issue or limitation in new feature, is it possible to disable it at runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to remove the template file, upon removing the template file associated with update/delete, the code will not be activated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a config flag to disable this feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this capability is already controlled by update.j2 and delete.j2, if not present, will retain the existing behavior. Prefer not to add extra knobs. If a config-db approach is preferred, then we should skip the template approach altogether. I think this can be added as phase 2.

StormLiangMS pushed a commit to sonic-net/sonic-buildimage that referenced this pull request May 10, 2025
Why I did it
This change is to support enhanced CRUD operations on dynamic bgp peers using a standard template-based mechanism.
Here is the High level design document explaining this feature:
sonic-net/SONiC#1963

Work item tracking
Microsoft ADO (number only):
How I did it
Upon a delete peer ocurring, we render the delete template(instead of executing the current behavior of no neighbor {{ neighbor addr}}), on the other hand if a delete template is not defined then the default behavior of no neighbor {{ neighbor addr}} applies as usual, thereby making this change backward compatible.

How to verify it
Create a dynamic peer like so in config_db.json:

"BGP_NEIGHBOR": {
"10.0.0.62": {
"admin_status": "up",
"asn": "65100",
"holdtime": "10",
"keepalive": "3",
"local_addr": "10.0.0.63",
"name": "vlab-01",
"nhopself": "0",
"rrclient": "0"
}
}

Once the peer is up, delete the neighbor (eg. config bgp neighbor remove ). Verify that the neighbor is deleted.
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

## 2.3 CLI
The following CLIs will be added, each of these will have similar CLI outputs to their ```default VRF``` counterparts which are widely in use today:
```
1. show ip bgp vrf {vrf_name} summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iso of adding new command can we add an option existing command show ip bgp summary --vrf <vrf_name> this way existing command show ip bgp summary will work for default VRF and can user can pass the vrf_name to get output per vrf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh the existing command show ip bgp summary will still work for default vrf, intention is to have all existing commands work as-is, and support the extra vrf option in a way that it matches vtysh. Doing it this way will match exactly how the command is structured in vtysh, making it helpful for everyone who is used to that structure of cli.

### New changes:
An update.conf.j2 and delete.conf.j2 will be defined in the same folder path structure as above, and similar branch out logic like the existing j2 files can be used with update.conf.j2 and delete.conf.j2. The update.conf.j2 and delete.conf.j2 will host the required logic to not only modify and delete the peers respectively, but also modify/delete associated configurations like route maps, peer groups, prefix lists etc. For the the purposes of this design we aim to simplify and only have a single j2 for update and delete each, which implies that update.j2 and delete.j2 will contain configuration that spans across all 3 types of config: instance, peer-group and policies. The alternate approach we considered was defining an update/delete version for each type of config, thereby defining multiple templates as follows: instance.update.conf.j2, policies.update.conf.j2, peer-group.update.conf.j2, instances.delete.conf.j2, policies.delete.conf.j2, peer-group.delete.conf.j2. We preferred the former approach of a single update.conf.j2 and delete.conf.j2 because of the simplicity it brings, moving to the latter model can be taken up in the future if use-cases for update and delete expand beyond and it becomes important to have more segregation of configuration across instance, policies and peer groups for the newly added update and delete operations.

# 3 Test Plan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a config flag to disable this feature?

```
self.templates["update"] = self.fabric.from_file(update_template_path)
```
3. If an update template is supported and the peer_type is dynamic, we will identify the set of ip_ranges which are added, and those which are deleted as part of the operation, and pass in those as additional kwargs and render the update template, sample code:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will identify the set of ip_ranges which are added, and those which are deleted as part of the operation

Is this done in bgpcfgd? how is this done? Can you add more details?

Copy link
Contributor Author

@anish-n anish-n May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is done in bgpcfgd, it is done by querying vtysh to find the current set of ranges, and diff-ing that against the newly configured set of ranges as received from config_db, the added/deleted ranges are then passed into an update.j2 template to render the required config changes. PR: sonic-net/sonic-buildimage#22261

I will update the HLD PR to add some more details on how it is done in bgpcfgd

prsunny
prsunny previously approved these changes May 27, 2025
@prsunny
Copy link
Contributor

prsunny commented Jun 2, 2025

@StormLiangMS , @arlakshm , would you approve this?

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

###### Table 2: Scalability
| Component | Expected value |
|--------------------------|-----------------------------|
| Number of Dynamic BGP Peers each with route-map, prefix-list and peer-group|2k|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anish-n Could you clarify what the "2K" refers to in this context? A scale of 2,000 is significantly larger compared to 64 neighbors and could potentially lead to performance issues or even system crashes. I recommend explicitly defining what the 2K and 4K entries represent in the table to avoid confusion.

Updated scalability requirements for dynamic BGP peers and listen ranges.
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

# 3 Test Plan
Bgpcfgd, CLI and template changes will have corresponding UTs.

New SONiC-mgmt tests will be introduced to test the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anish-n Do we have a more detailed test plan for this feature?

  1. Scale testing – 2K peers is non-trivial, especially considering the scaled device.
  2. Stress testing – including reloads and reboots.
  3. Dynamic update/delete scenarios – what operations are supported? Could we add one section in this doc to explicitly describe what is supported and what is not? For example:
    a. Updating the IP range of an existing dynamic peer.
    b. Adding a new dynamic peer with route-map/prefix-list configurations.
    c. Deleting an existing dynamic peer.
    d. Updating the route-map/prefix-list of an existing dynamic peer.

And could you provide an update on the progress of this feature and clarify which release it is planned for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StormLiangMS,
Added high level tests which will be added for scale and stress test. The intent of this HLD is supporting dynamic bgp peer changes, detailed stress test plan can be added to sonic-mgmt if required.

In terms of operations supported, this list is present in the requirements section, from config db perspective there is no change in operations beyond allowing a change in ip_ranges passed in. We handle the change in ip range and then FRR templates will render associated change in route map/prefix list as present in template, exactly like how the add peer/delete peers function today, so operationally it would be identical.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

StormLiangMS
StormLiangMS previously approved these changes Oct 19, 2025
Copy link
Contributor

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StormLiangMS
Copy link
Contributor

hi @prsunny @arlakshm @lipxu would you approve this one?

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

Copy link
Contributor

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved by Storm

@prsunny prsunny merged commit 47cfaf3 into sonic-net:master Oct 23, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants