Skip to content

[FRR]: Upgrade FRR to version 10.3#22267

Merged
lguohan merged 25 commits intosonic-net:masterfrom
cscarpitta:frr_10.3_upgrade
May 8, 2025
Merged

[FRR]: Upgrade FRR to version 10.3#22267
lguohan merged 25 commits intosonic-net:masterfrom
cscarpitta:frr_10.3_upgrade

Conversation

@cscarpitta
Copy link
Copy Markdown
Contributor

@cscarpitta cscarpitta commented Apr 8, 2025

New patches that were added:
Patch FRR Pull request
0086-isisd-lib-add-some-codepoints-usually-shared-with-other-vendors.patch FRRouting/frr#17957
0087-staticd-Add-support-for-SRv6-uA-behavior.patch FRRouting/frr#18198
Removed patches:
Patch FRR commit / Pull request
0025-bgp-community-memory-leak-fix.patch FRRouting/frr@e613e12
0028-zebra-fix-parse-attr-problems-for-encap.patch FRRouting/frr@ba5a353 FRRouting/frr@569f9e4 FRRouting/frr@bd4fca1
0030-zebra-backpressure-Zebra-push-back-on-Buffer-Stream-.patch FRRouting/frr@a8efa99
0031-bgpd-backpressure-Add-a-typesafe-list-for-Zebra-Anno.patch FRRouting/frr@705fed7
0033-bgpd-backpressure-cleanup-bgp_zebra_XX-func-args.patch FRRouting/frr@5f379be
0034-gpd-backpressure-Handle-BGP-Zebra-Install-evt-Creat.patch FRRouting/frr@ccfe452
0035-bgpd-backpressure-Handle-BGP-Zebra-EPVN-Install-evt-.patch FRRouting/frr@a07df6f
0036-zebra-backpressure-Fix-Null-ptr-access-Coverity-Issu.patch FRRouting/frr@ed7005d
0037-bgpd-Increase-install-uninstall-speed-of-evpn-vpn-vn.patch FRRouting/frr@9edf45b
0038-zebra-Actually-display-I-O-buffer-sizes.patch FRRouting/frr@8d8f12b
0039-zebra-Actually-display-I-O-buffer-sizes-part-2.patch FRRouting/frr@33dccbe
0040-bgpd-backpressure-Fix-to-withdraw-evpn-type-5-routes.patch FRRouting/frr@f4ba472
0041-bgpd-backpressure-Fix-to-avoid-CPU-hog.patch FRRouting/frr@920bf45
0042-zebra-Use-built-in-data-structure-counter.patch FRRouting/frr@a23a938
0043-zebra-Use-the-ctx-queue-counters.patch FRRouting/frr@34670c4
0044-zebra-Modify-dplane-loop-to-allow-backpressure-to-fi.patch FRRouting/frr@3af381b
0045-zebra-Limit-queue-depth-in-dplane_fpm_nl.patch FRRouting/frr@8926ac1
0046-zebra-Modify-show-zebra-dplane-providers-to-give-mor.patch FRRouting/frr@98b11de
0047-bgpd-backpressure-fix-evpn-route-sync-to-zebra.patch FRRouting/frr@b47a92e
0048-bgpd-backpressure-fix-to-properly-remove-dest-for-bg.patch FRRouting/frr@4395fcd
0049-bgpd-backpressure-Improve-debuggability.patch FRRouting/frr@186db96
0050-bgpd-backpressure-Avoid-use-after-free.patch FRRouting/frr@40965e5
0051-bgpd-backpressure-fix-ret-value-evpn_route_select_in.patch FRRouting/frr@c4bbb5b
0052-bgpd-backpressure-log-error-for-evpn-when-route-inst.patch FRRouting/frr@6cf5b79
0055-bgpd-lib-Include-SID-structure-in-seg6local-nexthop.patch FRRouting/frr@0402551
0059-Fix-BGP-reset-on-suppress-fib-pending-configuration.patch FRRouting/frr#17487
0060-bgpd-Validate-both-nexthop-information-NEXTHOP-and-N.patch FRRouting/frr@a0d2734
0061-dont-print-warning-if-not-a-daemon.patch FRRouting/frr@cecf571
0062-zebra-lib-use-internal-rbtree-per-ns.patch FRRouting/frr#17297
0064-SRv6-BGP-SID-reachability.patch FRRouting/frr#14810
0065-zebra-display-srv6-encapsulation-source-address-when-configured.patch FRRouting/frr@890b67d
0066-lib-fix-srv6-locator-flags-propagated-to-isis.patch FRRouting/frr@03d2ad0
0067-Add-support-for-SRv6-SID-Manager.patch FRRouting/frr#15604
0068-bgpd-Extend-BGP-to-communicate-with-the-SRv6-SID-Manager-to-allocate-release-SRv6-SIDs.patch FRRouting/frr#15676
0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr@0bc79f5
0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr@9f8968f
0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062
0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076
0073-remove-in6addr-cmp.patch FRRouting/frr#17312
0074-bgp-best-port-reordering.patch FRRouting/frr#15572
0075-bgp-mp-info-changes.patch FRRouting/frr#16961
0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
0077-frr-vtysh-dependencies-for-srv6-static-patches.patch FRRouting/frr@fd8edc3
0078-vtysh-de-conditionalize-and-reorder-install-node.patch FRRouting/frr@e26c580
0079-staticd-add-support-for-srv6.patch FRRouting/frr#16894
0081-bgpd-Optimize-evaluate-paths-for-a-peer-going-down.patch FRRouting/frr@9f55368
Realigned patches:
Patch
0001-Reduce-severity-of-Vty-connected-from-message.patch
0002-Allow-BGP-attr-NEXT_HOP-to-be-0.0.0.0-due-to-allevia.patch
0003-nexthops-compare-vrf-only-if-ip-type.patch
0004-frr-remove-frr-log-outchannel-to-var-log-frr.log.patch
0005-Add-support-of-bgp-l3vni-evpn.patch
0006-Link-local-scope-was-not-set-while-binding-socket-for-bgp-ipv6-link-local-neighbors.patch
0007-ignore-route-from-default-table.patch
0008-Use-vrf_id-for-vrf-not-tabled_id.patch
0010-bgpd-Change-log-level-for-graceful-restart-events.patch
0021-Disable-ipv6-src-address-test-in-pceplib.patch
0022-cross-compile-changes.patch
0054-build-dplane-fpm-sonic-module.patch
0056-zebra-do-not-send-local-routes-to-fpm.patch
0057-Adding-changes-to-write-ip-nht-resolve-via-default-c.patch
0058-When-the-file-is-config-replayed-we-cannot-handle-th.patch
0061-Set-multipath-to-514-and-disable-bgp-vnc-for-optimiz.patch
0063-Patch-to-send-tag-value-associated-with-route-via-ne.patch
0080-SRv6-vpn-route-and-sidlist-install.patch
0082-Revert-bgpd-upon-if-event-evaluate-bnc-with-matching.patch
0083-staticd-add-cli-to-support-steering-of-ipv4-traffic-over-srv6-sid-list.patch
0084-lib-Return-duplicate-prefix-list-entry-test.patch
0085-This-error-happens-when-we-try-to-write-to-a-socket.patch

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@ahsalam
Copy link
Copy Markdown

ahsalam commented Apr 8, 2025

@qiluo-msft qiluo-msft requested a review from StormLiangMS April 8, 2025 22:56
@BYGX-wcr
Copy link
Copy Markdown
Contributor

BYGX-wcr commented Apr 9, 2025

@cscarpitta, @ahsalam, the image with the FRR upgrade cannot pass the sanity check and test_pretest.py which seems to be caused by BGP container down. Please check locally.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

…group”.

Assume we have a BGP peer group called `BGPSLBPassive`.
This peer group is attached to a listen range `192.168.0.0/21`.

CONFIG_DB configuration:

> $ sonic-db-cli CONFIG_DB hgetall "BGP_PEER_RANGE|BGPSLBPassive"
> {'ip_range@': '192.168.0.0/21', 'name': 'BGPSLBPassive', 'src_address': '10.1.0.32'}

FRR configuration:

> ...
> router bgp 65100
>  neighbor BGPSLBPassive peer-group
>  bgp listen range 192.168.0.0/21 peer-group BGPSLBPassive
> ...

Now, we delete the peer group `BGPSLBPassive` from CONFIG_DB:

> $ sonic-db-cli CONFIG_DB del BGP_PEER_RANGE|BGPSLBPassive

In response to this, bgpcfgd tries to run the following vtysh command
to delete the peer group from the FRR configuration:

> vtysh -c "no neighbor BGPSLBPassive peer-group"

But the above vtysh command returns an error:

> %%Peer-group BGPSLBPassive is attached to 1 listen-range(s), delete them first

The problem is due to a change in recent versions of FRR.

Starting with FRR 10.1, if a peer group is attached to a "listen range",
the range must be removed before the peer group can be deleted.

In order to fix this issue, this commit does a change in bgpcfgd.

When bgpcfgd has to delete a peer group, it first runs the command
"no bgp listen range ..." to remove the "listen range" associated with
the peer group, and only then proceed with deleting the peer group.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@cscarpitta
Copy link
Copy Markdown
Contributor Author

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Commenter does not have sufficient privileges for PR 22267 in repo sonic-net/sonic-buildimage

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@cscarpitta
Copy link
Copy Markdown
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@cscarpitta
Copy link
Copy Markdown
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@eddieruan-alibaba
Copy link
Copy Markdown
Collaborator

@StormLiangMS, @liushilongbuaa saw the following message in the error . Can you help to rerun it?

Conflict already exists in master
Please wait a few hours to run ms_conflict again!
'/azpw ms_conflict'

@liushilongbuaa
Copy link
Copy Markdown
Contributor

/azpw ms_conflict

@cscarpitta
Copy link
Copy Markdown
Contributor Author

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Commenter does not have sufficient privileges for PR 22267 in repo sonic-net/sonic-buildimage

@cscarpitta
Copy link
Copy Markdown
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@cscarpitta
Copy link
Copy Markdown
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

@BYGX-wcr BYGX-wcr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lguohan lguohan merged commit dbe20fd into sonic-net:master May 8, 2025
19 checks passed
@liuh-80
Copy link
Copy Markdown
Contributor

liuh-80 commented May 12, 2025

@cscarpitta , all PR validation in https://github.com/sonic-net/sonic-swss/pulls failed because this change.

Here is how I validate it:

  1. Checkout latest sonic-buildimage master branch code, build docker-sonic-vs with "make target/docker-sonic-vs.gz" command
  2. Checkout latest https://github.com/sonic-net/sonic-swss code, setup test with https://github.com/sonic-net/sonic-swss/blob/master/tests/README.md
  3. Load docker-sonic-vs.gz in step1 with 'docker load < ./target/docker-sonic-vs.gz' command
  4. Run a failed test case, for example: 'sudo py.test -v --force-flaky --force-recreate-dvs --imgname=docker-sonic-vs:latest test_srv6.py::TestSrv6MySidFpmsyncd::test_Srv6MySidUNTunnelDscpMode'
  5. Check failed test:
    def test_Srv6MySidUNTunnelDscpMode(self, dvs, testlog):
......
        # Create MySID entry with dscp_mode pipe
        self.add_mysid_vtysh(dvs, "loc2", mysid2)
    
>       self.pdb.wait_for_entry("SRV6_MY_SID_TABLE", f"32:16:0:0:{mysid2}")
  1. Revert this PR by 'git revert dbe20fd', there is a merge conflict, fix it and commit revert change.
  2. Remove some package to make it rebuild: 'rm target/debs/bookworm/frr_10.3-sonic-0_amd64.deb' and 'rm target/docker-sonic-vs.gz'
  3. Build docker-sonic-vs again, then import the new docker image and run tests again, the failed test case passed.
  4. Enable swss debug log with following change in test_srv6.py and compare success and failed test log, I found following log are missing when test case failed:
    def setup_srv6(self, dvs):
        self.setup_db(dvs)
        dvs.runcmd(f"swssloglevel -d")
        dvs.runcmd(f"swssloglevel -l DEBUG -a") <== enable debug log

Missing log:

May 11 09:35:51.710186 5a14d7a0ce5f INFO #fpmsyncd: :- onSrv6MySidMsg: Rx MsgType:1000 SidValue:fd00:0:1::
May 11 09:35:51.710202 5a14d7a0ce5f INFO #fpmsyncd: :- parseSrv6MySidFormat: Rx Srv6 MySid block_len:32 node_len:16 func_len:0 arg_len:0
May 11 09:35:51.710208 5a14d7a0ce5f INFO #fpmsyncd: :- parseSrv6MySid: Rx block_len:32 node_len:16 func_len:0 arg_len:0 action:un vrf: adj:

Above just 1 example, there are many test cases failed.

@cscarpitta
Copy link
Copy Markdown
Contributor Author

Hi @liuh-80 Thanks for reporting the issue.

I raised two PRs to fix the test failures:

if (CHECK_FLAG(attr->flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP)))
- nh_invalid = (attr->nexthop.s_addr == INADDR_ANY ||
- !ipv4_unicast_valid(&attr->nexthop) ||
+ if (!ipv4_unicast_valid(&attr->nexthop) ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in FRRouting 10.0 or 8.5, the code is
if (attr->nexthop.s_addr == INADDR_ANY || !ipv4_unicast_valid(&attr->nexthop) || bgp_nexthop_self(bgp, afi, type, stype, attr, dest)) return true;
in Frrouting 10.3, the code is updated to
if (CHECK_FLAG(attr->flag, ATTR_FLAG_BIT(BGP_ATTR_NEXT_HOP))) nh_invalid = (attr->nexthop.s_addr == INADDR_ANY || !ipv4_unicast_valid(&attr->nexthop) || bgp_nexthop_self(bgp, afi, type, stype, attr, dest));
so the patch for Frr10.3 should be like this
nh_invalid = (!ipv4_unicast_valid(&attr->nexthop) || bgp_nexthop_self(bgp, afi, type, stype, attr, dest));
not be like this:
if (!ipv4_unicast_valid(&attr->nexthop) || bgp_nexthop_self(bgp, afi, type, stype, attr, dest));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cscarpitta Hi, could you please check this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ouxlwhu
Thanks for reporting this.

I opened a PR to fix the issue: #22910

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants