Skip to content

BFD support#4852

Merged
rcgoodfellow merged 9 commits into
mainfrom
bfd
Jan 31, 2024
Merged

BFD support#4852
rcgoodfellow merged 9 commits into
mainfrom
bfd

Conversation

@rcgoodfellow

@rcgoodfellow rcgoodfellow commented Jan 19, 2024

Copy link
Copy Markdown
Contributor

This is a staging PR and should most likely be pulled into

Here we simply pass through BFD commands to the underlying mgd daemons on the switches. No attempt is made to add BFD to the database schema or persist BFD information. As that would likely conflict with #4822.

The purpose of the RFD is to set up the scaffolding and API interfaces for BFD to work end-to-end, and to do some interim testing without the benefit of persistence.

Depends on

@rcgoodfellow

rcgoodfellow commented Jan 19, 2024

Copy link
Copy Markdown
Contributor Author

I've discussed this PR with @internet-diglett. What we'd like to do is get this reviewed and merged into main, in this partially complete form (sans-db-persistence). And then he'll pick it up in #4822. This is to avoid building in db-persistence for BFD that would just have to be torn down and redone in #4822.

This PR now comes with database plumbing and an RPW that manages BFD on the rack switches.

@rcgoodfellow

Copy link
Copy Markdown
Contributor Author

Testing notes. On a4x2 I am able to test this as follows.

Set up BFD via Omicron API

oxide system networking bfd enable --detection-threshold 3 --mode single_hop --remote 198.51.101.1 --required-rx 1000000 --switch switch0
oxide system networking bfd enable --detection-threshold 3 --mode single_hop --remote 198.51.101.9 --required-rx 1000000 --switch switch0
oxide system networking bfd enable --detection-threshold 3 --mode single_hop --remote 198.51.101.13 --required-rx 1000000 --switch switch1
oxide system networking bfd enable --detection-threshold 3 --mode single_hop --remote 198.51.101.5 --required-rx 1000000 --switch switch1

Query BFD

oxide system networking bfd status
success
[
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.1,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.9,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.13,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.5,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
]

Testing BFD link detection

From the host machine running the falcon topology

pfexec dladm set-linkprop a4x2_g3_sn_vnic7 -p maxbw=0

note that a BFD session is now down

oxide system networking bfd status
success
[
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.1,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.9,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.13,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.5,
        required_rx: 1000000,
        state: Down,
        switch: Name(
            "switch1",
        ),
    },
]

restore the link

pfexec dladm reset-linkprop a4x2_g3_sn_vnic7 -p maxbw

everyone should be back up

oxide system networking bfd status
success
[
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.1,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.9,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch0",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.13,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
    BfdStatus {
        detection_threshold: 3,
        local: Some(
            0.0.0.0,
        ),
        mode: SingleHop,
        peer: 198.51.101.5,
        required_rx: 1000000,
        state: Up,
        switch: Name(
            "switch1",
        ),
    },
]

@internet-diglett internet-diglett left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, verified functionality in a4x2. Really cool seeing all of this come together!

impl From<BfdSession> for BfdSessionKey {
fn from(value: BfdSession) -> Self {
Self {
switch: value.switch.parse().unwrap(), //TODO unwrap

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we handling this in a follow up issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants