Skip to content

Silent path loss during RIB update for distinct peers with identical path attributes #649

@taspelund

Description

@taspelund

When multiple BGP unnumbered peers exist and have the same link-local IP, it is possible for insertion of a RIB path to overwrite an existing path instead of being inserted alongside it. This was noticed at a customer site where 6 BGP unnumbered peers were established and advertising us default routes (v4 and v6), but mgadm rib status imported showed only 1 or 2 paths instead of 6.

When digging into this further, I found that the issue is in the Ord impl for the rdb Path type.
The Ord impl looks like this:

impl Ord for Path {
    fn cmp(&self, other: &Self) -> Ordering {
        // Paths from the same source are considered equal for set membership.
        // This enables BTreeSet::replace() to update paths from the same source.
        //
        // BGP paths: identified by peer IP
        if let (Some(a), Some(b)) = (&self.bgp, &other.bgp)
            && a.peer == b.peer
        {
            return Ordering::Equal;
        }

        // Static paths: identified by (nexthop, nexthop_interface, vlan_id)
        if self.bgp.is_none()
            && other.bgp.is_none()
            && self.nexthop == other.nexthop
            && self.nexthop_interface == other.nexthop_interface
            && self.vlan_id == other.vlan_id
        {
            return Ordering::Equal;
        }
        if self.nexthop != other.nexthop {
            return self.nexthop.cmp(&other.nexthop);
        }
        if self.shutdown != other.shutdown {
            return self.shutdown.cmp(&other.shutdown);
        }
        if self.rib_priority != other.rib_priority {
            return self.rib_priority.cmp(&other.rib_priority);
        }
        self.bgp.cmp(&other.bgp)
    }
}

impl PartialOrd for BgpPathProperties {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}
impl Ord for BgpPathProperties {
    fn cmp(&self, other: &Self) -> Ordering {
        if self.origin_as != other.origin_as {
            return self.origin_as.cmp(&other.origin_as);
        }
        if self.id != other.id {
            return self.id.cmp(&other.id);
        }
        // MED should *not* be used as a basis for comparison. Paths with
        // distinct MED values are not distinct paths.
        if self.as_path != other.as_path {
            return self.as_path.cmp(&other.as_path);
        }
        self.stale.cmp(&other.stale)
    }
}

and when BGP routes are inserted, we call .replace() on the BTreeSet to insert a path and overwrite it upon conflict.
The result is that a Path .replace() call will overwrite any other Paths anytime we have multiple BGP paths with the following properties:

DIFFERENT:

  • PeerId (unnumbered interface or peer ip)

SAME:

  • nexthop
  • shutdown (bool for BFD status, not used for BGP)
  • rib_priority (currently set to 20 for all BGP routes)
  • origin_as (peer's ASN)
  • id (peer's Router ID)
  • as_path
  • stale (Option wrapping timestamp for route refresh reaping)

This impacts all insertions into rib_in (the adj-rib-in which collects all valid paths regardless of selection for use in the data plane), which thus limits the paths seen by bestpath + installed into the rib_loc (the paths selected for installation into ddm/dpd).

Metadata

Metadata

Assignees

Labels

BugbgpBorder Gateway ProtocolcustomerFor any bug reports or feature requests tied to customer requestsmgdMaghemite daemonrelease notesreminder to include this in the release notes

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions