Skip to content

daemon: filter networks before converting to API types#50860

Merged
corhere merged 3 commits intomoby:masterfrom
corhere:network-filter-iface
Sep 4, 2025
Merged

daemon: filter networks before converting to API types#50860
corhere merged 3 commits intomoby:masterfrom
corhere:network-filter-iface

Conversation

@corhere
Copy link
Contributor

@corhere corhere commented Aug 30, 2025

Compared to those PRs, this one does not change the observable behaviour of the Engine API and refactors the code with fewer throwaway intermediate changes.

- What I did

  • Improved the code to filter a list of networks by a predicate specified in an Engine API request. It is simpler, more readable, more future-proof and more efficient.
    • Networks are filtered before they are converted to API types.
  • Replaced the buggy internal uses of the hidden idOrName term (which matched all networks irrespective of value, as if the term wasn't present) with an implementation that actually works as intended.
  • Moved network filter term validation from the api module to the daemon.

- How I did it
Internally a network is represented by either a libnetwork.Network or a swarmapi.Network. The daemon functions backing the API server map these values to the Engine API network.Inspect type on demand. Since they have to convert, the functions to get a list of networks have to loop over the slice of Networks and append them to a slice of network.Inspect values.

The function used to filter the list of networks by a user-supplied predicate takes a []network.Inspect and returns a shorter slice. Therefore the daemon functions backing the API server have to loop through the list twice: once to convert, and again inside the FilterNetworks function to delete networks from the slice which do not match the filter predicate. Each time an item is deleted from a slice, all items at higher indices need to be copied to lower indices in the backing array to close the hole.

Replace FilterNetworks with a function that accepts a single interface-valued network and returns a boolean. Amend libnetwork.Network and write a thin adapter for swarmapi.Network so both implement the aforementioned interface. The daemon functions can thus filter networks before projecting the values into API structs, and can completely skip over non-matching networks, which cuts down on a nontrivial amount of copying.

Split the validation of the filter predicate from filter evaluation to both make it more ergonomic to use inside loops, and to make invalid states (a filter with an ill-formed predicate) unrepresentable.

See the individual commit messages for more detail.

- How to verify it
Existing unit and integration tests.

- Human readable description for the release notes

- A picture of a cute animal (not mandatory but encouraged)

@corhere corhere added this to the 29.0.0 milestone Aug 30, 2025
@corhere corhere added area/api API kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/networking Networking kind/refactor PR's that refactor, or clean-up code labels Aug 30, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the network filtering implementation to improve performance and code clarity. It moves network filter validation from the API module to the daemon and replaces a two-pass filtering approach with a single-pass solution.

  • Filter networks before converting to API types rather than after conversion
  • Replace buggy idOrName filter term with proper id filter that can optionally match names
  • Move filter validation and creation logic to the daemon layer

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
daemon/server/router/network/network_routes.go Updates network list and retrieval routes to use new filter interface
daemon/server/router/network/backend.go Changes backend interface to use new Filter type instead of filters.Args
daemon/prune.go Refactors network pruning to use new filter system
daemon/network/filter_test.go Updates tests to work with new filter interface and mock objects
daemon/network/filter.go Complete rewrite implementing new Filter struct and filtering logic
daemon/network.go Updates GetNetworks to filter before conversion to API types
daemon/libnetwork/network.go Adds Driver and ContainerAttachments methods for filter interface
daemon/libnetwork/agent.go Adds ServiceAttachments method for filter interface
daemon/cluster/networks.go Updates cluster network handling to use new filter system
daemon/cluster/convert/network.go Adds FilterNetwork adapter for swarmapi.Network
daemon/cluster.go Updates NetworkManager interface to use new Filter type
api/types/network/network.go Removes ValidateFilters function moved to daemon layer

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +280 to +285
// Not tracked in swarmkit
return 0
}

func (nw FilterNetwork) ServiceAttachments() int {
// Not tracked in swarmkit
Copy link

Copilot AI Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments indicate these methods always return 0 because the data is not tracked in swarmkit. This could lead to incorrect filtering behavior when using the 'dangling' filter on swarm networks. Consider adding a more detailed comment explaining the implications of this limitation.

Suggested change
// Not tracked in swarmkit
return 0
}
func (nw FilterNetwork) ServiceAttachments() int {
// Not tracked in swarmkit
// Always returns 0 because the number of container attachments is not tracked in swarmkit.
// WARNING: This means that any filtering or logic (such as the 'dangling' filter) that relies on
// the attachment count will not work correctly for swarm networks. Users and maintainers should
// be aware that this limitation can lead to incorrect results when filtering or querying swarm networks.
return 0
}
func (nw FilterNetwork) ServiceAttachments() int {
// Always returns 0 because the number of service attachments is not tracked in swarmkit.
// WARNING: This means that any filtering or logic (such as the 'dangling' filter) that relies on
// the attachment count will not work correctly for swarm networks. Users and maintainers should
// be aware that this limitation can lead to incorrect results when filtering or querying swarm networks.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +542 to +545
// ServiceAttachments returns len(n.Services()).
func (n *Network) ServiceAttachments() int {
return len(n.Services())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is doing a lot of work to return a count, and the count isn't needed - a bool HasServiceAttachments function could just make the GetTableByNetwork calls n.Services makes, and return true as soon as it sees a non-empty result? (No need for unmarshalling, digging out ports, or grouping tasks by services.)

Maybe for a follow-up? But could make it a bool now - or, perhaps just change the name to make it clearer that it's returning a count, not the attachments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out on these; I wonder if we should start looking at replacing some of these operations with Iterators; in that case we could terminate early without already having paid the cost for collecting / constructing all the data, And for cases where we only need to know if "more than 0" attachments, can do something like;

// ServiceAttachments returns len(n.Services()).
func (n *Network) HasServiceAttachments() bool {
    for range n.ServiceAttachments() {
        return true
    }
    return false
}

Comment on lines 6 to 12
"strings"
"testing"

"gotest.tools/v3/assert"
is "gotest.tools/v3/assert/cmp"

"github.com/moby/moby/api/types/filters"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually only have two groups of imports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but: #50394

@robmry robmry added the release-blocker PRs we want to block a release on label Sep 3, 2025
@corhere corhere force-pushed the network-filter-iface branch from aaf2d6d to 07eafa8 Compare September 3, 2025 22:26
Comment on lines +597 to +603
if filter.Matches(n) {
nr := buildNetworkResource(n)
if config.Detailed {
nr.Containers = buildContainerAttachments(n)
if config.Verbose {
nr.Services = buildServiceAttachments(n)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! On topic of performance; this information had some implications as it was a breaking change and I think we even silently backported this change to the Docker EE releases (or had some tweak to make it configurable there) https://docs.docker.com/reference/api/engine/version-history/#v128-api-changes

The default was for docker network ls to collect all the same information as docker inspect would give, which included details for all containers and services attached. This brought down some UCP clusters, which (I think) was polling networks through Swarm V1, which meant; aggregate all networks, for all nodes in the cluster, which by default included all containers attached; that .... didn't work well with a cluster with a couple of thousand containers, which either resulted in out of memory, or the time to collect the information taking too long, causing UCP to consider the cluster to be down, and starting to kill nodes 😬

Comment on lines +78 to +80
if f.args.Contains("idOrName") &&
!f.args.Match("name", nw.Name()) &&
!f.args.Match("id", nw.Name()) {
Copy link
Member

@thaJeztah thaJeztah Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun discovery here that either I wasn't aware of, or forgotten; looks like Match is not just a "Match full name / id or prefix, but allows for regular expressions!". I'm not even sure if that's documented

// Match returns true if any of the values at key match the source string
func (args Args) Match(field, source string) bool {
if args.ExactMatch(field, source) {
return true
}
fieldValues := args.fields[field]
for name2match := range fieldValues {
match, err := regexp.MatchString(name2match, source)

Not even sure if that's fully working as expected; as it would to an exact match on a string that could technically be a regex (i.e., filtering on (foo|bar) would match an object literally named like that (we don't allow that for most objects, so probably not a real issue).

I think that means we're compiling the same regex for every network in the list while filtering (worse; could be multiple name2match theoretically, and those are not combined into a single regex) 🤔

Match as a name is definitely poorly chosen; even more ironic is that FuzzyMatch is then less "fuzzy" than Match() (and maybe should've been named MatchPrefix?). If the regex is not documented, possibly we could even use FuzzyMatch (or MatchPrefix if we decide to rename) to avoid the regexp.MatchString

// FuzzyMatch returns true if the source matches exactly one value, or the
// source has one of the values as a prefix.
func (args Args) FuzzyMatch(key, source string) bool {
if args.ExactMatch(key, source) {
return true
}
fieldValues := args.fields[key]
for prefix := range fieldValues {
if strings.HasPrefix(source, prefix) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure if that's documented

Looks like we documented the "observed behavior" (or at least, I have a vague memory of a PR updating the documentation when a user reported a bug that it didn't just match the name / prefix, but also if the string contained the name, which was surprising behavior); https://docs.docker.com/reference/cli/docker/network/ls/#filter

The id filter matches on all or part of a network's ID.
...
The name filter matches on all or part of a network's name.

The following filter matches all networks with a name containing the foobar string.

if err != nil {
return Filter{}, errdefs.InvalidParameter(err)
}
seconds, nanoseconds, err := timestamp.ParseTimestamps(ts, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off-topic; perhaps we should rename this to timestamp.Parse() - I thought we did when moving, but maybe we left that for a follow-up exercise 😂

Internally a network is represented by either a libnetwork.Network or a
swarmapi.Network. The daemon functions backing the API server map
these values to the Engine API network.Inspect type on demand. Since
they have to convert, the functions to get a list of networks have to
loop over the slice of Networks and append them to a slice of
network.Inspect values.

The function used to filter the list of networks by a user-supplied
predicate takes a []network.Inspect and returns a shorter slice.
Therefore the daemon functions backing the API server have to loop
through the list twice: once to convert, and again inside the
FilterNetworks function to delete networks from the slice which do not
match the filter predicate. Each time an item is deleted from a slice,
all items at higher indices need to be copied to lower indices in the
backing array to close the hole.

Replace FilterNetworks with a function that accepts a single
interface-valued network and returns a boolean. Amend libnetwork.Network
and write a thin adapter for swarmapi.Network so both implement the
aforementioned interface. The daemon functions can thus filter networks
before projecting the values into API structs, and can completely skip
over non-matching networks, which cuts down on a nontrivial amount of
copying.

Split the validation of the filter predicate from filter evaluation to
both make it more ergonomic to use inside loops, and to make invalid
states (a filter with an ill-formed predicate) unrepresentable.

Signed-off-by: Cory Snider <csnider@mirantis.com>
Construct a network.Filter from the filters.Args only once per API
request so we don't waste cycles re-validating an already validated
filter. Since (*Daemon).NetworksPrune is implemented in terms of
(Cluster).GetNetworks, that method now accepts a network.Filter instead
of a filter.Args. Change the signature of (*Daemon).GetNetworks for
consistency as both of the GetNetworks methods are used by network API
route handlers.

Signed-off-by: Cory Snider <csnider@mirantis.com>
Filter-term validation does not belong in the API module. Clients should
not be making any assumptions about which terms the daemon understands.
Users should not need to upgrade their clients to use filter terms
introduced in a newer daemon. Move the network filter validation from
the api module into the daemon.

Split network.NewFilter into network.NewListFilter and
network.NewPruneFilter constructors which validate the filter terms,
enforcing the invariant that any network.Filter is a well-formed filter
for networks.

The network route handlers have been leveraging a hidden 'idOrName'
filter term that is not listed in the set of accepted filters and
therefore not accepted in API client requests. And it's a good thing
that it was never part of the API: it is completely broken and not fit
for purpose! When a filter contains an idOrName term, the term values
are ignored and instead the filter tests whether either the 'id' or
'name' terms match the Name of the network. Unless the filter contains
both 'id' and 'name' terms, the match will evaluate to true for all
networks! None of the daemon-internal users of 'idOrName' set either
of those terms, therefore it has the same effect as if the filter did
not contain the 'idOrName' term in the first place.

Filtering networks by id-or-name is a quirky thing that the daemon needs
to do to uphold its end of the Engine API contract, not something that
would be of use to clients. Fixing up the idOrName filter would
necessitate adding it to the list of accepted terms so the filter passes
validaton, which would have the side effect of also making the filter
available to API clients. Instead, add an exported field to the Filter
struct so that daemon code can opt into the internal-only behaviour of
having the 'id' term match on either the network Name or ID.

Signed-off-by: Cory Snider <csnider@mirantis.com>
@corhere corhere force-pushed the network-filter-iface branch from c68f73e to f8bd170 Compare September 4, 2025 16:49
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@corhere corhere merged commit 291e129 into moby:master Sep 4, 2025
177 of 178 checks passed
@corhere corhere deleted the network-filter-iface branch September 4, 2025 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api API area/networking Networking kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. kind/refactor PR's that refactor, or clean-up code release-blocker PRs we want to block a release on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants