daemon: filter networks before converting to API types#50860
daemon: filter networks before converting to API types#50860corhere merged 3 commits intomoby:masterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the network filtering implementation to improve performance and code clarity. It moves network filter validation from the API module to the daemon and replaces a two-pass filtering approach with a single-pass solution.
- Filter networks before converting to API types rather than after conversion
- Replace buggy
idOrNamefilter term with properidfilter that can optionally match names - Move filter validation and creation logic to the daemon layer
Reviewed Changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| daemon/server/router/network/network_routes.go | Updates network list and retrieval routes to use new filter interface |
| daemon/server/router/network/backend.go | Changes backend interface to use new Filter type instead of filters.Args |
| daemon/prune.go | Refactors network pruning to use new filter system |
| daemon/network/filter_test.go | Updates tests to work with new filter interface and mock objects |
| daemon/network/filter.go | Complete rewrite implementing new Filter struct and filtering logic |
| daemon/network.go | Updates GetNetworks to filter before conversion to API types |
| daemon/libnetwork/network.go | Adds Driver and ContainerAttachments methods for filter interface |
| daemon/libnetwork/agent.go | Adds ServiceAttachments method for filter interface |
| daemon/cluster/networks.go | Updates cluster network handling to use new filter system |
| daemon/cluster/convert/network.go | Adds FilterNetwork adapter for swarmapi.Network |
| daemon/cluster.go | Updates NetworkManager interface to use new Filter type |
| api/types/network/network.go | Removes ValidateFilters function moved to daemon layer |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Not tracked in swarmkit | ||
| return 0 | ||
| } | ||
|
|
||
| func (nw FilterNetwork) ServiceAttachments() int { | ||
| // Not tracked in swarmkit |
There was a problem hiding this comment.
The comments indicate these methods always return 0 because the data is not tracked in swarmkit. This could lead to incorrect filtering behavior when using the 'dangling' filter on swarm networks. Consider adding a more detailed comment explaining the implications of this limitation.
| // Not tracked in swarmkit | |
| return 0 | |
| } | |
| func (nw FilterNetwork) ServiceAttachments() int { | |
| // Not tracked in swarmkit | |
| // Always returns 0 because the number of container attachments is not tracked in swarmkit. | |
| // WARNING: This means that any filtering or logic (such as the 'dangling' filter) that relies on | |
| // the attachment count will not work correctly for swarm networks. Users and maintainers should | |
| // be aware that this limitation can lead to incorrect results when filtering or querying swarm networks. | |
| return 0 | |
| } | |
| func (nw FilterNetwork) ServiceAttachments() int { | |
| // Always returns 0 because the number of service attachments is not tracked in swarmkit. | |
| // WARNING: This means that any filtering or logic (such as the 'dangling' filter) that relies on | |
| // the attachment count will not work correctly for swarm networks. Users and maintainers should | |
| // be aware that this limitation can lead to incorrect results when filtering or querying swarm networks. |
5c25e1d to
aaf2d6d
Compare
daemon/libnetwork/agent.go
Outdated
| // ServiceAttachments returns len(n.Services()). | ||
| func (n *Network) ServiceAttachments() int { | ||
| return len(n.Services()) | ||
| } |
There was a problem hiding this comment.
It looks like this is doing a lot of work to return a count, and the count isn't needed - a bool HasServiceAttachments function could just make the GetTableByNetwork calls n.Services makes, and return true as soon as it sees a non-empty result? (No need for unmarshalling, digging out ports, or grouping tasks by services.)
Maybe for a follow-up? But could make it a bool now - or, perhaps just change the name to make it clearer that it's returning a count, not the attachments?
There was a problem hiding this comment.
Good call out on these; I wonder if we should start looking at replacing some of these operations with Iterators; in that case we could terminate early without already having paid the cost for collecting / constructing all the data, And for cases where we only need to know if "more than 0" attachments, can do something like;
// ServiceAttachments returns len(n.Services()).
func (n *Network) HasServiceAttachments() bool {
for range n.ServiceAttachments() {
return true
}
return false
}
daemon/network/filter_test.go
Outdated
| "strings" | ||
| "testing" | ||
|
|
||
| "gotest.tools/v3/assert" | ||
| is "gotest.tools/v3/assert/cmp" | ||
|
|
||
| "github.com/moby/moby/api/types/filters" |
There was a problem hiding this comment.
We usually only have two groups of imports?
aaf2d6d to
07eafa8
Compare
| if filter.Matches(n) { | ||
| nr := buildNetworkResource(n) | ||
| if config.Detailed { | ||
| nr.Containers = buildContainerAttachments(n) | ||
| if config.Verbose { | ||
| nr.Services = buildServiceAttachments(n) | ||
| } |
There was a problem hiding this comment.
Oh! On topic of performance; this information had some implications as it was a breaking change and I think we even silently backported this change to the Docker EE releases (or had some tweak to make it configurable there) https://docs.docker.com/reference/api/engine/version-history/#v128-api-changes
The default was for docker network ls to collect all the same information as docker inspect would give, which included details for all containers and services attached. This brought down some UCP clusters, which (I think) was polling networks through Swarm V1, which meant; aggregate all networks, for all nodes in the cluster, which by default included all containers attached; that .... didn't work well with a cluster with a couple of thousand containers, which either resulted in out of memory, or the time to collect the information taking too long, causing UCP to consider the cluster to be down, and starting to kill nodes 😬
daemon/network/filter.go
Outdated
| if f.args.Contains("idOrName") && | ||
| !f.args.Match("name", nw.Name()) && | ||
| !f.args.Match("id", nw.Name()) { |
There was a problem hiding this comment.
Fun discovery here that either I wasn't aware of, or forgotten; looks like Match is not just a "Match full name / id or prefix, but allows for regular expressions!". I'm not even sure if that's documented
moby/vendor/github.com/moby/moby/api/types/filters/parse.go
Lines 159 to 167 in 6c7e290
Not even sure if that's fully working as expected; as it would to an exact match on a string that could technically be a regex (i.e., filtering on (foo|bar) would match an object literally named like that (we don't allow that for most objects, so probably not a real issue).
I think that means we're compiling the same regex for every network in the list while filtering (worse; could be multiple name2match theoretically, and those are not combined into a single regex) 🤔
Match as a name is definitely poorly chosen; even more ironic is that FuzzyMatch is then less "fuzzy" than Match() (and maybe should've been named MatchPrefix?). If the regex is not documented, possibly we could even use FuzzyMatch (or MatchPrefix if we decide to rename) to avoid the regexp.MatchString
moby/vendor/github.com/moby/moby/api/types/filters/parse.go
Lines 228 to 237 in 6c7e290
There was a problem hiding this comment.
I'm not even sure if that's documented
Looks like we documented the "observed behavior" (or at least, I have a vague memory of a PR updating the documentation when a user reported a bug that it didn't just match the name / prefix, but also if the string contained the name, which was surprising behavior); https://docs.docker.com/reference/cli/docker/network/ls/#filter
The
idfilter matches on all or part of a network's ID.
...
Thenamefilter matches on all or part of a network's name.The following filter matches all networks with a name containing the
foobarstring.
| if err != nil { | ||
| return Filter{}, errdefs.InvalidParameter(err) | ||
| } | ||
| seconds, nanoseconds, err := timestamp.ParseTimestamps(ts, 0) |
There was a problem hiding this comment.
off-topic; perhaps we should rename this to timestamp.Parse() - I thought we did when moving, but maybe we left that for a follow-up exercise 😂
07eafa8 to
c68f73e
Compare
Internally a network is represented by either a libnetwork.Network or a swarmapi.Network. The daemon functions backing the API server map these values to the Engine API network.Inspect type on demand. Since they have to convert, the functions to get a list of networks have to loop over the slice of Networks and append them to a slice of network.Inspect values. The function used to filter the list of networks by a user-supplied predicate takes a []network.Inspect and returns a shorter slice. Therefore the daemon functions backing the API server have to loop through the list twice: once to convert, and again inside the FilterNetworks function to delete networks from the slice which do not match the filter predicate. Each time an item is deleted from a slice, all items at higher indices need to be copied to lower indices in the backing array to close the hole. Replace FilterNetworks with a function that accepts a single interface-valued network and returns a boolean. Amend libnetwork.Network and write a thin adapter for swarmapi.Network so both implement the aforementioned interface. The daemon functions can thus filter networks before projecting the values into API structs, and can completely skip over non-matching networks, which cuts down on a nontrivial amount of copying. Split the validation of the filter predicate from filter evaluation to both make it more ergonomic to use inside loops, and to make invalid states (a filter with an ill-formed predicate) unrepresentable. Signed-off-by: Cory Snider <csnider@mirantis.com>
Construct a network.Filter from the filters.Args only once per API request so we don't waste cycles re-validating an already validated filter. Since (*Daemon).NetworksPrune is implemented in terms of (Cluster).GetNetworks, that method now accepts a network.Filter instead of a filter.Args. Change the signature of (*Daemon).GetNetworks for consistency as both of the GetNetworks methods are used by network API route handlers. Signed-off-by: Cory Snider <csnider@mirantis.com>
Filter-term validation does not belong in the API module. Clients should not be making any assumptions about which terms the daemon understands. Users should not need to upgrade their clients to use filter terms introduced in a newer daemon. Move the network filter validation from the api module into the daemon. Split network.NewFilter into network.NewListFilter and network.NewPruneFilter constructors which validate the filter terms, enforcing the invariant that any network.Filter is a well-formed filter for networks. The network route handlers have been leveraging a hidden 'idOrName' filter term that is not listed in the set of accepted filters and therefore not accepted in API client requests. And it's a good thing that it was never part of the API: it is completely broken and not fit for purpose! When a filter contains an idOrName term, the term values are ignored and instead the filter tests whether either the 'id' or 'name' terms match the Name of the network. Unless the filter contains both 'id' and 'name' terms, the match will evaluate to true for all networks! None of the daemon-internal users of 'idOrName' set either of those terms, therefore it has the same effect as if the filter did not contain the 'idOrName' term in the first place. Filtering networks by id-or-name is a quirky thing that the daemon needs to do to uphold its end of the Engine API contract, not something that would be of use to clients. Fixing up the idOrName filter would necessitate adding it to the list of accepted terms so the filter passes validaton, which would have the side effect of also making the filter available to API clients. Instead, add an exported field to the Filter struct so that daemon code can opt into the internal-only behaviour of having the 'id' term match on either the network Name or ID. Signed-off-by: Cory Snider <csnider@mirantis.com>
c68f73e to
f8bd170
Compare
Compared to those PRs, this one does not change the observable behaviour of the Engine API and refactors the code with fewer throwaway intermediate changes.
- What I did
idOrNameterm (which matched all networks irrespective of value, as if the term wasn't present) with an implementation that actually works as intended.- How I did it
Internally a network is represented by either a
libnetwork.Networkor aswarmapi.Network. The daemon functions backing the API server map these values to the Engine APInetwork.Inspecttype on demand. Since they have to convert, the functions to get a list of networks have to loop over the slice ofNetworksand append them to a slice ofnetwork.Inspectvalues.The function used to filter the list of networks by a user-supplied predicate takes a
[]network.Inspectand returns a shorter slice. Therefore the daemon functions backing the API server have to loop through the list twice: once to convert, and again inside theFilterNetworksfunction to delete networks from the slice which do not match the filter predicate. Each time an item is deleted from a slice, all items at higher indices need to be copied to lower indices in the backing array to close the hole.Replace FilterNetworks with a function that accepts a single interface-valued network and returns a boolean. Amend
libnetwork.Networkand write a thin adapter forswarmapi.Networkso both implement the aforementioned interface. The daemon functions can thus filter networks before projecting the values into API structs, and can completely skip over non-matching networks, which cuts down on a nontrivial amount of copying.Split the validation of the filter predicate from filter evaluation to both make it more ergonomic to use inside loops, and to make invalid states (a filter with an ill-formed predicate) unrepresentable.
See the individual commit messages for more detail.
- How to verify it
Existing unit and integration tests.
- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)