docs(ADR): decouple flag sources and flag sets #1644
Conversation
✅ Deploy Preview for polite-licorice-3db33c canceled.
|
b95bc5b to
ddd56aa
Compare
Signed-off-by: Hugo Huang <lorqor@gmail.com>
ddd56aa to
a8c9d09
Compare
|
|
||
| Assumptions of the current model | ||
|
|
||
| - `flagSetId`s must be unique across different sources or the configuration is considered invalid. |
There was a problem hiding this comment.
What does this mean exactly? Does it mean that a flag set id is only allowed to be contained in a single source?
There was a problem hiding this comment.
This is the assumption held in PR #1634.
My interpretation for this is: it's an undefined behavior if multiple sources provides different flag set IDs for the same flag key.
There was a problem hiding this comment.
According to flagd syncs merge strategy I would expect the last sync to win.
Maybe we should add an additional underlying assumption, that the flagId must be unique across different flag sets as well.
There was a problem hiding this comment.
Yes, this is the current behavior as there lacks a definition of order/priority of sources/flagsets. It's not a very good approach as the result is NOT deterministic and depends on the order of concurrent events.
There was a problem hiding this comment.
Yes I agree this is a limitation in our current modal which we need to make more robust.
|
|
||
| #### Flag Configuration Schema | ||
|
|
||
| Add an optional field `flagsetID` under `flag` or `flag.metadata`. The flag set ID cannot be specified if a flag set ID is specified for the config. |
There was a problem hiding this comment.
This nicely fits into the existing flagSetId defined in the schema.
| selector: source=override | ||
|
|
||
| # Flags from the flag set `project-42` | ||
| selector: flagsetID=project-42 |
There was a problem hiding this comment.
If we use the flagSetId from the schema it should be something like metadata.flagSetId=project42.
There was a problem hiding this comment.
I've considered the option and didn't chose it, mostly to keep the syntax simple for now (so it can extended later in a flexible way).
| selector: override | ||
|
|
||
| # Flags from the source `override` | ||
| selector: source=override |
There was a problem hiding this comment.
Wondering if we should inject the source as metadata prop into every flag. Then we wouldn't have to distinguish between selecting a source, or flagSetId, or anything else.
There was a problem hiding this comment.
This cannot be done in the flag config as the source itself does not know its name. In the flag store, it's currently under model.flag as a separate field and metadata is a map. On the technically perspective keeping it as a field (especially it's required) it is a better option to me. On the semantic level (for selector), metadata.source makes more sense to me.
|
|
||
| ## Background | ||
|
|
||
| Flagd daemon syncs flag configurations from multiple sources. A single source provides a single config, which has an optional flag set ID that may or may not change in the following syncs of the same source. |
There was a problem hiding this comment.
which has an optional flag set ID that may or may not change in the following syncs of the same source.
This is a good point. I think we need to make sure we do support changing this value and all the expected behavior associated with that.
There was a problem hiding this comment.
@tangenti for example, if the flagSetId is removed from a flag, we probably need to recompute the sets for listeners on that flagSet not to include that flag now. To put it another way, changes in the flag set could result in effective removal from listener sets. This isn't a problem, but we need to make sure we implement it this way, I think.
|
|
||
| #### Flag Configuration Schema | ||
|
|
||
| Add an optional field `flagsetID` under `flag` or `flag.metadata`. The flag set ID cannot be specified if a flag set ID is specified for the config. |
There was a problem hiding this comment.
| Add an optional field `flagsetID` under `flag` or `flag.metadata`. The flag set ID cannot be specified if a flag set ID is specified for the config. | |
| Add an optional field `flagsetId` under `metadata` or `flags.{flagKey}.metadata`. The flag set ID cannot be specified if a flag set ID is specified for the config. |
Feel free to accept or not, but this is what we currently support in our config, it just doesn't have most of the behavior you describe, but I think such behavior could be added using these existing fields. See playground.
Also here.
There was a problem hiding this comment.
I'm aware of that, and this is the config-level flag set ID that I mentioned in the second sentence.
It will be used as the flag set ID for every flag in the config, if the flag level flag set ID is not set.
| ### Flagd Daemon Storage | ||
|
|
||
| 1. Flagd will have separate stores for `flags` and `sources` | ||
|
|
||
| 2. `selector` will be removed from the store | ||
|
|
||
| 3. `flagSetID` will be added as part of `model.Flag` or under `model.Flag.Metadata` for better consistency with the API. |
There was a problem hiding this comment.
From an implementation perspective, do you see a benefit in changing flagd's internal representation of flags to support the flagsetID=project-42 use-case optimally? If we expect this to be a common selection, we could benefit from actually "storing" flags in memory as a "map of maps" (a map of flagSets) so that sets can easily be "plucked" instead of iterating.
Not critical for this ADR, just wondering.
There was a problem hiding this comment.
we could benefit from actually "storing" flags in memory as a "map of maps" (a map of flagSets) so that sets can easily be "plucked" instead of iterating.
I have the same idea as what Guido brought up in this comment - flag set ID is just as the other metadata of a flag, and I don't see a good reason to structure the internal storage by flag set ID.
Indexing is a different problem than storing. We could build indexing over flag sets and sources if that iterating flags turns out to be a performance issue (which I doubt).
toddbaert
left a comment
There was a problem hiding this comment.
I can support this proposal. I like the query expression idea. My main concern is if this can/should be done with the existing flagSetId field. I would certainly advocate for that to reduce breakage and redundancy. See https://github.com/open-feature/flagd/pull/1644/files#r2150722702.
Thanks Todd. This proposal is adding another more granularity to the existing flag set ID - it will be kept and used if the server would like to keep a config-level flag set ID. |
|
The more I think about this proposal, the more I like it. They don't need to be supported now, but I can think of a few very realistic "selector queries" beyond the examples you've posted:
That said, I do also have some implementation concerns which I think should be discussed briefly before we can accept it, even if we are only currently supporting However, this made me wonder if perhaps we should use some kind of in-memory DB (for example https://github.com/hashicorp/go-memdb)... this way we could offload all sorts of query code to that, benefiting from the performance of that DB, and accept various sorts of expressions as the selector. cc @beeme1mr @chrfwow @alexandraoberaigner @guidobrei @aepfli |
|
Thanks @toddbaert, When you talk about a large amount of flags, do you have a solid number in mind of what this means? 1k flags and 1M flags require different optimizations for implementations. I think a good approach here is to do a quick measurement with a reasonable number of flags on the critical path, especially the time it takes to recalculate the flags for each sub/listener. For the idea of an in-memory DB, I think there are more questions to answered:
|
|
|
||
| Sync server would count the extended syntax of `selector` and filter the list of flags on-the-fly answering the requests from the providers. | ||
|
|
||
| The existing conflict resolving based on sources remains the same. Resyncs on removing flags remains unchanged as well. |
There was a problem hiding this comment.
The existing conflict resolving based on sources remains the same.
I think this may be a problem. Iirc, our current conflict resolving strategy is to only use the last flag definition with a key, overriding flags from other flag sets if they share the same key. We could fix this by e.g. prepending the flag set id to the flag key, but I would like a solution that can handle duplicate flag keys in different flag sets natively.
I'd say on the order of 10k flags for our use case.
Agreed. But brings up another point: What this ADR doesn't fully address is still a means to define multiple sets of colliding flags in a single sync resource. At least for our internal purposes, while we can use the {
"$schema": "https://flagd.dev/schema/v0/flags.json",
"flags": {
"flag1": {
"state": "ENABLED",
"defaultVariant": "true",
"variants": {
"true": true,
"false": false
},
"targeting": {}
},
"flag1": {
"state": "ENABLED",
"defaultVariant": "false",
"variants": {
"true": true,
"false": false
},
"targeting": {}
}
}
}Currently I think others are dealing with this by prepending some sort of prefix to flag keys and then removing it, but that's exactly the thing we'd hoped the proposal here would remove the need for. cc @juanparadox as I believe this is your challenge too. Could we support the selector expressions in this proposal, but the schema changes from #1634 ? |
This is a new requirement and seems not brought up in #1634? I'm not so sure if this is broadly applicable to most user cases. I think we'll need more discussions to understand the requirement better, in particular
I believe multi-tenancy support is needed, but the current approach increases the complexity a lot and doesn't provide full isolation from flagd: the resources are not isolated so a high QPS from one tenant can make flagd unavailable for all other tenants; data level isolation seems not guaranteed as well |
Perhaps we did a poor job communicating some of the requirements. This is one of the main challenges we're facing at Dynatrace, and I think @juanparadox is having the same issue with their adoption. We tried to describe that here but I agree it wasn't made very obvious.
I'm not sure why it's implied that the flags must be the same type when their keys match across sets, maybe I'm missing something... I think the set you select comes with implicit assumptions about the flag types within that set. I'm not sure I see a problem with a situation where This just seems like a special case of flag-keys existing in multiple sets. 🤔 If it were to happen, you'd have to correct your selector to get the flag you expect to evaluate.
I think some kind of client rate-limiting could be discussed but it's out of scope of this proposal. "Tenants" in our case are just different teams and they are already sharing a central flagd implementation that's not isolating "noisy neighbors". All our clients are doing local evaluation anyway. I think a possible solution could be to accept the selector proposal here, and the schema changes in #1634. With that, we'd have:
If you think a meeting is necessary I'm open to that. |
The goal of this decision document is to establish flag sets as a first class concept in `flagd`, and support the dynamic addition/update/removal of flag sets at runtime. See document for all justifications and background. @dominikhaska @tangenti @cupofcat Alternative proposal here: #1644 (comment) --------- Signed-off-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com> Signed-off-by: Todd Baert <todd.baert@dynatrace.com> Co-authored-by: Alexandra Oberaigner <alexandra.oberaigner@dynatrace.com> Co-authored-by: Simon Schrottner <simon.schrottner@dynatrace.com>
|
|
||
| #### Flag Configuration Schema | ||
|
|
||
| Add an optional field `flagsetID` under `flag` or `flag.metadata`. The flag set ID cannot be specified if a flag set ID is specified for the config. |
There was a problem hiding this comment.
Unless I'm misunderstanding, this already exists.
beeme1mr
left a comment
There was a problem hiding this comment.
@toddbaert showed me a working prototype that convinced me this was the best approach. I think it strikes a nice balance between power when necessary and simplicity when it isn't.
This PR contains no behavioral changes, and no breaking changes. It lays the groundwork for some of the upcoming improvements we want in flagd, as specified in recent ADRs. It does this by re-implementing our storage layer to use [go-memdb](https://github.com/hashicorp/go-memdb), an open source in-memory database developed by Hashicorp and used in [Consul](https://developer.hashicorp.com/consul) (as well as MANY other projects). ### Why? This will allow us to easily support: - duplicate flag keys (namespaced by flagSetId), as mentioned in [this ADR](https://github.com/open-feature/flagd/blob/main/docs/architecture-decisions/duplicate-flag-keys.md) - robust flag selectors, as mentioned in [this ADR](#1644), by supporting "watchers" which allow us to "listen" to change in flags returned by a given query 🕺 - robust (and possibly, in the future, even customizable) indexing on arbitrary attributes (to easily support fetching "all boolean flags", or "flags with metadata = xyz", etc) - cross-"table" transactions I have already PoC'd that these are all practical. ### Changes in implementation - the `store` package's `State` is no longer just a struct; it's a object with methods wrapping the internal db - the `store` package's `State` was renamed to `Store` and the file was renamed from `flags.go` to `store.go`, since it's ceased to be a simple stateful object, and for consistency - a non-serialized (used only internally) `Key` field was added to the flag type (for indexing) - a new constructor for the `Store` was added which takes a logger and returns an error, the old was deprecated to avoid breaking changes in consumers (the Go flagd provider, mostly) Note that the go-memdb dependency is MPL2, which is not allowed by the CNCF, however, go-memdb is already used in CNCF projects and has a [special exception](https://github.com/cncf/foundation/blob/347a55dc0a6c6dc3d55a9a782e5701080a8ec43b/license-exceptions/cncf-exceptions-2023-08-31.json#L7-L11). ### Perfromance There was no significant change in performance, see benchmark diff vs main below: <details> <summary>Benchmark diff vs main</summary> ```diff -BenchmarkFractionalEvaluation/test_c@faas.com-16 559051 13832 ns/op 7229 B/op 135 allocs/op -BenchmarkFractionalEvaluation/test_d@faas.com-16 611665 13106 ns/op 7229 B/op 135 allocs/op -BenchmarkFractionalEvaluation/test_a@faas.com-16 383074 13433 ns/op 7229 B/op 135 allocs/op -BenchmarkFractionalEvaluation/test_b@faas.com-16 529185 12190 ns/op 7229 B/op 135 allocs/op -BenchmarkResolveBooleanValue/test_staticBoolFlag-16 3929409 1712 ns/op 1008 B/op 11 allocs/op -BenchmarkResolveBooleanValue/test_targetingBoolFlag-16 812671 10276 ns/op 6065 B/op 87 allocs/op -BenchmarkResolveBooleanValue/test_staticObjectFlag-16 4398327 1700 ns/op 1008 B/op 11 allocs/op -BenchmarkResolveBooleanValue/test_missingFlag-16 4541409 1494 ns/op 784 B/op 12 allocs/op -BenchmarkResolveBooleanValue/test_disabledFlag-16 2998599 1815 ns/op 1072 B/op 13 allocs/op -BenchmarkResolveStringValue/test_staticStringFlag-16 4378830 1698 ns/op 1040 B/op 13 allocs/op -BenchmarkResolveStringValue/test_targetingStringFlag-16 849668 9165 ns/op 6097 B/op 89 allocs/op -BenchmarkResolveStringValue/test_staticObjectFlag-16 4560192 1363 ns/op 1008 B/op 11 allocs/op -BenchmarkResolveStringValue/test_missingFlag-16 5283511 1196 ns/op 784 B/op 12 allocs/op -BenchmarkResolveStringValue/test_disabledFlag-16 4393116 1446 ns/op 1072 B/op 13 allocs/op -BenchmarkResolveFloatValue/test:_staticFloatFlag-16 4264772 1501 ns/op 1024 B/op 13 allocs/op -BenchmarkResolveFloatValue/test:_targetingFloatFlag-16 776436 8191 ns/op 6081 B/op 89 allocs/op -BenchmarkResolveFloatValue/test:_staticObjectFlag-16 4685841 1285 ns/op 1008 B/op 11 allocs/op -BenchmarkResolveFloatValue/test:_missingFlag-16 5001636 1376 ns/op 784 B/op 12 allocs/op -BenchmarkResolveFloatValue/test:_disabledFlag-16 3707120 1897 ns/op 1072 B/op 13 allocs/op -BenchmarkResolveIntValue/test_staticIntFlag-16 3770362 1677 ns/op 1008 B/op 11 allocs/op -BenchmarkResolveIntValue/test_targetingNumberFlag-16 739861 11142 ns/op 6065 B/op 87 allocs/op -BenchmarkResolveIntValue/test_staticObjectFlag-16 4221418 1913 ns/op 1008 B/op 11 allocs/op -BenchmarkResolveIntValue/test_missingFlag-16 4289516 1488 ns/op 768 B/op 12 allocs/op -BenchmarkResolveIntValue/test_disabledFlag-16 4027533 1829 ns/op 1072 B/op 13 allocs/op -BenchmarkResolveObjectValue/test_staticObjectFlag-16 1588855 3880 ns/op 2243 B/op 37 allocs/op -BenchmarkResolveObjectValue/test_targetingObjectFlag-16 562364 11580 ns/op 7283 B/op 109 allocs/op -BenchmarkResolveObjectValue/test_staticBoolFlag-16 5026976 1791 ns/op 1008 B/op 11 allocs/op -BenchmarkResolveObjectValue/test_missingFlag-16 4254043 1553 ns/op 784 B/op 12 allocs/op -BenchmarkResolveObjectValue/test_disabledFlag-16 3051976 2250 ns/op 1072 B/op 13 allocs/op +BenchmarkFractionalEvaluation/test_a@faas.com-16 478593 14527 ns/op 7467 B/op 143 allocs/op +BenchmarkFractionalEvaluation/test_b@faas.com-16 429560 14728 ns/op 7467 B/op 143 allocs/op +BenchmarkFractionalEvaluation/test_c@faas.com-16 574078 14230 ns/op 7467 B/op 143 allocs/op +BenchmarkFractionalEvaluation/test_d@faas.com-16 411690 15296 ns/op 7467 B/op 143 allocs/op +BenchmarkResolveBooleanValue/test_staticBoolFlag-16 4133443 1973 ns/op 960 B/op 18 allocs/op +BenchmarkResolveBooleanValue/test_targetingBoolFlag-16 822934 10981 ns/op 6033 B/op 94 allocs/op +BenchmarkResolveBooleanValue/test_staticObjectFlag-16 3955728 1964 ns/op 976 B/op 18 allocs/op +BenchmarkResolveBooleanValue/test_missingFlag-16 3068791 2294 ns/op 1064 B/op 21 allocs/op +BenchmarkResolveBooleanValue/test_disabledFlag-16 3500334 2225 ns/op 1024 B/op 20 allocs/op +BenchmarkResolveStringValue/test_staticStringFlag-16 3935048 1781 ns/op 1008 B/op 20 allocs/op +BenchmarkResolveStringValue/test_targetingStringFlag-16 770565 10765 ns/op 6065 B/op 96 allocs/op +BenchmarkResolveStringValue/test_staticObjectFlag-16 3896060 2084 ns/op 976 B/op 18 allocs/op +BenchmarkResolveStringValue/test_missingFlag-16 3103950 2141 ns/op 1064 B/op 21 allocs/op +BenchmarkResolveStringValue/test_disabledFlag-16 3717013 2092 ns/op 1024 B/op 20 allocs/op +BenchmarkResolveFloatValue/test:_staticFloatFlag-16 3971438 2003 ns/op 976 B/op 20 allocs/op +BenchmarkResolveFloatValue/test:_targetingFloatFlag-16 782996 10153 ns/op 6049 B/op 96 allocs/op +BenchmarkResolveFloatValue/test:_staticObjectFlag-16 3469644 2115 ns/op 976 B/op 18 allocs/op +BenchmarkResolveFloatValue/test:_missingFlag-16 3376167 2157 ns/op 1064 B/op 21 allocs/op +BenchmarkResolveFloatValue/test:_disabledFlag-16 3610095 2032 ns/op 1024 B/op 20 allocs/op +BenchmarkResolveIntValue/test_staticIntFlag-16 3883299 1941 ns/op 960 B/op 18 allocs/op +BenchmarkResolveIntValue/test_targetingNumberFlag-16 823038 10725 ns/op 6033 B/op 94 allocs/op +BenchmarkResolveIntValue/test_staticObjectFlag-16 3697454 2028 ns/op 976 B/op 18 allocs/op +BenchmarkResolveIntValue/test_missingFlag-16 3326895 1986 ns/op 1048 B/op 21 allocs/op +BenchmarkResolveIntValue/test_disabledFlag-16 3327046 2142 ns/op 1024 B/op 20 allocs/op +BenchmarkResolveObjectValue/test_staticObjectFlag-16 1534627 4885 ns/op 2211 B/op 44 allocs/op +BenchmarkResolveObjectValue/test_targetingObjectFlag-16 509614 14640 ns/op 7251 B/op 116 allocs/op +BenchmarkResolveObjectValue/test_staticBoolFlag-16 3871867 1978 ns/op 960 B/op 18 allocs/op +BenchmarkResolveObjectValue/test_missingFlag-16 3484065 2080 ns/op 1064 B/op 21 allocs/op +BenchmarkResolveObjectValue/test_disabledFlag-16 4013230 2158 ns/op 1024 B/op 20 allocs/op PASS -ok github.com/open-feature/flagd/core/pkg/evaluator 233.286s +ok github.com/open-feature/flagd/core/pkg/evaluator 261.212s ? github.com/open-feature/flagd/core/pkg/evaluator/mock [no test files] PASS -ok github.com/open-feature/flagd/core/pkg/logger 0.003s +ok github.com/open-feature/flagd/core/pkg/logger 0.002s ? github.com/open-feature/flagd/core/pkg/model [no test files] ? github.com/open-feature/flagd/core/pkg/service [no test files] PASS -ok github.com/open-feature/flagd/core/pkg/service/ofrep 0.003s +ok github.com/open-feature/flagd/core/pkg/service/ofrep 0.002s PASS ok github.com/open-feature/flagd/core/pkg/store 0.002s ? github.com/open-feature/flagd/core/pkg/sync [no test files] @@ -51,9 +51,9 @@ PASS ok github.com/open-feature/flagd/core/pkg/sync/builder 0.020s ? github.com/open-feature/flagd/core/pkg/sync/builder/mock [no test files] PASS -ok github.com/open-feature/flagd/core/pkg/sync/file 1.007s +ok github.com/open-feature/flagd/core/pkg/sync/file 1.006s PASS -ok github.com/open-feature/flagd/core/pkg/sync/grpc 8.014s +ok github.com/open-feature/flagd/core/pkg/sync/grpc 8.013s PASS ok github.com/open-feature/flagd/core/pkg/sync/grpc/credentials 0.004s ? github.com/open-feature/flagd/core/pkg/sync/grpc/credentials/mock [no test files] @@ -61,10 +61,10 @@ ok github.com/open-feature/flagd/core/pkg/sync/grpc/credentials 0.004s PASS ok github.com/open-feature/flagd/core/pkg/sync/grpc/nameresolvers 0.002s PASS -ok github.com/open-feature/flagd/core/pkg/sync/http 4.008s +ok github.com/open-feature/flagd/core/pkg/sync/http 4.007s ? github.com/open-feature/flagd/core/pkg/sync/http/mock [no test files] PASS -ok github.com/open-feature/flagd/core/pkg/sync/kubernetes 0.015s +ok github.com/open-feature/flagd/core/pkg/sync/kubernetes 0.016s ? github.com/open-feature/flagd/core/pkg/sync/testing [no test files] PASS ok github.com/open-feature/flagd/core/pkg/telemetry 0.016s ``` </details> --------- Signed-off-by: Todd Baert <todd.baert@dynatrace.com>

The ADR proposes a different approach to solve the problems stated in #1634.
It's quite concise and simple at the point, assuming the reviewers already have the context from the original discussions.
Will add more details after the initial feedback and inputs.