Conversation
Signed-off-by: clemensv <clemensv@microsoft.com>
schemaregistry/schemaregistry.yaml
Outdated
| type: integer | ||
| tags: | ||
| - 'schemas' | ||
| /schemagroups/{group-name}/schemas/{schema-name}/versions/{version-number}: |
There was a problem hiding this comment.
There might be a case for metadata to live at the version level of a schema. For example, when the schema was created, who created it, etc. I wonder if we need separate endpoints for the metadata vs fetching the raw schema itself.
There was a problem hiding this comment.
I think those extra metadata items can be exposed through metadata annotations or through an OPTIONS call if they are not part of the schema itself.
schemaregistry/schemaregistry.md
Outdated
| - Type: `Integer` | ||
| - Description: The version of the schema. This is a simple counter and tracks | ||
| the version in the scope of this schema within the schema group. The schema | ||
| document MAY indicate a schema that follows a different versioning scheme. |
There was a problem hiding this comment.
How do we envision this working exactly? Does this just mean implementations might follow a scheme that isn't simply a monotonically increasing Integer? This wording feels like we're opening it up for implementations to diverge from the spec without being specific about how they should do that.
There was a problem hiding this comment.
This means that the schema document itself can use semver or some other "embedded" versioning notion while the API goes strictly by order or when changes have been added.
There was a problem hiding this comment.
Understood, but wouldn't semver (or other versioning schemes) imply a different data type?
There was a problem hiding this comment.
I was wondering the same.
schemaregistry/schemaregistry.md
Outdated
|
|
||
| For simple scenarios, the API allows for version management to be automatic and | ||
| transparent. Whenever a schema is updated, a new version number is assigned and | ||
| prior schema versions are retained. The latest available schema is always the |
There was a problem hiding this comment.
seems there should be a MUST in here about "no version == latest"
| document MAY indicate a schema that follows a different versioning scheme. | ||
| - Constraints: | ||
| - REQUIRED | ||
| - Assigned by server. |
There was a problem hiding this comment.
I think I'm on the same track as Ryan, I'm wondering if we need to allow for the author to decide the version string so they can choose a simple int or a semvar pattern.
There was a problem hiding this comment.
The goal of the server-assigned integer is for versioning not to complicate the API model. With automatic numbers, you can make all updates a plain POST on the schema URI and you can enforce the compatibility rules you want using service-side logic as the update happens.
Introducing breaking changes should really require the pain of a wholly new schema with its own backcompat versioning sequence.
Semver 1.x, 2.x, 3.x is really better captured by
/schemagroups/myapp/schemas/foo.1/versions/{n}
/schemagroups/myapp/schemas/foo.2/versions/{n}
/schemagroups/myapp/schemas/foo.3/versions/{n}
than by
/schemagroups/myapp/schemas/foo/versions/1.1
/schemagroups/myapp/schemas/foo/versions/2.2
/schemagroups/myapp/schemas/foo/versions/3.0
Those things under foo are not the same if they don't describe structurally compatible data.
There was a problem hiding this comment.
By doing this you're basically asking the impl to be a document management system. Would it be so bad if we left that up to the author of the schema files and they could pretty much pick any URL pattern they wanted (within their permissions/scope)? Meaning, if they PUT over an existing one then it updates it. If they PUT to a new URL them they're creating a new one. Then the impl doesn't need any versioning at all, no saving of history, etc.. that's left up to systems that are built for that kind of thing and the results are pushed here for sharing/viewing.
There was a problem hiding this comment.
Given the ability and suggestion to semver the dataschema or schema id, complex versioning schemes within a schema seem less necessary. A server-side check to prevent footguns should be possible even without versions, and this API doesn't actually prevent replacing a specified version with an incompatible document at the same URL (delete the existing version and/or schema, then re-create a schema with the same name and re-insert schemas until the version lines up, but the content doesn't).
There was a problem hiding this comment.
Just discovered this project. Evaluating whether we could use this instead of writing our own schema registry. We want to be able to use semver to represent versions for a schema, where foo v1.y can be used as-is instead of foo v1.x (for y >= x), but foo v1.x and foo v2 are incompatible representations of the same data (and then maybe there'll be a converter registered in our system, etc.).
It's not clear to me which direction the conversation is leaning, but it'd be really lovely if it was possible to control the version id assigned to a new version of a schema. Otherwise, I mean, we can always add a proxy in front that remaps foo/versions/x.y to foo.x/versions/y, I guess. But it would be simpler if we could just control the version id to be created as we want. This is a very localized configuration point, too.
There was a problem hiding this comment.
One interesting thing I was thinking about with respect to semver support for versioning, is that compatibility checking could perhaps be customized with a different configuration based on a real understanding of semver. So if a client tries to update the content of a schema that is currently at version 2.7.3, uploading content to new version 2.7.4 then that is a patch version release and the compatibility checker can be very strict. But if the client does the same thing to new version 3.0.0 then the compatibility checker can e.g. allow breaking changes.
Just a thought. :)
There was a problem hiding this comment.
@chrish42 @EricWittmann the model here allows for a "trivial" case where the version is just a server assigned counter similar to how a GitHub commit identifier is a server assigned identifier for an entry in the sequential commit log. That does also not have deeper meaning. (The commit id is not a plain counter for different reasons). If you want to do something more sophisticated, you can always manage schema versions explicitly as "myschema:v1.1" and "myschema:v1.2" as separate schemas within the group and if you want to have the "latest" functionality, you also maintain a "myschema" that always returns the latest version. An implementation behind the protocol could easily provide that.
schemaregistry/schemaregistry.md
Outdated
|
|
||
| `/schemagroups/{group-id}/schemas/{schema-id}/versions/{version}` | ||
|
|
||
| The name of the first segment of the path is a suggestion and MAY differ between |
There was a problem hiding this comment.
How do you see the query for all group getting interop w/o agreement on the first one?
There was a problem hiding this comment.
"schemagroups" is really part of the path to the registry itself; everything defined here sits under that path. the segment could even be empty or could be multiple segments. I don't see there being interop issues, because you would always have it being at the root of the URI.
There was a problem hiding this comment.
ah, I didn't realize you meant for "schemagroups" to be an impl choice - we might want to make clearer throughout then entire doc
There was a problem hiding this comment.
Also update the OpenAPI to remove /schemagroups from paths. Would make it easier to generate servers/clients from the OpenAPI at some future point.
gunnarmorling
left a comment
There was a problem hiding this comment.
Really welcoming this initiative! I like where it's going; put in a few comments.
schemaregistry/schemaregistry.md
Outdated
| - Type: `Integer` | ||
| - Description: The version of the schema. This is a simple counter and tracks | ||
| the version in the scope of this schema within the schema group. The schema | ||
| document MAY indicate a schema that follows a different versioning scheme. |
There was a problem hiding this comment.
I was wondering the same.
| description: Schema group already exists | ||
| tags: | ||
| - 'groups' | ||
| delete: |
There was a problem hiding this comment.
Should DELETE actually be allowed? Did you consider a "decomission" option or "soft delete" alternatively? That'd e.g. prohibit to produce new events referencing such schema taken out of business, but existing events could still be decoded.
There was a problem hiding this comment.
Yes, in general deletes are problematic and should IMO be either discouraged or prevented. Immutability has been a useful property in our registry for schema versions - we can deprecate and disable versions. And I wish we had done the same for the entire schema (deprecate/disable) rather than allow deletes.
schemaregistry/schemaregistry.md
Outdated
| > 2) Since the above strategy is truly RESTful, but quite esoteric if you've not | ||
| > grown up as a RESTafarian, the alternative strategy for concurrently | ||
| > handling multiple schema formats is much simpler: Constrain each schema | ||
| > group to a single format. |
There was a problem hiding this comment.
This seems simple and pragmatic. Does it make sense to start with this approach?
There was a problem hiding this comment.
What about constraining on a per-schema basis (giving each schema a "format" or "type"). The reason here is that I can imagine a logical group of schemas that aren't all the same technology. Especially if this were ever to expand beyond schemas and into e.g. API Designs as well - I would want to have a group that included perhaps multiple OpenAPI documents as well as some JSON Schemas... and my OpenAPI would likely have $refs to the JSON Schemas.
There was a problem hiding this comment.
@EricWittmann I will add that as an XOR option, i.e. you can either define the formats at the group or schema level. If you define it at the group level, that is binding, meaning you can't override.
schemaregistry/schemaregistry.md
Outdated
| - Constraints: | ||
| - REQUIRED | ||
| - MUST be a non-empty string | ||
| - MUST conform with RFC3986/3.3 `segment-nz-nc` syntax |
There was a problem hiding this comment.
Would it be more familiar to specify this as a hostname / reg-name in section 3.2.2 ("Host"), rather than a path segment (which is slightly more permissive)?
There was a problem hiding this comment.
I think segment-nz-nc better supports some common ways you might want to name your groups to indicate a hierarchy where none exists. Or an Organization + Project format - that sort of thing. I'm thinking things like how a lot of NPM packages are now being named...
| A schema version is a document. The "body" of a schema version MAY be a text | ||
| document or binary stream. An implementation SHOULD validate whether a | ||
| schema version is valid according to the rules of its format, for instance | ||
| whether it is a valid Avro schema document when the format is Apache Avro. |
There was a problem hiding this comment.
How do recipients know what type of content to expect in the body? Is this based on the datacontenttype in the received message (plus some sort of lookup table to map datacontenttype to format)?
If so, making format the same value as datacontenttype would simplify the API.
There was a problem hiding this comment.
@EricWittmann "SHOULD" leaves that up to what you think is right.
schemaregistry/schemaregistry.yaml
Outdated
| - 'groups' | ||
| put: | ||
| summary: Create schema group | ||
| description: Create schema group with specified format format in registry namespace. |
There was a problem hiding this comment.
| description: Create schema group with specified format format in registry namespace. | |
| description: Create schema group with specified format in registry namespace. |
| operationId: getLatestSchema | ||
| responses: | ||
| '200': | ||
| $ref: '#/components/responses/SchemaBytePayloadResponse' |
There was a problem hiding this comment.
Do you want to include the ID that was served in this response, so that clients can retrieve the value again on subsequent calls?
| document MAY indicate a schema that follows a different versioning scheme. | ||
| - Constraints: | ||
| - REQUIRED | ||
| - Assigned by server. |
There was a problem hiding this comment.
Given the ability and suggestion to semver the dataschema or schema id, complex versioning schemes within a schema seem less necessary. A server-side check to prevent footguns should be possible even without versions, and this API doesn't actually prevent replacing a specified version with an incompatible document at the same URL (delete the existing version and/or schema, then re-create a schema with the same name and re-insert schemas until the version lines up, but the content doesn't).
schemaregistry/schemaregistry.md
Outdated
| @@ -0,0 +1,257 @@ | |||
| # CNCF Schema Registry API Version 0.1-rc01s | |||
There was a problem hiding this comment.
maybe "CNCF Schema Registry API - wip" since it's not an 'rc' yet.
| This section further describes the elements enumerated in the introduction. | ||
|
|
||
| ### 2.1. Schema Group | ||
|
|
There was a problem hiding this comment.
Just wanted to add that groups is a great concept that we don't (yet) have in Apicurio Registry, but may have been a mistake to not include. It's important I think to organize these things into groupings. I guess I'm just saying +1 to this concept from me. :)
| This specification does not define management constructs for such access control | ||
| rules. |
| A newer schema version might introduce breaking changes or it might only | ||
| introduce careful changes that preserve compatibility. These strategies are not | ||
| subject of this specification, but the API provides a conflict handling | ||
| mechanism that allows an implementation to reject updates that do not comply | ||
| with a compatibility policy, if one has been implemented. |
There was a problem hiding this comment.
Schema evolution is a pretty significant concept in a schema registry. I haven't seen how the spec facilitates configuring the compatibility (or validity) rules. This paragraph mentions a compatibility policy, but if that's mentioned anywhere else I missed it. :(
This might be an important enough feature to include in the spec. For Apicurio Registry we have the concept of "rules" that can be configured globally or per-artifact (in this spec I imagine rules could also be configured at the schemagroup level). Right now we have only two rules: Validity and Compatibility. But perhaps "rule" is a concept to consider?
There was a problem hiding this comment.
@EricWittmann I believe this is an important feature of the registry, but not an important feature of the protocol. At the protocol level, a policy violation bubbles up as a plain conflict. If we were designing an implementation and the management API for that implementation, I would agree.
| described data structure. All documents coexisting within the same version | ||
| SHOULD describe the exact same data structure. | ||
|
|
||
| ### 2.2.2. Schema attributes |
There was a problem hiding this comment.
It would be useful to keep some other meta-data for each artifact, particularly if anyone eve wants to create a Registry UI. Name, description, labels, creationTime, modifiedTime, etc.
| A schema version is a document. The "body" of a schema version MAY be a text | ||
| document or binary stream. An implementation SHOULD validate whether a | ||
| schema version is valid according to the rules of its format, for instance | ||
| whether it is a valid Avro schema document when the format is Apache Avro. |
There was a problem hiding this comment.
Can validity be disabled or configured (e.g. syntax vs. semantic validity)? Treating Validity and Compatibility in similar ways has proven useful in Apicurio Registry.
schemaregistry/schemaregistry.md
Outdated
| - Type: `String` | ||
| - Description: | ||
| - Constraints: | ||
| - OPTIONAL. Can be used if and only if not format has been set for the schema |
There was a problem hiding this comment.
Should be: if and only if format has not been set for the schema
Signed-off-by: clemensv <clemensv@microsoft.com>
2254b64 to
c33abad
Compare
| retention policy, but implementations MAY retire and remove outdated schema | ||
| versions. | ||
|
|
||
| The latest available schema is always the default version that is retrieved when |
There was a problem hiding this comment.
Might want to add a MUST in here someplace....
When the URL to a schema is used without a version string, the implementation MUST return the latest version of that schema.
perhaps?
| - Description: Instant when the schema was added to the registry. | ||
| - Constraints: | ||
| - OPTIONAL | ||
| - Assigned by the server. |
There was a problem hiding this comment.
My OCD is kicking in... sometimes you have periods, sometime you don't on bulleted lists :-) can we choose one? I'd prefer no periods.
There was a problem hiding this comment.
I am referring to the CE spec for the data types now. Do you think we need to copy them?
There was a problem hiding this comment.
I was just referring to the lack of consistency on the bulleted lists - some ending in a period and some not. I don't have an opinion on copying vs referencing the data types
| - Description: Instant when the schema was added to the registry. | ||
| - Constraints: | ||
| - OPTIONAL | ||
| - Assigned by the server. |
There was a problem hiding this comment.
as above, I think we need a MUST here specifying the format/syntax - or define "Timestamp" in some kind of "data types" section
schemaregistry/schemaregistry.md
Outdated
| schema version is valid according to the rules of its format, for instance | ||
| whether it is a valid Avro schema document when the format is Apache Avro. | ||
|
|
||
| Within the scope of the schema set, the version is identified by the combination |
There was a problem hiding this comment.
use of the word "set" here might confuse people since it's new. Did you mean "group" or "set of versions for one particular schema" ?
schemaregistry/schemaregistry.md
Outdated
|
|
||
| Within the scope of the schema set, the version is identified by the combination | ||
| of a version number and an optional format identifier. The schema version MAY | ||
| also have an additional, optional unique identifier within the scope of the |
There was a problem hiding this comment.
Can you elaborate on this last sentence? Do you mean they can add extensions that are unique identifier values? If so, why did you call this out? Just curious.
There was a problem hiding this comment.
Any concrete schema document may also have a unique identifier for itself. There is either a path /group/xyz/schemas/abc/versions/1 or you could just address that exact doc with its ID. We want that to enable a URL shortener function.
| - 1 | ||
| - 2 | ||
|
|
||
| #### id |
There was a problem hiding this comment.
How do you see this being used by a consumer of the registry?
There was a problem hiding this comment.
aka.ms/s/{id} URL shortener option for greedy protocols.
There was a problem hiding this comment.
Why isn't that just part of the URL shortener function's logic? E.g. tinyurl.com doesn't ask for an id - I just give it a full URL
schemaregistry/schemaregistry.md
Outdated
|
|
||
| These dependencies are reflected in the path structure: | ||
|
|
||
| `[/schemagroups]/{group-id}/schemas/{schema-id}/versions/{version}` |
There was a problem hiding this comment.
Not sure if it matters, but the use of the word id got me wondering if it should be name instead. While the semantics would be the same either way, all examples of these IDs appear to be more like human friendly names rather than IDs (e.g GUIDs). So while it's not totally human friendly (eg no spaces, etc...), if we expect people to use meaningful "words" and not "random chars", then perhaps name would help guide them in that direction.
Just a thought
There was a problem hiding this comment.
Do we need to add something here to indicate that the /versions/{version} part is optional?
[/schemagroups]/{group-id}/schemas/{schema-id}[/versions/{version}]
or text?
|
@clemensv since you haven't gotten to the REST API yet, let me ask.... do you see people being able to do a PUT to replace an existing version of the schema or will it always be required to be a POST and get a new version? I'm hoping they can do a PUT because I can image cases where people need to do non breaking changes (e.g. fixing typos in comments in the schema) and don't want to necessarily expose the equivalent of all "git commits" to their users. |
|
@duglin Speaking for myself, for deployments of this, I would be more concerned with reproducibility than the ability of fix typos in comments, etc. So unless there's some kind of "pre-commit hook" to check that a PUT does not affect the schema, I'd much rather everything go through POST and create a new version number. |
|
@chrish42 sure - and as the owner of that schema you should have that choice... meaning you could always use a POST. But should we then ban someone from using a PUT if they want to hide their typos (or anything else they want to remove from the old versions)? hmm I guess this might be less of a concern if the spec supports DELETE , but it's not in the list of ops for a version yet. So, imagine a rogue employee uploading something bad on the way out the door and no way to fix it. |
|
@duglin I'm building a data system where reproducibility is a top concern. That doesn't work it you leave that up to the owner of each schema in the system. I understand this may not be as strong a concern for everyone, so that might need to be made optional. But for me, I need a way to deploy this that is "append-only" at the schema registry level. A "retract" feature could be useful in this mode (hide a schema version from LIST operations, but continue to allow access to it for operations that fetch that exact version). But that can probably be wait until after the first release. Anyways, I'll let other chime in here. |
|
@chrish42 gotcha! Thanks for the insight! |
|
Yeah we've found that immutability has worked better than allowing DELETE and PUT for versions - once a version is added it can't be removed or modified. We have a state that can be 'deprecated' or 'disabled' to handle cases where a version shouldn't be used, or shouldn't be listed. It would be awesome if the spec at least allowed for that concept/mode. |
Signed-off-by: Clemens Vasters <clemensv@microsoft.com>
|
Approved on 6/18 call |
|
hi @chrish42, good question. I think this spec (and subscriptions) should follow the same path as discovery. Meaning, people should review it, open issues/PRs for changes, and then start to implement it to expose gaps. We've already started down this path for discovery, but not for subscriptions yet. I think people are just overloaded so it might take a while for the ball to really get rolling. |
|
@duglin So, just to be sure I understand, the CloudEvents project itself is not working on an implementation of this spec, but member companies (maybe Microsoft given that the initiator @clemensv works there) potentially are writing their own implementation of this spec. Correct? I'm trying to figure out how best to learn about implementations of this as they appear, and maybe collaborate on one if it pops up early enough and our needs are aligned. Hence the questions. Thanks! |
|
Hello all, I really like the approach to standardize on this big missing topic (on an enterprise level). May I propose to include an explicit metadata resource to additionally allow schema labels. So you can use a selector on the list operations (like K8s is doing it in a very flexible way: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors). /schemagroups/mygroup/schemas/schemas/{schema-id}/metadata This way anyone could just query So my interest is really on the "label selection" part (to group in common taxonomies) but there might be more common things to additionally put in a metadata resource (nevertheless you probably could "misuse" to put any additional information also as key/values). |
* Registry baseline Signed-off-by: clemensv <clemensv@microsoft.com> * first pass incorporating feedback in the narrative Signed-off-by: clemensv <clemensv@microsoft.com> * Inc. feedback and adding non-normative desc HTTP Signed-off-by: Clemens Vasters <clemensv@microsoft.com> Co-authored-by: clemensv <clemensv@microsoft.com>
|
Hi. I don't see this spec in the master branch anymore. Did it move to a different repo or project? |
|
It's there - see: https://github.com/cloudevents/spec/tree/master/schemaregistry |
Signed-off-by: clemensv clemensv@microsoft.com
For initial review. I'm still updating both documents including changing some names, but the combination of OpenAPI doc and the spec doc should already tell a fairly complete story.
Microsoft proposal for #610