-
Notifications
You must be signed in to change notification settings - Fork 651
Use JSON Schema to define schemas for our Controllers #759
Conversation
Conflicts: lib/class-wp-json-base-posts-controller.php
|
FWIW, we used to have a schema for a bunch of this: https://github.com/WP-API/WP-API/blob/04ce8e36c5180f3a51d4559cd203876f4e91143a/docs/schema.json Problem is, it never got updated. If we work that into the code, it'll help with that, but it's also a potential problem for developers as a barrier to entry, as it is quite complicated. |
plugin.php
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielbachhuber Appending an 's' will not always result in the plural form, or even make sense, for many other languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, should've marked that as @todo — it was a hack I used just to get it working, and should be a point of discussion today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the Doctrine/Inflector library would be helpful for pluralization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not such a fan of doing anything automatic here - if you register a post type through register_post_type, and you never explicitly declare the json route url, it doesn't really follow that the path to your CPT is something you never entered. Auto pluralisation is just a bit too magic for my liking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having some default behavior if you've set access_in_json => true, with the secondary ability to specify the slug if you want.
|
We should make sure that we allow adding fields without needing to register them, as clients need to handle this case for forward-compatibility already. Adding fields is harmless to the rest of the data, it's only changing or removing fields that should need to fail validation. That said, should be simple for developers to register their fields if they want to opt-in to the validation on their own as well. |
|
@justinshreve Any thoughts or feedback here? |
Conflicts: lib/class-wp-json-base-posts-controller.php
Core doesn't support `sticky` on other post types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WP-API/amigos hawtness!
|
I'll start some quick inline comments, with a more meaty discussion below. :)
Properly written clients need to deal with this possibility anyway. Blindly trusting received data could cause the client to crash if a proxy interferes (or if you're interacting with a different WP version).
I've programmed in statically typed languages (and am working on projects right now using them), so I understand the concern here. However, typing itself is built-in to JSON itself, so the schema isn't adding anything here. The schema declares fields as strings or numbers, but JSON natively encodes those. Enums are one of the "special" types that provide more data, but WP is extensible enough that these are practically useless (custom post types, custom post statuses). There are other types available here, such as datetimes, but these can be handled by pretty much any parser anyway.
Our central use case is the site's users, and everything follows from this. Both of these use cases (custom APIs and distributed APIs) need to be considered, but the obvious one to prioritise is distributed APIs. The long tail of sites contains the majority of users, so it's where we need to focus attention. My concern is more that having fluid schemas (as proposed here) is actually worse for this. We increase client complexity for (as far as I can tell) no real benefit. I think I may have miscommunicated with my comment, but @bkirby989 summarised it effectively:
I absolutely agree with having a fixed schema if we can, because it means you can do minimal validation on the client side (note: you still need at least some validation, but with a fixed schema, this can easily be provided by a library or other tool). My main issue is with having a schema that can be changed; that is, a schema generated by the server itself, rather than a "standard" schema that it conforms to. If the server is generating the schema, it doesn't help clients, because clients need to first parse the schema to understand what fields are available. Let's say I add a field, and the corresponding field is then added to the schema. How does this help the client? Yes, they can now check the schema to see "oh, this field is now there", but they could just check the response for that anyway. They don't gain meaningful information, apart from a human-readable string about the field. (No one dislikes documentation, mind you, the argument is that we can explore non-schema-related ways to handle this.) On the flip side, let's say a field is removed from the schema and hence from the response. Couldn't you just check whether that field exists on the response instead? The way I'd prefer us to handle the schema is similar to the HTML specification. It defines what you should expect to get back, as well as indicating what the parts inside it actually mean. However, it also provides a robust error-handling model to follow, which is a lesson learned the hard way (everyone loves browser compatibility). We would return requests saying "the response should be handled as per this schema", indicating that we believe it should conform. Clients still need to take responsibility for ensuring it does, but the way to handle failures is clearly defined. To those commenting on this issue: thanks for your feedback! Do keep in mind though that this pull request relates specifically to adding generated schemas returned by the API, which can be modified by plugins similarly to the response itself. |
|
Argument for WP-API schemas To make sure my points are appropriately communicated for tomorrow's conversation, here's a summary of what I'm proposing:
Why I'm proposing these points:
Note that querying collections of resources isn't covered in this proposal, but I suspect we can implement it as some derivative of the schema implementation. I'm not proposing these at this time, but they could be cool later:
It's important to note that no one has distributed an API to 23% of the web before. We're in uncharted territory to some degree, and we should keep that in mind as we discuss what's ideal. |
|
I do agree with what @danielbachhuber stated here:
There does need to be a way for custom content types to be discoverable by clients, and create them, I don't think anyone will disagree with that statement, even those who have voiced disagreement with a schema here. What I do disagree with, was what the original PR presented.
I do think that
|
The primary scenario that I've brought up in previous conversations: mobile clients (esp. the offical Automattic-developed apps) and a future WP-API-powered WP admin need a way to dynamically generate custom field input controls. Having a machine-readable schema makes this orders of magnitude easier and user-friendly. I spent four months back in 2012 looking at adding custom field support to the official Android WP app using the XML-RPC API. There's basically no way to do post meta editing in any reasonable manner without a schema; the Poster app made the best attempt to-date by allowing the end-user to manually configure their custom fields, but this doesn't scale across a team very well and is a hassle to users. You dismiss the value of enums and other simple enhancements over JSON's data types, but even small amounts of additional schema metadata allow a client to produce a more accurate and user-friendly interface. I don't understand the appeal of manually-written docs over a self-documenting API, but that's really a secondary benefit of schemas IMO. I wrote all of the XML-RPC API documentation on the Codex (including retroactively documenting the previous 8+ years of its existance) and I would never wish for a person to have to do the same for a new API. Sarah has done a great job, but it's not a scalable process and I can't imagine developers are ever going to do an adequate job publicly documenting their API customizations. |
One of the reasons for this is that core WP doesn't support field types. If/When WP core supports different types here, it would make sense to then expose this via the API, but that's an issue the metadata team was looking at. We don't want to bite off more than we can chew here.
My issue with enums specifically is that they can lie to the client. For example, saying that I'd love to give more information to clients, but it's important to never give information to clients that's straight up wrong. |
|
Just wanted to call out what I see as a difference of opinion in how this would be used/useful:
From my perspective, the virtue of this proposal is not a universe teeming with automagically-adapting clients; though schemas would be a pre-req for this, the singularity is still a ways off. 😃 Though clients that adapt to a range of expected circumstances (e.g. a mobile app for managing custom posts) would be possible, and pretty awesome, I don't see this as the common case. We certainly wouldn't want to burden developers with having to always query a schema before querying for an actual response. That's the notion of over-abstraction that nobody likes. Schemas and Custom API / Client Development As pointed out, the bulk of implementations are in the long-tail; to me that reads as one-off APIs and one-off clients. When those are inevitably developed by different parties at different times, schemas can be a godsend. The main imminently tangible virtue, expressed as a user-story, goes like this: is that as a client developer, when I want to integrate with WordPress sites I have a common, reliable way to discover a schema that is communicated in web-standard terms. The alternative of "checking the response" to see what an API does is effectively reverse-engineering, which is not the best way to develop an API client. We've all probably done it, but it's not the path to widespread adoption. Schemas and realistic automation Further, beyond the mobile-app for custom posts, there is one even more obvious, high value, and highly viable client type that actually could make programatic use of a schema immediately: a test. Having schemas makes mocking responses from an API a quick bit of work; ditto generating basic client test cases. Trying to backfill these things without the benefit of automation is no bueno. All that said, I'm certainly sensitive to what @rmccue is saying in terms of "this information could be wrong". The question there boils down to design tolerance. It needn't be 100% correct to provide a lot of value, just close enough, especially if the ways that it can be incorrect are understood and documented. Given my ignorance, I don't know how frequently a schema might be rendered inaccurate due to internals, but maybe this could become something of a standard practice? "Don't hack core" <---> "Don't break the schema"? (edited for spelling, grammar, clarity; will finish coffee before commenting next time) |
|
I think providing a schema is a good idea and does more good than harm. I've only consumed the current WP-API (1.1.1) with one off clients written specifically to handle one off scenarios on individual site. This is the kind of work that doesn't need a schema, though I don't think it would have gotten in the way. I imagine there are many other complex scenarios waiting to happen. To me, the real power of a new API for WordPress is in handling these scenarios, not necessarily in providing for my use case. To write a client that can be used or reused on many different sites, having a schema would mean less uncertainty overall. This example might be a stretch, but it's the one I've been using when thinking this through. 😄 If I have a client app that shows all items from all public post types that either have a post thumbnail or could be assigned one.
So, +1 for JSON schema from me. Auto generated API docs are a nice side benefit. As far as validation, etc... I'm still thinking through my thoughts on that. |
|
Update with where we left off in our discussion yesterday. Things we all (seemed to) agree upon:
Where there was a differing of opinion:
Where we ended up: After some respectful discussion, we decided to hit pause (for now) on the concept of a dynamic strict schema. Discussions can continue around this concept, but we should extract the non-controversial pieces of this PR and continue moving forward. |
|
Here's a new pull request for the non-controversial parts #820 Once that's merged, I'll put together a fully-functional PR for JSON Schema. I think part of the problem is that we're still debating a lot of hypotheticals — it will be easier to discuss when everyone can see the code. |
|
In my mind, the remaining open question is: Should WordPress support (and expose) dynamic schemas, or should the core schema be fixed, and any modifications to the schema sandboxed? I've updated the description to this effect. |
|
I'm late to the party, but since it seems that this conversation will continue in the future, I'll jot down some of my thoughts. I'm mostly responding to themes in this thread since I have the advantage of reading a month's worth of discussion before responding. Usability/User Experience I'm approaching this issue from the perspective of someone who has worked with many external JSON APIs, but surprisingly, has not worked with JSON schema. Thus, I've had to do research into the concept in order to inform my opinion. I think this is an important perspective as we will have a lot of devs approaching the JSON schema issue in our API from this same perspective. JSON schema is easy to grok from an implementation standpoint. With about 5-10 minutes of reading, I could easily understand it and felt confident in being able to implement it. With a solid API UX for updating our schema, I think that developers will not have a problem updating JSON schema for their bespoke needs. As we've seen in the WP community, an authoritative article on the subject with good placement in the community can easily train devs. I do not think that the increased complexity from a UX perspective is a good reason not to implement this schema. Documentation I really like the idea of using this for documenting the API for an individual site. The docs are accessible for developers reading code and via the API. Additionally, I think this is easier to standardize than PHPDoc. Individual developer use of DocBlocks is all over the place. PHP doesn't care how you implement them so it is hard to enforce them. If we have an API that is more forceful about the implementation, we can almost guarantee doc quality. I think it's a win/win from a docs standpoint. Dynamic vs. Static Schema I really like @maxcutler's comments about the two different use cases for the API (bespoke vs. network of JSON APIs). The API should be designed both both use cases, because, damn, that's powerful. My question is then, can we have both static and dynamic schema?. Having both helps for the two uses cases. Let's say I'm developing a hot new Blackberry WP app and I want to be able to connect to any WordPress site to read data. I want to make this easy, so I decide to only handle core content (e.g., posts, pages) via the API. Couldn't we have a canonical source for the core schema provided by WordPress.org? The app could rely on that source for verifying the JSON structure returned by individual blogs. This would help verify that a site has readable data in the format expected and my app could read from that blog. Now let's say @danielbachhuber is getting his Fusion on and designing an iPad app to read comics and only comics from Fusion's API. We need a way to extend the JSON schema to support this use case (although, in bespoke apps, the JSON schema may be of less utility) for custom content. Dynamic schema would help with this. Imagine if they then open sourced their comic server and client, having the schema would be extremely useful in developing an app that discovers the comic superpowers of a WordPress site that allows for aggregation in a single app. I think there are good reasons to have static and dynamic schemas and we could probably support both. Data Validation As I am just learning about JSON schema, I am really worried about numerous tutorials and references to using JSON schema to validate API data. When digesting data from an API, I usually go into tinfoil hat mode. An angry employee, a compromised network, a partial response, or a simple human error can all result in API data that can cause strange or even malicious behavior in a client. If the JSON schema is used as the source of validation for this data, I am not sure how your app could be secure given that it is susceptible to these same data integrity and authenticity issues. Even digesting these APIs over secure connections are susceptible to MitM attacks and all of the issues mentioned above. I just cannot envision a world where I could trust an API to explain the API to me in such a way that I would programmatically use it for validation. I would prefer to validate for my specific use case. Note that I'm not advocating that we don't use schema for discoverability or for suggestions about the conformity of data. I just do not think it should be used for the type of validation and escaping that one might normal do when constructing HTML documents. My main concern is about messaging around the JSON schema implementation. I hope that we are careful about how we talk about its security implications. Conclusion I am in favor of JSON schema for the WP API in general and think that it is worth the increased effort for building the API. I do not see how it hurts the API other than keeping it from being integrated into WP core because of the extra work needed to build the API. |
|
All for the basic ability for developers to extend/create query based API data discovery. The more accessible and easy the API is, the more traction it will generate. I urge everyone to grab a big coffee and watch this video: http://www.infoq.com/presentations/web-api-html. (it's for a HTML schema but still a good talk). ps. I have worked with two large API's recently (closed source commerce applications), both of which had poor documentation and were rigid. The biggest problem was they had non-existent data validation customization, this hurt the revenue stream and user experience a great deal. |
|
Thanks for the video, @wycks. We'll check it out. |
|
@tollmanz Not sure I follow re: data validation. In cases where JSON schemas are used to determine whether an API reponse contains what it says it should contain, it's all simple type checking (is this field present and is it an integer?) with no opportunities for arbitrary code execution or anything of the sort. You're not handing over the keys to your computer to the schema :-) Of course it's smart to layer your own validation on top of that, using the schema-based validation as just a preliminary sanity check. API response validation is really just one use-case. Another that I find more interesting is e.g. using a JSON schema on the client side for instant feedback when filling out a form in a custom web application built on top of this (or indeed in WordPress itself). In this case, too, the client side validation serves as user feedback but no more, the real validation should happen on the server where it can't be tampered with. Heck, not just validation, you can even use it for form generation: https://github.com/joshfire/jsonform and https://github.com/jdorn/json-editor |
|
I'm not a part of the WordPress community, but I've done a lot of API work in the past, so Daniel asked me if I could share some thoughts. Many people have shared examples of the power of using schemas: automatic documentation generation and even automatic API explorers (see e.g. some of the projects at http://raml.org/projects.html), shared client- and server-side input validation, a foundation to speed up the development of API clients and so on. It's DRY nirvana in that sense. And not a dream either, this is exactly how http://www.django-rest-framework.org/ and other REST frameworks do things today. On the other hand, don't underestimate the engineering effort. Not just in setting up the schemas and in providing a good way for other developers to extend those schemas or even provide their own ones (which, for non-content API endpoints, makes sense), but in making sure that all of those wonderful advantages that exist in theory (from validation to form generation to practically-for-free API clients) will also exist in practice. There might be more work on the tooling than you think. And if none of the supporting tools are there, you're really just asking people to do extra work for no return. So the big question, to me, is not a technical one and it's also not a question of which is the superior approach, but rather: to what extent is the broader WordPress community willing to support the better-but-harder approach? Just my two cents. |
|
Resource schemas dropped in #844 They're generally exposed for documentation purposes at We're still discussing how new fields should be registered, and whether data could be added to a response without registering appropriate schema #875 Schemas will also likely be used to perform basic request validation #871 |
|
Thanks for looping me in @danielbachhuber. Your points about CPTs being at the core use-case for API driven sites is key to me. That doesn't mean the majority of all WP sites, but it's bound to be "most" sites with enough dev effort put in to use the API. Then @maxcutler's point about two philosophies is notable. However, coming from a React & golang microservice world lately, I don't think those ends are distinct at all. A huge speed up on my recent project has come from the tools available in golang & React for generating and consuming Swagger docs. We can generate swagger.json dynamically from our build process and both philosophies are sped-up: Ph1 is sped through generation of both human-readable docs for reference (or copy-paste if devs prefer that approach). Ph2 is sped by having swagger.json on a consistently available path and pointing code generators at it. We could have maybe 60-80% of the models & handlers needed for an API consumer to work off Daniel's Fusion site in seconds. Plus, this level of discoverability opens up the possibility of Google or (ideally) new players to offer Internet-wide functionality searches, instead of just content searches. Validation seems like something best done in custom ways. It should be possible, but I can't think of too many instances where a site can trust JS validation end-to-end. Another benefit to doc generation not noted above that's valuable to maintaining a stable microservice platform -- Versioning. Versioned endpoints have been massively important, in my experience, to maintain the kind of flexibility Max's Ph1 aims for. This also address the easy vs. hard changes @rachelbaker summarized above. Without versioned endpoints an addition of one param to a verb that's not idempotent forces us to to update + test 2-3 UX projects and multiple dependent services. With them the old endpoint keeps working and new ones add filters and params as needed. No forced-refactors of the dependent a services or apps (barring changes to the data requirements like required fields that force old contract rules to expire). Plus, machine-readable doc formats might be discovered in a CI build using something like Dredd. Then you can test if new endpoint versions will regress other apps. Maybe the age of this ticket means more tooling needs to be built: A good PHP Swagger/API Blueprint/Slate parser and static doc sample would help get some of this discussion out of text and into code. But we could skip the larger amounts of implementation code until there's some boilerplate to reference. |
Publishing schemas for each item (e.g. Post, Term, Comment, User) represented by a controller will allow clients to better consume the API. Because our schema will be standardized, based on the existing JSON Schema, WordPress developers can make modification to the controllers without breaking clients. In fact, properly written clients will know how to interpret the modification, provide UI for it, etc.
Each controller's schema will be used to:
Validate request input.Ensure response data confirms to a schema.Schemas will be optional for custom endpoints, but enforced for core endpoints. In order for a WordPress developer to add or remove a field, they'll first need to modify the schema.
Update 1/31: For those just coming into the conversation, the entire thread has a good amount of history. At this time the large open question can be summarized as: should WordPress support (and expose) dynamic schemas, or should the core schema be fixed, and any modifications to the schema sandboxed?
See #718