Add reply_schema to command json files (internal for now) #10273

guybe7 · 2022-02-09T17:23:34Z

Work in progress towards implementing a reply schema as part of COMMAND DOCS, see #9845
Since ironing the details of the reply schema of each and every command can take a long time, we would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch.
Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build.

Background

In #9656 we add a lot of information about Redis commands, but we are missing information about the replies

Motivation

Documentation. This is the primary goal.
It should be possible, based on the output of COMMAND, to be able to generate client code in typed languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like.
We would like to build a fuzzer that verifies the reply structure (for now we use the existing testsuite, see the "Testing" section)

Schema

The idea is to supply some sort of schema for the various replies of each command.
The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3.
Note that the reply structure itself may change, depending on the arguments (e.g. XINFO STREAM, with and without the FULL modifier)
We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema.

Example for BZPOPMIN:

"reply_schema": {
    "oneOf": [
        {
            "description": "Timeout reached and no elements were popped.",
            "type": "null"
        },
        {
            "description": "The keyname, popped member, and its score.",
            "type": "array",
            "minItems": 3,
            "maxItems": 3,
            "items": [
                {
                    "description": "Keyname",
                    "type": "string"
                },
                {
                    "description": "Member",
                    "type": "string"
                },
                {
                    "description": "Score",
                    "type": "number"
                }
            ]
        }
    ]
}

Notes

It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI, where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one.
The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply schema for documentation (and possibly to create a fuzzer that validates the replies)
For documentation, the description field will include an explanation of the scenario in which the reply is sent, including any relation to arguments. for example, for ZRANGE's two schemas we will need to state that one is with WITHSCORES and the other is without.
For documentation, there will be another optional field "notes" in which we will add a short description of the representation in RESP2, in case it's not trivial (RESP3's ZRANGE's nested array vs. RESP2's flat array, for example)

Given the above:

We can generate the "return" section of all commands in redis-doc (given that "description" and "notes" are comprehensive enough)
We can generate a client in a strongly typed language (but the return type could be a conceptual union and the caller needs to know which schema is relevant). see the section below for RESP2 support.
We can create a fuzzer for RESP3.

Limitations (because we are using the standard json-schema)

The problem is that Redis' replies are more diverse than what the json format allows. This means that, when we convert the reply to a json (in order to validate the schema against it), we lose information (see the "Testing" section below).
The other option would have been to extend the standard json-schema (and json format) to include stuff like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that seemed like too much work, so we decided to compromise.

Examples:

We cannot tell the difference between an "array" and a "set"
We cannot tell the difference between simple-string and bulk-string
we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the case of two identical members with different scores (e.g. [["m1",6],["m1",7]]) because uniqueItems compares (member,score) tuples and not just the member name.

Testing

This commit includes some changes inside Redis in order to verify the schemas (existing and future ones) are indeed correct (i.e. describe the actual response of Redis).
To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands it executed and their replies.
For that, Redis needs to be compiled with -DLOG_REQ_RES and run with --reg-res-logfile <file> --client-default-resp 3 (the testsuite already does that if you run it with --log-req-res --force-resp3)
You should run the testsuite with the above args (and --dont-clean) in order to make Redis generate .reqres files (same dir as the stdout files) which contain request-response pairs.
These files are later on processed by ./utils/req-res-log-validator.py which does:

Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c)
For each request-response pair, it validates the response against the request's reply_schema (obtained from the extended COMMAND DOCS)
In order to get good coverage of the Redis commands, and all their different replies, we chose to use the existing redis test suite, rather than attempt to write a fuzzer.

Notes about RESP2

We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to accept RESP3 as the future RESP)
Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3 so that we can validate it, we will need to know how to convert the actual reply to the one expected.
- number and boolean are always strings in RESP2 so the conversion is easy
- objects (maps) are always a flat array in RESP2
- others (nested array in RESP3's ZRANGE and others) will need some special per-command handling (so the client will not be totally auto-generated)

Example for ZRANGE:

"reply_schema": {
    "anyOf": [
        {
            "description": "A list of member elements",
            "type": "array",
            "uniqueItems": true,
            "items": {
                "type": "string"
            }
        },
        {
            "description": "Members and their scores. Returned in case `WITHSCORES` was used.",
            "notes": "In RESP2 this is returned as a flat array",
            "type": "array",
            "uniqueItems": true,
            "items": {
                "type": "array",
                "minItems": 2,
                "maxItems": 2,
                "items": [
                    {
                        "description": "Member",
                        "type": "string"
                    },
                    {
                        "description": "Score",
                        "type": "number"
                    }
                ]
            }
        }
    ]
}

Other changes

Some tests that behave differently depending on the RESP are now being tested for both RESP, regardless of the special log-req-res mode ("Pub/Sub PING" for example)
Update the history field of CLIENT LIST
Added basic tests for commands that were not covered at all by the testsuite

TODO

(maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g. when SET return NULL, the condition is arguments.get||arguments.condition, for OK the condition is !arguments.get, and for string the condition is arguments.get - Add a "condition" field to anyOf/oneOf schemas in the commands' reply schema #11896
(maybe a different PR) also run runtest-cluster in the req-res logging mode
add the new tests to GH actions (i.e. compile with -DLOG_REQ_RES, run the tests, and run the validator)
(maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output of the tests - Warn about test-suite commands coverage for reply schema validation #11897
(probably a separate PR) add all missing schemas
check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res
move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to fight with the tcl including mechanism a bit)
issue: module API - Add support for reply schema validation run in module api unit #11898
(probably a separate PR): improve schemas: add required to objects - Add support in required properties concept for reply schema #11899

src/server.c

oranagra

LGTM.
please add a better top comment before we call others to take a look.
it should include some description of the purpose and design / decisions we already took.
you can maybe copy some from this comment.

also, in order to better judge this, it would be nice to look at what the user facing documentation is gonna look like, but maybe a detailed top comment can fill that place for now.

please also add some todo list which states the status of the task and the remaining issues (like completing the data for all commands, providing module API, and writing docs.)

src/commands/hgetall.json

src/commands/xautoclaim.json

oranagra · 2022-04-24T06:22:46Z

@redis/core-team please have a look and comment.
The design is now complete, and the "coding" for now covers only part of the command set (the ones that are important in order to judge this feature and run into all the complicated cases).
i'd like to get a conceptual approval for that, and then merge it into unstable after 7.0 is stable, and before starting to merge any command 7.2 related changes into unstable.

valkum · 2022-07-17T17:00:19Z

Just my 2 cents from a first glance. Overall I think this will help to generate type safe interfaces for client libraries.
The only things I would nag about are:

Variants

It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI, where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one.

To get a clean high-level API I would generate one method for each command (maybe this would be considered a medium-level API)) and if a subset of arguments causes a specific reply schema to be used, I would create a method for that too (this would be the high-level API).
If I am only interested in one kind of reply schema variant I most certainly will always pass Option::None or null to the arguments that are not needed for this reply schema variant. So for a very ergonomic high level interface a single method for each of those argument => response variant combinations would be desired.
The questions is how one could resemble these dependencies between arguments and reply schema variants.

Getting to my favorite example: CLIENT KILL
Here we have a clear mapping of old-format vs new-format (now a top-level oneof argument type) to one of the two reply schema variants.
Basically we have two options: annotate the arguments or annotate the reply schema top level oneOf variants.

While annotating the argument for the CLIENT KILL command seems to be the easier solution as you can simply say e.g. "yields: variant a of response_schema top level oneOf", this could get messy if the dependency is not as clear as with CLIENT KILL. E.g. there is no top level oneof argument type.

Thus, I guess augmenting the reply schema would be the better (more general) solution as you can directly annotate which arguments in which combination yield the specific variant.
There are a couple of open questions though. How would you address the specific arguments (they currently do not have unique identifier). What is the most powerful argument dependency for a response variant (something that is not a simple "if argument x is passed, this is the result").

I mean in the end we (as in client library authors) can aggregate these information ourselves and feed that into the code generator but I think having this upstream would be a benefit to all client lib generators.
Maybe, to not block this PR, a new issue to discuss different solutions for this problem is desired?

Unique ids

Another thing I would like to see would be $ids for the different responses. This could help with associating generated types to resources in the reply schema (Putting that in the docs of that type and maybe use this to derive a name for the data type)

oranagra · 2022-07-20T06:51:39Z

I agree we better solve this mapping of arguments to reply types.
i think it could be nice to give a unique name (ID) to both args and reply types, but i fear that, specifically for the args it could be a lot of work, so maybe it should be optional and we'l add it only to the ones needed.
The IDs for the reply type could just be indexes (like we have for key-specs), since they're not nested.
What worries me is that it's hard to add a reference on each and every arg or combination as to what reply type is generates. maybe we should have some way to define a default and a list of exceptions?

oranagra · 2023-03-08T20:02:52Z

i see we have an issue with 32bit build and one of the new GEO tests.
the reclaim and freebsd failures are probably unrelated.
waiting for valgrind but i don't expect anything to fail there.

tests/unit/latency-monitor.tcl

fix geo test

revert coverage test for LATENCY GRAPH

guybe7 · 2023-03-09T12:48:42Z

with fix to GEO on 32bit: https://github.com/guybe7/redis/actions/runs/4374580524

oranagra · 2023-03-09T15:09:08Z

@redis/core-team FYI, i would like to merge this one tomorrow.. let me know if you have any concerns.

madolson · 2023-03-09T21:45:35Z

I have no concerns, the actual change to the core seems small.

soloestoy · 2023-03-10T07:22:29Z

didn't CR, only a question about the test part: does it validate the key_specs and other sections?

guybe7 · 2023-03-10T10:19:24Z

@soloestoy no, it only validates the reply schema
validating the reply schemas is kinda easy, because the SOT is the existing Redis replies, so we just have to adjust the schemas according to them (the replies themselves don't rely on the reply-schema)
validating the key specs is a bit harder because we don't have an independent SOT (COMMAND GETKEYS relies on the key-specs)

oranagra · 2023-03-10T12:22:28Z

We can log key lookups of each command and then try to match these. But it's a topic for a different campaign..

Redis build runs `utils/generate-command-code.py` if there is a change in `src/commands/*.json` files. In #10273, we used f-string format in this script. f-string feature was introduced in python3.6. If a system has an earlier python version, build might fail. Added some changes to make that script compatible with earlier python versions.

The reason is in reply-schemas-validator, the resp of the client we create will be client_default_resp (currently 3): ``` client *createClient(connection *conn) { client *c = zmalloc(sizeof(client)); #ifdef LOG_REQ_RES reqresReset(c, 0); c->resp = server.client_default_resp; #else c->resp = 2; #endif } ``` But current_resp3 in redis-cli will be inconsistent with it, the test adds a simple hello 3 to avoid this failure, test was added in redis#11873. Added help descriptions for log-req-res, force-resp3 and dont-pre-clean options, options was added in redis#10273

The reason is in reply-schemas-validator, the resp of the client we create will be client_default_resp (currently 3): ``` client *createClient(connection *conn) { client *c = zmalloc(sizeof(client)); #ifdef LOG_REQ_RES reqresReset(c, 0); c->resp = server.client_default_resp; #else c->resp = 2; #endif } ``` But current_resp3 in redis-cli will be inconsistent with it, the test adds a simple hello 3 to avoid this failure, test was added in #11873. Added help descriptions for dont-pre-clean option, it was added in #10273

Work in progress towards implementing a reply schema as part of COMMAND DOCS, see redis#9845 Since ironing the details of the reply schema of each and every command can take a long time, we would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch. Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build. ### Background In redis#9656 we add a lot of information about Redis commands, but we are missing information about the replies ### Motivation 1. Documentation. This is the primary goal. 2. It should be possible, based on the output of COMMAND, to be able to generate client code in typed languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like. 3. We would like to build a fuzzer that verifies the reply structure (for now we use the existing testsuite, see the "Testing" section) ### Schema The idea is to supply some sort of schema for the various replies of each command. The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3. Note that the reply structure itself may change, depending on the arguments (e.g. `XINFO STREAM`, with and without the `FULL` modifier) We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema. Example for `BZPOPMIN`: ``` "reply_schema": { "oneOf": [ { "description": "Timeout reached and no elements were popped.", "type": "null" }, { "description": "The keyname, popped member, and its score.", "type": "array", "minItems": 3, "maxItems": 3, "items": [ { "description": "Keyname", "type": "string" }, { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } ] } ``` #### Notes 1. It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI, where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one. 2. The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply schema for documentation (and possibly to create a fuzzer that validates the replies) 3. For documentation, the description field will include an explanation of the scenario in which the reply is sent, including any relation to arguments. for example, for `ZRANGE`'s two schemas we will need to state that one is with `WITHSCORES` and the other is without. 4. For documentation, there will be another optional field "notes" in which we will add a short description of the representation in RESP2, in case it's not trivial (RESP3's `ZRANGE`'s nested array vs. RESP2's flat array, for example) Given the above: 1. We can generate the "return" section of all commands in [redis-doc](https://redis.io/commands/) (given that "description" and "notes" are comprehensive enough) 2. We can generate a client in a strongly typed language (but the return type could be a conceptual `union` and the caller needs to know which schema is relevant). see the section below for RESP2 support. 3. We can create a fuzzer for RESP3. ### Limitations (because we are using the standard json-schema) The problem is that Redis' replies are more diverse than what the json format allows. This means that, when we convert the reply to a json (in order to validate the schema against it), we lose information (see the "Testing" section below). The other option would have been to extend the standard json-schema (and json format) to include stuff like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that seemed like too much work, so we decided to compromise. Examples: 1. We cannot tell the difference between an "array" and a "set" 2. We cannot tell the difference between simple-string and bulk-string 3. we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the case of two identical members with different scores (e.g. `[["m1",6],["m1",7]]`) because `uniqueItems` compares (member,score) tuples and not just the member name. ### Testing This commit includes some changes inside Redis in order to verify the schemas (existing and future ones) are indeed correct (i.e. describe the actual response of Redis). To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands it executed and their replies. For that, Redis needs to be compiled with `-DLOG_REQ_RES` and run with `--reg-res-logfile <file> --client-default-resp 3` (the testsuite already does that if you run it with `--log-req-res --force-resp3`) You should run the testsuite with the above args (and `--dont-clean`) in order to make Redis generate `.reqres` files (same dir as the `stdout` files) which contain request-response pairs. These files are later on processed by `./utils/req-res-log-validator.py` which does: 1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c) 2. For each request-response pair, it validates the response against the request's reply_schema (obtained from the extended COMMAND DOCS) 5. In order to get good coverage of the Redis commands, and all their different replies, we chose to use the existing redis test suite, rather than attempt to write a fuzzer. #### Notes about RESP2 1. We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to accept RESP3 as the future RESP) 2. Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3 so that we can validate it, we will need to know how to convert the actual reply to the one expected. - number and boolean are always strings in RESP2 so the conversion is easy - objects (maps) are always a flat array in RESP2 - others (nested array in RESP3's `ZRANGE` and others) will need some special per-command handling (so the client will not be totally auto-generated) Example for ZRANGE: ``` "reply_schema": { "anyOf": [ { "description": "A list of member elements", "type": "array", "uniqueItems": true, "items": { "type": "string" } }, { "description": "Members and their scores. Returned in case `WITHSCORES` was used.", "notes": "In RESP2 this is returned as a flat array", "type": "array", "uniqueItems": true, "items": { "type": "array", "minItems": 2, "maxItems": 2, "items": [ { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } } ] } ``` ### Other changes 1. Some tests that behave differently depending on the RESP are now being tested for both RESP, regardless of the special log-req-res mode ("Pub/Sub PING" for example) 2. Update the history field of CLIENT LIST 3. Added basic tests for commands that were not covered at all by the testsuite ### TODO - [x] (maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g. when `SET` return NULL, the condition is `arguments.get||arguments.condition`, for `OK` the condition is `!arguments.get`, and for `string` the condition is `arguments.get` - redis#11896 - [x] (maybe a different PR) also run `runtest-cluster` in the req-res logging mode - [x] add the new tests to GH actions (i.e. compile with `-DLOG_REQ_RES`, run the tests, and run the validator) - [x] (maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output of the tests - redis#11897 - [x] (probably a separate PR) add all missing schemas - [x] check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res - [x] move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to fight with the tcl including mechanism a bit) - [x] issue: module API - redis#11898 - [x] (probably a separate PR): improve schemas: add `required` to `object`s - redis#11899 Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Hanna Fadida <hanna.fadida@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Shaya Potter <shaya@redislabs.com>

enjoy-binbin · 2024-01-24T07:31:23Z

do we need to add reply_schema for command.json? (the same reply as COMMAND INFO)

oranagra · 2024-01-24T07:58:54Z

i guess that technically we do. but:

it's a huge pile of lines to clone
i hate that COMMAND command (the fact it has sub-commands but also works as a plain command), and wish we could someday deprecate it.
maybe the fact that we don't need it (yet, for the tests), we can hold this back for now?

@guybe7 maybe you remember some other reason why we didn't do that?

guybe7 commented Feb 9, 2022

View reviewed changes

src/server.c Outdated Show resolved Hide resolved

guybe7 force-pushed the reply_schema branch from ce215d5 to e1afd9a Compare February 17, 2022 08:03

guybe7 added 2 commits March 30, 2022 14:47

WIP command reply schema

3e57657

fixes

c03a876

guybe7 force-pushed the reply_schema branch from e1afd9a to c03a876 Compare March 30, 2022 13:51

guybe7 added 4 commits March 30, 2022 16:35

cont

3b1db14

cont 2

9e6d12b

cont 3

81c470b

commands that start with A

e5e3a1d

guybe7 requested review from itamarhaber and oranagra April 6, 2022 15:46

oranagra reviewed Apr 6, 2022

View reviewed changes

src/commands/hgetall.json Show resolved Hide resolved

src/commands/xautoclaim.json Show resolved Hide resolved

oranagra added the approval-needed Waiting for core team approval to be merged label Apr 26, 2022

oranagra mentioned this pull request Jul 7, 2022

Improvement of commands.json regarding return types and argument sets #10948

Open

valkum mentioned this pull request Jul 17, 2022

Add initial version of redis-codegen djc/redis-rs#1

Draft

guybe7 added 12 commits September 8, 2022 11:17

Merge remote-tracking branch 'origin/unstable' into reply_schema

723e04b

cont

64f8907

add validator py

2955ed9

cont

079ee42

cont

e8f672e

cont

49640a1

cont

f89cca2

Merge remote-tracking branch 'origin/unstable' into reply_schema

2b019e4

cont

c85396e

zset works

afe2fbd

cont

afd947d

cont

7abab53

enjoy-binbin reviewed Mar 9, 2023

View reviewed changes

tests/unit/latency-monitor.tcl Outdated Show resolved Hide resolved

fadidahanna and others added 4 commits March 9, 2023 13:03

fix geo test

454a08b

revert coverage test for LATENCY GRAPH

01543cc

Merge pull request #70 from fadidahanna/reply_schema_geo_test_fix

c7f9a76

fix geo test

Merge pull request #71 from fadidahanna/reply_schema_latency_tests

1a32256

revert coverage test for LATENCY GRAPH

fadidahanna mentioned this pull request Mar 9, 2023

Add a "condition" field to anyOf/oneOf schemas in the commands' reply schema #11896

Open

oranagra approved these changes Mar 11, 2023

View reviewed changes

oranagra merged commit 4ba47d2 into redis:unstable Mar 11, 2023

This was referenced Mar 15, 2023

Update debian version for Jepsen docker instances RedisLabs/redisraft#590

Closed

Use older string format to support earlier python versions #11920

Merged

enjoy-binbin mentioned this pull request Mar 20, 2023

Fix new subscribe mode test in reply-schemas-validator #11939

Merged

oranagra mentioned this pull request Mar 29, 2023

Reimplement cli hints based on command arg docs #10515

Merged

uglide mentioned this pull request Aug 8, 2023

Breaking API changes in RESP3 #12466

Closed

oranagra mentioned this pull request Aug 10, 2023

Add detailed RESP3 command responses redis/redis-doc#2512

Open

Add reply_schema to command json files (internal for now) #10273

Add reply_schema to command json files (internal for now) #10273

Uh oh!

Conversation

guybe7 commented Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Motivation

Schema

Notes

Limitations (because we are using the standard json-schema)

Testing

Notes about RESP2

Other changes

TODO

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

oranagra commented Apr 24, 2022

Uh oh!

valkum commented Jul 17, 2022

Variants

Unique ids

Uh oh!

oranagra commented Jul 20, 2022

Uh oh!

oranagra commented Mar 8, 2023

Uh oh!

Uh oh!

guybe7 commented Mar 9, 2023

Uh oh!

oranagra commented Mar 9, 2023

Uh oh!

madolson commented Mar 9, 2023

Uh oh!

soloestoy commented Mar 10, 2023

Uh oh!

guybe7 commented Mar 10, 2023

Uh oh!

oranagra commented Mar 10, 2023

Uh oh!

enjoy-binbin commented Jan 24, 2024

Uh oh!

oranagra commented Jan 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

guybe7 commented Feb 9, 2022 •

edited

Loading