Skip to content

Feature: Asynchronous validators#565

Closed
rmedaer wants to merge 20 commits intopython-jsonschema:masterfrom
rmedaer:master
Closed

Feature: Asynchronous validators#565
rmedaer wants to merge 20 commits intopython-jsonschema:masterfrom
rmedaer:master

Conversation

@rmedaer
Copy link

@rmedaer rmedaer commented May 23, 2019

This (WIP) pull-request contains a new feature ! It allows user to write (custom) asynchronous validators.
It has been already discussed in #499 . It is here an attempt to integrate the mechanism without having to rewrite the whole library.

@Julian
Copy link
Member

Julian commented May 23, 2019

Hi! Thanks so much for trying to tackle this. I need to spend some time reading through exactly what you're trying to make work, but I can say from a first look that this as is looks like a backwards incompatible change -- so at least, there's that to work through.

E.g., existing validators, that previously only dealt with exceptions, now might be being passed through your new sentinel objects.

Will try to have a closer look though.

@rmedaer
Copy link
Author

rmedaer commented May 24, 2019

Thanks so much for trying to tackle this.

You're welcome ! ;-)

(...) is looks like a backwards incompatible change

I'm not sure there is incompatible change. Here is an explanation of the way it works. I hope it will help you (or anyone) to understand the mechanism and how to use it...

My idea is to use a kind of generator-based coroutine. It's basically a way to pause/break code execution using the yield keyword. Some frameworks (like Tornado) had to use this mechanism when async/await keywords were not available (< 3.5). This kind of code interruption is exactly what I'm looking for in the validation process to use with asynchronous validator.

Hopefully in @Julian's jsonschema implementation, the validation is a big generator which iterates recursively on each schema keyword.
Thanks to this implementation, I "only" have to implement my own Future/coroutine object which will be yielded when we detect an asynchronous validator. This object contains the asynchronous validator to call with its arguments.
Obviously we don't want to forward this object to the user as ValidationError. Instead we intercept it in async_iter_errors which is running asynchronously the validator in current async context.
Although if user is calling iter_errors, this object is not intercepted. It has to be handled as a SchemaError.
The limitation I detected for now is the "error buffering" (aka not transversal). Meaning somewhere in the code where the error are not directly forwarded to the user code. I detected and fixed 2 validators where I have to forward error if and only if it's a AsyncValidationError.

Here are illustrations of code flow with asynchronous validator compared to synchronous one:

Synchronous validator with async_iter_errors in between:
async_validation (1)

Asynchronous validator flow:
async_validation (2)

@Julian
Copy link
Member

Julian commented Jun 4, 2019

Thanks! That diagram is helpful. Apologies for still not getting a chance to look through this more carefully then, but trust me I haven't forgotten it.

@rmedaer
Copy link
Author

rmedaer commented Oct 18, 2019

Hi Julian,

Did you have some times to review this merge request ? Is there anything I can do to help you ?

Kind regards,

@Julian
Copy link
Member

Julian commented Oct 20, 2019 via email

@Julian
Copy link
Member

Julian commented Nov 3, 2019

So I'm finally getting at least a moment to even read through this -- first most basic question -- I see you implemented it for anyOf -- is your initial goal specifically for "compound" validators? i.e. anyOf, allOf, oneOf?

And you're trying to get the event loop a chance to run in between each of the branches?

Or is there some other reason anyOf is special here?

@rmedaer
Copy link
Author

rmedaer commented Nov 4, 2019

So I'm finally getting at least a moment to even read through this -- first most basic question -- I see you implemented it for anyOf -- is your initial goal specifically for "compound" validators? i.e. anyOf, allOf, oneOf?

I had to do changes for "compound" validators because of following kind of code branching:

errs = list(validator.descend(instance, subschema, schema_path=index))
if not errs:
    break

Indeed, if an AsyncValidationBreakpoint is yielded by children (aka descendant) validators, I have first to forward it before testing there is actually an error...

And you're trying to get the event loop a chance to run in between each of the branches?

It's exactly what I'm doing with the AsyncValidationBreakpoint mechanism. When a validator is asynchronous, it raises an exception AsyncValidationBreakpoint which ascends up to async_iter_errors function.

I also implemented it for oneOf.

For allOf: not any change is required since all the errors are just yielded

@rmedaer
Copy link
Author

rmedaer commented Nov 29, 2019

Hi @Julian,

Do you need more details for this pull request ? Can I help you in any way ?

@Julian
Copy link
Member

Julian commented Nov 30, 2019

Hi @rmedaer -- really sorry this keeps dragging, but still have too many things on my plate -- the next short term goal for jsonschema is still to get support for the new draft out [2019-09] (which is by now no longer new :/)

I hope after that I can finally think about whether I agree with the approach here or not.

Sorry again... too many things, too little time :(

@Julian
Copy link
Member

Julian commented Nov 30, 2019

The one other thing that occurs to me immediately though is -- async_iter_errors should be fairly easy to implement externally as a normal function to start, no?

If the only long operation you're interrupting here is anyOf and oneOf, you'll spend longer in those two methods than you might want to, but that may be ok depending on your async app.

Which is part of what makes it not immediately obvious to me that this is the correct approach -- it doesn't actually get us very much async -- e.g., there still wouldn't be an easy way to inject in asyncronous ref resolution.

The only real internal async thing this would do is let event loops run in between iterations of any/oneOf, which doesn't seem like a huge win if you don't have the other stuff over just running the event loop between iterations of iter_errors.

(All of which I meant to elaborate on at some point but yeah see above on not enough time :( :( )

@Julian
Copy link
Member

Julian commented Jan 16, 2020

@rmedaer any thoughts?

@rmedaer
Copy link
Author

rmedaer commented Jan 17, 2020

Hi @Julian

The one other thing that occurs to me immediately though is -- async_iter_errors should be fairly easy to implement externally as a normal function to start, no?

Not sure I fully understand. What do you mean by "fairly easy to implement externally as a normal function to start" ?

Would you mean that we should have a equivalent to validate function for async_iter_errors ?
Or would you mean that we should "externalize" async_iter_errors in another file/project/repo ?

If the only long operation you're interrupting here is anyOf and oneOf, you'll spend longer in those two methods than you might want to, but that may be ok depending on your async app.

Actually I'm not interrupting anyOf and oneOf more than other validator. anyOf and oneOf are just compound validators which needed some refactoring to allow AsyncValidationBreakpoint interruptions.

(...) it doesn't actually get us very much async -- e.g., there still wouldn't be an easy way to inject in asyncronous ref resolution.

I can implement asynchronous ref resolution (and others) if needed. This PR is a first step in async validation. Btw, if we decide to go to async (and lazy-loaded) ref resolution, we will need this PR anyway.

The only real internal async thing this would do is let event loops run in between iterations of any/oneOf

Apologize if I misspoke. With this PR you can use async "anywhere" with "any validator". Not only between iterations of any/oneOf.

For instance:

async def async_validator(validator, value, instance, schema):
    await asyncio.sleep(0)
    if not value:
        yield exceptions.ValidationError(u"Async whoops!")

all_validators = dict(validators.Draft7Validator.VALIDATORS)
all_validators['async_valid'] = async_validator
validator = validators.create(
    meta_schema=validators.Draft7Validator.META_SCHEMA,
    validators=all_validators,
)({})

instance = {}
schema = {
    {
        "type": "object",
        "properties": {
            "example": {
                u"async_valid": True
            }
        }
    },
}
errors = [e async for e in validator.async_iter_errors(instance, schema)]

rmedaer added 7 commits March 4, 2020 08:24
Implementing mechanism explained in
python-jsonschema#499 (comment)
to introduce async validators.

I refactored bufferized validator such as `anyOf` and `oneOf` to pipe
AsyncValidationBreakpoint until async_iter_errors.

If iter_errors is called (instead of async_iter_errors) and async
validator is called, it will raise AsyncValidationBreakpoint which is a
SchemaError.
@codecov
Copy link

codecov bot commented Mar 4, 2020

Codecov Report

Merging #565 into master will decrease coverage by 0.1%.
The diff coverage is 91.54%.

@@            Coverage Diff             @@
##           master     #565      +/-   ##
==========================================
- Coverage   94.98%   94.87%   -0.11%     
==========================================
  Files          18       19       +1     
  Lines        2433     2498      +65     
  Branches      309      321      +12     
==========================================
+ Hits         2311     2370      +59     
- Misses        101      103       +2     
- Partials       21       25       +4

@rmedaer
Copy link
Author

rmedaer commented Mar 23, 2020

Hi @Julian ,

I try to fix the tests, however in list of interpreters there is still py2. Is it a mistake ? Otherwise how would you like to organize tests to run some of them only from minimum python version (3.6 in my case) ?

@rmedaer rmedaer marked this pull request as ready for review March 23, 2020 16:29
@Julian
Copy link
Member

Julian commented Mar 25, 2020

@rmedaer will try to make some time in the next week or two to give this another pass -- thanks truly again for being on top of it.

On the py2 thing specifically, jsonschema still supports Py2 (at least versions of Py2 that are still supported, i.e. PyPy) -- this will change at some point but no timeline yet. From what I saw I think what you did looked reasonable, but will have a look.

@rmedaer
Copy link
Author

rmedaer commented May 6, 2020

Hi @Julian ,

will try to make some time in the next week or two to give this another pass

Did you had some time to a new pass ?

@Julian
Copy link
Member

Julian commented May 21, 2020

Hey @rmedaer thanks for being so on top of this... I did, but honestly it hasn't yet sat fully well with me as-is given that it handles only the validator side and not the ref resolution side. I still don't really have a full idea of what to do about that yet. Would love to see comments from any other users certainly who want this or otherwise have an opinion.

@Julian Julian closed this Oct 20, 2020
@Julian Julian deleted the branch python-jsonschema:master October 20, 2020 16:37
Julian added a commit that referenced this pull request Jun 30, 2022
1047a1aa Merge pull request #565 from json-schema-org/update-test-schema
7967a87b Merge pull request #566 from json-schema-org/tidy-unevaluatedItems
b45a7878 Remove unnecessary type validation from unevaluatedItems tests.
2b536c02 Update the test schema to Draft 2020, and fix a bug in it.

git-subtree-dir: json
git-subtree-split: 1047a1aa225bab43c50978555a990bfff81f6023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants