Skip to content

Porter - return meaningful error if there are not enough Ursulas#2772

Merged
derekpierre merged 9 commits intonucypher:mainfrom
piotr-roslaniec:handle-missing-ursulas
Aug 26, 2021
Merged

Porter - return meaningful error if there are not enough Ursulas#2772
derekpierre merged 9 commits intonucypher:mainfrom
piotr-roslaniec:handle-missing-ursulas

Conversation

@piotr-roslaniec
Copy link
Contributor

@piotr-roslaniec piotr-roslaniec commented Aug 13, 2021

Type of PR:

  • Bugfix
  • Feature
  • Documentation
  • Other

Required reviews:

  • 1
  • 2
  • 3

What this does:

  • Currently, if there are not enough Ursulas in reservoir, Porter will return 0 total failures recorded error from concurrency.py
  • This PR checks for the number of Ursulas before contacting them; if the number of Ursulas is too low, it returns a meaningful error message to the user

Notes for reviewers:

  • Found this to be useful while testing on Rinkeby

@derekpierre
Copy link
Member

Interesting - block_until_number_of_known_nodes_is() typically handles that case:

Ran the following on Lynx.

In [2]: payload = {'quantity': 20, 'duration_periods': 5}

In [4]: r = requests.get("http://localhost:9155/get_ursulas", json=payload)

In [5]: r
Out[5]: <Response [500]>

In [6]: r.text
Out[6]: "After 10 seconds and 30 rounds, didn't find 20 nodes"

How did you test it? It may be related to Ibex being a bit of a shit show for nodes.

@derekpierre
Copy link
Member

derekpierre commented Aug 13, 2021

I see why. On Ibex:

In [16]: payload = {'quantity': 20, 'duration_periods': 1000}

In [17]: r = requests.get("http://localhost:9155/get_ursulas", json=payload)

In [18]: r
Out[18]: <Response [500]>

In [19]: r.text
Out[19]: '0 total failures recorded'

The WorkerPool used by Porter will return either TimedOut or OutOfValues when block_until_target_successes() is called - https://github.com/nucypher/nucypher/blob/main/nucypher/utilities/concurrency.py#L233.

We probably have a better shot of more generally catching those exceptions instead, and raising something more informative. Analogous situation when publishing the treasure map - https://github.com/nucypher/nucypher/blob/main/nucypher/utilities/porter/porter.py#L129.

@piotr-roslaniec
Copy link
Contributor Author

@derekpierre I see, let me rewrite this. Btw. there are like 2 good Ursulas on Ibex

@derekpierre
Copy link
Member

derekpierre commented Aug 17, 2021

I had a few thoughts after thinking about this some more:

  1. The messaging from the WorkerPool could be improved
  2. The WorkerPool is used as part of other calls within these functions that aren't caught by this change such as _make_staker_reservoir() and block_until_number_of_known_nodes_is() so the exceptions of the pool can still creep into the response from Porter.
  3. I'm concerned about returning RuntimeError from the Porter learner class, since technically it can also be called via the Python API and returning RuntimeError is faulty in that case i.e. not really a runtime error scenario if get_ursulas can't return the expected number of ursulas or it times out. Potentially we could return a PorterException (and/or subclasses of PorterException) that wraps any underlying execution exception that needs to be raised by Porter or something like that. However, when I originally wrote Porter I didn't want to create a PorterException because I wanted similar exceptions that were thrown by individual characters Alice/Bob to be also thrown by Porter on the python side and RPC side.

Played around with modifying the WorkerPool messaging which I think was too vague to begin with for all uses of it (not just Porter) and this should help the original problem you had. See the PR I created to merge into this branch - piotr-roslaniec#1. Interested to get your thoughts.

@derekpierre
Copy link
Member

^ @KPrasch , @vepkenez , @fjarri feel free to weigh in as well.

@derekpierre
Copy link
Member

derekpierre commented Aug 18, 2021

@piotr-roslaniec I fixed the failing concurrency tests due to my changes. Could you add a newsfragment?

Since both of us have changes in this PR, let's get others to review.

@derekpierre derekpierre added Porter ux design User experience enhancements labels Aug 18, 2021
@piotr-roslaniec
Copy link
Contributor Author

I liked the changes to the WorkerPool and I think it's a step in the right direction for error handling in Porter.

I was wondering how much Porter API consumers should know about the reasons behind API call failures. For example (correct me if I'm wrong) the response will now contain WorkerPool error messages that are fairly vague to the caller ("something timed out", "something failed"). Is it the "right amount" of information? We could provide more ("Urulas timed-out"), or less (just an HTTP error code, 404).

I prefer the latter (less complexity for the caller), with the assumptions that Porter API calls are "robust" or "idempotent" (API caller can just repeat the call without unintended consequences).

Maybe I am overthinking this. Any thoughts?

@derekpierre
Copy link
Member

I was wondering how much Porter API consumers should know about the reasons behind API call failures. For example (correct me if I'm wrong) the response will now contain WorkerPool error messages that are fairly vague to the caller ("something timed out", "something failed"). Is it the "right amount" of information? We could provide more ("Urulas timed-out"), or less (just an HTTP error code, 404).

I prefer the latter (less complexity for the caller), with the assumptions that Porter API calls are "robust" or "idempotent" (API caller can just repeat the call without unintended consequences).

I hear ya. It's a tough balance - you want to return something to the client in the error case, that provides some context but doesn't overwhelm them. The "timed out" case is simple and in most cases users will understand and trying again suffices, so saying "Timed out" type message is probably sufficient. The "out of values" case is trickier because retrying doesn't necessarily mean it will work - depends on the reason - but we don't want the entire stack trace which the WorkerPool was doing previously.

The current Porter API functions are idempotent, so retrying does not have unintended consequences, it's just whether you can identify the reason, and do something about it. It will also help us when people come to us and say Porter returned some status code, and we have no idea what to tell them because there is only a status code. Remember that clients that use a Porter service may not have access to Porter's logs because the service is being run by a separate entity.

@derekpierre derekpierre force-pushed the handle-missing-ursulas branch from 2314092 to ba3ad64 Compare August 20, 2021 18:40
@derekpierre
Copy link
Member

@KPrasch , @fjarri , @vepkenez: ready for review. We can be even more specific for Porter, but all REST endpoints will have to possibly deal with WorkerPool exceptions so just generalizing it for now. Thoughts?

piotr-roslaniec and others added 4 commits August 25, 2021 19:52
…eptions raised. If callers need more context they can use the get_tracebacks() function on the exception to obtain all execution failure tracebacks.
…response message - ensure WorkerPoolExceptions provide access to relevant data.
@derekpierre derekpierre force-pushed the handle-missing-ursulas branch from ba3ad64 to 2620d55 Compare August 26, 2021 00:02
@derekpierre derekpierre requested a review from fjarri August 26, 2021 13:00
@derekpierre derekpierre merged commit bcd6071 into nucypher:main Aug 26, 2021
derekpierre added a commit to derekpierre/nucypher that referenced this pull request Aug 30, 2021
@piotr-roslaniec piotr-roslaniec deleted the handle-missing-ursulas branch September 2, 2021 09:04
derekpierre added a commit to derekpierre/nucypher that referenced this pull request Sep 8, 2021
derekpierre added a commit to derekpierre/nucypher that referenced this pull request Sep 9, 2021
derekpierre added a commit to derekpierre/nucypher that referenced this pull request Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ux design User experience enhancements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants