Use default/dynamic active staker pagination size and standardize Porter execution timeouts by derekpierre · Pull Request #2732 · nucypher/nucypher

derekpierre · 2021-06-30T18:22:23Z

Type of PR:

Bugfix
Feature
Documentation
Other

Required reviews:

1
2
3

What this does:
Based over #2721 (Initial Porter Documentation), so only the latest commit is additional work.

Default/Dynamic Pagination Size

When attempting to get active stakers from StakingEscrow as part of sampling logic, this eth_call can be quite heavy due to network size, and can sometimes timeout

In [51]: r = requests.get("http://127.0.0.1:9155/get_ursulas", json=payload)

In [52]: r
Out[52]: <Response [500]>

In [53]: r.text
Out[53]: "{'code': -32000, 'message': 'execution aborted (timeout = 5s)'}"

This causes Porter to sometimes fail when attempting to sample ursulas. The timeout occurs intermittently, probably because of some caching done by the underlying eth node. In my testing, I've been using Infura which uses Geth. Geth has a hard-coded timeout of 5s for contract calls - https://github.com/ethereum/go-ethereum/blob/master/internal/ethapi/api.go#L950.

I assume that since Porter and Alice use the same code, and since Alice doesn't specify a pagination size either, Alice will encounter this same problem. So instead of just fixing it for Porter, I made a generalized default solution.

By default, the pagination size used is 0 (unlimited) if no pagination size is specified which is currently the case for both Porter and Alice when creating the staker reservoir. Therefore the call attempts to obtain all active stakers, which is a heavy call and therefore sometimes times out. Instead, I propose that we use a default size if no pagination value (i.e. pagination_size is None) is provided to the API - 0 will still hold if explicitly provided as the pagination size.

The value of this size is tricky and can vary by node types (geth etc.), and whether the node is a light node or not. Consequently, the pagination size for regular and light nodes have default values that can be overridden via environment variables (1000 for regular node, 30 for light node). The environment variable provides flexibility for us not to have to determine the "correct" value for a variety of scenarios.

Additionally, to provide some dynamism, if there is a failure in getting the active stakers based on the magnitude of the pagination size used, the size is reduced by half and then retried - process repeated for a maximum of 3 total attempts.

For example, if the pagination size value used is 1000

try pagination size of 1000 (attempt 1.)
if 1000 fails, retry with pagination size 500 (attempt 2.)
if 500 fails, retry with pagination size 250 (attempt 3.)

if attempt 3 fails, then raise failure exception. Note that this sequence will be attempted for any non-zero pagination size provided, including defaults.

Porter Standardized Execution Timeout

Standardized execution timeout used by Porter. All executions that take a timeout will use this default timeout (10s). It felt weird some executions using different values than others.

Issues fixed/closed:
Related to #1424.

Notes for reviewers:

Thoughts about the environment variable approach?
Thoughts about how dynamic to make the retries and pagination size? - I went for simplicity
Thoughts on Porter default execution timeout, as opposed to using default timeout values where available.

…kingEscrow, and standardize porter execution timeouts.

KPrasch · 2021-07-03T23:09:12Z

nucypher/blockchain/eth/agents.py

+                                                                              start_index,
+                                                                              pagination_size).call()
+                except Exception as e:
+                    if 'timeout = 5s' not in str(e):


This seems a bit fragile, since it depends on the exception containing this string exactly. Not an RFC, just commentary.

I hear ya - feels like we can do one of the following:

simply check for the word "timeout" instead of including the length (it is currently a geth setting that can't be set)
OR

don't perform any specific check on the text, and assume that any exception is caused by the timeout (i.e. remove the "if" check')
OR
...?

nucypher/blockchain/eth/agents.py

KPrasch · 2021-07-03T23:40:36Z

Thoughts about the environment variable approach?

Seems like a good approach for a minimal configuration. Let's not overthink it for now until we start getting some actual usage.

Thoughts about how dynamic to make the retries and pagination size? - I went for simplicity

Simplicity is the best principal at this stage of Porter's development 👍🏻

Thoughts on Porter default execution timeout, as opposed to using default timeout values where available.

Same as above. Use a high timeout by default, ans let usage dictate optimization.

Properly handle pagination sizes when getting active stakers from Sta…

779ec2e

…kingEscrow, and standardize porter execution timeouts.

derekpierre added Web Webpages Alice 👩 Effects the "Alice" development area ux design User experience enhancements labels Jun 30, 2021

derekpierre added this to the Porter v1 (MVP) milestone Jun 30, 2021

derekpierre requested review from KPrasch, cygnusv, piotr-roslaniec and vzotova June 30, 2021 18:22

derekpierre self-assigned this Jun 30, 2021

This was referenced Jul 2, 2021

[EPIC] Porter MVP - "Infura for NuCypher" #2664

Merged

Publish Porter docker image #2733

Merged

KPrasch changed the base branch from porter to main July 3, 2021 23:04

KPrasch changed the base branch from main to porter July 3, 2021 23:04

KPrasch reviewed Jul 3, 2021

View reviewed changes

nucypher/blockchain/eth/agents.py Show resolved Hide resolved

KPrasch approved these changes Jul 3, 2021

View reviewed changes

derekpierre requested a review from vepkenez July 3, 2021 23:45

vzotova approved these changes Jul 4, 2021

View reviewed changes

Make pagination exception check a little less specific.

d0a28f5

derekpierre merged commit 25c406f into nucypher:porter Jul 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use default/dynamic active staker pagination size and standardize Porter execution timeouts#2732

Use default/dynamic active staker pagination size and standardize Porter execution timeouts#2732
derekpierre merged 2 commits intonucypher:porterfrom
derekpierre:timeouts

derekpierre commented Jun 30, 2021 •

edited

Loading

Uh oh!

KPrasch Jul 3, 2021

Uh oh!

derekpierre Jul 3, 2021

Uh oh!

Uh oh!

KPrasch commented Jul 3, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

derekpierre commented Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Default/Dynamic Pagination Size

Porter Standardized Execution Timeout

Uh oh!

KPrasch Jul 3, 2021

Choose a reason for hiding this comment

Uh oh!

derekpierre Jul 3, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KPrasch commented Jul 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

derekpierre commented Jun 30, 2021 •

edited

Loading

KPrasch commented Jul 3, 2021 •

edited

Loading