Make update_execution() atomic by khushboobhatia01 · Pull Request #5358 · StackStorm/st2

khushboobhatia01 · 2021-09-14T05:22:28Z

Making update_execution() atomic to avoid concurrent updates causing inconsistency.

In the below screenshot two concurrent updates to the execution object resulted in inconsistent data where overall execution is set to running, but the log field depicts the execution had succeeded.

How did this happen?
Two interleaved update_execution() can cause this.

P1	P2
liveaction.status = running	liveaction.status = succeeded
execution.status = running	execution.status = running
if condition is not valid. Move forward.	if condition is not valid. Move forward.
if condition is not valid. No log is pushed	if condition is valid. `succeeded` log is pushed
	Action update happens
Action update happens

After the above interleaved execution, P1 will overwrite overall execution status (succeeded set by P2) to running.

CLAassistant · 2021-09-15T12:45:23Z

All committers have signed the CLA.

…1/st2 into bkhushboo/race_condition

cognifloyd

So basically, all you did was add one line to create the with ...get_lock(...): line, and then indent then block of code to protect it within the lock. And then add some more to the tests. Nice.

Why did adding 1 lock in 1 method increase the number of lock side effects so much in the tests?

cognifloyd · 2021-10-02T04:25:56Z

st2common/st2common/services/executions.py

-        with Timer(key="action.executions.calculate_result_size"):
-            result_size = len(
-                ActionExecutionDB.result._serialize_field_value(liveaction_db.result)
+    with coordination.get_coordinator().get_lock(liveaction_db.id):


Could we add a prefix to this to scope the change just to this block? Or are there other places already using this the live action id as the lock name that you also want to block?

@cognifloyd No, get_lock() is not being used anywhere else.
And to answer why tests are failing, https://github.com/StackStorm/st2/blob/master/st2actions/tests/unit/test_workflow_engine.py#L155-L159 mocks get_lock() to return ToozConnectionError (to test

st2/st2common/st2common/services/workflows.py

Line 941 in 007beed

with coord_svc.get_coordinator(start_heart=True).get_lock(wf_ex_id):

).

Because of the above mocking, action execution update calls will fail with my changes and cause the test assertions to fail. I'm working on the fix for this.

khushboobhatia01 · 2021-10-05T07:01:23Z

@cognifloyd All the tests are passing now. Could you please review and merge the request? Thanks

cognifloyd

LGTM I'm not sure if this will get merged for 3.6.0 or if we'll wait till 3.7.0 to include it.

arm4b · 2021-10-05T19:15:20Z

I remember there is another PR that is more involved related to changing the behavior in st2workflowengine #5367.
This one #5358 looks like an easy hotfix, tests are good and no blockers here. It would be really nice if we could include it for the v3.6.0.

@khushboobhatia01 Could you please include the Changelog record as well for this PR to make it complete?

cognifloyd · 2021-10-05T20:13:06Z

I added a changelog entry

arm4b · 2021-10-05T21:21:22Z

Thanks, looks we're all good ✔️ on this one!

Make update_execution() atomic

14cf46c

pull-request-size bot added the size/M PR that changes 30-99 lines. Good size to review. label Sep 14, 2021

khushboobhatia01 and others added 2 commits September 14, 2021 11:46

Black reformat

20045dc

Merge branch 'master' into bkhushboo/race_condition

5fa3ecd

khushboobhatia01 added 3 commits September 15, 2021 18:21

Fix chunk 2 test cases

a33c319

Merge branch 'bkhushboo/race_condition' of github.com:khushboobhatia0…

4250e23

…1/st2 into bkhushboo/race_condition

Fix chunk 2 unit test cases

7c42aad

pull-request-size bot added size/L PR that changes 100-499 lines. Requires some effort to review. and removed size/M PR that changes 30-99 lines. Good size to review. labels Sep 15, 2021

khushboobhatia01 and others added 5 commits September 15, 2021 20:30

Retrigger CI

1d12cfe

Retrigger CI

67cc829

Merge branch 'master' into bkhushboo/race_condition

ba3ad95

Retrigger CI

d6a999c

Retrigger CI

3ce5819

cognifloyd reviewed Oct 2, 2021

View reviewed changes

khushboobhatia01 and others added 3 commits October 5, 2021 11:40

Fix test case

4eb19d5

Reformat

44f691e

Merge branch 'master' into bkhushboo/race_condition

7b365f1

cognifloyd approved these changes Oct 5, 2021

View reviewed changes

Merge branch 'master' into bkhushboo/race_condition

2944aed

cognifloyd added this to the 3.6.0 milestone Oct 5, 2021

add changelog entry

ec52b89

cognifloyd merged commit f4fcf13 into StackStorm:master Oct 5, 2021

cognifloyd mentioned this pull request Oct 7, 2021

Refactor FileWatchSensor to remove logshipper #5096

Draft

arm4b mentioned this pull request Nov 21, 2021

Add Khushboo Bhatia (@khushboobhatia01) as a Contributor #5450

Merged

cognifloyd mentioned this pull request Dec 7, 2021

Executions can not be proceeded with errors in st2actionrunner and st2scheduler #5483

Closed

arm4b mentioned this pull request Dec 7, 2021

Fix etcd lock error #5484

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make update_execution() atomic#5358

Make update_execution() atomic#5358
cognifloyd merged 16 commits intoStackStorm:masterfrom
khushboobhatia01:bkhushboo/race_condition

khushboobhatia01 commented Sep 14, 2021

Uh oh!

CLAassistant commented Sep 15, 2021 •

edited

Loading

Uh oh!

cognifloyd left a comment

Uh oh!

cognifloyd Oct 2, 2021

Uh oh!

khushboobhatia01 Oct 4, 2021

Uh oh!

khushboobhatia01 commented Oct 5, 2021

Uh oh!

cognifloyd left a comment

Uh oh!

arm4b commented Oct 5, 2021 •

edited

Loading

Uh oh!

cognifloyd commented Oct 5, 2021

Uh oh!

arm4b commented Oct 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

khushboobhatia01 commented Sep 14, 2021

Uh oh!

CLAassistant commented Sep 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cognifloyd left a comment

Choose a reason for hiding this comment

Uh oh!

cognifloyd Oct 2, 2021

Choose a reason for hiding this comment

Uh oh!

khushboobhatia01 Oct 4, 2021

Choose a reason for hiding this comment

Uh oh!

khushboobhatia01 commented Oct 5, 2021

Uh oh!

cognifloyd left a comment

Choose a reason for hiding this comment

Uh oh!

arm4b commented Oct 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cognifloyd commented Oct 5, 2021

Uh oh!

arm4b commented Oct 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Sep 15, 2021 •

edited

Loading

arm4b commented Oct 5, 2021 •

edited

Loading