Skip to content

Support for safely shutting down the NeoSystem#574

Closed
jsolman wants to merge 15 commits intomasterfrom
SafeShutdown
Closed

Support for safely shutting down the NeoSystem#574
jsolman wants to merge 15 commits intomasterfrom
SafeShutdown

Conversation

@jsolman
Copy link
Copy Markdown
Contributor

@jsolman jsolman commented Jan 24, 2019

We need to ensure the actors fully stop in the right order disposing the ActorSystem. This change uses CoordinatedShutdown and registers a tasks to wait for the LocalNode and Blockchain to actually stop before continuing with shutdown.

Fixes #570

@jsolman jsolman changed the title Support for safely shutting down the Neo. Support for safely shutting down Neo Jan 25, 2019
@jsolman jsolman changed the title Support for safely shutting down Neo Support for safely shutting down the NeoSystem Jan 25, 2019
@erikzhang
Copy link
Copy Markdown
Member

https://getakka.net/articles/actors/coordinated-shutdown.html

Can we just use the "Coordinated Shutdown" for safely shutting down the NeoSystem?

@jsolman
Copy link
Copy Markdown
Contributor Author

jsolman commented Jan 25, 2019

Looks like it is available in the core parts of Akka. I'll work on switching this to use CoordinatedShutdown.

@jsolman
Copy link
Copy Markdown
Contributor Author

jsolman commented Jan 25, 2019

@erikzhang The solution is clean now and uses CoordinatedShutdown

@vncoelho
Copy link
Copy Markdown
Member

Sounds good to me, Jeff.
Just wonder if it is possible to test this following that strategy used for the Consensus, but here starting everything and closing unexpectedly.

@jsolman
Copy link
Copy Markdown
Contributor Author

jsolman commented Jan 27, 2019

but here starting everything and closing unexpectedly.

@vncoelho It's working fine from my testing, can you give more details about the error you are getting?

@vncoelho
Copy link
Copy Markdown
Member

vncoelho commented Jan 28, 2019

I did not get errors, I did not have time to check yet. I was mentioning about the possibility of a Unit Test.

@jsolman
Copy link
Copy Markdown
Contributor Author

jsolman commented Jan 28, 2019

I was mentioning about the possibility of a Unit Test.

Well, there is certainly the possibility, and it would be nice to improve unit test coverage for the Blockchain class in general. For now I think it is relying a lot on manual integration testing. I suppose a unit test for the behavior of ShutdownAndWaitForIdle shouldn't be too difficult to add at some point, though not sure I will add it right now.

I've done manual integration testing of these changes with neo-cli.

@vncoelho
Copy link
Copy Markdown
Member

vncoelho commented Jan 28, 2019

That is it, maybe following this template:

TestProbe subscriber = CreateTestProbe();

TestActorRef<ConsensusService> actorConsensus = ActorOfAsTestActorRef<ConsensusService>(

We can create and initialize the actors and, then, kill/stop it.
I can try to do it in a couple of days.

@jsolman
Copy link
Copy Markdown
Contributor Author

jsolman commented Jan 29, 2019

@erikzhang I made changes to the PR so it no longer waits for the Blockchain to be idle. Now it just ensures the actors are stopped in the right order. It adds two CoordinatedShutdown phases the user can add tasks to in order to delay shutdown according to their needs. Akka doesn't seem to provide any way to tell when the Actor is stopped, except it provides the PostStop lifecycle hook in the actor, so I had to expose methods that use a semaphore to be able to wait asynchronously until the actor is stopped.

@jsolman
Copy link
Copy Markdown
Contributor Author

jsolman commented Feb 20, 2019

Should be good now, just needs a bit of testing.

@erikzhang
Copy link
Copy Markdown
Member

I think stopping an actor and waiting for it to complete can be done with GracefulStop().

https://getakka.net/articles/actors/receive-actor-api.html#graceful-stop

GracefulStop is useful if you need to wait for termination or compose ordered termination of several actors.

@jsolman
Copy link
Copy Markdown
Contributor Author

jsolman commented Feb 20, 2019

@erikzhang that would require it processing all the messages in the queue to get to the message that stops it gracefully, which you said you didn't want to do earlier. If we want to throw the current messages in the actor queues away, then that doesn't work for us.

protected override void OnReceive(object message)
{
if (message is Terminated t && ActorSemaphores.TryGetValue(t.ActorRef, out SemaphoreSlim semWait))
semWait.Release(short.MaxValue);
Copy link
Copy Markdown
Contributor Author

@jsolman jsolman Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably call Context.Unwatch(t.ActorRef) here also, but perhaps it doesn't matter since we are shutting down anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants