Tested with tendermint 0.31.5.
The issue is that when something goes wrong in abci app, for example, in InitChain method - we are forced to panic, as there is no other way to return an error and stop the app.
When this happens we cannot recover, because these panics happen in a goroutine
func (s *SocketServer) handleRequests(
which in turn is being spawned by
func (s *SocketServer) acceptConnectionsRoutine()
What this leads to is the socket not being closed and cleaned up, so when the application fails once it is not able to start again, there is a manual step required, which is basically rm /tmp/app.socket.
There are different options to fix that:
- Recover from panic, log the error and allow main thread to exit gracefully, which will let the code looking like this:
svr, err := server.NewServer(addr, "socket", app)
if err != nil {
return errors.Wrap(err, "failed to create a listener")
}
svr.SetLogger(logger.With("module", "abci-server"))
done := make(chan bool)
cleanupCallback := func() {
// Cleanup
_ = svr.Stop()
done <- true
}
cmn.TrapSignal(logger, cleanupCallback)
defer func() {
if err := recover(); err != nil {
logger.Error("recovered from panic", "err", err)
cleanupCallback()
}
}()
err = svr.Start()
if err != nil {
return errors.Wrap(err, "failed to start a server")
}
fmt.Println("123")
// wait forever
<-done
return nil
to do proper cleanup.
-
Alternatively allow passing a cleanup callback to the socket server, and call that on recover() and then propagate panic. In our case such callback would still be something like
func() {svr.Stop()}. I think the solution number one is preferred, because it would be easier to cleanup in the main thread as shown above.
-
Yet another solution is to modify ABCI interface to be able to return errors in responses or alongside them. Then instead of panicking the app could just return an error.
Tested with tendermint 0.31.5.
The issue is that when something goes wrong in abci app, for example, in
InitChainmethod - we are forced to panic, as there is no other way to return an error and stop the app.When this happens we cannot recover, because these panics happen in a goroutine
which in turn is being spawned by
What this leads to is the socket not being closed and cleaned up, so when the application fails once it is not able to start again, there is a manual step required, which is basically
rm /tmp/app.socket.There are different options to fix that:
to do proper cleanup.
Alternatively allow passing a cleanup callback to the socket server, and call that on
recover()and then propagate panic. In our case such callback would still be something likefunc() {svr.Stop()}. I think the solution number one is preferred, because it would be easier to cleanup in the main thread as shown above.Yet another solution is to modify ABCI interface to be able to return errors in responses or alongside them. Then instead of panicking the app could just return an error.