Skip to content

The ability to cleanup when application panics in any of the routines spawned by s.acceptConnectionsRoutine().  #3800

@ruseinov

Description

@ruseinov

Tested with tendermint 0.31.5.

The issue is that when something goes wrong in abci app, for example, in InitChain method - we are forced to panic, as there is no other way to return an error and stop the app.

When this happens we cannot recover, because these panics happen in a goroutine

func (s *SocketServer) handleRequests(

which in turn is being spawned by

func (s *SocketServer) acceptConnectionsRoutine()

What this leads to is the socket not being closed and cleaned up, so when the application fails once it is not able to start again, there is a manual step required, which is basically rm /tmp/app.socket.

There are different options to fix that:

  1. Recover from panic, log the error and allow main thread to exit gracefully, which will let the code looking like this:
	svr, err := server.NewServer(addr, "socket", app)
	if err != nil {
		return errors.Wrap(err, "failed to create a listener")
	}

	svr.SetLogger(logger.With("module", "abci-server"))

	done := make(chan bool)
	cleanupCallback := func() {
		// Cleanup
		_ = svr.Stop()
		done <- true
	}

	cmn.TrapSignal(logger, cleanupCallback)

	defer func() {
		if err := recover(); err != nil {
			logger.Error("recovered from panic", "err", err)
			cleanupCallback()
		}
	}()

	err = svr.Start()
	if err != nil {
		return errors.Wrap(err, "failed to start a server")
	}
	fmt.Println("123")
	// wait forever
	<-done

	return nil

to do proper cleanup.

  1. Alternatively allow passing a cleanup callback to the socket server, and call that on recover() and then propagate panic. In our case such callback would still be something like
    func() {svr.Stop()}. I think the solution number one is preferred, because it would be easier to cleanup in the main thread as shown above.

  2. Yet another solution is to modify ABCI interface to be able to return errors in responses or alongside them. Then instead of panicking the app could just return an error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C:abciComponent: Application Blockchain Interface

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions