Cluster management (membership & cache query)



## Design

### Membership
- Worker nodes periodically report the following info to the master:
-- worker ID: unique string in the cluster. e.g. `workerhost01-containerd-overlay`
-- connection info for connecting to the worker from the master: implementation-specific. probably, e.g. `tcp://workerhost01:12345` or `unix://run/buildkit/instance01.sock`
--- Support for UNIX socket should be useful for testing purpose
--- not unique; can be shared among multiple workers. (master puts workerID to all request messages)
-- performance stat: loadavg, disk quota usage, and so on
-- annotations

e.g.
```json
{
  "worker_id": "workerhost01-containerd-overlay",
  "connections":[
    {
      "type": "grpc.v0",
      "socket": "tcp://workerhost01.12345"
    }
  ],
  "stats":[
    {
      "type": "cpu.v0",
      "loadavg": [0.01, 0.02, 0.01]
    }
  ],
  "annotations": {
    "os": "linux",
    "arch": "amd64",
    "executor": "containerd",
    "snapshotter": "overlay",
    "com.example.userspecific": "blahblahblah",
  }
}
```

### Cache query
- With the connection info above, managers can ask a worker whether the worker has the cache for the `CacheKey`.
-- the answer does not need to be 100% accurate.
-- How to transfer the cache data is another topic: #224


## Initial naive implementation

- Stateless master
-- When the master dies, the orchestrator (k8s/swarm) restarts the master (and membership info will be lost)
-- Multiple masters could be started, but no connection between masters

- Worker connects to the master using gRPC
-- the master address(es) can be specified via the daemon CLI flag: `--join tcp://master:12345`

- Master connects to all workers using gRPC for querying cache existence
-- does not scale for dozens of nodes, but probably acceptable for the initial work


## Future possible implementation

- Use IPFS (or just [libp2p DHT library](https://github.com/libp2p/go-libp2p-kad-dht)) for querying cache existence (and also transfer)? 
-- Membership state can be saved to IPFS as well?
-- or Infinite? (is it still active?)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster management (membership & cache query) #231

Design

Membership

Cache query

Initial naive implementation

Future possible implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cluster management (membership & cache query) #231

Description

Design

Membership

Cache query

Initial naive implementation

Future possible implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions