Skip to content

Cluster management (membership & cache query) #231

@AkihiroSuda

Description

@AkihiroSuda

Design

Membership

  • Worker nodes periodically report the following info to the master:
    -- worker ID: unique string in the cluster. e.g. workerhost01-containerd-overlay
    -- connection info for connecting to the worker from the master: implementation-specific. probably, e.g. tcp://workerhost01:12345 or unix://run/buildkit/instance01.sock
    --- Support for UNIX socket should be useful for testing purpose
    --- not unique; can be shared among multiple workers. (master puts workerID to all request messages)
    -- performance stat: loadavg, disk quota usage, and so on
    -- annotations

e.g.

{
  "worker_id": "workerhost01-containerd-overlay",
  "connections":[
    {
      "type": "grpc.v0",
      "socket": "tcp://workerhost01.12345"
    }
  ],
  "stats":[
    {
      "type": "cpu.v0",
      "loadavg": [0.01, 0.02, 0.01]
    }
  ],
  "annotations": {
    "os": "linux",
    "arch": "amd64",
    "executor": "containerd",
    "snapshotter": "overlay",
    "com.example.userspecific": "blahblahblah",
  }
}

Cache query

  • With the connection info above, managers can ask a worker whether the worker has the cache for the CacheKey.
    -- the answer does not need to be 100% accurate.
    -- How to transfer the cache data is another topic: Cache transfer #224

Initial naive implementation

  • Stateless master
    -- When the master dies, the orchestrator (k8s/swarm) restarts the master (and membership info will be lost)
    -- Multiple masters could be started, but no connection between masters

  • Worker connects to the master using gRPC
    -- the master address(es) can be specified via the daemon CLI flag: --join tcp://master:12345

  • Master connects to all workers using gRPC for querying cache existence
    -- does not scale for dozens of nodes, but probably acceptable for the initial work

Future possible implementation

  • Use IPFS (or just libp2p DHT library) for querying cache existence (and also transfer)?
    -- Membership state can be saved to IPFS as well?
    -- or Infinite? (is it still active?)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions