Skip to content

Conversation

@csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Apr 4, 2023

We were using Redis to keep track of the available hosts. This sometimes causes problems when hosts fail, as they are not removed from the redis queue unless flushed. Even though we could implement a heart-beat mechanism within redis, I wanted to take this opportunity to implement a satndalone, centralised, component in faabric.

In this PR I introduce the planner (name can be changed), a standalone component that has a centralised view of the state of a distributed faabric deployment. Thus far, we only use it to keep track of the available hosts, but more functionality will follow in subsequent PRs.

@csegarragonz csegarragonz changed the title Centralised Control-Plane In-house host book-keeping Apr 4, 2023
@csegarragonz csegarragonz self-assigned this Apr 6, 2023
# Vim
*.swp

# Protobuf
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this here from ./include/faabric/proto/.gitignore.

Ideally, we would add a generated include dir as we do for faasm, instead of having the generated header files in the main source tree, but maybe in a separate PR.

#include <google/protobuf/util/json_util.h>

namespace faabric::util {
void messageToJsonPb(const google::protobuf::Message& msg,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods use protobuf's embedded JSON (de)-serialisation.

It is a very big improvement in terms of LoC and complexity versus the current RapidJSON approach.

I implement it here as the RapidJSON approach only works for faabric::Message whereas protobuf's works for any class inheriting from google::protobuf::Message.

However, removing the old version is not immediate as all CLI and experiment code relies on the syntax we define when (de)-serialising messages to JSON, so I'd rather add this here, and remove RapidJSON in a separate PR.

int port,
int threadCount,
std::shared_ptr<HttpRequestHandler> requestHandlerIn = nullptr);
FaabricEndpoint(int port,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, we allowed users of the FaabricEndpoint not to specify a request handler, and we would use FaabricEndpointHandler by default.

Unfortunately, this means that any other class implementing a different endpoint (thus using FaabricEndpoint but a different endpoint handler than FaabricEndpointHandler) still has a link-time dependency to FaabricEndpointHandler (i.e. needs to provide all the symbols used by FaabricEndpointHandler). In turn, this means that we need to link with faabric::scheduler even if not using any symbols therein defined.

The aim of this PR is to split from the scheduler (eventually turning the existing scheduler into a form of local scheduler), so we don't want to take that link-time dependency.

@csegarragonz csegarragonz marked this pull request as ready for review April 11, 2023 17:01
image: redis
UBSAN_OPTIONS: print_stacktrace=1:halt_on_error=1
steps:
- name: "Install Conan"
Copy link
Collaborator Author

@csegarragonz csegarragonz Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move the tests job from running inside a container with additional services to using docker compose manually.

The planner runs in a separate container, but may be updated frequently. To avoid having to re-tag the code everytime, we want to mount the binaries we build as part of the tests (with the latest checkout of the code) and use those to run the planner.

The first problem this arises is that, even though GHA's service containers support mounting volumes, the planner service requires the planner_server binary to be there before it can start. And once it is running it can not be restarted from inside the tests container. (Maybe we could mount the docker daemon inside the container, but this solution sounded way too hacky).

To overcome this problem, we add a custom entrypoint for the planner that allows setting the directory where it will look for the planner_server binary. If its not there, it will wait until it becomes available.

Then, we come to a second problem. GHA's service containers don't allow overwriting the entrypoint. See here for a further discussion.

The solution I have found is to use docker compose manually.

@csegarragonz csegarragonz changed the title In-house host book-keeping In-house host membership accounting Apr 12, 2023
@csegarragonz csegarragonz requested a review from Shillaker April 12, 2023 12:32
using namespace rapidjson;

namespace faabric::util {
void messageToJsonPb(const google::protobuf::Message& msg, std::string* jsonStr)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update this after #302 is merged in

@csegarragonz csegarragonz requested review from Shillaker and removed request for Shillaker April 17, 2023 14:10
@csegarragonz csegarragonz merged commit 6fa0fdb into main May 4, 2023
@csegarragonz csegarragonz deleted the planner branch May 4, 2023 12:25
@csegarragonz csegarragonz mentioned this pull request May 4, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants