Initial File System - Bounty-to-Hire#1032
Conversation
| /// Key that can be used to identify a component file. | ||
| /// All files with the same content will have the same key. | ||
| #[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize, NewType)] | ||
| pub struct InitialComponentFileKey(pub String); |
There was a problem hiding this comment.
The design that I went for is that all initial component files are stored based on their hash in the blob storage.
This way we can have good sharing of data without making the implementation more complex.
The hashes are computed server-side so we are resistant to any attacks.
Collisions for sha-256 should basically not exist in practice.
| golem-service-base = { path = "../golem-service-base" } | ||
| golem-rib = { path = "../golem-rib" } | ||
| golem-wasm-ast = { workspace = true } | ||
| golem-worker-executor-base = { path = "../golem-worker-executor-base", version = "0.0.0" } |
There was a problem hiding this comment.
This dependency is here for the blob storage. It feels very bad to have this dependency instead of some common crate though. Extracted to the golem_service_base
| /// Update a component | ||
| #[oai( | ||
| path = "/:component_id/updates", | ||
| method = "post", |
There was a problem hiding this comment.
Update (the existing one too) is not idempotent, so imo post is a better fit then put.
This old one is left for backwards compat, as we now need to be multipart here too.
There was a problem hiding this comment.
May be you are right, but I am having my doubts :)
So let me confirm, the existing /:component_id/upload with PUT doesn't work here?
I guess update should be merging with the previous files if there are no conflicts.
There was a problem hiding this comment.
@afsalthaj the IFS is tied for a specific component version. No merging.
| self.fail_if_read_only(&fd)?; | ||
| record_host_function_call("filesystem::types::descriptor", "write_via_stream"); | ||
| HostDescriptor::write_via_stream(&mut self.as_wasi_view(), self_, offset) | ||
| HostDescriptor::write_via_stream(&mut self.as_wasi_view(), fd, offset) |
There was a problem hiding this comment.
I played around a bit with wasmtime and I saw a few issues with preopened directories:
- Preopened directories in the root don't work well. If I have a preopen for '/' and '/data' and list descriptors for '/', I don't see the descriptor for /data. This is imo going to be very confusing for users.
- Preopen only works on the level of directories, meaning that we would have to make the whole dir readonly. This does not really work well with the format that is specced for the files properties in the application manifest / we would need to specify read-only directories explicitly in some way.
- When playing around with complex setups I saw bad file descriptor errors a few times, but I'm not sure how to consistently trigger them.
Based on these I decided to implement this slightly differently.
1: Each worker maintains a shared cache of read only files /tmp/foo/{hash}. When a worker gets scheduled all required files are hard linked from the cache to the worker tmpdir. All read-write files are copied into the appropriate location.
2: The worker maintains a hashset of readonly paths that were loaded. All mutating operations in the HostDescriptor resolve the host path via the ResourceTable and check the hashset. If the path is present we fail with permission denied. This is very similar to how the preopen feature is implemented in wasmtime.
There was a problem hiding this comment.
Maybe you can get rid of the need to do hard links as well by adding an extra path mapping in each filesystem host function. (Just an idea, not necessarily the way to go)
There was a problem hiding this comment.
The hope was that with the hard links and the permission overlay we can keep the implementation on our side as simple as possible. I also saw that they have very similar infrastructure in wasmtime, so maybe we can get them to expose an api that allows us to just give the file permission to them.
| owned_worker_id.worker_id, worker_config.deleted_regions | ||
| ); | ||
|
|
||
| let read_only_paths = prepare_filesystem(file_loader, temp_dir.path(), &component_metadata.files).await?; |
There was a problem hiding this comment.
My proposal for handling files during updates / replays is the following:
- Automatic: We replay from the beginning using the new initial file set. If we encounter errors the user will have to use a snapshot based update. This is similar to how other incompatibilities in the component are being handled from what I can tell.
- Snapshot: We rely on the user-defined save-snapshot function to save all data that the users wants to persist across updates. This could either be implemented by using the blob storage apis directly, or we could allow returning paths to files that should be persisted from the snapshot function. The user-defined load-snapshot function is then responsible to reintialize state on top of the new initial filesystem. From what I can tell this is consistent with how other state in the component is handled, so this should be least surprising to users.
|
@mschuwalow Congratulations on submitting your solution for our Bounty-to-Hire challenge (#1004)! We have reviewed your pull request and believe that it is very promising. While not yet satisfying all of the requirements of #1004, we believe there is a short and well-defined path to get there, and so we are delighted to announce that you are a finalist in our Bounty-to-Hire event! Next steps:
If you no longer wish to work on this pull request, please close it, and we will reach out about the interview. On the other hand, if you wish to continue pushing this pull request forward, then please let us know here and continue your work. Congratulations again on being a finalist for Bounty-to-Hire! |
ccddada to
ff20373
Compare
| string componentName = 2; | ||
| optional ComponentType componentType = 3; | ||
| // All files need to be uploaded to the blob storage before providing them here | ||
| repeated InitialComponentFile files = 4; |
There was a problem hiding this comment.
It feel a bit wasteful to upload the files through grpc if we are already in the 'trusted' context of the application and have direct access to the blob_storage, so I went with this approach
There was a problem hiding this comment.
I did not think of the gRPC API as something that is only available for internal communication (but it is right that nothing external uses it at the moment).
| ); | ||
|
|
||
|
|
||
| let blob_storage = Arc::new(FileSystemBlobStorage::new(&PathBuf::from("/tmp/ittest-local-object-store/golem")).await.unwrap()); |
There was a problem hiding this comment.
This is consistent with what the other services use for their blob_storage, but this looks a bit broken to me. I think it would be better to have somethink like minio running and use the s3 based blob storage for all services. Definitely leaving that for a followup though.
There was a problem hiding this comment.
The test framework is used for multiple things:
- Worker executor tests (
golem-worker-executor-tests/tests) for testing various properties of the executor (only). For this minimal dependencies and as fast startup time as possible are the most important for me so here we use file system / in memory implementations - Integration tests involving all services (
integration-tests/tests). For these I agree that it would be better to use the s3 based blob storage. - CLI tests (
golem-cli/tests) are like integration tests but doing operations through invoking the CLI itself. For historical reasons many things we should have as integration tests are implemented as CLI tests now - and for those I would also use S3 instead of file system storage, but if we think about it as just tests for the CLI layer itself it should not matter - Benchmarks - we are running those either against locally a started golem cluster where I think using minio instead of the local file system would not be very useful (not interested in the performance of minio itself). And of course we plan to be able to run them against a real cluster deployed somewhere (but currently not doing it)
That said there are various ways to set up the test dependencies, as you see, and the cli.rs one is only used for benchmarks at the moment.
There was a problem hiding this comment.
What I mean with broken here is that we setup the blob storage as using a filesystem in the containers /tmp/ittest-local-object-store/golem, but I don't see any volume mounts that would allow sharing this directory between containers.
Note this only applies to the docker and k8s based test environments. I also didn't get around to trying this out, so maybe I'm missing something and it works already 👍
|
|
||
| pub async fn put_if_not_exists( | ||
| &self, | ||
| bytes: &Bytes, |
There was a problem hiding this comment.
In a followup this should be changed to instead stream the data so we can handle files that are larger than memory
| ) | ||
| } | ||
|
|
||
| fn dump_component_info(path: &Path) -> golem_common::model::component_metadata::ComponentMetadata { |
There was a problem hiding this comment.
This got fully moved to the file system implementation of the component service, so doing it as part of the dsl is unneeded now
There was a problem hiding this comment.
I don't think we want this dump happen in the prod service, but that can be achieved by using log levels or some configurations so I'm ok with it.
There was a problem hiding this comment.
I meant the test-framework ComponentService, not the prod one :)
9e2a535 to
b8ecd8e
Compare
8e75f91 to
f8bc645
Compare
* wip * wip * wip * wip * wip * wip * wip\ * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip
| /// - not contain "." components | ||
| /// - use '/' as a separator | ||
| #[derive(Clone, Debug, Eq, PartialEq, Hash)] | ||
| pub struct ComponentFilePath(Utf8UnixPathBuf); |
There was a problem hiding this comment.
As this guy is getting serialized and we might have windows binaries at some point, I went with this instead of a normal path.
| #[allow(clippy::large_enum_variant)] | ||
| #[derive(Clone, Debug, Serialize, Deserialize)] | ||
| #[serde(tag = "type", content = "config")] | ||
| pub enum BlobStorageConfig { |
There was a problem hiding this comment.
No changes here, just moved to the service-base crate
There was a problem hiding this comment.
golem-service-base is a carry forward of a very early version of golem and may get deprecated soon. I think I introduced this module, but these days I was hope everything that 's common just gets back to golem-common.
The migration of BlobStorageConfig back here may hinder the avoidance of golem-service-base in near future. Is this mandatory to have it here?
Thanks for the comments though, that it clarifies the context.
There was a problem hiding this comment.
It's good to move this here for now.
|
|
||
| #[async_trait] | ||
| impl<Ctx: WorkerCtx + DurableWorkerCtxView<Ctx>> FileSystemReading for DurableWorkerCtx<Ctx> { | ||
| async fn list_directory(&self, path: &ComponentFilePath) -> Result<ListDirectoryResult, GolemError> { |
There was a problem hiding this comment.
We don't need to bother with any locking here. There is some machinery higher up that will ensure these are only called when the worker is not executing anything (and potentially mutating the files)
| use testcontainers_modules::minio::MinIO; | ||
| use uuid::Uuid; | ||
|
|
||
| macro_rules! test_blob_storage { |
There was a problem hiding this comment.
I think these tests (and the other two suite) should be moved to the crate where the implementation is now
There was a problem hiding this comment.
Ah I realized you only moved the blob storage not the other two. In that case I'm only talking about this file.
| #[derive(Debug, Clone, PartialEq, Eq, Hash)] | ||
| pub enum BlobStorageNamespace { | ||
| CompilationCache, | ||
| InitialComponentFiles, |
There was a problem hiding this comment.
Let's add AccountId to this namespace too (and use it in the path just like with the oplog related blob stores). This makes it easier to collect / delete a specific account's data.
There was a problem hiding this comment.
What should be used as the account_id? From what I can tell both the api and grpc api for component creation do not support an account_id currently
There was a problem hiding this comment.
Yeah it is a bit tricky because there are no accounts in the OSS part of Golem. But if you check worker executor, it propagates an account id everywhere, and various upper layers are trying to abstract it. (It's a bit of a mess right now).
For example https://github.com/golemcloud/golem/blob/main/golem-worker-service/src/lib.rs#L11 and in the component service it's represented by the "namespace" (but the DefaultNamespace here https://github.com/golemcloud/golem/blob/main/golem-service-base/src/auth.rs does not contain anything, as there are no accounts in the open-source version).
As you probably already figured out, the reason for all these abstractions, and different -base crates etc is that we have some advanced features built on top of this codebase which require these features.
Of course it's hard to figure out the best way to do this as you don't see the other half.
For this PR is suggest the following:
- add
AccountIdto this namespace, because it already has it forOplogPayload, for example - in worker executor you already have account id available everywhere (sometimes hidden in an
OwnedWorkerId) - in component service, you need to extract it from the namspace which is propagated to the service level. It does not have a way to extract it right now, but you can define a trait with an
Self -> Option<AccountId>extractor and require it for the namespace here: https://github.com/golemcloud/golem/blob/main/golem-component-service-base/src/service/component.rs#L390 - if there is no account id, use "-1" (because worker service does it already in one of the above links)
There was a problem hiding this comment.
I'm surprised it "leaked" into so many places like API definition models. Is that necessary? Worker service was already able to address workers somehow, which means it had the mechanism to get the necessary context (accountid etc) alredy without these changes.
I would like to revisit this and if possible make less spread out change.
Other than this from my side the PR is ready to go.
There was a problem hiding this comment.
The main reason this leaked into so many places is that I need to use the correct accountId for an api deployment if I understand the code correctly.
The way it was done before, was that namespace information was included when creating the ApiDefinition, but it was serialized to a string and never deserialized again. Deserializing and carrying the information around as a proper type makes it show up in a lot of places
Let me try to limit it to only the compiled versions of the definitions.
30bd2c8 to
1182760
Compare
|
I think I addressed all of the comments. Please do another round of reviews :) |
vigoo
left a comment
There was a problem hiding this comment.
I have one last request (#1032 (comment)) then it's good to go.
We should merge when each reviewer approves it as we reviewed different parts.
|
all comments are resolved on cli side 👍 |
|
@vigoo what is missing here / anything to do from my side to unblock merging? |
Hopefully nothing, I just want an approve from @afsalthaj too |
|
@mschuwalow If you could just make this small change, I think we good to merge |
ah sorry, missed that one. Will adjust 👍 |
@afsalthaj Done, please take a look 625786a I was thinking to add a comment to all grpc definitions that (directly or transitively) get stored in the database so other people don't run into the same problem. This would affect quite a lot of types like rib expressions though (due to how big CompiledHttpApiDefinition is). Something like Want me to go for it? |
|
We are still thinking as to how to manage this: Thanks for addressing the review comments. |
* Initial version: cli upload and allow workers to access component files * Add support for initial files to test framework & worker-executor-tests * Expose worker files through rest and grpc * regenerate openapi and client * Add fileserver binding type to api definitions (#4) * wip * wip * wip * wip * wip * wip * wip\ * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * adjust example configs * adjust docker-compose settings * integration tests * add e2e tests for file-server api deployments * remove duplicate code * fmt and fix * address comments * improve file_loader * download files concurrently * make code more readable * Add accountId to blobstorage namespace: (#5) * fix test * refactor cli * less namespace * keep backwards compat for grpc messages
Initial Component Files
/claim #1004
/resolves #1004
Features:
Note the comment I left regarding updating workers with intial component files present. There are test present for correct behavior in automatic update mode.
For manual update I would argue that we should give control to the user and have them explicitly store / restore files from the blob storage instead of trying to do it for them.
I.e. at the time when load-snapshot gets called the user will have a fully initialized filesystem with the files declared for the new component version and can restore whatever is needed from the old files.
Alternatively, if we want to simplify this for users as much as possible, we could always restore the previous r+w files to a special, reserved descriptor in read-only mode after a manual update. I.e. something like '/@previous'. This will not show up in normal directory listings for '/' if it's added as a seperate preopen, so it should not interfere with normal operations.
If desired I can add this behavior here / as a followup.
Obviously this is a very chunky boy. I tried to make it somewhat reviewable commit by commit, maybe that will help a bit.