Skip to content

Initial File System - Bounty-to-Hire#1032

Merged
afsalthaj merged 21 commits intogolemcloud:mainfrom
mschuwalow:bounty-2-hire-2
Nov 9, 2024
Merged

Initial File System - Bounty-to-Hire#1032
afsalthaj merged 21 commits intogolemcloud:mainfrom
mschuwalow:bounty-2-hire-2

Conversation

@mschuwalow
Copy link
Copy Markdown
Contributor

@mschuwalow mschuwalow commented Oct 27, 2024

Initial Component Files

/claim #1004
/resolves #1004


Features:

  • Uploading initial component files during creation via cli
  • Uploading initial component files during update via cli
  • Initial component files get loaded by workers during execution
  • Permissions of initial component files are respected by workers
  • Support for initial component files is added to the test framework
  • Worker files are exposed via worker executor api
  • Worker files are exposed via worker api
  • Worker files are served via the gateway
  • Read-only files are directly loaded from the blob storage when accessed via the gateway

Note the comment I left regarding updating workers with intial component files present. There are test present for correct behavior in automatic update mode.

For manual update I would argue that we should give control to the user and have them explicitly store / restore files from the blob storage instead of trying to do it for them.

I.e. at the time when load-snapshot gets called the user will have a fully initialized filesystem with the files declared for the new component version and can restore whatever is needed from the old files.

Alternatively, if we want to simplify this for users as much as possible, we could always restore the previous r+w files to a special, reserved descriptor in read-only mode after a manual update. I.e. something like '/@previous'. This will not show up in normal directory listings for '/' if it's added as a seperate preopen, so it should not interfere with normal operations.
If desired I can add this behavior here / as a followup.


Obviously this is a very chunky boy. I tried to make it somewhat reviewable commit by commit, maybe that will help a bit.

/// Key that can be used to identify a component file.
/// All files with the same content will have the same key.
#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize, NewType)]
pub struct InitialComponentFileKey(pub String);
Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design that I went for is that all initial component files are stored based on their hash in the blob storage.
This way we can have good sharing of data without making the implementation more complex.

The hashes are computed server-side so we are resistant to any attacks.
Collisions for sha-256 should basically not exist in practice.

golem-service-base = { path = "../golem-service-base" }
golem-rib = { path = "../golem-rib" }
golem-wasm-ast = { workspace = true }
golem-worker-executor-base = { path = "../golem-worker-executor-base", version = "0.0.0" }
Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency is here for the blob storage. It feels very bad to have this dependency instead of some common crate though. Extracted to the golem_service_base

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a better place :)

/// Update a component
#[oai(
path = "/:component_id/updates",
method = "post",
Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update (the existing one too) is not idempotent, so imo post is a better fit then put.

This old one is left for backwards compat, as we now need to be multipart here too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be you are right, but I am having my doubts :)
So let me confirm, the existing /:component_id/upload with PUT doesn't work here?
I guess update should be merging with the previous files if there are no conflicts.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afsalthaj the IFS is tied for a specific component version. No merging.

self.fail_if_read_only(&fd)?;
record_host_function_call("filesystem::types::descriptor", "write_via_stream");
HostDescriptor::write_via_stream(&mut self.as_wasi_view(), self_, offset)
HostDescriptor::write_via_stream(&mut self.as_wasi_view(), fd, offset)
Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around a bit with wasmtime and I saw a few issues with preopened directories:

  1. Preopened directories in the root don't work well. If I have a preopen for '/' and '/data' and list descriptors for '/', I don't see the descriptor for /data. This is imo going to be very confusing for users.
  2. Preopen only works on the level of directories, meaning that we would have to make the whole dir readonly. This does not really work well with the format that is specced for the files properties in the application manifest / we would need to specify read-only directories explicitly in some way.
  3. When playing around with complex setups I saw bad file descriptor errors a few times, but I'm not sure how to consistently trigger them.

Based on these I decided to implement this slightly differently.

1: Each worker maintains a shared cache of read only files /tmp/foo/{hash}. When a worker gets scheduled all required files are hard linked from the cache to the worker tmpdir. All read-write files are copied into the appropriate location.
2: The worker maintains a hashset of readonly paths that were loaded. All mutating operations in the HostDescriptor resolve the host path via the ResourceTable and check the hashset. If the path is present we fail with permission denied. This is very similar to how the preopen feature is implemented in wasmtime.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can get rid of the need to do hard links as well by adding an extra path mapping in each filesystem host function. (Just an idea, not necessarily the way to go)

Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hope was that with the hard links and the permission overlay we can keep the implementation on our side as simple as possible. I also saw that they have very similar infrastructure in wasmtime, so maybe we can get them to expose an api that allows us to just give the file permission to them.

owned_worker_id.worker_id, worker_config.deleted_regions
);

let read_only_paths = prepare_filesystem(file_loader, temp_dir.path(), &component_metadata.files).await?;
Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal for handling files during updates / replays is the following:

  1. Automatic: We replay from the beginning using the new initial file set. If we encounter errors the user will have to use a snapshot based update. This is similar to how other incompatibilities in the component are being handled from what I can tell.
  2. Snapshot: We rely on the user-defined save-snapshot function to save all data that the users wants to persist across updates. This could either be implemented by using the blob storage apis directly, or we could allow returning paths to files that should be persisted from the snapshot function. The user-defined load-snapshot function is then responsible to reintialize state on top of the new initial filesystem. From what I can tell this is consistent with how other state in the component is handled, so this should be least surprising to users.

@jdegoes
Copy link
Copy Markdown
Contributor

jdegoes commented Oct 31, 2024

@mschuwalow Congratulations on submitting your solution for our Bounty-to-Hire challenge (#1004)!

We have reviewed your pull request and believe that it is very promising. While not yet satisfying all of the requirements of #1004, we believe there is a short and well-defined path to get there, and so we are delighted to announce that you are a finalist in our Bounty-to-Hire event!

Next steps:

  1. You are guaranteed an interview based on your work thus far.
  2. Should you continue to work on this pull request, and get it into a state that can be merged to satisfy all requirements, you stand a reasonable chance of winning the bounty.

If you no longer wish to work on this pull request, please close it, and we will reach out about the interview. On the other hand, if you wish to continue pushing this pull request forward, then please let us know here and continue your work.

Congratulations again on being a finalist for Bounty-to-Hire!

string componentName = 2;
optional ComponentType componentType = 3;
// All files need to be uploaded to the blob storage before providing them here
repeated InitialComponentFile files = 4;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feel a bit wasteful to upload the files through grpc if we are already in the 'trusted' context of the application and have direct access to the blob_storage, so I went with this approach

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not think of the gRPC API as something that is only available for internal communication (but it is right that nothing external uses it at the moment).

);


let blob_storage = Arc::new(FileSystemBlobStorage::new(&PathBuf::from("/tmp/ittest-local-object-store/golem")).await.unwrap());
Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with what the other services use for their blob_storage, but this looks a bit broken to me. I think it would be better to have somethink like minio running and use the s3 based blob storage for all services. Definitely leaving that for a followup though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test framework is used for multiple things:

  • Worker executor tests (golem-worker-executor-tests/tests) for testing various properties of the executor (only). For this minimal dependencies and as fast startup time as possible are the most important for me so here we use file system / in memory implementations
  • Integration tests involving all services (integration-tests/tests). For these I agree that it would be better to use the s3 based blob storage.
  • CLI tests (golem-cli/tests) are like integration tests but doing operations through invoking the CLI itself. For historical reasons many things we should have as integration tests are implemented as CLI tests now - and for those I would also use S3 instead of file system storage, but if we think about it as just tests for the CLI layer itself it should not matter
  • Benchmarks - we are running those either against locally a started golem cluster where I think using minio instead of the local file system would not be very useful (not interested in the performance of minio itself). And of course we plan to be able to run them against a real cluster deployed somewhere (but currently not doing it)

That said there are various ways to set up the test dependencies, as you see, and the cli.rs one is only used for benchmarks at the moment.

Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean with broken here is that we setup the blob storage as using a filesystem in the containers /tmp/ittest-local-object-store/golem, but I don't see any volume mounts that would allow sharing this directory between containers.

Note this only applies to the docker and k8s based test environments. I also didn't get around to trying this out, so maybe I'm missing something and it works already 👍


pub async fn put_if_not_exists(
&self,
bytes: &Bytes,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a followup this should be changed to instead stream the data so we can handle files that are larger than memory

)
}

fn dump_component_info(path: &Path) -> golem_common::model::component_metadata::ComponentMetadata {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got fully moved to the file system implementation of the component service, so doing it as part of the dsl is unneeded now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this dump happen in the prod service, but that can be achieved by using log levels or some configurations so I'm ok with it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* wip

* wip

* wip

* wip

* wip

* wip

* wip\

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip
@mschuwalow mschuwalow requested a review from vigoo November 3, 2024 17:43
/// - not contain "." components
/// - use '/' as a separator
#[derive(Clone, Debug, Eq, PartialEq, Hash)]
pub struct ComponentFilePath(Utf8UnixPathBuf);
Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this guy is getting serialized and we might have windows binaries at some point, I went with this instead of a normal path.

#[allow(clippy::large_enum_variant)]
#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(tag = "type", content = "config")]
pub enum BlobStorageConfig {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes here, just moved to the service-base crate

Copy link
Copy Markdown
Contributor

@afsalthaj afsalthaj Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

golem-service-base is a carry forward of a very early version of golem and may get deprecated soon. I think I introduced this module, but these days I was hope everything that 's common just gets back to golem-common.

The migration of BlobStorageConfig back here may hinder the avoidance of golem-service-base in near future. Is this mandatory to have it here?

Thanks for the comments though, that it clarifies the context.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to move this here for now.


#[async_trait]
impl<Ctx: WorkerCtx + DurableWorkerCtxView<Ctx>> FileSystemReading for DurableWorkerCtx<Ctx> {
async fn list_directory(&self, path: &ComponentFilePath) -> Result<ListDirectoryResult, GolemError> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to bother with any locking here. There is some machinery higher up that will ensure these are only called when the worker is not executing anything (and potentially mutating the files)

use testcontainers_modules::minio::MinIO;
use uuid::Uuid;

macro_rules! test_blob_storage {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests (and the other two suite) should be moved to the crate where the implementation is now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I realized you only moved the blob storage not the other two. In that case I'm only talking about this file.

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum BlobStorageNamespace {
CompilationCache,
InitialComponentFiles,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add AccountId to this namespace too (and use it in the path just like with the oplog related blob stores). This makes it easier to collect / delete a specific account's data.

Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be used as the account_id? From what I can tell both the api and grpc api for component creation do not support an account_id currently

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is a bit tricky because there are no accounts in the OSS part of Golem. But if you check worker executor, it propagates an account id everywhere, and various upper layers are trying to abstract it. (It's a bit of a mess right now).

For example https://github.com/golemcloud/golem/blob/main/golem-worker-service/src/lib.rs#L11 and in the component service it's represented by the "namespace" (but the DefaultNamespace here https://github.com/golemcloud/golem/blob/main/golem-service-base/src/auth.rs does not contain anything, as there are no accounts in the open-source version).

As you probably already figured out, the reason for all these abstractions, and different -base crates etc is that we have some advanced features built on top of this codebase which require these features.

Of course it's hard to figure out the best way to do this as you don't see the other half.
For this PR is suggest the following:

  • add AccountId to this namespace, because it already has it for OplogPayload, for example
  • in worker executor you already have account id available everywhere (sometimes hidden in an OwnedWorkerId)
  • in component service, you need to extract it from the namspace which is propagated to the service level. It does not have a way to extract it right now, but you can define a trait with an Self -> Option<AccountId> extractor and require it for the namespace here: https://github.com/golemcloud/golem/blob/main/golem-component-service-base/src/service/component.rs#L390
  • if there is no account id, use "-1" (because worker service does it already in one of the above links)

Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigoo done. Please take a look whether this what you had in mind:
8115556

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised it "leaked" into so many places like API definition models. Is that necessary? Worker service was already able to address workers somehow, which means it had the mechanism to get the necessary context (accountid etc) alredy without these changes.

I would like to revisit this and if possible make less spread out change.

Other than this from my side the PR is ready to go.

Copy link
Copy Markdown
Contributor Author

@mschuwalow mschuwalow Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason this leaked into so many places is that I need to use the correct accountId for an api deployment if I understand the code correctly.

The way it was done before, was that namespace information was included when creating the ApiDefinition, but it was serialized to a string and never deserialized again. Deserializing and carrying the information around as a proper type makes it show up in a lot of places

Let me try to limit it to only the compiled versions of the definitions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigoo Done. I think this looks much better 8573696.

@mschuwalow
Copy link
Copy Markdown
Contributor Author

mschuwalow commented Nov 6, 2024

I think I addressed all of the comments. Please do another round of reviews :)

Copy link
Copy Markdown
Contributor

@vigoo vigoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one last request (#1032 (comment)) then it's good to go.

We should merge when each reviewer approves it as we reviewed different parts.

@mschuwalow mschuwalow requested a review from vigoo November 7, 2024 10:33
@noise64
Copy link
Copy Markdown
Contributor

noise64 commented Nov 7, 2024

all comments are resolved on cli side 👍

@mschuwalow
Copy link
Copy Markdown
Contributor Author

@vigoo what is missing here / anything to do from my side to unblock merging?

@vigoo
Copy link
Copy Markdown
Contributor

vigoo commented Nov 8, 2024

@vigoo what is missing here / anything to do from my side to unblock merging?

Hopefully nothing, I just want an approve from @afsalthaj too

@afsalthaj
Copy link
Copy Markdown
Contributor

afsalthaj commented Nov 8, 2024

#1032 (comment)

@mschuwalow If you could just make this small change, I think we good to merge
May be proto3 is internally optional by default, but we have been following a safer approach by making things explicitly optional to keep backward compatibility.

@mschuwalow
Copy link
Copy Markdown
Contributor Author

#1032 (comment)

@mschuwalow If you could just make this small change, I think we good to merge May be proto3 is internally optional by default, but we have been following a safer approach by making things explicitly optional to keep backward compatibility.

ah sorry, missed that one. Will adjust 👍

@mschuwalow
Copy link
Copy Markdown
Contributor Author

#1032 (comment)

@mschuwalow If you could just make this small change, I think we good to merge May be proto3 is internally optional by default, but we have been following a safer approach by making things explicitly optional to keep backward compatibility.

@afsalthaj Done, please take a look 625786a
Only CompiledHttpApiDefinition and ComponentMetadata seem to be getting stored as a protobuf, so the rest should be fine.

I was thinking to add a comment to all grpc definitions that (directly or transitively) get stored in the database so other people don't run into the same problem. This would affect quite a lot of types like rib expressions though (due to how big CompiledHttpApiDefinition is). Something like

// Note: This message is used to store serialized data in the database.
// make sure to not break backward compatibility when changing this message.

Want me to go for it?

@afsalthaj
Copy link
Copy Markdown
Contributor

We are still thinking as to how to manage this: bincode vs proto. Something for us to improve.
The advantage of proto is we throw the backward compatibility pollutions there. We will see.

Thanks for addressing the review comments.

@afsalthaj afsalthaj merged commit df73784 into golemcloud:main Nov 9, 2024
vigoo pushed a commit that referenced this pull request Aug 15, 2025
* Initial version: cli upload and allow workers to access component files

* Add support for initial files to test framework & worker-executor-tests

* Expose worker files through rest and grpc

* regenerate openapi and client

* Add fileserver binding type to api definitions (#4)

* wip

* wip

* wip

* wip

* wip

* wip

* wip\

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* adjust example configs

* adjust docker-compose settings

* integration tests

* add e2e tests for file-server api deployments

* remove duplicate code

* fmt and fix

* address comments

* improve file_loader

* download files concurrently

* make code more readable

* Add accountId to blobstorage namespace: (#5)

* fix test

* refactor cli

* less namespace

* keep backwards compat for grpc messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Initial File System - Bounty-to-Hire

5 participants