ACL for meta resources
Hello,
I'm currently implementing a SOLID server in OCaml (https://github.com/zoggy/ocaml-solid).
From https://github.com/solid/solid-spec/blob/master/content-representation.md#metadata , I understand that a client can modify a metadata resource, which is fine. But how are permissions on these metadata defined ?
I see two possibilities:
- permissions on metadata of resource R are the same as the permissions on R
- permissions on metadata of resource R can be different from the permissions on R and so must be explicitely defined in another resource, which can be the same ACL resource of another one (
.meta.acl?)
Sorry if this is stated somewhere.
I have argued that all resources can/should have at least one acl Link header relation. https://github.com/solid/solid/issues/61 At present I think one needs to implement whatever is deployed and documented, but I think one could come to some consensus as to how one could go on in the future versions, and discuss that.
@zoggy I believe the rough consensus is the first one :
permissions on metadata of resource R are the same as the permissions on R
@melvincarvalho wrote:
@zoggy I believe the rough consensus is the first one :
permissions on metadata of resource R are the same as the permissions on R
That does not seem right. I have been told for example that most applications don't allow acls to be written to, and even claims that many servers did not want them to be readable. (see for example the conversation that lead to issue https://github.com/solid/solid/issues/120). Given that acls were for a long time part of the metadata (there was a discussion some time back where acls were taken out of the meta file), I'd be surprised if there is much consensus there....
In any case, having implicit rules where some meta data resources follow some acls, and others others, don't seem to me to be following correctly the SoLiD philosophy. Much better to allow each server to specify these things explicity by having resources point to the correct acl. That makes it much easier to write clients, which don't need to then guess what type of resource one is looking at. (Don't forget, on the web you can reach a resource from anywhere)
I understand that permissions on ACL is being debated, and I tend to agree that ACLs should have ACLs (and having all them in the same ACL resource seems fine to me).
But for metadata, i.e. R.meta for resource R, I do not see a use case where having different ACL for R.meta and R would be useful.
But for metadata, i.e. R.meta for resource R, I do not see a use case where having different ACL for R.meta and R would be useful.
There are two points to be made:
- It does not help to make exceptions and have implicit relations for some resources and not others as that starts complicating code in the client in ways that may be impossible to resolve. Eg: it may not always be possible for a client to know that a resource is a metadata resource of another one. Perhaps in high-security environments, all files have base64 named files of the form https://domain.name/aGVsbG8gaG93IGFyZSB5b3U
- It is always possible to think of use cases. Why would an administrator not want to avoid that people editing the file, change the metadata, which could contain information about where the previous version was, who wrote it, who reviewed it, etc...? Nobody has to my knowledge clearly delimited what metadata is, so I can't see how one can exclude the use case that the ACL should be different.
I agree, especially with the "regularity" argument :-)
Hi @zoggy - thanks for the question!
Currently, ACLs for .meta resources are handled in the same way that other resources are -- they get their own .acl. For example, when you create a new account on solid-server, you get a .meta file in your root storage container, as well as a corresponding .meta.acl.
However, we're intending to make .meta resources to be special-case resources, so that only the server is able to write to them (not the user, directly). For example, we're planning to store the WebID of the author/creator of a resource in the .meta. (See issues https://github.com/solid/solid/issues/111 and https://github.com/solid/node-solid-server/issues/407). Once that gets spec'd and implemented, it will make sense to make the (read) ACLs of .meta resources to have the same values as the ACLs of their parent resources.
If the .meta can be written only by the server, where is the user supposed to store additional metadata associated to a resource in a way that the server will advertise them when the resource is requested ?
So I think having the .meta file writable by the server only is a bad idea. One can specify that the server should keep additional metadata, but not by preventing any other metadata to be added.
I see at least two possible solutions:
- when the
.metais requested, the server adds additional information to the graph which is sent to the client. This implies that content of.metais (syntactically) correct, so that operations (insertions) can be done on the graph before sending it to client. (This is what I implemented, responding with an error if the client does not sent a valid graph (currently tutrle of xmlrdf)); - have another "type" in
link:headers for the server to advertise about "the metadata kept by the server".
Regarding where to store this "metadata writable only by server", this could be left unspecified but rather left to implementation. Personnally, in my implementation and to keep things simple (i.e. not use a database...), I'm ready to sacrifice another extension to store this information next to the resource, like using ._meta or any extension unlikely to be used by clients (and the server would reply a 403 Forbidden if the clients tries to PUT on these resources).
If the .meta can be written only by the server, where is the user supposed to store additional metadata associated to a resource in a way that the server will advertise them when the resource is requested ?
So, this depends. We essentially have 3 different cases to handle.
- Non-RDF Resources (images, PDFs, other binary blobs). I believe the current practice (and spec, though this may need to be clarified) is that if you do an RDF PATCH to a non-rdf resource, it gets added to the
.meta. So, in that regard, you sort of get the best of both worlds - it's user editable in the sense of, the user PATCHes the resource directly, but it also server-controlled (in that the server redirects it to the.metainstead). - RDF Resources that are server "views", like Container listings. Again, the semantics here are the same as for Non-RDF Resources -- if you do a PATCH to add some metadata to a Container, the server actually adds it to the container's
.meta. (With the interesting addition of -- a container's.metagets transparently appended to the container listing, when a client requests the listing). So, the user never really interacts with the.metaresource directly -- they both read and write it via the container itself. - Regular RDF Resources. This is the case that has resulted in most debates. On the one hand, it's an RDF resource, you should just add metadata triples to the resource itself. (I believe the Hydra community advocates this.) On the other hand -- maybe there are cases where a user would want to add metadata to the RDF resource that is separate from the resource itself? I haven't encountered such use cases, but I can see how it'd be possible. However, keeping
.metauser-editable just to fit this one fairly exotic use case - I'm not sure that's worth it. If all else fails, app developers can do this in the application logic, no server support necessary.
Thanks for your explanations. Using PATCH to update metadata seems simple and elegant (though not the most intuitive way a first sight for non-rdf resources, but no problem).
Np! This is definitely an interesting topic, one that we need to think through and clarify.
We've got several design challenges:
Server-only metadata
We definitely need some facility for protected / server-only metadata. (Somewhere to store creator IDs, various timestamps, formats, etc). As you said, the storage mechanism for this doesn't need to be specified (can be left to implementors), but the mechanism for retrieval by clients does need to be specified.
Our rough options are:
- Specify a separate
Link:rel header for server-only metadata (a third one, in addition to the ACL link rel and the.metalink rel). The main downside to this is that this sort of metadata (ie who created the resource) is frequently needed by the client, but requires a separate HTTP call to retrieve it. So we're essentially forcing 2 GET calls for each resource. - Return this server-only metadata as part of the regular (user-editable)
.metaresource. And have the server be in charge of keeping the server-only statements separate from the user-added statements, but return both sets of statements on a GET to the.metaresource. This has the same downside as above (needing 2 separate HTTP calls for each resource), with the additional complexity of the server having to keep the two sets of metadata separate. - Return the server-only metadata in (specified) HTTP headers. I like this option the best, actually, since there isn't /that/ much server-only metadata, and it avoids a separate round-trip request.
User-editable metadata
We do still want users to be able to add metadata (especially to non-RDF resources, or to containers etc). Here, we probably need to spec both the write and read mechanisms. The current (informal) spec is that clients can edit .meta resources directly, plus there's the implicit editing of non-RDF and Container metadata via PATCH to the parent resource. And to retrieve that metadata, the client parses the 'describedby' link rel and fetches the .meta directly (though in the case of Containers, it's automatically appended to the container listing results, without a separate call).
This system mostly works fine, although I haven't seen anybody actually use it for anything. (We do keep a solid:account triple in the account's root .meta, actually. But that's mostly because we didn't know where else to put it, and needed it for account discovery.)
There have been several user requests for clients being able to specify custom HTTP headers to be returned with resources. So there might be some benefit to handling user-editable metadata in the same way as the third option above for server-only metadata -- specify some mapping of RDF triples to HTTP headers, and have servers return metadata with the resource itself, in the headers.
Metadata ACLs
However we decide to handle user-editable and server-only metadata, your original question about ACLs for it, still stands. Currently, we treat .meta resources the same as any other, so they can have their own ACLs. This is a benefit in terms of consistency and conceptual simplicity, but in practice, it adds additional burden of on the client (app developer) to manage yet another set of ACLs.
At very least, we should specify some default semantics such as "Unless specifically overridden by the client creating a dedicated R.meta.acl, the .meta's permissions are the same as the resource."
Other options include - a) leaving it unchanged, b) forcing the .meta permissions to be the same as the parent resource, regardless, or c) extending the acl:Control permission mode to also cover the .meta resource.
How to handle inline metadata
An additional complicating factor is - how do we handle metadata for Globbing / inline GETs?
If we decide to return server-only (or even user-editable) metadata in HTTP headers, should we include those headers in glob results?
Similarly, if we keep metadata in separate .meta resources, do we append them to glob results?
Thanks @dmitrizagidulin for this panoramic view on open questions. I'll keep them in mind to keep doors open while implementing.
Just a remark
This has the same downside as above (needing 2 separate HTTP calls for each resource), with the additional complexity of the server having to keep the two sets of metadata separate.
The complexity implied by the server having to keep two sets of metadata separate stands for the 3 options, doesn't it ? (except if the server could analyze PATCH/sparql update statements to preserve server-only metadata and update only user metadata, but this may become even more complicated...).
The complexity implied by the server having to keep two sets of metadata separate stands for the 3 options, doesn't it ?
I think I just meant that if the two types of metadata live in separate resources, there isn't as much custom logic on the part of the server. But yeah.
Regarding user metadata, here is a use case I'm encountering right now.
I'm developing a SOLID application to edit files (in markdown or XML format for example) stored in my workspace. When I compile such a file, I get an XHTML document and an RDF graph. The user gets a previews of the XHTML and when she thinks it's okay, it can PUT it on the server to an IRI of her choice.
I'd like to store the associated RDF graph in the user metadata, which seems the right place as it contains... metadata, like the IRI or the "source" file, the compiler used to get the final XHTML document, etc.
So here is my question: should the server delete the metadata associated to a document when this document is PUT on the server and replaces a previous one ? It not, I have to delete all previous metadata (but not the meta data the server created) before adding the new metadata.
This use case seems an argument for having separate metadata: server meta data and user metadata.
Will be handled in https://github.com/solid/specification/issues/63