RFC: batch locking API#6021
Conversation
3d89582 to
84a5812
Compare
|
Hey, thank you very much for this proposal! Without having delved into the specifics yet, introducing an optional batch locking API seems like an entirely reasonable and desirable goal to me. We've accumulated a small queue of PRs to review and so it might take me a while to circle around to this one and give it a closer look, but I will absolutely do so as soon as possible, and I apologize in advance for the delay. Thank you so much for drafting such a detailed proposal; just from a first read it looks like you've covered many of the potential edge cases. One aspect of the design which caught my eye is that, if I understand correctly, server implementations should process the lock changes for all the paths listed in a request in a single transaction. That would allow them to return a consistent error message if the requested changes can not be made for any one of the given paths. Does that accord with your conception of how the API might be implemented on the server side? Thank you again very much for this idea, and for taking the time to write it up so carefully. We really appreciate these types of contributions from our development community! |
Yes, it is all-or-nothing. |
chrisd8088
left a comment
There was a problem hiding this comment.
Hey, thank you so much for this great proposal!
I've made a few notes and suggestions, mostly trying to think through how best to handle things like batch queries for multiple lock IDs and reporting multiple errors from batch lock requests.
One issue I didn't touch on in a specific review comment is that we now have a "pure" SSH-only Git LFS transfer protocol that includes a set of lock, list-lock, and unlock commands. These are sent to the Git LFS server over an SSH connection, and replicate the options provided by the legacy HTTP-based Git LFS File Locking API.
Once we have settled on the design of a new Batch File Locking API using HTTP, we'll need to mirror those changes into the SSH version of the Git LFS transfer protocol. Among other things, the sshLockClient adapter in the Git LFS client will then need to implement the same batch operations as the httpLockClient adapter.
Thank you for your patience in waiting for a first response from us! It's very exciting to have a new proposal like this underway, one that will improve performance and offer real benefits to our users.
| * [#2978: File locks are extremely slow and basically unusable](https://github.com/git-lfs/git-lfs/issues/2978) | ||
| * [#3066: Feature: Add Batch API for locking and unlocking](https://github.com/git-lfs/git-lfs/issues/3066) | ||
|
|
||
| This proposal introduces a batch API for locking and unlocking with Subversion-compatible semantics. |
There was a problem hiding this comment.
Although it's mentioned below in a couple of places that the server should return a 404 if it doesn't support this batch locking API, we should probably include an explicit note here in the introductory section which explains that this is an optional extension to the Git LFS File Locking API, and that servers are not required to implement this extension.
I'd also suggest including a note that if a server does implement this API extension, they should take care to never generate a lock ID which happens to be the word batch, since then a regular unlock request for that lock ID would have a URI path ending with /locks/batch/unlock, which would conflict with the URI path used by this API extension.
Finally, you mention "Subversion-compatible semantics", which is very interesting and took me down memory road thinking about WebDAV and so forth! However, for the sake of others reading this API documentation, if we're going to include this comment, we should expand on it with specific references.
Are you thinking about Subversion's custom v2 protocol and its lock-many command and unlock-many command? Or the multi-resource LOCK request defined by WebDAV? Or both, or something else?
There was a problem hiding this comment.
they should take care to never generate a lock ID which happens to be the word batch
This is so weird... :D Maybe we want to move batch API out of /locks/ URL to avoid this?
There was a problem hiding this comment.
Are you thinking about Subversion's custom v2 protocol and its lock-many command and unlock-many command?
Yes, these.
That's what I figured! Would you mind adding a bit of text here just to clarify that connection?
This is so weird... :D Maybe we want to move batch API out of
/locks/URL to avoid this?
Yes, maybe, although it's also the obvious URL path for a Batch File Locking API.
Personally I don't think it's a major issue, as servers are really unlikely to be using the word batch as a lock identifier! I'd just make a note in this proposal that if a server wants to implement this API, they have to avoid using batch as a lock ID, and leave it at that. It's not ideal, but I'm not too fond of a URL path like /locks-batch or whatever. If you have a good idea for another URL though, which avoids this conflict, let us know!
There was a problem hiding this comment.
Added references to SVN commands.
There was a problem hiding this comment.
Added references to SVN commands.
Thank you very much!
If you don't mind also adding a note that servers should refrain from using batch as a lock ID, if they also implement this API extension, I think that will close out this thread.
There was a problem hiding this comment.
I'm still thinking about introducing a separate non-conflicting endpoint, to be honest...
| * `operation` - should be `unlock` | ||
| * `ref` - Optional object describing the server ref that the locks belong to. | ||
| * `name` - Fully-qualified server refspec. | ||
| * `locks` - Array of lock IDs that should be unlocked. |
There was a problem hiding this comment.
How should the client acquire these lock IDs?
At the moment, if you run a command like git lfs unlock a.bin b.bin c.bin, it iterates over the list of file path arguments and calls the UnlockFile() method of our locking package's Client structure for each one. That method uses the lockIdFromPath() method to retrieve the ID of the lock from the server, and then assuming an ID of an active lock is found, calls UnlockFileById() to ask the server to remove the lock. So we make two API requests for every file listed on the command line.
The goal of this batch locking API is, of course, to improve performance, so I would imagine we'd want to have a way to query for the lock IDs of multiple file paths at once.
The "List Locks" endpoint of our existing File Locking API is rather under-specified, but most server implementations are only going to accept a single path query argument, I believe. At least some implementations will match this against leading portions of a lock's path, and so will return multiple locks in the reply in that case. For instance, if you send path=/dir/ to GitHub's version of the API, it will return IDs for file paths such as /dir/a.bin, /dir/b.bin, etc.
However, that's not going to be sufficient in the generic use case envisioned by this proposed new API, since it won't always be true that all of the file paths the user wants to unlock at once will happen to have the same path prefix.
Since we don't want to slow down the batch unlock operation by having to make individual API requests for each lock, before issuing the final unlock request, we probably need define a way to query for the IDs of multiple paths at once.
Perhaps this could maybe be as simple as stating that the existing GET .../locks endpoint of the File Locking API may, optionally, accept multiple path (or id) query arguments, and return all the matching locks in its response.
Likewise, the SSH version of the API could optionally accept multiple path (or id) arguments to its list-lock command.
There was a problem hiding this comment.
How should the client acquire these lock IDs?
I'd prefer this to be out of scope of current PR.
Client could remember what locks it created earlier. It could GET .../locks without specifying any paths to get all locks. It could group paths to reduce number of GET .../locks requests.
There was a problem hiding this comment.
I take your point that it would easier if this was out of scope for now!
However, I feel as though we at least need to explain how this should work, even if it is an optional part of the specification, and I don't think it should be too difficult if we just state that servers would be free to accept multiple path or id query arguments and return all matching locks in reply.
There was a problem hiding this comment.
I am afraid we cannot just add multiple path to existing /locks endpoint. Because servers that do not handle it will only reply with a single lock (possibly, either the first one or the last one, depending on how it is implemented). And in this situation, client will be unable to distinguish why server only responsed with one lock - because other paths are not locked or because server does not support multiple path in query.
There was a problem hiding this comment.
Because servers that do not handle it will only reply with a single lock (possibly, either the first one or the last one, depending on how it is implemented).
That's a good point. Well, I suppose we may end up something like /locks/batch?id=<id1>&id=<id2>&... and/or /locks/batch?path=<path1>&path=<path2>&... in this API extension.
However, I know you want to try developing a prototype implementation without a multiple-lock query endpoint, so I'm happy to merge this PR without something fully specified, but I'd like to leave this conversation thread open as a topic we may need to revisit before the proposal reaches its final version. I'll be interested to hear what the performance is like using the existing single-lock query endpoint in your intended use cases!
There was a problem hiding this comment.
/locks/batch?path=&path=&...
That could work! Let me think about it for a couple of days.
| // HTTP/1.1 409 Conflict | ||
| // Content-Type: application/vnd.git-lfs+json | ||
| { | ||
| "lock": { | ||
| // details of existing lock | ||
| }, | ||
| "message": "already created lock", | ||
| "documentation_url": "https://lfs-server.com/docs/errors", | ||
| "request_id": "123" | ||
| } |
There was a problem hiding this comment.
This example raises the question of how the server, and client, should handle the failure to lock some of the resources from a batch lock request. (The same question applies to unlock operations too, of course.)
Suppose the client sends a list of 100 paths to be locked, and several of them are already locked. If the server just sends back the details of the one of locked resource, the client (or user) might then remove that from their request and try again, at which point they'll get another error for the second of the locked resources.
For simplicity, let's assume the client doesn't automatically retry the lock request after removing the already-locked resource from its list, but just reports the problem to the user. The user could then run git lfs locks, perhaps with the --verify option, and try to figure out which paths to remove from their next request.
That's going to be a somewhat manual process, though. The client could try to assist the user in this case by automatically performing a "list locks" operation, determining from that which locks were already held from the list the user requested, and reporting those to the user. Still, that would only be a guess on the part of the client, since locks might have changed on the server between its two requests, and this also wouldn't cover cases where a lock was refused for some other reason (e.g., an invalid path).
I'm inclined to think this is a situation where we might want to adopt one of the design elements of the WebDAV specification, specifically the 207 Multi-Status status code, but with a modification to the format of the response body to use JSON rather than XML. RFC 4918 defines the 207 status code as follows:
A Multi-Status response conveys information about multiple resources in situations where multiple status codes might be appropriate. The default Multi-Status response body is a text/xml or application/xml HTTP entity with a 'multistatus' root element. Further elements contain 200, 300, 400, and 500 series status codes generated during the method invocation. 100 series status codes SHOULD NOT be recorded in a 'response' XML element. Although '207' is used as the overall response status code, the recipient needs to consult the contents of the multistatus response body for further information about the success or failure of the method execution. The response MAY be used in success, partial success and also in failure situations.
This is written in the context of the WebDAV specification and therefore assumes the use of XML in the response body. However, it only states that the "default Multi-Status response body" is XML, not that it must be XML. If our batch locking API specifies that the response body should be JSON and have our usual application/vnd.git-lfs+json content type, I think that should be acceptable.
I feel relatively certain that proxies and firewalls out in the wild are not going to tamper with an HTTP response just because it has an uncommon 2xx status code (whereas they might balk at a request with an uncommon HTTP method like LOCK), so I think we'd be pretty safe using 207 status codes in some responses.
We would then want to define the JSON response format so it could return the status for the specific locks which could not be acquired due to errors. Perhaps something like:
{
"multi-status": [
{
"path": "foo/bar.zip",
"status": 409,
"lock": {
// details of existing lock, i.e., "id" and "locked_at" elements
},
"message": "already created lock"
},
{
"path": "//////",
"status": 400,
"message": "invalid lock path"
}
],
"message": "no locks created due to 2 errors",
"documentation_url": "https://lfs-server.com/docs/errors",
"request_id": "123"
}Following another principle from the WebDAV specification, I'd suggest that the server does not need to include the status for all the paths that would have been locked successfully if some others paths hadn't encountered errors. This is what the WebDAV specification calls "error minimization", meaning only the errors need to be reported, and the client can deduce that any paths which aren't listed in the response would have otherwise been locked successfully.
As I noted above, this use of the 207 status code and JSON format would also apply to unlock requests where some of the locks could not be removed. The per-lock response elements and status codes would allow the server to distinguish between locks that don't exist (404s) and locks the user lacks permission to remove (403s).
One caveat to this idea is that the use of a 2xx response to mean "we made no changes due to some errors" is a little unconventional, but as the specification says about the 207 status code, it "MAY be used in success, partial success and also in failure situations", so it explicitly allows the use of a 207 response in a failure situation.
That all said, our HTTP-based Batch API for Git LFS object transfers just specifies the use of a 200 OK response under similar conditions, with JSON that includes per-object code elements for the status code relevant to that object.
So it might be more in keeping with our existing APIs to return a 200 response, even when no locks are created (or removed, in the case of unlock requests which encounter errors).
There was a problem hiding this comment.
So it might be more in keeping with our existing APIs to return a 200 response, even when no locks are created (or removed, in the case of unlock requests which encounter errors).
I think this is the right option, all things considered, because it aligns with how we're already handling multi-status responses in the Batch API.
I'm reminded that a key aspect of this proposal is that if even one lock can't be acquired (or deleted), the entire request should be denied, and no locks changed on the server.
Given that, multi-status responses are still important, because we want to document the locks which couldn't be acquired or released, and why, so I still think the JSON should include per-object status codes. To keep in sync with how these are reported in the regular object transfer Batch API, let's use code: as the JSON element names. So something like:
{
"locks": [
{
"path": "foo/bar.zip",
"error": {
"code": 409,
"message": "already created lock",
"lock": {
// details of existing lock, i.e., "id" and "locked_at" elements
}
}
},
{
"path": "//////",
"error": {
"code": 400,
"message": "invalid lock path"
}
}
],
"message": "no locks created due to 2 errors",
"documentation_url": "https://lfs-server.com/docs/errors",
"request_id": "123"
}But as for what the overall HTTP status code should be in the server response, since the entire request is being denied, returning a 200 like the object transfer Batch API does seems, on further reflection, just wrong. I guess 409 Conflict is about the best option here, so let's leave that as-is.
My apologies for making this more complicated than necessary! :-)
There was a problem hiding this comment.
I believe this thread was addressed. And I am explicitly +1 on responding with 409 Conflict (or whatever 4xx code) instead of 200 if operation failed.
There was a problem hiding this comment.
I am explicitly +1 on responding with
409 Conflict
Agreed!
I believe this thread was addressed.
The Bad Response: Some of the locks could not be deleted section looks great!
Would you mind updating also updating this Bad Response: Lock Exists section to match that as well? Maybe something like:
* `locks` - List of descriptions of each lock that prevented operation from completing successfully.
* `id` - String ID of the Lock.
* `error`
* `code` - HTTP error code for this lock.
Should be the same as user would get if tried to acquire this lock individually.
* `message` - String error message.
* `lock` - Optional details of existing lock.
* `message` - String error message.
* `request_id` - Optional String unique identifier for the request.
Useful for debugging.
* `documentation_url` - Optional String to give the user a place to report errors.
```
// HTTP/1.1 409 Conflict
// Content-Type: application/vnd.git-lfs+json
{
"locks": [
{
"id": "some-uuid",
"error": {
"code": 409,
"message": "lock already exists",
"lock": {
// optional details of existing lock, i.e., "id" and "locked_at" elements
}
}
},
{
"id": "//some-another-uuid//",
"error": {
"code": 400,
"message": "invalid lock path"
}
}
],
"message": "some locks could not be acquired",
"documentation_url": "https://lfs-server.com/docs/errors",
"request_id": "123"
}
```
That may seem redundant, but I think it allows for servers to return non-409 code values if they need to express a error other than "this lock already exists" which is specific to a single lock.
32e4027 to
89645b2
Compare
|
Thanks for all the comments and feedback! I will aim to take another look at this proposal draft over the weekend, unless something unexpected comes up. |
958779b to
3868eb5
Compare
3868eb5 to
bc6c368
Compare
|
Whoops, I somehow forgot about this. |
See #3066
I don't want to create an implementation yet, first would like to discuss core API structure.