Add an additional_info_url fields to ExecuteResponse. by EdSchouten · Pull Request #36 · bazelbuild/remote-apis

EdSchouten · 2018-11-05T15:54:47Z

As mentioned in bazelbuild/bazel#6591, Bazel
Buildbarn now ships with a web service that can be used to explore data
stored in the CAS and AC.

By adding an extra field to ExecuteResponse, we can let Bazel
automatically report URLs to the user pointing to the correct page.
These URLs could, for example, be printed for build steps that fail.

buchgr · 2018-11-06T08:50:29Z

build/bazel/remote/execution/v2/remote_execution.proto

+
+  // Similar to `server_logs`, an optional list of URLs the user may visit to
+  // obtain more information about the execution of the action.
+  repeated string additional_info_url = 5;


Why repeated? Do you have a use case for this already?

I can only think this is due to logs from the worker side and logs from the execution side?

That is a very good question. I have to say I didn't give this a lot of thought. "Why not?" must have been my reasoning.

Right now it's just the Bazel Buildbarn Browser to which we link. That said, there is a fair chance the scheduler processes will at some point gain a status page with info on workers, etc. Maybe there is a use for emitting links to that as well?

That's all speculating. If there's a preference for a non-repeated field (or the other extreme: a string keyed map, just like the server logs), I can change this. :-)

The more generic version of the additional_info_url field would be a

string message = 5;

with the expectation being that a server can put an arbitrary message that a client is expected to display to the user. I do prefer the additional_info_url field because it's simple, no surprises and allows a client to have more control over how it presents the information to the user.

Right now it's just the Bazel Buildbarn Browser to which we link. That said, there is a fair chance the scheduler processes will at some point gain a status page with info on workers, etc. Maybe there is a use for emitting links to that as well?

Wouldn't this require either a key or a description for each URL to make this usable? A list of URLs without any metadata doesn't sound user-friendly.

Yeah, exactly. Let's switch this to a map then.

Right now it's just the Bazel Buildbarn Browser to which we link. That said, there is a fair chance the scheduler processes will at some point gain a status page with info on workers, etc. Maybe there is a use for emitting links to that as well?

In that case couldn't you still only post a single link that takes the user to a web interface and link to other things from there? One link seems simple and powerful enough to fulfill all of these use cases.

finnball · 2018-11-06T09:08:45Z

Having read the original work this is based on, this is a nice proposal and I really like the website style that produces easily readable information for users.

Though I'm unsure if support for URLs with additional information really fits into the Remote Execution API here.

Perhaps we could expand the current ExecutionMetadata here. Potentially, as you pointed out, including come kind of map. I already find just including the stdout_stream_name to be limiting in that it doesn't provide the ByteStream URL.

~~The only problem with my suggestion is through duplication of Actions. The API specifies that this be: // The details of the execution that originally produced this result.~~ That's the ExecutedActionMetadata. I knew I was going to make this mistake at some point!

I still feel like this is more of a metadata result and very much optional which is what makes me feel uncomfortable having it written in the current form it is.

buchgr · 2018-11-06T09:29:07Z

I'm unsure if support for URLs with additional information really fits into the Remote Execution API. It may be an idea to have this as some kind of Any field whereby you can pack in your own implementation specific protos.

Do you have specific concerns? I don't like Any in that it hurts cross compatibility between clients and services - it's poking a hole into a well specified protocol.

finnball · 2018-11-06T09:33:45Z

I'm unsure if support for URLs with additional information really fits into the Remote Execution API. It may be an idea to have this as some kind of Any field whereby you can pack in your own implementation specific protos.

Do you have specific concerns? I don't like Any in that it hurts cross compatibility between clients and services - it's poking a hole into a well specified protocol.

Apologies, I amended my answer as you posted after realising it wasn't a sensible suggestion. For me, this is more of a metadata result than something essential to the ExecutionResponse.

buchgr · 2018-11-06T09:54:45Z

My rational for including it in the ExecuteRequest would be that this URL fulfills the same purpose as server_logs, except that a URL can point to a rich (web) interface that can do a better job displaying information and potentially offer additional functionality. I think both server_logs and additional_info_url do make sense and can be useful. I think the human_readable field in LogFile will turn out to have been a bad idea, as it seems not realistic for a client to display arbitrary, server defined log files in it's UI.

I would argue against ExecuteOperationMetadata as it's not clear yet whether this whole message will turn out to be useful. It seems a bit complicated. We don't use it in Bazel and have no intention of doing so.

finnball · 2018-11-06T10:18:13Z

My rational for including it in the ExecuteRequest would be that this URL fulfills the same purpose as server_logs, except that a URL can point to a rich (web) interface that can do a better job displaying information and potentially offer additional functionality. I think both server_logs and additional_info_url do make sense and can be useful. I think the human_readable field in LogFile will turn out to have been a bad idea, as it seems not realistic for a client to display arbitrary, server defined log files in it's UI.

s/ExecuteRequest/ExecuteResponse :)

I have no problem with including this extra information and I rather like the idea. I just question if we should be shoving this field into the ExecuteResponse as it currently stands.

So if this is a type of log as you say, would it be better to refactor the message LogFile than adding another field whilst keeping a field that as you suggest, could be improved?

I would argue against ExecuteOperationMetadata as it's not clear yet whether this whole message will turn out to be useful. It seems a bit complicated. We don't use it in Bazel and have no intention of doing so.

I find this field useful for builds that take a long time. I like how it allows the user to optionally stream logs back to the client and lets them know the status of the build. There are certain limitations about it which annoy me and I do question if we should be using the long-running-api or a modified REAPI version instead though that's for a separate discussion.

I've also made a mistake in my above comment, when I spoke about // The details of the execution that originally produced this result. . This is for the ExecutedActionMetadata and not ExecuteOperationMetadata. So we could actually move this field into the metadata if we wanted.

I would therefore argue that this field somehow be included as an improved logging message or in a metadata field.

EdSchouten · 2018-11-06T14:17:31Z

Moving it into the ExecutedActionMetadata would be a bit odd in Buildbarn's use case specifically. ExecutedActionMetadata is stored within ActionResult. The page that I've made for Buildbarn is used to inspect ActionResult objects specifically. Upon successful builds, ActionResults are normally stored in the AC. Because we care about the failures in particular, we store them in a special message in the CAS to not interfere:

https://github.com/EdSchouten/bazel-buildbarn/blob/master/pkg/proto/failure/failure.proto

The URL that we return to the user is not much more than a digest of this message stored in the CAS. This is where it becomes problematic: by basing the URL on the digest of this message and storing that URL within the message, we introduce a cycle. This cycle could be avoided by only adding the URL to the ActionResult to the message returned to Bazel and omitting it from the ActionResult that is stored, but that feels sloppy.

Another thing that is worrisome about storing this underneath ActionResult is that it now also ends up in the AC. This means that an administrator of a build cluster has to purge the entire AC in case the URLs change (e.g., switching hostnames, upgrading to a newer version of Buildbarn Browser that has a different URL scheme).

finnball · 2018-11-06T14:20:46Z

Moving it into the ExecutedActionMetadata would be a bit odd in Buildbarn's use case specifically. ExecutedActionMetadata is stored within ActionResult. The page that I've made for Buildbarn is used to inspect ActionResult objects specifically. Upon success, ActionResults are normally stored in the AC. Because we care about the failures in particular, we store them in a special message in the CAS:

https://github.com/EdSchouten/bazel-buildbarn/blob/master/pkg/proto/failure/failure.proto

The URL that we return to the user is not much more than a digest of this message stored in the CAS. This is where it becomes problematic: by basing the URL on the digest of this message and storing that URL within the message, we introduce a cycle. This cycle could be avoided by only adding the URL to the ActionResult to the message returned to Bazel and omitting it from the ActionResult that is stored, but that feels sloppy.

Another thing that is worrisome about storing this underneath ActionResult is that it now also ends up in the AC. This means that an administrator of a build cluster has to purge the entire AC in case the URLs change (e.g., switching hostnames, upgrading to a newer version of Buildbarn that has a different URL scheme).

Agreed, I don't think this should be in the ExecutedActionMetadata.

To be clear, I was suggesting we store these keys in ExecuteOperationMetadata, not ExecutedActionMetadata. Or even some other metadata field yet to be defined.

EdSchouten · 2018-11-06T14:47:41Z

Ah, gotcha. So it's either ExecuteOperationMetadata or ExecuteResponse. I think deciding between these two should be done by looking at the behaviour of Bazel.

For a single ExecuteRequest, Bazel fires off one or more operations. Operations may fail or disappear for many odd reasons (e.g., workers dying). This causes Bazel to create brand new operations for the same ExecuteRequest until it ends up receiving an ExecuteResponse. For each of these operations, ExecuteOperationMetadata is permitted to contain data that is different (e.g., different std{out,err}_stream_names).

This means that if these URLs are stored in the ExecuteOperationMetadata, Bazel might end up with a pile of different URLs to display for a single build step. Do we want Bazel to potentially print multiple similar URLs for every build step? If so, the URLs should go in ExecuteOperationMetadata. If not, they should likely go in ExecuteResponse.

For Buildbarn Browser, the latter is sufficient.

finnball · 2018-11-06T15:09:39Z

Ah, gotcha. So it's either ExecuteOperationMetadata or ExecuteResponse. I think deciding between these two should be done by looking at the behaviour of Bazel.

Bazel isn't the only client here. A better approach might be to look at what the API specifies. Currently when sending an ExecuteRequest, you receive a longrunning.Operation as shown here with a metadata field described here. If this needs changing then we should start a discussion.

For a single ExecuteRequest, Bazel fires off one or more operations. Operations may fail or disappear for many odd reasons (e.g., workers dying). This causes Bazel to create brand new operations for the same ExecuteRequest until it ends up receiving an ExecuteResponse. For each of these operations, ExecuteOperationMetadata is permitted to contain data that is different (e.g., different std{out,err}_stream_names).

In this rare scenario, if a worker dies, could the server not retry the job internally, keeping the same operation_id as before? Or could the client just retry the job and use the new operation id it is given?

This means that if these URLs are stored in the ExecuteOperationMetadata, Bazel might end up with a pile of different URLs to display for a single build step. Do we want Bazel to potentially print multiple similar URLs for every build step? If so, the URLs should go in ExecuteOperationMetadata. If not, they should likely go in ExecuteResponse.

For Buildbarn Browser, the latter is sufficient.

For jobs that take a long time, I wouldn't mind clicking on a URL to see what was happening. This URL could be retrieved from the metadata streamed from the first request you make. If you want to retrieve it from the last, you could still retrieve it from the metadata you receive from the final response you get since the ExecuteResponse is supposed to be in the response field of the long-running-api.

It is a little subjective, though I feel that this data is metadata and should be made available from the first call and not just the last.

bergsieker · 2018-11-08T20:22:11Z

Coming late to this very active party. :) I think that Finn has things roughly correct: the ExecuteResponse is intended to capture "end state" metadata for completed Actions, while the ExecuteOperationMetadata is intended to capture "status update" metadata for ongoing Actions. I think there are valid use cases for each type of metadata (and some data belongs in multiple places, e.g., stdout/stderr is provided as a stream handle in ExecuteOperationMetadata, but also as a digest in ActionResult). So, perhaps we should consider designating a place for this type of metadata in both ExecuteOperationMetadata and ExecuteResponse.

Note that Bazel doesn't currently use ExecuteOperationMetadata, and that for Bazel, which is generally used to dispatch 100s or 1000s or actions, printing metadata for each ongoing action probably isn't the right thing to do. Other clients may have different usage patterns, though, so printing per-action metadata for ongoing actions may make more sense.

In my opinion, the bigger question is how to capture and represent this data. I'd suggest that server_logs may be the right place after all, but that LogFile may need to be extended to support richer processing. For example, maybe it should have either a digest or embedded text. I think it should also specify some degree of handling preferences--something like "always print," "print on failure," or "never print"--as a hint to the client.

buchgr · 2018-11-12T09:04:40Z

Note that Bazel doesn't currently use ExecuteOperationMetadata, and that for Bazel, which is generally used to dispatch 100s or 1000s or actions, printing metadata for each ongoing action probably isn't the right thing to do. Other clients may have different usage patterns, though, so printing per-action metadata for ongoing actions may make more sense.

YAGNI. If there is no current use case I'd argue we should revisit this once there is.

I'd suggest that server_logs may be the right place after all, but that LogFile may need to be extended to support richer processing. For example, maybe it should have either a digest or embedded text.

Works for me.

I think it should also specify some degree of handling preferences--something like "always print," "print on failure," or "never print"--as a hint to the client.

YAGNI.

finnball · 2018-11-12T10:51:50Z

Thankyou for you responses :)

I'd suggest that server_logs may be the right place after all, but that LogFile may need to be extended to support richer processing. For example, maybe it should have either a digest or embedded text. I think it should also specify some degree of handling preferences--something like "always print," "print on failure," or "never print"--as a hint to the client.

Adding or amending to the LogFile map feels to me like a bandaid over a field which isn't that useful in its current form anyway. We're now going to be in a situation where we have logs contained in the result and logs in the metadata.

for this type of metadata in both ExecuteOperationMetadata and ExecuteResponse.

I still feel that we are better off having a single metadata field instead of repeating ourselves. Bazel surely still uses the long-running-api to some extent in that it needs to look into the operation.result field for the ExecuteResponse as stated here.

If you take a look at what is returned from an ExecuteRequest, this format to me mostly seems sensible. It includes the result, the done flag and importantly a metadata field. Surely Bazel could just check that done flag, if true, then scrape any optional metadata from the metadata field? This allows us for example, to pass any meaningful links to the user from the first request, allowing the client to dictate if it makes this visible from the start or not, instead of allowing specific implementations to dictate the API.

buchgr · 2018-11-13T16:59:00Z

@finnball are you suggesting of moving the server_logs field to the ExecuteOperationMetadata? I agree that this information should be contained in either one or the other but not both.

finnball · 2018-11-14T11:45:52Z

Unless there's a good reason not to, I think it's better to move the mapping discussed for this MR to the ExecuteOpertationMetadata as it gives us more flexibility for when the client chooses to display this data to the user.

I'm also suggesting in the future and in a separate MR, we could look at removing the LogField and replace it with something more generic in the ExecutionoperationMetadata that would also encompass the stdout_stream_name, as it seems a little odd that we have two potential places for logging :)

ola-rozenfeld · 2018-11-14T23:14:59Z

Sorry for being even later to the party!

Agree with Steven on the difference between the metadata and the response -- ExecuteOperationMetadata describes an in-progress operation, with fields such as stream handles for the outputs (and maybe logs, if we need them), while ExecuteResponse describes the completed action, with all the outputs already digested, and is stored persistently.

I see a use-case for both here -- if the URL can be used to view the progress of an already executing action, or to ssh to the worker, or things like that -- these items belong in ExecuteOperationMetadata. But if the URL is used for debugging a failed action, then it belongs in the ExecuteResponse. Re: loop -- yes, the response digest cannot be part of the key in this case, it would have to be action-id+timestamp or some such.

Btw, disagree with Jakob on ExecuteOperationMetadata -- it is the only way to get streamed logs, and we will definitely need to use it in Bazel if we want --test_output=streamed to work as intended with RE.

As to the field names: I am ambivalent about assuming that a server will always communicate in urls, I prefer the message field which clients can display to the user, it's just more generic. But I'm not opposed to a bunch of urls either.

buchgr · 2018-11-15T09:27:08Z

In order to make progress on solving this particular use case of displaying a url in case of a failed action. Would anyone strongly disagree with adding a simple string message field to ExecuteResponse whose intend is to be displayed to and read by a human?

For the other points raised here, let's talk about them in separate issues? I am happy to create the issues and start the discussion.

EdSchouten · 2018-11-15T09:29:50Z

Sounds good to me!

This field may be used by a remote builder to inject additional lines of text into the output of the build client. Buildbarn may use this to print links to Buildbarn Browser, so that users can inspect build failures in more detail.

bergsieker · 2018-11-15T14:57:45Z

SGTM

ola-rozenfeld · 2018-11-15T16:12:45Z

SGTM

buchgr · 2018-11-16T14:22:26Z

paging @finnball :-)

finnball · 2018-11-19T16:07:58Z

Yup, fine by me. Thankyou.

Noteworthy changes include: * From remote_execution.proto: - New message field in ExecuteResponse [1]. - Output file and dir. parent folders clarifications [2]. - Better doc. for do_not_cache/skip_cache_lookup [3]. - Documention updates. * From semver.proto: - Documentation updates. [1] bazelbuild/remote-apis#36 [2] bazelbuild/remote-apis#42 [3] bazelbuild/remote-apis#55

googlebot added the cla: yes Pull requests whose authors are covered by a CLA with Google. label Nov 5, 2018

EdSchouten mentioned this pull request Nov 5, 2018

Add an option for printing selected server logs on failure. bazelbuild/bazel#6591

Closed

buchgr reviewed Nov 6, 2018

View reviewed changes

buchgr requested review from aiuto and ola-rozenfeld November 6, 2018 08:50

buchgr removed the request for review from aiuto November 6, 2018 09:26

Add a message field to ExecuteResponse.

61c3b3f

This field may be used by a remote builder to inject additional lines of text into the output of the build client. Buildbarn may use this to print links to Buildbarn Browser, so that users can inspect build failures in more detail.

EdSchouten force-pushed the additional-info-urls branch from bef1c02 to 61c3b3f Compare November 15, 2018 09:43

buchgr approved these changes Nov 22, 2018

View reviewed changes

buchgr merged commit ed48498 into bazelbuild:master Nov 22, 2018

Conversation

EdSchouten commented Nov 5, 2018

Uh oh!

buchgr Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

finnball Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

EdSchouten Nov 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

buchgr Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

juergbi Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

EdSchouten Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

buchgr Nov 6, 2018

Choose a reason for hiding this comment

Uh oh!

finnball commented Nov 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buchgr commented Nov 6, 2018

Uh oh!

finnball commented Nov 6, 2018

Uh oh!

buchgr commented Nov 6, 2018

Uh oh!

finnball commented Nov 6, 2018

Uh oh!

EdSchouten commented Nov 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

finnball commented Nov 6, 2018

Uh oh!

EdSchouten commented Nov 6, 2018

Uh oh!

finnball commented Nov 6, 2018

Uh oh!

bergsieker commented Nov 8, 2018

Uh oh!

buchgr commented Nov 12, 2018

Uh oh!

finnball commented Nov 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buchgr commented Nov 13, 2018

Uh oh!

finnball commented Nov 14, 2018

Uh oh!

ola-rozenfeld commented Nov 14, 2018

Uh oh!

buchgr commented Nov 15, 2018

Uh oh!

EdSchouten commented Nov 15, 2018

Uh oh!

bergsieker commented Nov 15, 2018

Uh oh!

ola-rozenfeld commented Nov 15, 2018

Uh oh!

buchgr commented Nov 16, 2018

Uh oh!

finnball commented Nov 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

EdSchouten Nov 6, 2018 •

edited

Loading

finnball commented Nov 6, 2018 •

edited

Loading

EdSchouten commented Nov 6, 2018 •

edited

Loading

finnball commented Nov 12, 2018 •

edited

Loading