Add query watches api to retrieve multiple watches.#64582
Add query watches api to retrieve multiple watches.#64582martijnvg merged 18 commits intoelastic:masterfrom
Conversation
This api supports pagination (from / size) and querying and sorting by watch _id and watcher metadata. This avoids using .watch index directly. Relates elastic#62501
|
Pinging @elastic/es-core-features (:Core/Features/Watcher) |
|
@martijnvg I think this API will work for the Watcher UI, but I wanted to do a bit of a brain dump, both to keep a record of my thoughts and also to get your eyes on my thought process in case I'm missing or misunderstanding anything. Scrolling and paginationCurrently, Watcher fetches watches by specifying My question is: can we replace this logic with something similar using the export const fetchAllWatches = async (
dataClient: ILegacyScopedClusterClient,
from: number,
size: number,
accumulatedHits: any[] = []
): Promise<any> => {
const response = await dataClient.callAsCurrentUser('watcher.listWatches', {
body: {
from,
size,
},
ignore: [404],
});
const hits = get(response, 'hits.hits', []);
accumulatedHits.push(...hits);
// If we asked for a number of results equal to `size`, and got back that amount, then there
// may be more to retrieve.
if (hits.length === size) {
return fetchAllWatches(dataClient, from + size, size, accumulatedHits);
}
return accumulatedHits;
};
// Begin series of paginated requests.
fetchAllWatches(dataClient, 0, 100, []);I'm basing this off the search pagination docs. This is generally how pagination works with the Ignoring 404sThe original |
|
@cjcenizal This api is a wrapper around the _search api. Supporting scroll would mean we would also need to write a wrapper around the search scroll api. This felt a bit too much to me, sense a scroll search tends to be used when paging through large amounts of documents and this doesn't apply to watches. I think we can expose search after in the list watcher api, which allows to page through the watches in a convenient and efficient way like scroll, but this can then all be done via the list watches api. Would that make things easier on the UI side?
I don't that particular option is part of ES. In ES there is |
|
Thanks for the context @martijnvg. When we talk about scroll being suited for "large numbers of documents", what's the rough order of magnitude we have in mind? Tens of thousands? Millions? The PR description mentions that this API supports In terms of Also, I just wanted to note that we're using |
I think in most cases there will not be more than a few thousands watches. So I think regular pagination should be fine.
Search after is like scroll search in the sense that it is optimised for deep pagination. With the sort values of the last hit, you can paginate through all the hits like is possible with from / size pagination. The main difference with scroll and search after is that scroll search takes into account the latest updates to an index and therefor the sort order may change during pagination, whereas with scroll search the view of the index remain the same during the lifetime of a scroll id. However search after can be used in combination with the point in time api to preserve the state of an index while searches happen to paginate though the results.
I see. I think just returning an empty response is ok when there are no watches or there is no .watches index. |
|
Thanks Martijn! This is great info.
Did you mean to say "search after takes into account the latest updates to an index"?
Agreed! This is much more ergonomic for us as consumers. The absence of watches or a |
Whoops, yes, this is what I meant. |
|
@elasticmachine run elasticsearch-ci/docs |
jakelandis
left a comment
There was a problem hiding this comment.
The _id field is only a keyword, and the metadata.* are optional (and probably not used too heavily). I wonder if exposing query makes sense for this API since those are the two fields that you can query ?
I think exposing the query syntax here might be more confusing then the value it provides. Would it make sense to have this API ONLY return the watch IDs, so that a consumer would need go back to the /_watch/id endpoint ?
| @@ -0,0 +1,25 @@ | |||
| { | |||
| "watcher.list_watches":{ | |||
There was a problem hiding this comment.
need to cc client's team for awareness
| watcherMetadata = (Map<?, ?>) response.getWatches().get(1).getSource().getAsMap().get("metadata"); | ||
| assertThat(watcherMetadata.get("key2"), equalTo(5)); | ||
|
|
||
| request = new ListWatchesAction.Request(2, 2, null, List.of(new FieldSortBuilder("metadata.key1")), null); |
There was a problem hiding this comment.
Should we also include a search_after here ? (or in a seperate test)
| public static Watch createTestWatch(String watchName, Client client, HttpClient httpClient, EmailService emailService, | ||
| WatcherSearchTemplateService searchTemplateService, Logger logger) throws AddressException { | ||
| WatcherSearchTemplateService searchTemplateService, Logger logger) { | ||
| ActionThrottler actionThrottler = new ActionThrottler(Clock.systemUTC(), null, mock(XPackLicenseState.class)); |
There was a problem hiding this comment.
What is this change for ?
There was a problem hiding this comment.
So the reason this change is needed, is because the ListWatchesResponseTests creates watch instances for testing xcontent serialization and a couple more components are needed to make a watch xcontent serialization work without NPE's being thrown.
We don't know how widely
How the query syntax behaves is the same whether users query the |
Agree we don't know, but I think we can we can also agree that there is probably not a lot of searching happening given that the only 2 (logical) fields available to search, and one is a keyword and the other may not exist.
Assuming this is 8.0, we should look to subtract complexity for areas that are not really used and there are viable alternatives.
The syntax is the same, but there is subtle difference with the privilege model of
Agree... however, I think this speaks to my point about it really being useful.
I disagree here. This is a fairly common REST pattern (get IDs then go get additional info) and don't think it is that complex. I actually think using this API that supports search is more complex then a basic "list all of the ids". To list the watches I need to a) understand the API well enough to know about pagination or the default count returned (else I only get partial results) b) optionally understand the query syntax as well as the internal mappings (and that the only thing that i can search is metadata and IDs). If we were to build this from scratch, would we expose the ability to search .watches by ID or metadata ? IMO, probably not. We would either not expose search, or index additional values. I don't think the .watches was ever really intended to be searched, hence the mappings. If we go this path (only returning IDs from list), I do think we should extend the GET /_watcher/id to accept multiple id's. |
The supported options are the same as in the search api, so no need to learn new syntax. Users that are querying the metadata fields are already aware of the mappings. I don't think we build this api for new users. New users should use Alerting instead. If the list watch api would return just all the IDs of watches then we would still use the search api. The search api by default already includes the source of a watch. The api instead either doesn't request the _source or omits it, but then in a later point in time, any consumer of this API would then use the get watch api to get the watch source, which we could have returned in list watch response. Also in the case that the list watch api returns all IDs then we would maybe need some form of pagination. I believe there are use cases were there is a watch per user model and in those cases there may be many watches.
If I recall correctly, the
This can be many fields defined by watch authors, since the
I initially wanted to extend the get watch api, but the response currently doesn't allow returning multiple watches in backwards compatible way. We need api versioning in order to change the get watch api response. |
My concern isn't so much about the syntax of the query, but rather needing to read the documentation for this API well enough to know If we changed the name to
I think returning a thousand IDs is still sane, and let the client do the pagination if necessary. If there are real cases of tens/hundreds of thousands watches (which is possible, but likely rare) we could introduce pagination as option, where the default is no pagination or a huge default value. Which IMO would be more inline with the semantics with
I don't like it ... but ... we could in 7.x continue to return a single object for a single id (maintaining passivity), and an array of objects for multiple ids (new functionality). Then in 8.0 consolidate on a array objects and implement the compatible API the preserve the 7.x shape for single ids. @jethr0null - Would you share your opinion ? |
That makes sense. Maybe
It is rare compared to all clusters that use Watcher, but there are cases with more then 10000 watches and even beyond 100000 watches or more, so I think we should take these clusters into account.
For 7.x, perhaps we can switch to object array in the response if no watch id is provided? So when |
Not germane to this discussion, but the Watcher UI uses |
|
Thanks CJ, I wasn't aware that the UI used this [1]. It appears that it simply read (as opposed to searched) for this use case ? However, this does provide a nice example of how it could be used for something non-trival as part of a larger system. @martijnvg - Can we change the API to be [1] example usage: "metadata" : {
"name" : "test1",
"watcherui" : {
"trigger_interval_unit" : "h",
"agg_type" : "count",
"time_field" : "@timestamp",
"trigger_interval_size" : 1,
"term_size" : 5,
"time_window_unit" : "m",
"threshold_comparator" : ">",
"term_field" : null,
"index" : [
"logstash-2020.11.30-000001"
],
"time_window_size" : 5,
"threshold" : 1000,
"agg_field" : null
},
"xpack" : {
"type" : "threshold"
}
} |
jakelandis
left a comment
There was a problem hiding this comment.
LGTM thanks for the name change.
One minor point, however, I don't have a strong preference (only minor) would be for the path to _watcher/_query/watches I think this read a bit better and sets a bit cleaner precedence for custom queries against system indices.
This api supports pagination (from / size) and querying and sorting by watch _id and watcher metadata. This avoids using .watch index directly. On a per watch basis the same information that the get watch api returns is returned, except version. Relates elastic#62501
This api supports pagination (from / size) and
querying and sorting by watch _id and watcher metadata.
This avoids using .watch index directly.
On a per watch basis the same information that the get watch api returns is returned,
except
version.Example 1:
Example 2:
Relates #62501