[es] disable wildcards in destructive actions#88986
Conversation
722093a to
ab1eb85
Compare
There was a problem hiding this comment.
Is it possible to add a refresh: wait_for here for us since we're looping now? It should be ok for us and our users and not a big breaking change. Might make tests and other things less in-deterministic and get us a better chance of getting this right on the first loop around.
There was a problem hiding this comment.
The index delete API doesn't support refresh, and AFAIK is a "synchronous" action and the entire cluster will refuse to search across shards for deleted indexes immediately.
There was a problem hiding this comment.
"synchronous" action and the entire cluster will refuse to search across shards for deleted indexes immediately
Do you mean, when we execute a request like DELETE /delme-idx-1,delme-idx-2, before returning "acknowledged" : true every node in the cluster must acknowledge the deletion and disable all other actions with the indices - synchronously? To me "acknowledged" semantically means "ok, received your request, will process it" w/o time guarantees, so I wouldn't personally rely on that. But I might be wrong.
I think my question here is: after deleting indices in await callWithRequest('indices.delete', can we still get the same indices from await callWithRequest('indices.getAlias' in the next iteration? If the cluster is under load or whatever? Can we catch an error due to a race condition? Because if it's possible, on the next iteration when trying to delete the same index that was already asked to be deleted, we can receive an error.
I tried this:
PUT delme-idx-1/_doc/1
{ "foo": "bar" }
DELETE /delme-idx-1,delme-idx-2
where deletion returns an error
{
"error" : {
"root_cause" : [
{
"type" : "index_not_found_exception",
"reason" : "no such index [delme-idx-2]",
"index_uuid" : "_na_",
"resource.type" : "index_or_alias",
"resource.id" : "delme-idx-2",
"index" : "delme-idx-2"
}
],
"type" : "index_not_found_exception",
"reason" : "no such index [delme-idx-2]",
"index_uuid" : "_na_",
"resource.type" : "index_or_alias",
"resource.id" : "delme-idx-2",
"index" : "delme-idx-2"
},
"status" : 404
}
and then
GET /delme-idx-*/_alias
returns
{
"delme-idx-1" : {
"aliases" : { }
}
}
Which means that DELETE /delme-idx-1,delme-idx-2 did nothing.
Is a similar situation possible due to a race condition? And would await callWithRequest('indices.delete' throw an exception in this case?
There was a problem hiding this comment.
And just thinking out loud.. Maybe specifying a timeout + exp wait would do the job?
namesToDelete = get index names from the pattern
send the delete request
while (not timed out) {
existingNames = get index names
if (namesToDelete not in existingNames) {
return
}
get next delay // exponential
wait
}
throw timed out
Here the difference is: this will delete only those specific indices which existed at the moment of receiving the request. So if a new index matching the pattern is added during the deletion (while loop), it will be ignored. Which can be good or bad depending on the case.
There was a problem hiding this comment.
@banderror You definitely have a point. I am pretty darn confident that a successful response from Elasticsearch means that the cluster state has been updated and all nodes have acknowledged the update, that said there isn't any reason to not protect ourselves from a situation like this. To do so I've updated the indices.delete call to use the ignore_unavailable=true query string param which will ignore concrete indices which are deleted, preventing the 404 you showed and will instead just delete the indices that do exist.
There was a problem hiding this comment.
WRT exponential backoff/timeout I'm not totally opposed, but I also feel like it's a good deal more complicated to implement and I don't really see the value personally. If you specify a wildcard to this API and new indexes show up that match the wildcard while it's running I think it makes sense to delete them too. Additionally, if new indexes keep showing up leading to a maximum number of iterations that's a issue which I don't think this API is intended to solve. Finally, picking a timeout seems like a very challenging task as operations to ES can take quite some time in certain scenarios and I think we should rely on the configured request timeout rather than defining a new timeout for this operation specifically.
There was a problem hiding this comment.
ignore_unavailable=true sounds good, thank you! 👍
x-pack/plugins/security_solution/server/lib/detection_engine/index/delete_all_index.ts
Outdated
Show resolved
Hide resolved
2f7114a to
52ed849
Compare
52ed849 to
6e79008
Compare
|
Pinging @elastic/kibana-operations (Team:Operations) |
YulNaumenko
left a comment
There was a problem hiding this comment.
Alerting related changes LGTM.
…disable-destructive-wildcard
jportner
left a comment
There was a problem hiding this comment.
Code review only - Security integration test changes LGTM.
… race with getAlias
…disable-destructive-wildcard
FrankHassanabad
left a comment
There was a problem hiding this comment.
These changes are good with me too.
💔 Build Failed
Failed CI Steps
Test FailuresKibana Pipeline / general / Chrome X-Pack UI Functional Tests.x-pack/test/functional/apps/ml/data_frame_analytics/feature_importance·ts.machine learning data frame analytics total feature importance panel and decision path popover binary classification job should display the feature importance decision path in the data gridStandard OutStack TraceKibana Pipeline / general / X-Pack Detection Engine API Integration Tests.x-pack/test/detection_engine_api_integration/security_and_spaces/tests/create_index·ts.detection engine api security and spaces enabled create_index t1_analyst should NOT be able to create a signal index when it has not been created yet. Should return a 403 and error that the user is unauthorizedStandard OutStack TraceKibana Pipeline / general / X-Pack Detection Engine API Integration Tests.x-pack/test/detection_engine_api_integration/security_and_spaces/tests/create_index·ts.detection engine api security and spaces enabled create_index t1_analyst should NOT be able to create a signal index when it has not been created yet. Should return a 403 and error that the user is unauthorizedStandard OutStack Traceand 1 more failures, only showing the first 3. Metrics [docs]
History
To update your PR or re-run it, just comment with: |
| this._log.indent(4); | ||
|
|
||
| const esArgs = options.esArgs || []; | ||
| const esArgs = ['action.destructive_requires_name=true', ...(options.esArgs || [])]; |
There was a problem hiding this comment.
I'd prefer to add an action to node scripts/es_archiver or something which deletes all indices with a wildcard like https://github.com/elastic/kibana/blob/master/test/common/services/es_delete_all_indices.ts
There was a problem hiding this comment.
Developing against an ES cluster with this =false feels a little risky, but I hear you and we could definitely make that an option.
There was a problem hiding this comment.
Example of adding a command to the CLI: https://github.com/elastic/kibana/blob/master/packages/kbn-es-archiver/src/cli.ts#L260-L267, and if you wanted to move the majority of the logic from that FTR service to the esArchiver and then import it we could share the implementation.
There was a problem hiding this comment.
Oh interesting. This is the first I'd heart of es_archiver. I'll check that out. Thanks!
There was a problem hiding this comment.
I spent some time playing with this today and came up with #112376 but running it is a little cumbersome for things like local docker testing envs.
Would you have any interest in also making es_archiver able to read kibana.dev.yml?
Needing to replicate my connection configuration into format that es-archiver can work with means it might just be easier to override destructive_requires_name (via _cluster/settings) when I need this sort of thing.
There was a problem hiding this comment.
I'm definitely interested in es_archiver being able to read kibana.dev.yml. All of that is handled in packages/kbn-es-archiver/src/cli.ts, and shouldn't really impact the EsArchiver class. Reading the kibana.yml and kibana.dev.yml files can be a little tricky as we technically support compound keys:
nested.setting: foo
# and
nested:
setting: fooThere is an implementation in https://github.com/elastic/kibana/blob/72dd0578eadf395897903075a2ae4992fb52c577/packages/kbn-apm-config-loader/src/utils/read_config.ts#L41-L54 but maybe @pgayvallet has a better example in mind.
There was a problem hiding this comment.
Nice, thanks! If we can do that I can probably get something like node scripts/es_archiver.js delete-indices '.monitoring-*' which would be better than manually deleting specific indices via dev tools.
In elastic/elasticsearch#66908 ES is going to change the default value for the
action.destructive_requires_nameto true, which will remove our ability to use wildcards in destructive API calls likeindices.delete(). This PR enables the setting when running Kibana in most dev scenarios as well as on CI, so that we can make sure we're ready for it.Many places in the functional tests are deleting indices with wildcards, and those instances have been migrated to use the
esDeleteAllIndicesservice, which resolves the wildcards to a list of concrete indices to delete and deletes them, in a loop, until the wildcard no longer resolves to any indices.The detection engine code seems to be the only place we were supporting deletion of wildcard indices server side, and that code has been updated to use a similar strategy.
Release note:
Enables support for ES clusters using
action.destructive_requires_name=true.