[savedObjects] wait for Kibana index on every write#14202
[savedObjects] wait for Kibana index on every write#14202spalger merged 8 commits intoelastic:masterfrom
Conversation
a593f98 to
311a62d
Compare
c9d017b to
495ddc0
Compare
b813cdf to
059b7ac
Compare
059b7ac to
d2166d0
Compare
epixa
left a comment
There was a problem hiding this comment.
I haven't run this yet, but I did go through the implementation and left some comments.
| clusterHealth.onCall(2).returns(Promise.resolve({ status: 'green' })); | ||
|
|
||
| plugin.status.once = function (event, handler) { | ||
| expect(event).to.be('green'); |
There was a problem hiding this comment.
The problem with burying assertions like this inside a callback is that they don't fail if the callback isn't invoked at all. If I do a refactor on the status lifecycle or something that results in once not being called, or in this case simply changing the name of the function, this whole test would continue to pass even though the status may not actually be green.
src/server/config/schema.js
Outdated
| }).default(), | ||
|
|
||
| savedObjects: Joi.object({ | ||
| indexCheckTimeout: Joi.string().default('5s').regex(/^\d+(\.\d)?[sm]$/, 'number with either s (seconds) or m (minutes) suffix') |
There was a problem hiding this comment.
It's pretty well established at this point that our configurable timeouts are number of seconds, so let's just keep up that trend. This means slightly fewer moving parts that can break on us anyway.
There was a problem hiding this comment.
I assume you mean milliseconds?
| index, | ||
| mappings, | ||
| callCluster, | ||
| onBeforeWrite = noop, |
There was a problem hiding this comment.
Nit: How about a short function syntax here rather than importing noop from lodash?
|
|
||
| try { | ||
| await adminCluster.callWithInternalUser('cluster.health', { | ||
| timeout: server.config().get('savedObjects.indexCheckTimeout'), |
There was a problem hiding this comment.
Rather than polling for the health of the index, why not just do a simple 404 check on the index itself? The write request itself doesn't poll or anything, so it doesn't seem especially important to poll for availability before it.
There was a problem hiding this comment.
The cluster.health call confirms not only that the index exists, but also that it is usable. When the index is first created it goes through a phase of being red before it turns yellow and is actually ready for use. This is why we have the same cluster.health check in the healthcheck, because indexing or searching for a config doc immediately after the index is created produces its own set of errors.
I also don't think it's especially important to wait for the index, but the idea is that if the healthCheck is left at 2.5 seconds then writes to the savedObjects should just always succeed.
That said, once we have an index template the check will just ensure that the index template exists and be instant.
| const initialIndex = await getIndex(callCluster, indexName); | ||
|
|
||
| // delete the kibana index and run the test, we have about 2 seconds | ||
| // before the healthCheck runs again, that SHOULD be enough time |
There was a problem hiding this comment.
Oh dear... will this hackery disappear when the health check is gone?
There was a problem hiding this comment.
The whole point of these tests is to verify how the uiSettings behave between healthcheck runs, once it is gone these tests can go too (though we might still want to know how it behaves when there isn't an index template, which will probably require similar but less timing dependent hackery)
|
Alright @epixa, feedback addressed. |
|
LGTM |
tylersmalley
left a comment
There was a problem hiding this comment.
Overall, looks good - just a things I found
| }); | ||
| } catch (error) { | ||
| if (error && error.body && error.body.status === 'red') { | ||
| server.log(['debug', 'savedObjects'], `Attempted to write to the Kibana index when it didn't exist.`); |
There was a problem hiding this comment.
Is it possible the status is red for a reason other than the index not existing?
There was a problem hiding this comment.
Yeah, I suppose it is. I'll update the message.
There was a problem hiding this comment.
I did a little digging and confirmed that we can detect when a red status is because there are uninitialized or otherwise unavailable shards. In those cases we will no longer throw a 404, instead we will let the request fail on its own and provide its own descriptive error.
red status when all data nodes are gone but index does exist

red status when es is healthy but Kibana index doesn't exist

| } | ||
|
|
||
| return new Promise((resolve) => { | ||
| plugin.status.once('green', resolve); |
There was a problem hiding this comment.
No, but not having it caused a race condition where code that was waiting for the Kibana index to exist would try to access the config document before the migrateConfig step was complete, and that step can take a bit now that it uses the savedObjectsClient. This addition just makes sure that waitUntilReady() only resolves elasticsearch is really ready.
| }).default(), | ||
|
|
||
| savedObjects: Joi.object({ | ||
| indexCheckTimeout: Joi.number().default(5000) |
There was a problem hiding this comment.
The issue description mentioned the wait defaulting to a second. Just want to ensure it wasn't an oversight.
There was a problem hiding this comment.
I waffled around a bit on the timeout, ended up with 5 seconds though, forgot to update the description
| index, | ||
| mappings, | ||
| callCluster, | ||
| onBeforeWrite = () => {}, |
There was a problem hiding this comment.
Should this be a promise?
There was a problem hiding this comment.
Not necessary with await, promise/non-promise values all resolve the same.
| } catch (error) { | ||
| if (error && error.body && error.body.status === 'red') { | ||
| server.log(['debug', 'savedObjects'], `Attempted to write to the Kibana index when it didn't exist.`); | ||
| throw new adminCluster.errors.NotFound(); |
There was a problem hiding this comment.
I find this error misleading - given the index is unavailable, we will fail API create requests with a 404, not found.
Maybe a 503 service unavailable with a message specific to the Kibana index being unavailable?
There was a problem hiding this comment.
I totally agree, 404 does not make sense for create requests. I'll update the create method to send a 503 instead of a 404.
note to the reader: We chatted about this over video and worked out that all of the other APIs can still respond with 404 when the "target data is does not exist", regardless if that is because of the index existing or not.
ebce438 to
058801d
Compare
e9f68c3 to
460fd5f
Compare
* [savedObjects] wait for Kibana index on every write * [es/waitUntilReady] make test failure less likely by waiting for green * [es/healthCheck] assert that plugin.status.once was called * [savedObjectsClient] avoid importing noop * [savedObjects] use milliseconds for indexCheckTimeout * [savedObjectsClient/onBeforeWrite] don't 404 when kibana index has unassigned shards * [savedObjectsClient/create] cast 404 caused by index missing to 503 (cherry picked from commit 07eca00)
|
6.x/6.1: 9510b41 |
* [savedObjects/delete+bulk_get] add failing tests * [savedObjects/delete+bulk_get] improve 404 handling * [savedObjects/client] fix mocha tests * [savedObjects/tests] remove extra test wrapper * [apiIntegration/kbnServer] basically disable es healthcheck * [savedObjects/create] add integration test * [savedObjects/find] add failing integration tests * [savedObjects/find] fix failing test * [savedObjects/client] explain reason for generic 404s * [savedObjects/get] add integration tests * [savedObjects/find] test request with unkown type * [savedObjects/find] add some more weird param tests * [savedObjects/find] test that weird params pass when no index * [savedObjects/update] use generic 404 * fix typos * [savedObjects/update] add integration tests * remove debugging uncomment * [savedObjects/tests] move backup kibana index delete out of tests * [savedObjects/tests/esArchives] remove logstash data * [savedObjects] update test * [uiSettings] remove detailed previously leaked from API * [functional/dashboard] wrap check that is only failing on Jenkins * [savedObjects/error] replace decorateNotFound with createGenericNotFound * fix typo * [savedObjectsClient/errors] fix decorateEsError() test * [savedObjectsClient] fix typos * [savedObjects/tests/functional] delete document that would normally exist * [savedObjectsClient/tests] use sinon assertions * [savedObjects/apiTests] create without index responds with 503 after #14202
* [savedObjects/delete+bulk_get] add failing tests * [savedObjects/delete+bulk_get] improve 404 handling * [savedObjects/client] fix mocha tests * [savedObjects/tests] remove extra test wrapper * [apiIntegration/kbnServer] basically disable es healthcheck * [savedObjects/create] add integration test * [savedObjects/find] add failing integration tests * [savedObjects/find] fix failing test * [savedObjects/client] explain reason for generic 404s * [savedObjects/get] add integration tests * [savedObjects/find] test request with unkown type * [savedObjects/find] add some more weird param tests * [savedObjects/find] test that weird params pass when no index * [savedObjects/update] use generic 404 * fix typos * [savedObjects/update] add integration tests * remove debugging uncomment * [savedObjects/tests] move backup kibana index delete out of tests * [savedObjects/tests/esArchives] remove logstash data * [savedObjects] update test * [uiSettings] remove detailed previously leaked from API * [functional/dashboard] wrap check that is only failing on Jenkins * [savedObjects/error] replace decorateNotFound with createGenericNotFound * fix typo * [savedObjectsClient/errors] fix decorateEsError() test * [savedObjectsClient] fix typos * [savedObjects/tests/functional] delete document that would normally exist * [savedObjectsClient/tests] use sinon assertions * [savedObjects/apiTests] create without index responds with 503 after #14202

Adds an
onBeforeWrite()option to theSavedObjectsClientwhich will be called before the SavedObjectsClient does any write and can delay or abort the write.This option is only available when instantiating the class directly. When using the
request.getSavedObjectsClient()orserver.savedObjectsClientFactory()methods it will always be set to a function that waits up to 5 seconds for the index to report a yellow status. This will prevent writes when the index does not exist and should be a light enough operation that writes while the index exists will only be slowed down a few milliseconds.When the option is not specified it is ignored and writes are sent as soon as requested.