Skip to content

[savedObjects] wait for Kibana index on every write#14202

Merged
spalger merged 8 commits intoelastic:masterfrom
spalger:implement/saved-objects-client/on-before-write
Sep 30, 2017
Merged

[savedObjects] wait for Kibana index on every write#14202
spalger merged 8 commits intoelastic:masterfrom
spalger:implement/saved-objects-client/on-before-write

Conversation

@spalger
Copy link
Copy Markdown
Contributor

@spalger spalger commented Sep 27, 2017

Adds an onBeforeWrite() option to the SavedObjectsClient which will be called before the SavedObjectsClient does any write and can delay or abort the write.

This option is only available when instantiating the class directly. When using the request.getSavedObjectsClient() or server.savedObjectsClientFactory() methods it will always be set to a function that waits up to 5 seconds for the index to report a yellow status. This will prevent writes when the index does not exist and should be a light enough operation that writes while the index exists will only be slowed down a few milliseconds.

When the option is not specified it is ignored and writes are sent as soon as requested.

@spalger spalger force-pushed the implement/saved-objects-client/on-before-write branch 3 times, most recently from a593f98 to 311a62d Compare September 27, 2017 23:29
@spalger spalger changed the title [SavedObjects] wait for Kibana index on every write [savedObjects] wait for Kibana index on every write Sep 27, 2017
@spalger spalger force-pushed the implement/saved-objects-client/on-before-write branch 2 times, most recently from c9d017b to 495ddc0 Compare September 28, 2017 06:01
@spalger spalger force-pushed the implement/saved-objects-client/on-before-write branch 2 times, most recently from b813cdf to 059b7ac Compare September 28, 2017 06:05
@spalger spalger force-pushed the implement/saved-objects-client/on-before-write branch from 059b7ac to d2166d0 Compare September 28, 2017 06:19
Copy link
Copy Markdown
Contributor

@epixa epixa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run this yet, but I did go through the implementation and left some comments.

clusterHealth.onCall(2).returns(Promise.resolve({ status: 'green' }));

plugin.status.once = function (event, handler) {
expect(event).to.be('green');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with burying assertions like this inside a callback is that they don't fail if the callback isn't invoked at all. If I do a refactor on the status lifecycle or something that results in once not being called, or in this case simply changing the name of the function, this whole test would continue to pass even though the status may not actually be green.

}).default(),

savedObjects: Joi.object({
indexCheckTimeout: Joi.string().default('5s').regex(/^\d+(\.\d)?[sm]$/, 'number with either s (seconds) or m (minutes) suffix')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty well established at this point that our configurable timeouts are number of seconds, so let's just keep up that trend. This means slightly fewer moving parts that can break on us anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean milliseconds?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

index,
mappings,
callCluster,
onBeforeWrite = noop,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: How about a short function syntax here rather than importing noop from lodash?


try {
await adminCluster.callWithInternalUser('cluster.health', {
timeout: server.config().get('savedObjects.indexCheckTimeout'),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than polling for the health of the index, why not just do a simple 404 check on the index itself? The write request itself doesn't poll or anything, so it doesn't seem especially important to poll for availability before it.

Copy link
Copy Markdown
Contributor Author

@spalger spalger Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster.health call confirms not only that the index exists, but also that it is usable. When the index is first created it goes through a phase of being red before it turns yellow and is actually ready for use. This is why we have the same cluster.health check in the healthcheck, because indexing or searching for a config doc immediately after the index is created produces its own set of errors.

I also don't think it's especially important to wait for the index, but the idea is that if the healthCheck is left at 2.5 seconds then writes to the savedObjects should just always succeed.

That said, once we have an index template the check will just ensure that the index template exists and be instant.

const initialIndex = await getIndex(callCluster, indexName);

// delete the kibana index and run the test, we have about 2 seconds
// before the healthCheck runs again, that SHOULD be enough time
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear... will this hackery disappear when the health check is gone?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Copy Markdown
Contributor Author

@spalger spalger Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of these tests is to verify how the uiSettings behave between healthcheck runs, once it is gone these tests can go too (though we might still want to know how it behaves when there isn't an index template, which will probably require similar but less timing dependent hackery)

@spalger
Copy link
Copy Markdown
Contributor Author

spalger commented Sep 29, 2017

Alright @epixa, feedback addressed.

@epixa
Copy link
Copy Markdown
Contributor

epixa commented Sep 29, 2017

LGTM

Copy link
Copy Markdown
Member

@tylersmalley tylersmalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good - just a things I found

});
} catch (error) {
if (error && error.body && error.body.status === 'red') {
server.log(['debug', 'savedObjects'], `Attempted to write to the Kibana index when it didn't exist.`);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible the status is red for a reason other than the index not existing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suppose it is. I'll update the message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little digging and confirmed that we can detect when a red status is because there are uninitialized or otherwise unavailable shards. In those cases we will no longer throw a 404, instead we will let the request fail on its own and provide its own descriptive error.

red status when all data nodes are gone but index does exist
image

red status when es is healthy but Kibana index doesn't exist
image

}

return new Promise((resolve) => {
plugin.status.once('green', resolve);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but not having it caused a race condition where code that was waiting for the Kibana index to exist would try to access the config document before the migrateConfig step was complete, and that step can take a bit now that it uses the savedObjectsClient. This addition just makes sure that waitUntilReady() only resolves elasticsearch is really ready.

}).default(),

savedObjects: Joi.object({
indexCheckTimeout: Joi.number().default(5000)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue description mentioned the wait defaulting to a second. Just want to ensure it wasn't an oversight.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I waffled around a bit on the timeout, ended up with 5 seconds though, forgot to update the description

index,
mappings,
callCluster,
onBeforeWrite = () => {},
Copy link
Copy Markdown
Member

@tylersmalley tylersmalley Sep 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a promise?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary with await, promise/non-promise values all resolve the same.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

giphy

} catch (error) {
if (error && error.body && error.body.status === 'red') {
server.log(['debug', 'savedObjects'], `Attempted to write to the Kibana index when it didn't exist.`);
throw new adminCluster.errors.NotFound();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this error misleading - given the index is unavailable, we will fail API create requests with a 404, not found.

Maybe a 503 service unavailable with a message specific to the Kibana index being unavailable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree, 404 does not make sense for create requests. I'll update the create method to send a 503 instead of a 404.

note to the reader: We chatted about this over video and worked out that all of the other APIs can still respond with 404 when the "target data is does not exist", regardless if that is because of the index existing or not.

@spalger spalger force-pushed the implement/saved-objects-client/on-before-write branch from ebce438 to 058801d Compare September 30, 2017 00:57
@spalger spalger force-pushed the implement/saved-objects-client/on-before-write branch from e9f68c3 to 460fd5f Compare September 30, 2017 01:06
Copy link
Copy Markdown
Member

@tylersmalley tylersmalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@spalger spalger merged commit 07eca00 into elastic:master Sep 30, 2017
spalger added a commit that referenced this pull request Sep 30, 2017
* [savedObjects] wait for Kibana index on every write

* [es/waitUntilReady] make test failure less likely by waiting for green

* [es/healthCheck] assert that plugin.status.once was called

* [savedObjectsClient] avoid importing noop

* [savedObjects] use milliseconds for indexCheckTimeout

* [savedObjectsClient/onBeforeWrite] don't 404 when kibana index has unassigned shards

* [savedObjectsClient/create] cast 404 caused by index missing to 503

(cherry picked from commit 07eca00)
@spalger
Copy link
Copy Markdown
Contributor Author

spalger commented Sep 30, 2017

6.x/6.1: 9510b41

@spalger spalger deleted the implement/saved-objects-client/on-before-write branch September 30, 2017 04:11
spalger added a commit that referenced this pull request Oct 3, 2017
* [savedObjects/delete+bulk_get] add failing tests

* [savedObjects/delete+bulk_get] improve 404 handling

* [savedObjects/client] fix mocha tests

* [savedObjects/tests] remove extra test wrapper

* [apiIntegration/kbnServer] basically disable es healthcheck

* [savedObjects/create] add integration test

* [savedObjects/find] add failing integration tests

* [savedObjects/find] fix failing test

* [savedObjects/client] explain reason for generic 404s

* [savedObjects/get] add integration tests

* [savedObjects/find] test request with unkown type

* [savedObjects/find] add some more weird param tests

* [savedObjects/find] test that weird params pass when no index

* [savedObjects/update] use generic 404

* fix typos

* [savedObjects/update] add integration tests

* remove debugging uncomment

* [savedObjects/tests] move backup kibana index delete out of tests

* [savedObjects/tests/esArchives] remove logstash data

* [savedObjects] update test

* [uiSettings] remove detailed previously leaked from API

* [functional/dashboard] wrap check that is only failing on Jenkins

* [savedObjects/error] replace decorateNotFound with createGenericNotFound

* fix typo

* [savedObjectsClient/errors] fix decorateEsError() test

* [savedObjectsClient] fix typos

* [savedObjects/tests/functional] delete document that would normally exist

* [savedObjectsClient/tests] use sinon assertions

* [savedObjects/apiTests] create without index responds with 503 after #14202
spalger added a commit that referenced this pull request Oct 3, 2017
* [savedObjects/delete+bulk_get] add failing tests

* [savedObjects/delete+bulk_get] improve 404 handling

* [savedObjects/client] fix mocha tests

* [savedObjects/tests] remove extra test wrapper

* [apiIntegration/kbnServer] basically disable es healthcheck

* [savedObjects/create] add integration test

* [savedObjects/find] add failing integration tests

* [savedObjects/find] fix failing test

* [savedObjects/client] explain reason for generic 404s

* [savedObjects/get] add integration tests

* [savedObjects/find] test request with unkown type

* [savedObjects/find] add some more weird param tests

* [savedObjects/find] test that weird params pass when no index

* [savedObjects/update] use generic 404

* fix typos

* [savedObjects/update] add integration tests

* remove debugging uncomment

* [savedObjects/tests] move backup kibana index delete out of tests

* [savedObjects/tests/esArchives] remove logstash data

* [savedObjects] update test

* [uiSettings] remove detailed previously leaked from API

* [functional/dashboard] wrap check that is only failing on Jenkins

* [savedObjects/error] replace decorateNotFound with createGenericNotFound

* fix typo

* [savedObjectsClient/errors] fix decorateEsError() test

* [savedObjectsClient] fix typos

* [savedObjects/tests/functional] delete document that would normally exist

* [savedObjectsClient/tests] use sinon assertions

* [savedObjects/apiTests] create without index responds with 503 after #14202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants