[savedObjects] wait for Kibana index on every write by spalger · Pull Request #14202 · elastic/kibana

spalger · 2017-09-27T20:09:34Z

Adds an onBeforeWrite() option to the SavedObjectsClient which will be called before the SavedObjectsClient does any write and can delay or abort the write.

This option is only available when instantiating the class directly. When using the request.getSavedObjectsClient() or server.savedObjectsClientFactory() methods it will always be set to a function that waits up to 5 seconds for the index to report a yellow status. This will prevent writes when the index does not exist and should be a light enough operation that writes while the index exists will only be slowed down a few milliseconds.

When the option is not specified it is ignored and writes are sent as soon as requested.

epixa

I haven't run this yet, but I did go through the implementation and left some comments.

epixa · 2017-09-28T20:32:51Z

src/core_plugins/elasticsearch/lib/__tests__/health_check.js

        clusterHealth.onCall(2).returns(Promise.resolve({ status: 'green' }));

+        plugin.status.once = function (event, handler) {
+          expect(event).to.be('green');


The problem with burying assertions like this inside a callback is that they don't fail if the callback isn't invoked at all. If I do a refactor on the status lifecycle or something that results in once not being called, or in this case simply changing the name of the function, this whole test would continue to pass even though the status may not actually be green.

epixa · 2017-09-28T20:36:46Z

src/server/config/schema.js

  }).default(),
+
+  savedObjects: Joi.object({
+    indexCheckTimeout: Joi.string().default('5s').regex(/^\d+(\.\d)?[sm]$/, 'number with either s (seconds) or m (minutes) suffix')


It's pretty well established at this point that our configurable timeouts are number of seconds, so let's just keep up that trend. This means slightly fewer moving parts that can break on us anyway.

I assume you mean milliseconds?

epixa · 2017-09-28T20:46:43Z

src/server/saved_objects/client/saved_objects_client.js

      index,
      mappings,
      callCluster,
+      onBeforeWrite = noop,


Nit: How about a short function syntax here rather than importing noop from lodash?

epixa · 2017-09-28T20:55:12Z

src/server/saved_objects/saved_objects_mixin.js

+
+    try {
+      await adminCluster.callWithInternalUser('cluster.health', {
+        timeout: server.config().get('savedObjects.indexCheckTimeout'),


Rather than polling for the health of the index, why not just do a simple 404 check on the index itself? The write request itself doesn't poll or anything, so it doesn't seem especially important to poll for availability before it.

The cluster.health call confirms not only that the index exists, but also that it is usable. When the index is first created it goes through a phase of being red before it turns yellow and is actually ready for use. This is why we have the same cluster.health check in the healthcheck, because indexing or searching for a config doc immediately after the index is created produces its own set of errors.

I also don't think it's especially important to wait for the index, but the idea is that if the healthCheck is left at 2.5 seconds then writes to the savedObjects should just always succeed.

That said, once we have an index template the check will just ensure that the index template exists and be instant.

epixa · 2017-09-28T20:57:30Z

src/ui/ui_settings/routes/__tests__/index_missing.js

-    const initialIndex = await getIndex(callCluster, indexName);

+    // delete the kibana index and run the test, we have about 2 seconds
+    // before the healthCheck runs again, that SHOULD be enough time


Oh dear... will this hackery disappear when the health check is gone?

The whole point of these tests is to verify how the uiSettings behave between healthcheck runs, once it is gone these tests can go too (though we might still want to know how it behaves when there isn't an index template, which will probably require similar but less timing dependent hackery)

…ed-objects-client/on-before-write

spalger · 2017-09-29T04:59:23Z

Alright @epixa, feedback addressed.

epixa · 2017-09-29T19:35:50Z

LGTM

tylersmalley

Overall, looks good - just a things I found

tylersmalley · 2017-09-29T23:25:06Z

src/server/saved_objects/saved_objects_mixin.js

+      });
+    } catch (error) {
+      if (error && error.body && error.body.status === 'red') {
+        server.log(['debug', 'savedObjects'], `Attempted to write to the Kibana index when it didn't exist.`);


Is it possible the status is red for a reason other than the index not existing?

Yeah, I suppose it is. I'll update the message.

I did a little digging and confirmed that we can detect when a red status is because there are uninitialized or otherwise unavailable shards. In those cases we will no longer throw a 404, instead we will let the request fail on its own and provide its own descriptive error.

red status when all data nodes are gone but index does exist

red status when es is healthy but Kibana index doesn't exist

tylersmalley · 2017-09-29T23:27:38Z

src/core_plugins/elasticsearch/lib/health_check.js

      }
+
+      return new Promise((resolve) => {
+        plugin.status.once('green', resolve);


Was this a bug?

No, but not having it caused a race condition where code that was waiting for the Kibana index to exist would try to access the config document before the migrateConfig step was complete, and that step can take a bit now that it uses the savedObjectsClient. This addition just makes sure that waitUntilReady() only resolves elasticsearch is really ready.

tylersmalley · 2017-09-29T23:28:51Z

src/server/config/schema.js

  }).default(),
+
+  savedObjects: Joi.object({
+    indexCheckTimeout: Joi.number().default(5000)


The issue description mentioned the wait defaulting to a second. Just want to ensure it wasn't an oversight.

I waffled around a bit on the timeout, ended up with 5 seconds though, forgot to update the description

tylersmalley · 2017-09-29T23:30:54Z

src/server/saved_objects/client/saved_objects_client.js

      index,
      mappings,
      callCluster,
+      onBeforeWrite = () => {},


Should this be a promise?

Not necessary with await, promise/non-promise values all resolve the same.

tylersmalley · 2017-09-29T23:42:35Z

src/server/saved_objects/saved_objects_mixin.js

+    } catch (error) {
+      if (error && error.body && error.body.status === 'red') {
+        server.log(['debug', 'savedObjects'], `Attempted to write to the Kibana index when it didn't exist.`);
+        throw new adminCluster.errors.NotFound();


I find this error misleading - given the index is unavailable, we will fail API create requests with a 404, not found.

Maybe a 503 service unavailable with a message specific to the Kibana index being unavailable?

I totally agree, 404 does not make sense for create requests. I'll update the create method to send a 503 instead of a 404.

note to the reader: We chatted about this over video and worked out that all of the other APIs can still respond with 404 when the "target data is does not exist", regardless if that is because of the index existing or not.

…assigned shards

tylersmalley

LGTM

* [savedObjects] wait for Kibana index on every write * [es/waitUntilReady] make test failure less likely by waiting for green * [es/healthCheck] assert that plugin.status.once was called * [savedObjectsClient] avoid importing noop * [savedObjects] use milliseconds for indexCheckTimeout * [savedObjectsClient/onBeforeWrite] don't 404 when kibana index has unassigned shards * [savedObjectsClient/create] cast 404 caused by index missing to 503 (cherry picked from commit 07eca00)

spalger · 2017-09-30T04:10:55Z

6.x/6.1: 9510b41

…lastic#14202

* [savedObjects/delete+bulk_get] add failing tests * [savedObjects/delete+bulk_get] improve 404 handling * [savedObjects/client] fix mocha tests * [savedObjects/tests] remove extra test wrapper * [apiIntegration/kbnServer] basically disable es healthcheck * [savedObjects/create] add integration test * [savedObjects/find] add failing integration tests * [savedObjects/find] fix failing test * [savedObjects/client] explain reason for generic 404s * [savedObjects/get] add integration tests * [savedObjects/find] test request with unkown type * [savedObjects/find] add some more weird param tests * [savedObjects/find] test that weird params pass when no index * [savedObjects/update] use generic 404 * fix typos * [savedObjects/update] add integration tests * remove debugging uncomment * [savedObjects/tests] move backup kibana index delete out of tests * [savedObjects/tests/esArchives] remove logstash data * [savedObjects] update test * [uiSettings] remove detailed previously leaked from API * [functional/dashboard] wrap check that is only failing on Jenkins * [savedObjects/error] replace decorateNotFound with createGenericNotFound * fix typo * [savedObjectsClient/errors] fix decorateEsError() test * [savedObjectsClient] fix typos * [savedObjects/tests/functional] delete document that would normally exist * [savedObjectsClient/tests] use sinon assertions * [savedObjects/apiTests] create without index responds with 503 after #14202

spalger added Team:Operations Kibana-Operations Team review v6.1.0 v7.0.0 labels Sep 27, 2017

spalger requested review from epixa and tylersmalley September 27, 2017 20:09

spalger mentioned this pull request Sep 27, 2017

[uiSettings] auto create/upgrade saved config #14164

Merged

spalger force-pushed the implement/saved-objects-client/on-before-write branch 3 times, most recently from a593f98 to 311a62d Compare September 27, 2017 23:29

spalger changed the title ~~[SavedObjects] wait for Kibana index on every write~~ [savedObjects] wait for Kibana index on every write Sep 27, 2017

spalger force-pushed the implement/saved-objects-client/on-before-write branch 2 times, most recently from c9d017b to 495ddc0 Compare September 28, 2017 06:01

[savedObjects] wait for Kibana index on every write

39396de

spalger force-pushed the implement/saved-objects-client/on-before-write branch 2 times, most recently from b813cdf to 059b7ac Compare September 28, 2017 06:05

[es/waitUntilReady] make test failure less likely by waiting for green

d2166d0

spalger force-pushed the implement/saved-objects-client/on-before-write branch from 059b7ac to d2166d0 Compare September 28, 2017 06:19

spalger mentioned this pull request Sep 28, 2017

[savedObjects] use index template rather than healthcheck #14142

Closed

epixa suggested changes Sep 28, 2017

View reviewed changes

spalger added 4 commits September 28, 2017 16:18

Merge branch 'master' of github.com:elastic/kibana into implement/sav…

a0c5eb2

…ed-objects-client/on-before-write

[es/healthCheck] assert that plugin.status.once was called

d5efe8e

[savedObjectsClient] avoid importing noop

ea445ec

[savedObjects] use milliseconds for indexCheckTimeout

4304482

epixa approved these changes Sep 29, 2017

View reviewed changes

tylersmalley requested changes Sep 29, 2017

View reviewed changes

[savedObjectsClient/onBeforeWrite] don't 404 when kibana index has un…

058801d

…assigned shards

spalger force-pushed the implement/saved-objects-client/on-before-write branch from ebce438 to 058801d Compare September 30, 2017 00:57

[savedObjectsClient/create] cast 404 caused by index missing to 503

460fd5f

spalger force-pushed the implement/saved-objects-client/on-before-write branch from e9f68c3 to 460fd5f Compare September 30, 2017 01:06

tylersmalley approved these changes Sep 30, 2017

View reviewed changes

spalger merged commit 07eca00 into elastic:master Sep 30, 2017

spalger deleted the implement/saved-objects-client/on-before-write branch September 30, 2017 04:11

spalger added a commit to spalger/kibana that referenced this pull request Oct 3, 2017

[savedObjects/apiTests] create without index responds with 503 after e…

9b2f127

…lastic#14202

tylersmalley added the release_note:enhancement label Nov 16, 2017

Conversation

spalger commented Sep 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

epixa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spalger Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spalger Sep 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spalger commented Sep 29, 2017

Uh oh!

epixa commented Sep 29, 2017

Uh oh!

tylersmalley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tylersmalley Sep 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tylersmalley left a comment

Choose a reason for hiding this comment

Uh oh!

spalger commented Sep 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

spalger commented Sep 27, 2017 •

edited

Loading

spalger Sep 28, 2017 •

edited

Loading

spalger Sep 28, 2017 •

edited

Loading

tylersmalley Sep 29, 2017 •

edited

Loading