Skip to content

Prevent watcher from failing to start if its templates are missing#82395

Merged
masseyke merged 2 commits intoelastic:masterfrom
masseyke:fix/prevent-watcher-fail-on-missing-templates
Jan 13, 2022
Merged

Prevent watcher from failing to start if its templates are missing#82395
masseyke merged 2 commits intoelastic:masterfrom
masseyke:fix/prevent-watcher-fail-on-missing-templates

Conversation

@masseyke
Copy link
Copy Markdown
Member

This commit prevents watcher from failing to start if its templates are unavailable. Previously the watcher service
would fail to start if (for example) the .watch-history-14 template did not exist. This could happen during a rolling
upgrade. In that case, the watcher service would fail to start even though it could reasonably keep writing without its
templates.

@masseyke
Copy link
Copy Markdown
Member Author

Notes on testing:
I am not sure the best way to test this in an automated way. So for now I've only manually tested it. And in order to manually test it, I modified the code a little locally (this gets ugly). First I created a watch so that I'd have something to work with. I just used the example at https://www.elastic.co/guide/en/elasticsearch/reference/current/watcher-getting-started.html#log-add-input. I couldn't just delete the template because it gets automatically recreated. So I modified org.elasticsearch.xpack.core.template.IndexTemplateRegistry::addComposableTemplatesIfMissing to not create missing templates. Then I deleted the watcher history data stream and template:

curl -XDELETE localhost:9200/_data_stream/.watcher-history-14
curl -XDELETE localhost:9200/_index_template/.watch-history-14

The data stream is immediately recreated but the template is not. If I restart the server, I can see that the template is still missing and the watcher service failed to start up (before the fix):

curl -XDELETE localhost:9200/_index_template/.watch-history-14
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"unable to remove composable templates [.watch-history-14] as they are in use by a data streams [.watcher-history-14]"}],"type":"illegal_argument_exception","reason":"unable to remove composable templates [.watch-history-14] as they are in use by a data streams [.watcher-history-14]"},"status":400}

curl -XGET localhost:9200/_watcher/stats | python -mjson.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   247  100   247    0     0  17642      0 --:--:-- --:--:-- --:--:-- 17642
{
    "_nodes": {
        "failed": 0,
        "successful": 1,
        "total": 1
    },
    "cluster_name": "elasticsearch",
    "manually_stopped": false,
    "stats": [
        {
            "execution_thread_pool": {
                "max_size": 0,
                "queue_size": 0
            },
            "node_id": "4gZEGMNaRN2-GAyE9S4kxA",
            "watch_count": 0,
            "watcher_state": "stopped"
        }
    ]
}

I applied the fix in this PR, and then confirmed that the watcher service came up, and that the .watcher-history-14 index was getting new writes with no template. I then restored the code that auto-creates the templates, restarted the server, and made sure that the .watcher-history-14 index was still getting new writes (just a sanity check).

@@ -139,8 +139,7 @@ public boolean validate(ClusterState state) {
// template check makes only sense for non existing indices, we could refine this
boolean hasValidWatcherTemplates = WatcherIndexTemplateRegistry.validate(state);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this check entirely from here since it also checked on reload.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not worth doing the check for the debug log? I thought it might be useful so I left it. But I've never done any troubleshooting on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly..but this happens on the cluster state listener thread and less work done there the better.

@jakelandis
Copy link
Copy Markdown
Contributor

Also in the commit message, can you make reference to #69328 that greatly reduces the need to check for templates since 2/3 templates it used to check for are now managed as system indices. As well as related to #82109

@masseyke masseyke marked this pull request as ready for review January 13, 2022 19:05
@elasticmachine elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Jan 13, 2022
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Copy Markdown
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@masseyke masseyke merged commit 5369bd6 into elastic:master Jan 13, 2022
@masseyke masseyke deleted the fix/prevent-watcher-fail-on-missing-templates branch January 13, 2022 19:20
masseyke added a commit to masseyke/elasticsearch that referenced this pull request Jan 13, 2022
…stic#82395)

This commit prevents watcher from failing to start if its templates are unavailable. Previously the watcher service
would fail to start if (for example) the .watch-history-14 template did not exist. This could happen during a rolling
upgrade. In that case, the watcher service would fail to start even though it could reasonably keep writing without its
templates.
Relates elastic#69328 elastic#82109
masseyke added a commit to masseyke/elasticsearch that referenced this pull request Jan 13, 2022
…stic#82395)

This commit prevents watcher from failing to start if its templates are unavailable. Previously the watcher service
would fail to start if (for example) the .watch-history-14 template did not exist. This could happen during a rolling
upgrade. In that case, the watcher service would fail to start even though it could reasonably keep writing without its
templates.
Relates elastic#69328 elastic#82109
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.0
7.17

elasticsearchmachine pushed a commit that referenced this pull request Jan 13, 2022
) (#82573)

This commit prevents watcher from failing to start if its templates are unavailable. Previously the watcher service
would fail to start if (for example) the .watch-history-14 template did not exist. This could happen during a rolling
upgrade. In that case, the watcher service would fail to start even though it could reasonably keep writing without its
templates.
Relates #69328 #82109
elasticsearchmachine pushed a commit that referenced this pull request Jan 13, 2022
) (#82572)

This commit prevents watcher from failing to start if its templates are unavailable. Previously the watcher service
would fail to start if (for example) the .watch-history-14 template did not exist. This could happen during a rolling
upgrade. In that case, the watcher service would fail to start even though it could reasonably keep writing without its
templates.
Relates #69328 #82109
@albertzaharovits albertzaharovits changed the title Preventing watcher from starting up if its templates are missing Prevent watcher from starting if its templates are missing Jan 25, 2022
@jakelandis jakelandis changed the title Prevent watcher from starting if its templates are missing Prevent watcher from failing to start if its templates are missing Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants