Skip to content

Upgrade notes for new st2timersengine#749

Merged
Kami merged 9 commits intomasterfrom
k8s/split_timers
Aug 8, 2018
Merged

Upgrade notes for new st2timersengine#749
Kami merged 9 commits intomasterfrom
k8s/split_timers

Conversation

@lakshmi-kannan
Copy link
Contributor

@lakshmi-kannan lakshmi-kannan commented Jun 20, 2018

DO NOT MERGE UNTIL StackStorm/st2#4180 is merged.

Related: StackStorm/st2-packages#564

Closes #766

@lakshmi-kannan
Copy link
Contributor Author

Build break is fixed in #750. Orchestra runner apparently doesn't have runner_parameters.

* master:
  Not all runners (e.g. orchestra) have runner_parameters
Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some suggestions to make this flow a bit better. 👍

|st2| v2.9
----------

* |st2| timers used to be run as part of ``st2rulesengine`` process until versions older than ``v2.9``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/until/in/
"until versions older than" feels awkward to me. "in versions older than" would flow a bit better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reword the first couple sentences (everything is the same after ``st2timersengine`` is the new...):

* |st2| timers moved from the ``st2rulesengine`` to the ``st2timersengine`` in ``v2.9``. Moving timers
  out of the rules engine allows scaling rules and timers independently. ``st2timersengine`` is the new
  process that schedules all the user timers. Please note that when upgrading from older versions, you
  will need to carefully accept changes to ``st2.conf`` file. Otherwise, you risk losing access to
  ``st2`` database in MongoDB.

local_timezone = America/Los_Angeles
logging = conf/logging.timersengine.conf

Though ``timer`` section in config is supported for backward compatibility, it is recommended to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible alternate wording for this section:

We recommend renaming the ``timer`` config section to ``timersengine``. Though deprecated, using the
``timer`` section is still supported for backwards compatibility. In a future release, support for
the ``timer`` section will be removed and ``timersengine`` will be the only way to configure timers.

@lakshmi-kannan
Copy link
Contributor Author

@cognifloyd bdf4561. I dropped some "the" because it sounded like there is a specific one but in HA there are multiple rules engine. So the definite article seemed unnecessary. LMK what you think.

Lakshmi Kannan added 2 commits July 19, 2018 15:01
* master: (37 commits)
  Update roadmap with 2.8 release
  Fix typo.
  Fix invalid syntax.
  Generate winrm runner parameters tables.
  Use include instead of copy and paste.
  Update version to 2.9dev
  Add some docs on listing differently scoped datastore items.
  Update version info for release - 2.8.0
  Some rewording and clarification.
  Clarify remote_user and remote_addr need to come in as CGI environment values and not as headers.
  Also add a note on the Upgrades page.
  Fix syntax.
  Add info on verifying that service has been started.
  Add a link.
  Add upgrade notes section for v2.8 release.
  Add the tags fields for actions
  And more info about the timezone format for core.st2.CronTimer trigger
  Replaced 'st2' to the macro that replaces to the product name of this as with others
  Fixed a minor typo in the webhooks page
  Update mistral.rst
  ...
``st2timersengine`` is responsible for scheduling all user specified timers. See
:ref:`timers <ref-rule-timers>` for the specifics on setting up timers via rules.

You have to have one active ``st2timersengine`` process running to schedule all timers. This is trivial to setup in Kubernetes so there is exactly one active container running ``st2timersengine`` process. Failover is handled natively by Kubernetes. In non Kubernetes deployments, external monitoring needs to setup and a new ``st2timersengine`` process needs to be spun up to address failover.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In HA doc I don't think we should mention any Kubernetes specifics, but have general descriptions as we do for all other services.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about it but then we need to say how to handle failover. Should we just leave out the k8s part?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like we don't have any way to promise timers in HA. I mean can't run at least 2+ instances.

Failover in K8s with running 1 single node is the same as running 1 timersengine service with systemd which will restart the process on failure.
While Kubernetes can "guarantee something", obviously container/process could be killed by whatever reasons and there is no other timersengine that will keep running.
What happens if no timersengine is available at the moment (say it's restarting), will the missed events be rescheduled or lost?

I think it's worth mentioning what happens in scenario when timersengine is not available, if we can't guarantee HA for it. Additionally, what's needed for timersengine to run properly (DB, MQ, anything else), as it's described for other services. Are there any other services that rely on timers functionality?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my understanding, the only purpose is that by extracting timers into a separated singleton service we can run 2+ instances of st2rulesengine?

Copy link
Contributor Author

@lakshmi-kannan lakshmi-kannan Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While Kubernetes can "guarantee something", obviously container/process could be killed by whatever reasons and there is no other timersengine that will keep running.

Won't this be taken care of by Kubernetes though? Isn't that the whole point of using it?

This sounds like we don't have any way to promise timers in HA. I mean can't run at least 2+ instances.

Correct, this is what we decided to do with timers. If we decide to solve this, then we need to look at leader election which we intentionally decided to avoid. See https://github.com/StackStorm/discussions/issues/305

I think it's worth mentioning what happens in scenario when timersengine is not available, if we can't guarantee HA for it.

There is no A at that point leave alone HA :). It goes without saying IMO but I'll make that explicit.

Additionally, what's needed for timersengine to run properly (DB, MQ, anything else), as it's described for other services. Are there any other services that rely on timers functionality?

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pod/container failure will lead to reschedule/restart. What happens in between is a downtime.

Related to that, we'll need to add /status for each st2 service https://github.com/StackStorm/k8s-st2/issues/5 so K8s can control the reschedule and knows whether the service is really alive or just sits there in any manner of deadlock, spinning the cycles and being actually non-responsive. See StackStorm/st2#4020

With no big A in HA, we still can haz High 😃

@arm4b
Copy link
Member

arm4b commented Jul 19, 2018

Didn't know we have a doc for this. Good to find it 👍

Closes #766


You have to have exactly one active ``st2timersengine`` process running to schedule all timers.
Having more than one active ``st2timersengine`` will result in duplicate timer events and therefore
duplicate rule evaluations leading to duplicate workflows or actions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very important detail here was just documented 👍

@Kami Kami closed this Aug 7, 2018
@arm4b
Copy link
Member

arm4b commented Aug 7, 2018

@Kami I guess you wanted to merge it, but closed instead?

@Kami Kami reopened this Aug 7, 2018
@Kami Kami merged commit f3823fa into master Aug 8, 2018
@Kami Kami deleted the k8s/split_timers branch August 8, 2018 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants