Conversation
|
Build break is fixed in #750. Orchestra runner apparently doesn't have runner_parameters. |
* master: Not all runners (e.g. orchestra) have runner_parameters
cognifloyd
left a comment
There was a problem hiding this comment.
Here are some suggestions to make this flow a bit better. 👍
docs/source/upgrade_notes.rst
Outdated
| |st2| v2.9 | ||
| ---------- | ||
|
|
||
| * |st2| timers used to be run as part of ``st2rulesengine`` process until versions older than ``v2.9``. |
There was a problem hiding this comment.
s/until/in/
"until versions older than" feels awkward to me. "in versions older than" would flow a bit better.
There was a problem hiding this comment.
Maybe reword the first couple sentences (everything is the same after ``st2timersengine`` is the new...):
* |st2| timers moved from the ``st2rulesengine`` to the ``st2timersengine`` in ``v2.9``. Moving timers
out of the rules engine allows scaling rules and timers independently. ``st2timersengine`` is the new
process that schedules all the user timers. Please note that when upgrading from older versions, you
will need to carefully accept changes to ``st2.conf`` file. Otherwise, you risk losing access to
``st2`` database in MongoDB.
docs/source/upgrade_notes.rst
Outdated
| local_timezone = America/Los_Angeles | ||
| logging = conf/logging.timersengine.conf | ||
|
|
||
| Though ``timer`` section in config is supported for backward compatibility, it is recommended to |
There was a problem hiding this comment.
Possible alternate wording for this section:
We recommend renaming the ``timer`` config section to ``timersengine``. Though deprecated, using the
``timer`` section is still supported for backwards compatibility. In a future release, support for
the ``timer`` section will be removed and ``timersengine`` will be the only way to configure timers.
|
@cognifloyd bdf4561. I dropped some "the" because it sounded like there is a specific one but in HA there are multiple rules engine. So the definite article seemed unnecessary. LMK what you think. |
* master: (37 commits) Update roadmap with 2.8 release Fix typo. Fix invalid syntax. Generate winrm runner parameters tables. Use include instead of copy and paste. Update version to 2.9dev Add some docs on listing differently scoped datastore items. Update version info for release - 2.8.0 Some rewording and clarification. Clarify remote_user and remote_addr need to come in as CGI environment values and not as headers. Also add a note on the Upgrades page. Fix syntax. Add info on verifying that service has been started. Add a link. Add upgrade notes section for v2.8 release. Add the tags fields for actions And more info about the timezone format for core.st2.CronTimer trigger Replaced 'st2' to the macro that replaces to the product name of this as with others Fixed a minor typo in the webhooks page Update mistral.rst ...
docs/source/reference/ha.rst
Outdated
| ``st2timersengine`` is responsible for scheduling all user specified timers. See | ||
| :ref:`timers <ref-rule-timers>` for the specifics on setting up timers via rules. | ||
|
|
||
| You have to have one active ``st2timersengine`` process running to schedule all timers. This is trivial to setup in Kubernetes so there is exactly one active container running ``st2timersengine`` process. Failover is handled natively by Kubernetes. In non Kubernetes deployments, external monitoring needs to setup and a new ``st2timersengine`` process needs to be spun up to address failover. |
There was a problem hiding this comment.
In HA doc I don't think we should mention any Kubernetes specifics, but have general descriptions as we do for all other services.
There was a problem hiding this comment.
Yeah, I thought about it but then we need to say how to handle failover. Should we just leave out the k8s part?
There was a problem hiding this comment.
This sounds like we don't have any way to promise timers in HA. I mean can't run at least 2+ instances.
Failover in K8s with running 1 single node is the same as running 1 timersengine service with systemd which will restart the process on failure.
While Kubernetes can "guarantee something", obviously container/process could be killed by whatever reasons and there is no other timersengine that will keep running.
What happens if no timersengine is available at the moment (say it's restarting), will the missed events be rescheduled or lost?
I think it's worth mentioning what happens in scenario when timersengine is not available, if we can't guarantee HA for it. Additionally, what's needed for timersengine to run properly (DB, MQ, anything else), as it's described for other services. Are there any other services that rely on timers functionality?
There was a problem hiding this comment.
Per my understanding, the only purpose is that by extracting timers into a separated singleton service we can run 2+ instances of st2rulesengine?
There was a problem hiding this comment.
While Kubernetes can "guarantee something", obviously container/process could be killed by whatever reasons and there is no other timersengine that will keep running.
Won't this be taken care of by Kubernetes though? Isn't that the whole point of using it?
This sounds like we don't have any way to promise timers in HA. I mean can't run at least 2+ instances.
Correct, this is what we decided to do with timers. If we decide to solve this, then we need to look at leader election which we intentionally decided to avoid. See https://github.com/StackStorm/discussions/issues/305
I think it's worth mentioning what happens in scenario when timersengine is not available, if we can't guarantee HA for it.
There is no A at that point leave alone HA :). It goes without saying IMO but I'll make that explicit.
Additionally, what's needed for timersengine to run properly (DB, MQ, anything else), as it's described for other services. Are there any other services that rely on timers functionality?
+1
There was a problem hiding this comment.
Pod/container failure will lead to reschedule/restart. What happens in between is a downtime.
Related to that, we'll need to add /status for each st2 service https://github.com/StackStorm/k8s-st2/issues/5 so K8s can control the reschedule and knows whether the service is really alive or just sits there in any manner of deadlock, spinning the cycles and being actually non-responsive. See StackStorm/st2#4020
With no big A in HA, we still can haz High 😃
|
Didn't know we have a doc for this. Good to find it 👍 Closes #766 |
|
|
||
| You have to have exactly one active ``st2timersengine`` process running to schedule all timers. | ||
| Having more than one active ``st2timersengine`` will result in duplicate timer events and therefore | ||
| duplicate rule evaluations leading to duplicate workflows or actions. |
There was a problem hiding this comment.
Very important detail here was just documented 👍
|
@Kami I guess you wanted to merge it, but closed instead? |
DO NOT MERGE UNTIL StackStorm/st2#4180 is merged.
Related: StackStorm/st2-packages#564
Closes #766