Skip to content

Low: sbd: inform the user to restart the sbd service#117

Merged
wenningerk merged 1 commit intoClusterLabs:masterfrom
aleksei-burlakov:sbd-create
Nov 5, 2020
Merged

Low: sbd: inform the user to restart the sbd service#117
wenningerk merged 1 commit intoClusterLabs:masterfrom
aleksei-burlakov:sbd-create

Conversation

@aleksei-burlakov
Copy link
Copy Markdown

No description provided.

@knet-ci-bot
Copy link
Copy Markdown

Can one of the admins verify this patch?

@kgaillot
Copy link
Copy Markdown

ok to test

@wenningerk
Copy link
Copy Markdown

mhh ... why did that just trigger travis ...

@wenningerk
Copy link
Copy Markdown

test this please

@wenningerk
Copy link
Copy Markdown

Giving this hint is probably a good idea.
Having added/moved a few arch/default-config things to autoconf recently this might be an opportunity to have the system-start-flavor available in the code in an easy to consume way. Then we could give more dedicated tips - or for the first just make the systemd one just appear when running on systemd-distribution.
Will look into it next week ...

@aleksei-burlakov
Copy link
Copy Markdown
Author

@wenningerk , @kgaillot may we merge this PR?

@wenningerk
Copy link
Copy Markdown

wenningerk commented Oct 5, 2020

Sry ... that slipped a bit. Haven't brought in the system-start-flavor stuff yet. But of course we can merge it as is and modify later.
Maybe we should mention that service should be restarted on all nodes.
Best would actually be to assure somehow that sbd is down before initialization. But atm I don't really see how to achieve this. We could go through the claimed slots and issue a 'test' command (slot-ping) and verify that it isn't cleared within a certain timeout.
With a force option we could fall back to the current behavior.
Guess we don't have anything atm that would prevent a slot from being claimed multiple times if we are doing restarts in an unsynced manner. While one node is restarting and claiming a wiped slot another node may still write to that slot.
So maybe we should rather advise to first stop corosync on all nodes. Do the wipe and the start on the nodes.
If we don't find anything easy to do here we might say something like:
"Don't forget to start services again on all nodes. (e.g. systemctl start pacemaker - should have brought the service down first on all nodes with e.g. systemctl stop corosync)"

@aleksei-burlakov
Copy link
Copy Markdown
Author

aleksei-burlakov commented Oct 7, 2020

we can merge it as is and modify later.

I completely agree, something more complicated would be overengineering to my mind.

Maybe we should mention that service should be restarted on all nodes.

I have amended the PR. It mentions restarting on all nodes.

Don't forget to start services again on all nodes. (e.g. systemctl start pacemaker - should have brought the service down first on all nodes with e.g. systemctl stop corosync)

I think it would be too lengthy. I would rather make a small notify, rather than give a manual on the pacemaker+sbd.

@wenningerk
Copy link
Copy Markdown

I think it would be too lengthy. I would rather make a small notify, rather than give a manual on the pacemaker+sbd.
Agreed - on the other hand if we give an advice we foster the feeling to be on the safe side if it is followed.
And an unsynced restart of the nodes is potentially risky.
We could as well do something similar as with watchdog-testing when on a tty and warn before actually writing to the disk.
"Did you assure that sbd is down on all nodes? [ABORT/Proceed]"
would definitely be shorter. + probably a force option for the daring ones.

@aleksei-burlakov
Copy link
Copy Markdown
Author

I think it would be too lengthy. I would rather make a small notify, rather than give a manual on the pacemaker+sbd.
Agreed - on the other hand if we give an advice we foster the feeling to be on the safe side if it is followed.
And an unsynced restart of the nodes is potentially risky.
We could as well do something similar as with watchdog-testing when on a tty and warn before actually writing to the disk.
"Did you assure that sbd is down on all nodes? [ABORT/Proceed]"
would definitely be shorter. + probably a force option for the daring ones.

I would rather stick to the brief description in the shorted message that is still good enough. If we want to share more information, we could use the journalctl catalogs (those in /usr/lib/systemd/catalog/) and the sd_journal_send function instead of the fprintf. (This would however bring more overhead because someone would have to keep those catalogs up to date).

Actually, we are discussing the possibility of using the catalogs in the pacemaker and corosync as well. @kgaillot @wenningerk @jfriesse What do you think about it?

@jfriesse
Copy link
Copy Markdown
Member

jfriesse commented Nov 4, 2020

I think it would be too lengthy. I would rather make a small notify, rather than give a manual on the pacemaker+sbd.
Agreed - on the other hand if we give an advice we foster the feeling to be on the safe side if it is followed.
And an unsynced restart of the nodes is potentially risky.
We could as well do something similar as with watchdog-testing when on a tty and warn before actually writing to the disk.
"Did you assure that sbd is down on all nodes? [ABORT/Proceed]"
would definitely be shorter. + probably a force option for the daring ones.

I would rather stick to the brief description in the shorted message that is still good enough. If we want to share more information, we could use the journalctl catalogs (those in /usr/lib/systemd/catalog/) and the sd_journal_send function instead of the fprintf. (This would however bring more overhead because someone would have to keep those catalogs up to date).

Actually, we are discussing the possibility of using the catalogs in the pacemaker and corosync as well. @kgaillot @wenningerk @jfriesse What do you think about it?

It's something we may discuss. I would start with bringing support to libqb to solve portability problems and then what messages should get into catalog?

@aleksei-burlakov
Copy link
Copy Markdown
Author

aleksei-burlakov commented Nov 4, 2020

It's something we may discuss. I would start with bringing support to libqb to solve portability problems.

I will make a short demo in two weeks and if they approve my example (https://github.com/aleksei-burlakov/corosync/commit/70551381b4e80c7ba2117d4821eece2b9a39a3e1) I will enable the MESSAGE_ID field in the libqb.

and then what messages should get into catalog?

The same that in journactl, but in more detail. The motivation is that there will be many SAP HANA users who migrate from Windows to linux, and they might already know what the journalctl is, but they can't read it. The catalog, as you know may have not only a longer description but also web references.

@jfriesse
Copy link
Copy Markdown
Member

jfriesse commented Nov 4, 2020

It's something we may discuss. I would start with bringing support to libqb to solve portability problems.

I will make a short demo in two weeks and if they approve my example (aleksei-burlakov/corosync@7055138) I will enable the MESSAGE_ID field in the libqb.

Ok. I will probably rather wait for final demo, but right now it looks scary - too much totally non-portable code (keep in mind at least corosync supports FreeBSD without any problems) and if every single line of log would become +10 lines of code then it's just NACK (that's why I've suggested adding libqb support first).

and then what messages should get into catalog?

The same that in journactl, but in more detail. The motivation is that there will be many SAP HANA users who migrate from

Eh. there is like 450+ LOGSYS_LEVEL_ occurences in the corosync code, so have all of them in catalog looks like a huge amount of work.

Windows to linux, and they might already know what the journalctl is, but they can't read it. The catalog, as you know may have not only a longer description but also web references.

Yup, that would be nice, but honestly, not sure if really make sense to have every single log line with its own web page.

@aleksei-burlakov
Copy link
Copy Markdown
Author

aleksei-burlakov commented Nov 4, 2020

Ok. I will probably rather wait for final demo, but right now it looks scary - too much totally non-portable code (keep in mind at least corosync supports FreeBSD without any problems) and if every single line of log would become +10 lines of code then it's just NACK (that's why I've suggested adding libqb support first).

Sorry for the confusion, the example is just to show how it should work. How it should look like is that instead of

log_printf (log_level, "The message..."

we will write something like

log_printf_catalog ("52fb62f99e2c49d89cfbf9d6de5e3555", log_level, "Rest is the same.."

And the catalog is /usr/lib/systemd/catalog/corosync.catalog

-- 52fb62f99e2c49d89cfbf9d6de5e3555
Subject: The corosync 
Defined-By: corosync
Support: http://google.com

The long description here

Eh. there is like 450+ LOGSYS_LEVEL_ occurences in the corosync code, so have all of them in catalog looks like a huge amount of work.
Yup, that would be nice, but honestly, not sure if really make sense to have every single log line with its own web page.

This is an optional feature. We might leave the self-explanatory messages as they are and describe the rest in catalogs. And optionally refer to a web-page. see more on https://xkcd.com/1024/

@wenningerk
Copy link
Copy Markdown

wenningerk commented Nov 4, 2020

Actually, we are discussing the possibility of using the catalogs in the pacemaker and corosync as well. @kgaillot @wenningerk @jfriesse What do you think about it?

Probably something to have a look at.
I like the idea of picking a candidate to experiment with - one with most issues expected is probably best.
On the other hand knet & libqb are special in a way that they are of course used by systemd services but there is no systemd service knet & libqb while there is for corosync, pacemaker, sbd ...
Would be cool if we could agree on a strategy for the whole ha-stack.
The web-link might be a reference to a chapter of our upstream-online-documentation - right?
Possibly adapted when packaged for downstream ... quite some effort ... do we have examples to look at but systemd?

@kgaillot
Copy link
Copy Markdown

kgaillot commented Nov 4, 2020

Actually, we are discussing the possibility of using the catalogs in the pacemaker and corosync as well. @kgaillot @wenningerk @jfriesse What do you think about it?

My first impression is that it would open interesting possibilities, but they're unlikely to be worth the maintenance burden. I wouldn't mind seeing a demo though.

This is actually my first encounter with the systemd catalog, and while it's a neat feature, I don't see anything but systemd using it, and I wonder how many users are using "journalctl -x" to look at logs. Are you aware of some larger effort to promote awareness of the catalog?

The current libqb journal target does:

sd_journal_send("PRIORITY=%d", final_priority,
                            "CODE_LINE=%d", cs->lineno,
                            "CODE_FILE=%s", cs->filename,
                            "CODE_FUNC=%s", cs->function,
                            "SYSLOG_IDENTIFIER=%s", t->name,
                            "MESSAGE=%s", output_buffer,
                            NULL);

If I understand your suggestion, we could add new versions of the libqb logging functions that take one extra argument (the MESSAGE_ID), and the call above would pass along the MESSAGE_ID. Our projects would need to turn on QB_LOG_CONF_USE_JOURNAL (pacemaker doesn't currently), and we'd probably need a libqb function to set the catalog name.

Having to generate a MESSAGE_ID with something like "journalctl --new-id128", and having an intimidating, unreadable string obscuring an entire line of code, every time we want to code a log message, seems like something devs will avoid unless absolutely necessary, and casual contributors would likely be put off by having to understand it. But if we don't do it for most log messages, users are going to give up checking the catalog when 99 times out of 100 they don't find anything useful.

Every time we changed a log message, we'd have to check whether it has a corresponding catalog entry, and decide whether it needs to be updated too. If we're not strict about that, over time we're likely to get into situations where the catalog entry no longer makes sense.

On the positive side, being able to add detailed information that a one-line message can't convey could be very helpful to users, and it opens the door for native-language translations and easier overriding of messages by distros. But again the maintenance of those would be a lot of work.

@jfriesse
Copy link
Copy Markdown
Member

jfriesse commented Nov 5, 2020

Ok. I will probably rather wait for final demo, but right now it looks scary - too much totally non-portable code (keep in mind at least corosync supports FreeBSD without any problems) and if every single line of log would become +10 lines of code then it's just NACK (that's why I've suggested adding libqb support first).

Sorry for the confusion, the example is just to show how it should work. How it should look like is that instead of

log_printf (log_level, "The message..."

we will write something like

log_printf_catalog ("52fb62f99e2c49d89cfbf9d6de5e3555", log_level, "Rest is the same.."

Yup, that looks good, eventho as @kgaillot said I'm afraid of generating this ID (especially if it should be for every single log message). Tho I think it would make sense to generate this only for really important messages where longer explanation of what is happening (and/or how to prevent such situation) is most helpful and leave rest of messages (for example DEBUG ones) as they are.

And the catalog is /usr/lib/systemd/catalog/corosync.catalog

-- 52fb62f99e2c49d89cfbf9d6de5e3555
Subject: The corosync 
Defined-By: corosync
Support: http://google.com

The long description here

Eh. there is like 450+ LOGSYS_LEVEL_ occurences in the corosync code, so have all of them in catalog looks like a huge amount of work.
Yup, that would be nice, but honestly, not sure if really make sense to have every single log line with its own web page.

This is an optional feature. We might leave the self-explanatory messages as they are and describe the rest in catalogs. And optionally refer to a web-page. see more on https://xkcd.com/1024/

:) Got it

@aleksei-burlakov
Copy link
Copy Markdown
Author

But again the maintenance of those would be a lot of work.

That's true. But consider the msdn. The newcomers from windows are used to such things. I will make a PR or rise the discussion on the mailing list on the next sprint and we can agree if it pays off.

@aleksei-burlakov
Copy link
Copy Markdown
Author

@wenningerk may we merge this PR as is?

@wenningerk
Copy link
Copy Markdown

wenningerk commented Nov 5, 2020

@wenningerk may we merge this PR as is?

We can go with a short message - at least for now. But my arguments against something that looks as if a sequential restart after purging the device would be the preferred approach are still standing.
Maybe "Did you check sbd service down on all nodes before? If not do so now and restart afterwards.".
More detailed advice on how to do that I would leave for high-level-tooling as they can give adequate tips.
Using sbd-cmdline-tool directly is anyway kind of reserved for people who know what they are doing - but still might have forgotten to check something why issuing a message here is a good idea in general.

@aleksei-burlakov
Copy link
Copy Markdown
Author

@wenningerk may we merge this PR as is?

We can go with a short message - at least for now. But my arguments against something that looks as if a sequential restart after purging the device would be the preferred approach are still standing.
Maybe "Did you check sbd service down on all nodes before? If not do so now and restart afterwards.".
More detailed advice on how to do that I would leave for high-level-tooling as they can give adequate tips.
Using sbd-cmdline-tool directly is anyway kind of reserved for people who know what they are doing - but still might have forgotten to check something why issuing a message here is a good idea in general.

I've amended the PR.

@wenningerk wenningerk merged commit 507bd5f into ClusterLabs:master Nov 5, 2020
@kgaillot
Copy link
Copy Markdown

kgaillot commented Nov 5, 2020

Tho I think it would make sense to generate this only for really important messages where longer explanation of what is happening (and/or how to prevent such situation) is most helpful and leave rest of messages (for example DEBUG ones) as they are.

That's true, info and lower severity aren't important for catalog purposes. That's still a lot for Pacemaker (actual numbers are a little higher than this, due to some lesser used logging functions I didn't count):

crit 47
err 765
warn 312
notice 235

Of course some of those are self-explanatory and others could share a message (e.g. out of memory). But I do suspect we'd need strong coverage of crit/err messages in order for users to think it's worth the effort to check the catalog.

@kgaillot
Copy link
Copy Markdown

kgaillot commented Nov 5, 2020

But consider the msdn. The newcomers from windows are used to such things. I will make a PR or rise the discussion on the mailing list on the next sprint and we can agree if it pays off.

Good idea. I agree it would be a cool feature, I'm just concerned about code readability and maintainability. One idea that might help is if we pre-generate a large number of codes (~2,000) and stuck them in a header with defines, like

#define PCMK__MSG1 "abcdef1..."
#define PCMK__MSG2 "abcdef2..."
...

so then the libqb calls could look something like

qb_logm(LOG_ERR, PCMK__MSG1, "format", args);

and the pacemaker wrapper macro call could look like

pcmk_log_err(1, "format", args);

The libqb function should accept a NULL message ID and behave like the usual (non-message-id) function in that case. Then we could define PCMK__MSG0 as NULL, and use pcmk_log_err(0, ...) for calls without catalog entries.

@kgaillot
Copy link
Copy Markdown

kgaillot commented Nov 5, 2020

and the pacemaker wrapper macro call could look like

pcmk_log_err(1, "format", args);

Of course we'd need to check for the new libqb capability in configure.ac, and define the macros accordingly.

Also, pre-generating the IDs means we'd have to track what the last used one is, maybe just with a comment at the top of the header. Subject to human error, but hopefully we'd get used to it.

@wenningerk
Copy link
Copy Markdown

Not so sure any more if merging that message was such a good idea.
Doing a disk-init on an operational cluster should be quite safe due to the uuid in the header changing.
Do you have a use-case where it explicitly didn't work?

@gao-yan
Copy link
Copy Markdown
Member

gao-yan commented Dec 8, 2020

It's not that it wouldn't work :-) A device might be re-created with different settings though. One might think sbd daemon would pick up a new value of watchdog timeout automatically, which is not the case ...

@wenningerk
Copy link
Copy Markdown

aah ... that was the background.
then we probably should work on that.
the inquisitor is getting the io-error exit-code so it could update.
although this is of course delicate as we can't just tear down the configuration including watchdog and set it up new as we have to assure that watchdog-observation is uninterrupted.

@gao-yan
Copy link
Copy Markdown
Member

gao-yan commented Dec 9, 2020

Yes, it'd be a nice enhancement for sure but need to be carefully thought through.

I assume timeout value of a opened watchdog device could be changed with ioctl(). There are other things I can think of so far that'd need to be considered though:

  • It should be ensured that changes on the fly get reflected on every node at the same time, but the availability of a device on other nodes is unknown.

  • Change of loop timeout would affect other servants if to support this on the fly.

  • Things could become trickier if multiple devices were configured, where there would be a period when settings among devices might be inconsistent, meanwhile again availability of them on other nodes is unknown ...

@wenningerk
Copy link
Copy Markdown

Yep watchdog-devices should be able to be reprogrammed. Btw. that might imply a watchdog-trigger which would be relevant triggered in a situation where sbd would have stopped triggering for some reason.
That was just an example for why we can't go the same route here as with the disk-watcher when it detects a uuid-change (exit and restart).
If we have disks we are not sure if the other node really picked up the changes. In case of diskless operation we don't even have that way of communication. That touches btw. the issues that are coming up when we want to enable a scenario that supports watchdog-fencing on a cluster where some nodes don't have a proper watchdog and thus can't run sbd properly, the watchdog isn't set to matching timeouts (caveat with setting stonith-watchdog-timeout='-1'), ...
I'm aware of a lot of caveats in this area and thinking about the issue makes them more and more.
Sorry if my comment above didn't sound cautious enough ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants