Low: sbd: inform the user to restart the sbd service#117
Low: sbd: inform the user to restart the sbd service#117wenningerk merged 1 commit intoClusterLabs:masterfrom aleksei-burlakov:sbd-create
Conversation
|
Can one of the admins verify this patch? |
|
ok to test |
|
mhh ... why did that just trigger travis ... |
|
test this please |
|
Giving this hint is probably a good idea. |
|
@wenningerk , @kgaillot may we merge this PR? |
|
Sry ... that slipped a bit. Haven't brought in the system-start-flavor stuff yet. But of course we can merge it as is and modify later. |
I completely agree, something more complicated would be overengineering to my mind.
I have amended the PR. It mentions restarting on all nodes.
I think it would be too lengthy. I would rather make a small notify, rather than give a manual on the pacemaker+sbd. |
|
I would rather stick to the brief description in the shorted message that is still good enough. If we want to share more information, we could use the journalctl catalogs (those in /usr/lib/systemd/catalog/) and the Actually, we are discussing the possibility of using the catalogs in the pacemaker and corosync as well. @kgaillot @wenningerk @jfriesse What do you think about it? |
It's something we may discuss. I would start with bringing support to libqb to solve portability problems and then what messages should get into catalog? |
I will make a short demo in two weeks and if they approve my example (https://github.com/aleksei-burlakov/corosync/commit/70551381b4e80c7ba2117d4821eece2b9a39a3e1) I will enable the
The same that in journactl, but in more detail. The motivation is that there will be many SAP HANA users who migrate from Windows to linux, and they might already know what the journalctl is, but they can't read it. The catalog, as you know may have not only a longer description but also web references. |
Ok. I will probably rather wait for final demo, but right now it looks scary - too much totally non-portable code (keep in mind at least corosync supports FreeBSD without any problems) and if every single line of log would become +10 lines of code then it's just NACK (that's why I've suggested adding libqb support first).
Eh. there is like 450+
Yup, that would be nice, but honestly, not sure if really make sense to have every single log line with its own web page. |
Sorry for the confusion, the example is just to show how it should work. How it should look like is that instead of
we will write something like
And the catalog is
This is an optional feature. We might leave the self-explanatory messages as they are and describe the rest in catalogs. And optionally refer to a web-page. see more on https://xkcd.com/1024/ |
Probably something to have a look at. |
My first impression is that it would open interesting possibilities, but they're unlikely to be worth the maintenance burden. I wouldn't mind seeing a demo though. This is actually my first encounter with the systemd catalog, and while it's a neat feature, I don't see anything but systemd using it, and I wonder how many users are using "journalctl -x" to look at logs. Are you aware of some larger effort to promote awareness of the catalog? The current libqb journal target does: If I understand your suggestion, we could add new versions of the libqb logging functions that take one extra argument (the MESSAGE_ID), and the call above would pass along the MESSAGE_ID. Our projects would need to turn on QB_LOG_CONF_USE_JOURNAL (pacemaker doesn't currently), and we'd probably need a libqb function to set the catalog name. Having to generate a MESSAGE_ID with something like "journalctl --new-id128", and having an intimidating, unreadable string obscuring an entire line of code, every time we want to code a log message, seems like something devs will avoid unless absolutely necessary, and casual contributors would likely be put off by having to understand it. But if we don't do it for most log messages, users are going to give up checking the catalog when 99 times out of 100 they don't find anything useful. Every time we changed a log message, we'd have to check whether it has a corresponding catalog entry, and decide whether it needs to be updated too. If we're not strict about that, over time we're likely to get into situations where the catalog entry no longer makes sense. On the positive side, being able to add detailed information that a one-line message can't convey could be very helpful to users, and it opens the door for native-language translations and easier overriding of messages by distros. But again the maintenance of those would be a lot of work. |
Yup, that looks good, eventho as @kgaillot said I'm afraid of generating this ID (especially if it should be for every single log message). Tho I think it would make sense to generate this only for really important messages where longer explanation of what is happening (and/or how to prevent such situation) is most helpful and leave rest of messages (for example DEBUG ones) as they are.
:) Got it |
That's true. But consider the msdn. The newcomers from windows are used to such things. I will make a PR or rise the discussion on the mailing list on the next sprint and we can agree if it pays off. |
|
@wenningerk may we merge this PR as is? |
We can go with a short message - at least for now. But my arguments against something that looks as if a sequential restart after purging the device would be the preferred approach are still standing. |
I've amended the PR. |
That's true, info and lower severity aren't important for catalog purposes. That's still a lot for Pacemaker (actual numbers are a little higher than this, due to some lesser used logging functions I didn't count): crit 47 Of course some of those are self-explanatory and others could share a message (e.g. out of memory). But I do suspect we'd need strong coverage of crit/err messages in order for users to think it's worth the effort to check the catalog. |
Good idea. I agree it would be a cool feature, I'm just concerned about code readability and maintainability. One idea that might help is if we pre-generate a large number of codes (~2,000) and stuck them in a header with defines, like so then the libqb calls could look something like and the pacemaker wrapper macro call could look like The libqb function should accept a NULL message ID and behave like the usual (non-message-id) function in that case. Then we could define PCMK__MSG0 as NULL, and use pcmk_log_err(0, ...) for calls without catalog entries. |
Of course we'd need to check for the new libqb capability in configure.ac, and define the macros accordingly. Also, pre-generating the IDs means we'd have to track what the last used one is, maybe just with a comment at the top of the header. Subject to human error, but hopefully we'd get used to it. |
|
Not so sure any more if merging that message was such a good idea. |
|
It's not that it wouldn't work :-) A device might be re-created with different settings though. One might think sbd daemon would pick up a new value of watchdog timeout automatically, which is not the case ... |
|
aah ... that was the background. |
|
Yes, it'd be a nice enhancement for sure but need to be carefully thought through. I assume timeout value of a opened watchdog device could be changed with ioctl(). There are other things I can think of so far that'd need to be considered though:
|
|
Yep watchdog-devices should be able to be reprogrammed. Btw. that might imply a watchdog-trigger which would be relevant triggered in a situation where sbd would have stopped triggering for some reason. |
No description provided.