(WIP) Make controller go through attribute manager to clear transient attributes. by HideoYamauchi · Pull Request #1699 · ClusterLabs/pacemaker

HideoYamauchi · 2019-02-21T23:31:09Z

Hi All,

This is for discussion only.
I have confirmed the rough behavior, but the source has not been completed yet.
I think there are many missing fixes.

This PR is based on the following Ken's PR as a base.

(WIP) Make controller go through attribute manager to clear transient attributes #1695

I made the following changes as the first realization method.

It is unknown whether it can cope with clusters with many nodes or when there are many attributes.
In my modification, since clearing of attributes is handled in smaller units, improvement may be necessary.

Mark when receiving the ATTRD_OP_PEER_CLEAR message.
In order to cope with crmd restarting, we will not immediately delete the attribute.
For remote node attributes, delete them immediately.
Deletes the attribute when the mark of the request to delete the attribute from crmd and the event of the node leaving attrd are collected.(It should be noted that the order in which this mark and the event occur may be reversed.)
With this fix, attributes are always deleted via the ATTRD_OP_PEER_CLEAR message.
Although this fix alone does not solve it, it seems to be effective even if only attrd is restarted in the future.
Add ATTRD_OP_ATTR_REMOVE as a new message.
After completing the update of cib, the Writer is delete the memory attribute of attrd.
At this time, send ATTRD_OP_ATTR_REMOVE to the attrd member of the cluster.
Other attrds that received ATTRD_OP_ATTR_REMOVE also delete the memory attributes.
In order to reliably synchronize the attributes in the cluster, we have made such a mechanism.
It does not incorporate the correspondence of the old CRM_ATTR_PROTOCOL version.
When the attrd leaving the cluster once again participates, the processing flag is cleared.
It is when attrd received the CRM_ATTR_PROTOCOL message
Actually, you should clear the processing flag when crmd comes in, but attrd can not detect it at the moment.
Forced write decision by force_write has been moved to write_attributes () write_attribute ().
This is because writing from write_attribute () may occur....
For now it seems to be difficult to delete the attribute of an isolated attrd if crmd does not restart.
etc..

This PR has already found the next problem.

The flag is cleared incompletely, after crmd restarts, the attribute is deleted when attrd restarts.(Need something good way)
ATTRD_OP_PEER_CLEAR will not be sent from the crmd of the new DC node after the cluster without STONITH is broken.
To @kgaillot : Correction to send ATTRD_OP_PEER_CLEAR message from new DC node is necessary. Do you know where you need to make corrections?

The problem of attrd remaining after this correspondence.

However, if Election is delayed, problems remain. (https://bugs.clusterlabs.org/show_bug.cgi?id=5358#c21).
Currently, setting the "transition-delay" only prevents the problem.
Unlike this problem, we must also take into consideration that the attribute will be lost if only attrd is restarted in the future.
The current attrd deletes the attribute when it starts up.
Even if we exchange sync_response messages with other nodes, we reset them with NULL.
As a result, when attrd restarts, its attributes disappear.
Especially when configuring a cluster with a single node, the attribute is completely lost.
For example, when restarting attrd, it may be effective not to erase the attribute by pacemakerd passing the parameter.

Best Regards,
Hideo Yamauchi.

…ibutes.

HideoYamauchi · 2019-02-28T23:37:36Z

IMO,I have not confirmed the correction method, but I think that it is better for Writer to erase it in a stroke in the same way as it did with erase_status_tag ().

aleksei-burlakov · 2020-02-12T16:46:52Z

@HideoYamauchi , @kgaillot may we merge this PR, or at least the #1693 ?

HideoYamauchi · 2020-02-13T00:28:50Z

Hi aleksei,

This fix is fairly old and experimental and not completed.
Therefore, this fix cannot be merged.

I think we need to reconsider again.

Best Regards,
Hideo Yamauchi.

aleksei-burlakov · 2020-02-13T08:00:51Z

Hi aleksei,

This fix is fairly old and experimental and not completed.
Therefore, this fix cannot be merged.

I think we need to reconsider again.

Best Regards,
Hideo Yamauchi.

@HideoYamauchi , @kgaillot what if we merge the #1693 ? It should not make any regression.

kgaillot · 2020-02-13T15:39:35Z

@HideoYamauchi , @kgaillot what if we merge the #1693 ? It should not make any regression.

Unfortunately it would reintroduce a crash at scale. For the time being it's a choice of lesser evils, normal operation of pacemaker at scale being more important than corner cases when daemons respawn.

This is a tricky problem. Moving the functionality to attrd makes sense and avoids issues with the controller respawning, and should have fewer corner cases, but we do have to watch out for the corner cases that do remain, and we have to handle backward compatibility in mixed-version clusters where some nodes have the new feature and some don't.

HideoYamauchi · 2020-02-13T23:45:38Z

Hi Ken,

If there is a lot of demand for improvement of this problem, I think it is good to have the opportunity to tackle this problem again somewhere.

Best Regards,
Hideo Yamauchi.

aleksei-burlakov · 2020-03-24T08:05:48Z

I would like to take over this PR in the #2020. @HideoYamauchi , I would appreciate your comments or any help with it=)

HideoYamauchi · 2020-03-26T02:32:32Z

Hi aleksei,

I would like to take over this PR in the #2020. @HideoYamauchi , I would appreciate your comments or any help with it=)

Okay!
If you need my comments, please contact me.

Best Regards,
Hideo Yamauchi.

kgaillot · 2023-12-13T15:48:00Z

I have a new attempt at this in progress and will submit a PR when it is ready.

gao-yan · 2024-09-10T19:08:52Z

I have a new attempt at this in progress and will submit a PR when it is ready.

Hi @kgaillot , @HideoYamauchi , is there a PR about this already, or am I missing any further progress? Thanks.

kgaillot · 2024-09-10T20:00:36Z

I have a new attempt at this in progress and will submit a PR when it is ready.

Hi @kgaillot , @HideoYamauchi , is there a PR about this already, or am I missing any further progress? Thanks.

I'm about to pick this up again, had to divert to 3.0.0 compatibility breaks for a while. FYI I'm planning to release 2.1.9 in October and 3.0.0 after that.

HideoYamauchi · 2024-09-11T03:09:28Z

@gao-yan

At present, I have left the consideration of this matter to Ken.

HideoYamauchi added 3 commits February 22, 2019 08:08

Fix: controller: go through attribute manager to clear transient attr…

4e893c9

…ibutes.

Feature: attrd: new operation for clearing all of a node's attributes.

423f3df

Low : attrd: Add TODO comment so as not to forget.

188c16c

aleksei-burlakov mentioned this pull request Mar 24, 2020

* DON'T MERGE. This commit is the modified #1699. #2020

Closed

inouekazu mentioned this pull request Apr 13, 2021

Call sync() before calling sysrq_trigger(b) in panic_local() #2344

Merged

kgaillot mentioned this pull request Jun 17, 2021

Purge remote node attribute when the node is up. #2415

Merged

kgaillot closed this Dec 13, 2023

gao-yan mentioned this pull request Dec 17, 2024

Subdaemon heartbeat with modified libqb (async API for connect) #2588

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Make controller go through attribute manager to clear transient attributes.#1699

(WIP) Make controller go through attribute manager to clear transient attributes.#1699
HideoYamauchi wants to merge 3 commits intoClusterLabs:2.0from
HideoYamauchi:disscuss_first

HideoYamauchi commented Feb 21, 2019 •

edited

Loading

Uh oh!

HideoYamauchi commented Feb 28, 2019 •

edited

Loading

Uh oh!

aleksei-burlakov commented Feb 12, 2020

Uh oh!

HideoYamauchi commented Feb 13, 2020 •

edited

Loading

Uh oh!

aleksei-burlakov commented Feb 13, 2020

Uh oh!

kgaillot commented Feb 13, 2020

Uh oh!

HideoYamauchi commented Feb 13, 2020

Uh oh!

aleksei-burlakov commented Mar 24, 2020

Uh oh!

HideoYamauchi commented Mar 26, 2020

Uh oh!

kgaillot commented Dec 13, 2023

Uh oh!

gao-yan commented Sep 10, 2024

Uh oh!

kgaillot commented Sep 10, 2024

Uh oh!

HideoYamauchi commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

HideoYamauchi commented Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HideoYamauchi commented Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aleksei-burlakov commented Feb 12, 2020

Uh oh!

HideoYamauchi commented Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aleksei-burlakov commented Feb 13, 2020

Uh oh!

kgaillot commented Feb 13, 2020

Uh oh!

HideoYamauchi commented Feb 13, 2020

Uh oh!

aleksei-burlakov commented Mar 24, 2020

Uh oh!

HideoYamauchi commented Mar 26, 2020

Uh oh!

kgaillot commented Dec 13, 2023

Uh oh!

gao-yan commented Sep 10, 2024

Uh oh!

kgaillot commented Sep 10, 2024

Uh oh!

HideoYamauchi commented Sep 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HideoYamauchi commented Feb 21, 2019 •

edited

Loading

HideoYamauchi commented Feb 28, 2019 •

edited

Loading

HideoYamauchi commented Feb 13, 2020 •

edited

Loading