proposal: dynamic configuration change#13660
Conversation
Codecov Report
@@ Coverage Diff @@
## master #13660 +/- ##
================================================
- Coverage 80.0635% 79.9129% -0.1507%
================================================
Files 473 473
Lines 116440 115801 -639
================================================
- Hits 93226 92540 -686
- Misses 15924 15973 +49
+ Partials 7290 7288 -2 |
Signed-off-by: Ryan Leung <rleungx@gmail.com>
9c70ce2 to
76a534d
Compare
|
There are also several scenarios to consider, it is recommended to add to the workflow:
In addition, it is best to describe how the workdflow is in the normal machines and kubernetes environment. |
| - New cluster. Both TiDB and TiKV use the default configuration and send it to | ||
| PD to complete the registration. The registration needs to establish the mapping | ||
| relationship between the component ID, version and local configuration. For | ||
| customized requirements, such as modifying the size of block cache. It needs |
There was a problem hiding this comment.
When does the customization happen? I think it should be happened before TiKV and TiDB to register themselves. Because some configurations won't take effect after restart.
| Global global = 2; | ||
| } | ||
| string name = 3; | ||
| string value = 4; |
| files of those components can be removed and don't need to learn how those | ||
| tools works. It reduces administrative costs. For configuration options that | ||
| cannot be modified dynamically, we still can change it using this unified way, | ||
| but we need to wait for the next restart after modification to take effect. |
There was a problem hiding this comment.
We need a mechanism to tell user if they need to restart the cluster to make the modification to take effect.
| The functions of each interface are as follows: | ||
|
|
||
| *Create* is used to register a configuration to PD when the components start. | ||
| *Get* is used to get the complete configuration of the component periodically |
There was a problem hiding this comment.
Would a Watch API be helpful? Client could watch the configuration version and only Get on version change.
There was a problem hiding this comment.
We can directly use the version to decide if we need to return the configuration?
|
|
||
| *Create* is used to register a configuration to PD when the components start. | ||
| *Get* is used to get the complete configuration of the component periodically | ||
| from PD and decide if the component configuration need to update. *Update* is |
There was a problem hiding this comment.
How does Update perform concurrency control?
| These two types are used to distinguish whether the configuration is shared by | ||
| components. For example, the label configuration of TiKV is individual for each | ||
| TiKV instance. So the type should be local. Each instance here is uniquely identified | ||
| by the *component_id*, which can be obtained by hashing *IP: port*. |
There was a problem hiding this comment.
I know little about the behavior convention of components, but the IP of a logical instance would change in many scenarios, wouldn't it?
There was a problem hiding this comment.
I think this is the registered address, in k8s the instance IP may change when upgraded but we can register a persistent address for each component.
| registers or queries the component ID. By comparing the version carried by the | ||
| request with the version stored in PD, it determines whether to return the | ||
| configuration of the component in the response. After receiving the reply, TiDB | ||
| or TiKV decides whether to update the configuration or not after comparing with |
There was a problem hiding this comment.
Suppose PD lost the configurations in a disaster, can operator make configuration changes after recovery? Seems like that the changes will be ignored by component because the version in PD is less than version in component.
There was a problem hiding this comment.
IIRC, the PD recover tool will recover the cluster-id and also pick a big enough alloc-id which needs bigger than the current allocated id. So I guess when storing configuration in PD, the corresponding config version needs to be handled the same way.
| or TiKV decides whether to update the configuration or not after comparing with | ||
| the version stored in the component. | ||
|
|
||
| - Delete the node. PD can directly delete the corresponding component ID, the |
There was a problem hiding this comment.
I think there are two ways to delete a node. One is adding a delete API. Another one is using the TTL mechanism.
|
|
||
| - Add a new component or restart the component. The initialization of | ||
| configuration calls the *Create* method. After receiving the request, PD first | ||
| registers or queries the component ID. By comparing the version carried by the |
There was a problem hiding this comment.
How does a new component or a restarted component determine which version to send in the request?
There was a problem hiding this comment.
I think the version of each component also needs to persist itself.
Signed-off-by: Ryan Leung <rleungx@gmail.com>
1 similar comment
|
/label test |
|
These labels are not found test. |
|
No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md |
|
/label test,wip |
|
These labels are not found test,wip. |
1 similar comment
|
I'm going to close this PR since it's hasn't been updated for a long time, feel free to reopen it if you are planning to continue this PR in the future. Thanks for your contribution. |
Summary
This proposal proposes a unified way to manage the configuration options of TiDB, TiKV, and PD by storing them in PD and support dynamically change the configuration options by using the same way among the different components which can greatly improve usability.
Motivation
Here are some reasons why we need to do it:
pd-ctl,tikv-ctl, andSQL, resulting in poor usability. For better usability, provide a unified way to modify them dynamically.