Skip to content

Use watch instead of list pods in node controller#1129

Merged
centaurus-cloud-bot merged 4 commits intoCentaurusInfra:masterfrom
Sindica:taintmanager-migration
Aug 9, 2021
Merged

Use watch instead of list pods in node controller#1129
centaurus-cloud-bot merged 4 commits intoCentaurusInfra:masterfrom
Sindica:taintmanager-migration

Conversation

@Sindica
Copy link
Collaborator

@Sindica Sindica commented Jul 28, 2021

Reduced # of list pods requests to ETCD. 1 request per 3 second, 1 call per node. This change significantly reduced system load during large cluster density test.

@Sindica
Copy link
Collaborator Author

Sindica commented Aug 2, 2021

Can someone review this PR?

@yb01
Copy link
Collaborator

yb01 commented Aug 2, 2021

can we test this change with a 1tp 2 rp cluster with smaller number of nodes?

@yb01
Copy link
Collaborator

yb01 commented Aug 2, 2021

/lgtm.
please run a small cluster test with 1tp 2rp to validate the functionality of the change in scaleout cases.

Copy link
Collaborator

@yb01 yb01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@Sindica
Copy link
Collaborator Author

Sindica commented Aug 3, 2021

/lgtm.
please run a small cluster test with 1tp 2rp to validate the functionality of the change in scaleout cases.

I tested in 2tp2rp 2x100 environment, perf test succeeded. However, we got unexpected error in KCM:
E0802 20:59:05.761554 1 taint_manager.go:466] Index with name spec.nodeName does not exist

This happened in 1tp1rp test as well. I will investigate. Otherwise, perf test works fine.

@Sindica
Copy link
Collaborator Author

Sindica commented Aug 3, 2021

/lgtm.
please run a small cluster test with 1tp 2rp to validate the functionality of the change in scaleout cases.

I tested in 2tp2rp 2x100 environment, perf test succeeded. However, we got unexpected error in KCM:
E0802 20:59:05.761554 1 taint_manager.go:466] Index with name spec.nodeName does not exist

This happened in 1tp1rp test as well. I will investigate. Otherwise, perf test works fine.

The error seems happen only when cluster start up, not 100% occur, seems depends on the initialization time of the indexer and referral. And only once per node. Will add a log into POC branch to make sure it does not affect the actual index function.

@Sindica Sindica force-pushed the taintmanager-migration branch from 08d635c to b036dc5 Compare August 5, 2021 19:40
@Sindica Sindica force-pushed the taintmanager-migration branch from b036dc5 to aa73519 Compare August 5, 2021 19:42
@Sindica
Copy link
Collaborator Author

Sindica commented Aug 5, 2021

/lgtm.
please run a small cluster test with 1tp 2rp to validate the functionality of the change in scaleout cases.

I tested in 2tp2rp 2x100 environment, perf test succeeded. However, we got unexpected error in KCM:
E0802 20:59:05.761554 1 taint_manager.go:466] Index with name spec.nodeName does not exist
This happened in 1tp1rp test as well. I will investigate. Otherwise, perf test works fine.

The error seems happen only when cluster start up, not 100% occur, seems depends on the initialization time of the indexer and referral. And only once per node. Will add a log into POC branch to make sure it does not affect the actual index function.

Index was not added onto podinformer from tenant managers because informer was started before adding index. Rearranged informer start time, density test on 1x1x100 cluster. Index was properly added and expected debug log appeared.

@Sindica Sindica closed this Aug 5, 2021
@Sindica Sindica reopened this Aug 5, 2021
Copy link
Collaborator

@yb01 yb01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@yb01
Copy link
Collaborator

yb01 commented Aug 9, 2021

re-approve, the most recent scale test runs fine

@zmn223
Copy link
Collaborator

zmn223 commented Aug 9, 2021

/approve

@centaurus-cloud-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yb01, zmn223

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@centaurus-cloud-bot centaurus-cloud-bot merged commit e270ec9 into CentaurusInfra:master Aug 9, 2021
@Sindica Sindica modified the milestones: 930, v0.9 Sep 15, 2021
@Sindica Sindica deleted the taintmanager-migration branch December 2, 2021 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants