storage: clean up timer in cacheWatcher.add#23210
storage: clean up timer in cacheWatcher.add#23210k8s-github-robot merged 1 commit intokubernetes:masterfrom
Conversation
In the e2e benchmarks, this timer is a significant source of garbage and stale timers. Because the timer is not stopped after its use in the select, it stays in the timer heap until it eventually fires (5 seconds later). Under load, a lot of 5-second timers can pile up before any start going away. The timer heap being large makes timer operations take longer; the operations are O(log N) but N is still big. The way to fix this in current versions of Go is to stop the underlying timer explicitly, which this CL does for this one case. There are many other places in the code that use the same idiom, but those do not show up on profiles of the e2e server. I am investigating changes for Go 1.7's runtime that would make the old code behave like this new code transparently, so I don't think it's worth updating any uses of the idiom that are not in hot spots found with profiling. Measuring 'LIST nodes' latency in milliseconds during e2e test shows the benefit of this change. Using Go 1.4.2: BEFORE p50: 148±7 p90: 328±19 p99: 513±29 n: 10 AFTER p50: 151±8 p90: 339±19 p99: 479±20 n: 9 Using Go 1.6.0: BEFORE p50: 141±9 p90: 383±32 p99: 604±44 n: 11 AFTER p50: 140±14 p90: 360±31 p99: 483±39 n: 10
|
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
2 similar comments
|
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
|
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry. Otherwise, if this message is too spammy, please complain to ixdy. |
|
Labelling this PR as size/XS |
|
Nice find! |
|
LGTM, thanks! |
|
I went through fixing these at one point, but I think this usage was added more recently. |
|
@lavalamp here are the rest https://gist.github.com/ncdc/7917ee173625a746d5ae |
|
GCE e2e build/test passed for commit e4b369e. |
|
The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label? |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
GCE e2e build/test passed for commit e4b369e. |
|
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
|
GCE e2e build/test passed for commit e4b369e. |
|
Automatic merge from submit-queue |
Auto commit by PR queue bot
|
Nice - thanks @rsc ! |
In the e2e benchmarks, this timer is a significant source of garbage
and stale timers. Because the timer is not stopped after its use
in the select, it stays in the timer heap until it eventually fires
(5 seconds later). Under load, a lot of 5-second timers can pile up
before any start going away. The timer heap being large makes timer
operations take longer; the operations are O(log N) but N is still big.
The way to fix this in current versions of Go is to stop the underlying
timer explicitly, which this CL does for this one case.
There are many other places in the code that use the same idiom,
but those do not show up on profiles of the e2e server.
I am investigating changes for Go 1.7's runtime that would make
the old code behave like this new code transparently, so I don't
think it's worth updating any uses of the idiom that are not in
hot spots found with profiling.
Measuring 'LIST nodes' latency in milliseconds during e2e test
shows the benefit of this change.
Using Go 1.4.2:
BEFORE p50: 148±7 p90: 328±19 p99: 513±29 n: 10
AFTER p50: 151±8 p90: 339±19 p99: 479±20 n: 9
Using Go 1.6.0:
BEFORE p50: 141±9 p90: 383±32 p99: 604±44 n: 11
AFTER p50: 140±14 p90: 360±31 p99: 483±39 n: 10