Logs performance benchmarks#115358
Conversation
|
@serathius wrote in #115277 (comment):
With the directories set up like this, running benchmarks against some log files from the CI only need a single command as setup (the gsutil mentioned in the updated README). I could keep the symlinks out of the commit, but creating them or moving files is complicated enough that I then would have to add a script instead that users need to invoke - I don't think that this will be simpler. Without any symlinks, the same file would have to be replicated in different places (once for each verbosity level). |
|
Hmm, I had forgotten about the additional mangling of the files. Perhaps a script is better after all. |
a2caa47 to
037aba8
Compare
There was a problem hiding this comment.
Why ln -s and not mv? What's the benefit of using symlinks here?
There was a problem hiding this comment.
The same file gets referenced in two different directories and then serves as input for different test cases (default log level and v 3). I could copy it, but that seems wasteful.
There was a problem hiding this comment.
@serathius: I've added some comments to the script about this and also completed the support for running the script multiple times (would have failed when symlinks already exist).
037aba8 to
6d69ee9
Compare
|
/retest |
|
Not super convinced on necessity of symlinks in repo. Would love some second opinion. cc @dims |
|
looks like this is a developer oriented script right? let's leave it simple (and have them duplicated) |
I already removed them? |
|
/triage accepted |
|
@dims, @serathius: what exactly do you want me to change? |
6d69ee9 to
1ac27c6
Compare
I replaced with plain copies. There was a merge conflict because @serathius: please take another look. |
There was a problem hiding this comment.
Is this still needed as we removed symlinks?
There was a problem hiding this comment.
Hmm, now I regret that we are not symlinking anymore. Each individual copy of a file gets printed again, which can lead to substantial redundant output.
Let me replace with hashing the file content.
There was a problem hiding this comment.
Done. While at it, I switched from "path" to "path/filepath" because it works better on Windows.
There was a problem hiding this comment.
One problem I expect for this script is log format changing and we forget how the original logs looked. Without knowing original log lines we will loose understanding what sed script did.
Can you provide an example what you expect this sed does? A example of line before, and example of line after. Note in commit message is ok, but git history might get obfuscated.
When trying again with recent log files from the CI job, it was found that some
JSON messages get split across multiple lines, both in container logs and in
the systemd journal:
2022-12-21T07:09:47.914739996Z stderr F {"ts":1671606587914.691,"caller":"rest/request.go:1169","msg":"Response ...
2022-12-21T07:09:47.914984628Z stderr F 70 72 6f 78 79 10 01 1a 13 53 ... \".|\n","v":8}
Note the different time stamp on the second line. That first line is
long (17384 bytes). This seems to happen because the data must pass through a
stream-oriented pipe and thus may get split up by the Linux kernel.
The implication is that lines must get merged whenever the JSON decoder
encounters an incomplete line. The benchmark loader now supports that. To
simplifies this, stripping the non-JSON line prefixes must be done before using
a log as test data.
The updated README explains how to do that when downloading a CI job
result. The amount of manual work gets reduced by committing symlinks under
data to the expected location under ci-kubernetes-kind-e2e-json-logging and
ignoring them when the data is not there.
Support for symlinks gets removed and path/filepath is used instead of path
because it has better Windows support.
The same effect can be achieved with `-bench=BenchmarkEncoding/none`.
For long strings the output of assert.Contains is not very readable.
The benchmarks and unit tests were written so that they used custom APIs for each log format. This made them less realistic because there were subtle differences between the benchmark and a real Kubernetes component. Now all logging configuration is done with the official k8s.io/component-base/logs/api/v1. To make the different test cases more comparable, "messages/s" is now reported instead of the generic "ns/op".
v1.Container is still changing a log which caused the test to fail each time a new field was added. To test loading, let's better use something that is unlikely to change. The runtimev1.VersionResponse gets logged by kubelet and seems to be stable.
1ac27c6 to
5ee679b
Compare
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: b6baedab2557474afb9edeb68dc5cdc47d73e7c9 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Improves usability and realism of the performance tests for logging.
Does this PR introduce a user-facing change?
/assign @serathius