Skip to content

Add basic cgroup CPU metrics#21029

Merged
jasontedor merged 8 commits intoelastic:masterfrom
jasontedor:cgroups
Oct 24, 2016
Merged

Add basic cgroup CPU metrics#21029
jasontedor merged 8 commits intoelastic:masterfrom
jasontedor:cgroups

Conversation

@jasontedor
Copy link
Copy Markdown
Member

This commit adds basic cgroup CPU metrics to the node stats API.

This commit adds basic cgroup CPU metrics to the node stats API.
This commit adds some forbidden API suppressions to methods in OsProbe
that are reading from various cgroup virtual files.
This commit adds a file permission for /sys/fs/cgroup/cpu/- so that the
OS probe can read from here when reading CPU cgroup stats.
This commit fixes the name of a method in OsProbe. In particular, a
method that previously got the control group for the current process for
the cpuacct sub-system has since evolved to get all control groups for
the current process. This commit reflects this by renaming this method.
@jasontedor
Copy link
Copy Markdown
Member Author

Since a review hasn't started, I'm going to rebase this on master to resolve merge conflicts introduced by #21037.

This commit adds Javadocs for the code that obtains the basic cgroup
stats.
Copy link
Copy Markdown
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of comments look good though

// note that Matcher#matches must be invoked as
// matching is lazy; this can not happen in an assert
// as assertions might not be enabled
if (!matcher.matches()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do it like this:

final boolean matches = matcher.matches();
assert matches : line;

more readable?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 5e2b71f.

final List<String> lines = Files.readAllLines(path);
assert lines != null && lines.size() == 1;
return lines.get(0);
} catch (final IOException e) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this leniency here everywhere? I mean can't we catch the expection on a higher level and skip the entire stats? I feel this kind of stuff will just hide problems from us and adds lots of unnecessary lenient code?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 8ff1a87.

out.writeString(cpuControlGroup);
out.writeLong(cpuCfsPeriodMicros);
out.writeLong(cpuCfsQuotaMicros);
if (cpuStat == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use writeOptionalWriteable here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer necessary after 8ff1a87.

cpuControlGroup = in.readString();
cpuCfsPeriodMicros = in.readLong();
cpuCfsQuotaMicros = in.readLong();
if (!in.readBoolean()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in.readOptionalWriteable(CpuStat::new);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer necessary after 8ff1a87.

@Override
public XContentBuilder toXContent(final XContentBuilder builder, final Params params) throws IOException {
builder.startObject("cgroup");
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* @return the lines from {@code /proc/self/cgroup} or an empty
* list.
*/
@SuppressForbidden(reason = "access /proc/self/cgroup")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please resolve this in Environment and pass it through via MonitorService -> OsService -> OsStats I think we should really try to not use PathUtils.get at all.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a follow-up? We have to push Environment all the way down from Bootstrap because we initialize the probes there before the security manager is installed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can totally be a followup but MonitorService already has it no?

This commit improves the readability of control group matching in
OsProbe.
This commit removes some leniency in the handling of I/O exceptions
during the reading of cgroup stats.
@jasontedor
Copy link
Copy Markdown
Member Author

Thanks for the review @s1monw; I've responded to your feedback.

This commit fixes an issue when cgroups are not available and thus the
cgroup field is null. This can happen on non-Linux systems, for example,
or Linux kernels where cgroups are not compiled in.
@s1monw
Copy link
Copy Markdown
Contributor

s1monw commented Oct 22, 2016

LGTM provided the current commits CI run passes

I really want to follow up on the PathUtils stuff mentioned, please open a followup for this, Progress over Perfection.

@jasontedor
Copy link
Copy Markdown
Member Author

test this please

1 similar comment
@jasontedor
Copy link
Copy Markdown
Member Author

test this please

@jasontedor
Copy link
Copy Markdown
Member Author

@s1monw CI passes; I'll merge soon and open a few follow-ups. Thanks for reviewing.

@jasontedor jasontedor merged commit 3d642ab into elastic:master Oct 24, 2016
jasontedor added a commit that referenced this pull request Oct 24, 2016
This commit adds basic cgroup CPU metrics to the node stats API.

Relates #21029
@jasontedor jasontedor deleted the cgroups branch October 24, 2016 12:33
jasontedor added a commit that referenced this pull request Dec 16, 2016
This commit fixes a silly doc bug where the field that represents the
total CPU time consumed by all tasks in the same cgroup was mistakenly
reported as "usage" instead of "usage_nanos".

Relates #21029
jasontedor added a commit that referenced this pull request Dec 16, 2016
This commit fixes a silly doc bug where the field that represents the
total CPU time consumed by all tasks in the same cgroup was mistakenly
reported as "usage" instead of "usage_nanos".

Relates #21029
jasontedor added a commit that referenced this pull request Dec 16, 2016
This commit fixes a silly doc bug where the field that represents the
total CPU time consumed by all tasks in the same cgroup was mistakenly
reported as "usage" instead of "usage_nanos".

Relates #21029
anandnalya added a commit to Automattic/elasticsearch-statsd-plugin that referenced this pull request Jan 19, 2017
* Add basic cgroup CPU metrics (elastic/elasticsearch#21029)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants