Skip to content

Conversation

@lifeSo
Copy link
Collaborator

@lifeSo lifeSo commented Dec 11, 2023

What changes were proposed in this pull request?

ShuffleServer add Direct Memory Metric for monitor

Why are the changes needed?

Fix: # (#1189)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

service.scheduleAtFixedRate(
() -> {
try {
long usedDirectMemory = PlatformDependent.usedDirectMemory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference with the metric of jvm_memory_bytes_used nonheap

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We found "jvm_memory_bytes_used nonheap" is not correct, and it is small than direct memory.
When OutOfDirectMemory is throw,"jvm_memory_bytes_used nonheap" is small and not correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@codecov-commenter
Copy link

codecov-commenter commented Dec 12, 2023

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (ecbf2e7) 53.22% compared to head (7ca74cf) 54.15%.
Report is 4 commits behind head on master.

Files Patch % Lines
...pache/uniffle/server/NettyDirectMemoryTracker.java 48.00% 13 Missing ⚠️
.../java/org/apache/uniffle/server/ShuffleServer.java 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1363      +/-   ##
============================================
+ Coverage     53.22%   54.15%   +0.92%     
- Complexity     2715     2717       +2     
============================================
  Files           418      399      -19     
  Lines         23932    21600    -2332     
  Branches       2043     2043              
============================================
- Hits          12738    11697    -1041     
+ Misses        10408     9189    -1219     
+ Partials        786      714      -72     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

service.scheduleAtFixedRate(
() -> {
try {
long usedDirectMemory = PlatformDependent.usedDirectMemory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.


import org.apache.uniffle.common.util.ThreadUtils;

public class DirectMemoryUsageReporter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class name is not proper, how about NettyMemoryTracker ? And put this class into netty module ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the implement is get direct memory , not only netty.
Just only netty use direct memory.

If I change to NettyMemoryTracker, how about I use other component use direct memory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I also think the nettyDirectMemoryTracker is better. And it needn't move file position

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I corrented and renamed it .

public static Gauge.Child gaugeInFlushBufferSize;
public static Gauge.Child gaugeUsedBufferSize;
public static Gauge.Child gaugeReadBufferUsedSize;
public static Gauge.Child gaugeUsedDirectMemorySize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dedicated for netty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Shuffle Server and monitor Direct memory, but only netty use direct memory 。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is only for netty after reviewing the implementation of PlatformDependent.usedDirectMemory()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuston Ok,I change the class name later , thanks

private static final String USED_BUFFER_SIZE = "used_buffer_size";
private static final String READ_USED_BUFFER_SIZE = "read_used_buffer_size";
private static final String USED_DIRECT_MEMORY_SIZE = "used_direct_memory_size";
private static final String NETTY_USED_DIRECT_MEMORY_SIZE = "netty_used_direct_memory_size";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metrics also should be moved into netty metric class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this should be initialized in the netty metrics class rather than in shuffleServer class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grpc use Netty, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed this. Let reserve this. Sorry @lifeSo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuston so, I revert the code to early version.

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left the minor comments.


import org.apache.uniffle.common.util.ThreadUtils;

public class DirectMemoryUsageReporter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I also think the nettyDirectMemoryTracker is better. And it needn't move file position

.withDescription("rss direct memory usage report initial delay (ms)");

public static final ConfigOption<Long> SERVER_DIRECT_MEMORY_USAGE_REPORT_INTERVAL =
ConfigOptions.key("rss.server.direct.memory.usage.report.interval")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rss.server.netty.direct-memory-tracker.metrics.update-interval-ms

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I corrented it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my comment incorrectness. Follows @jerqi suggestion here: #1370 (comment)

Let's use the camel style config key.

How about rss.server.netty.directMemoryTracker.memoryUsage.initialFetchDelayMs and rss.server.netty.directMemoryTracker.memoryUsage.updateMetricsIntervalMs ? cc @jerqi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I corrected it .

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your effort. @lifeSo

@zuston zuston changed the title [#1189] improvement(server): ShuffleServer add Direct Memory Metric for monitor [#1189] feat(server): Add netty used direct memory size metric Dec 16, 2023
@zuston zuston merged commit 9bd5fd8 into apache:master Dec 16, 2023
lifeSo added a commit to lifeSo/incubator-uniffle that referenced this pull request Dec 22, 2023
* master: (23 commits)
  format
  use synchronized
  format
  modify to singleton
  format
  format
  format
  format
  format
  change Coordinator to singleton like ShuffleServerClientFactory.java
  [apache#1189] feat(server): Add netty used direct memory size metric (apache#1363)
  format
  format
  format
  rename
  file rename
  [apache#960] fix(dashboard): simplify dependency and correct the startup script (apache#1347)
  [apache#1348] improvement(metrics): Unify tags generation for shuffle-server metrics reporter (apache#1349)
  [MINOR] chore: fix kubernetes ci pipeline (apache#1368)
  [MINOR] fix(spark): Fix NPE for ShuffleWriteClientImpl.unregisterShuffle (apache#1367)
  ...
lifeSo added a commit to lifeSo/incubator-uniffle that referenced this pull request Dec 26, 2023
…re_1119_new_2023_12_23

* feature_1119_BUFFER_LIMIT_OF_HUGE_PARTITION:
  add file for commit
  [apache#1189] feat(server): Add netty used direct memory size metric (apache#1363)
  format
  format
  format
  rename
  file rename
  [apache#960] fix(dashboard): simplify dependency and correct the startup script (apache#1347)
  [apache#1348] improvement(metrics): Unify tags generation for shuffle-server metrics reporter (apache#1349)
  [MINOR] chore: fix kubernetes ci pipeline (apache#1368)
  [MINOR] fix(spark): Fix NPE for ShuffleWriteClientImpl.unregisterShuffle (apache#1367)
  format code
  format code
  添加堆外内存监控的代码
jerqi pushed a commit that referenced this pull request Jan 15, 2024
### What changes were proposed in this pull request?

Fix PR [#1363](#1363).
NettyDirectMemoryTracker will not be started currently.

### Why are the changes needed?

It's a followup PR for [#1363](#1363).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.
zuston pushed a commit to zuston/incubator-uniffle that referenced this pull request Jan 18, 2024
…pache#1363)

### What changes were proposed in this pull request?

ShuffleServer add Direct Memory Metric for monitor

### Why are the changes needed?

Fix: # (apache#1189)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants