-
Notifications
You must be signed in to change notification settings - Fork 168
[#1189] feat(server): Add netty used direct memory size metric #1363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| service.scheduleAtFixedRate( | ||
| () -> { | ||
| try { | ||
| long usedDirectMemory = PlatformDependent.usedDirectMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference with the metric of jvm_memory_bytes_used nonheap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We found "jvm_memory_bytes_used nonheap" is not correct, and it is small than direct memory.
When OutOfDirectMemory is throw,"jvm_memory_bytes_used nonheap" is small and not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1363 +/- ##
============================================
+ Coverage 53.22% 54.15% +0.92%
- Complexity 2715 2717 +2
============================================
Files 418 399 -19
Lines 23932 21600 -2332
Branches 2043 2043
============================================
- Hits 12738 11697 -1041
+ Misses 10408 9189 -1219
+ Partials 786 714 -72 ☔ View full report in Codecov by Sentry. |
| service.scheduleAtFixedRate( | ||
| () -> { | ||
| try { | ||
| long usedDirectMemory = PlatformDependent.usedDirectMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
|
|
||
| import org.apache.uniffle.common.util.ThreadUtils; | ||
|
|
||
| public class DirectMemoryUsageReporter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class name is not proper, how about NettyMemoryTracker ? And put this class into netty module ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the implement is get direct memory , not only netty.
Just only netty use direct memory.
If I change to NettyMemoryTracker, how about I use other component use direct memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I also think the nettyDirectMemoryTracker is better. And it needn't move file position
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I corrented and renamed it .
| public static Gauge.Child gaugeInFlushBufferSize; | ||
| public static Gauge.Child gaugeUsedBufferSize; | ||
| public static Gauge.Child gaugeReadBufferUsedSize; | ||
| public static Gauge.Child gaugeUsedDirectMemorySize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this dedicated for netty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Shuffle Server and monitor Direct memory, but only netty use direct memory 。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only for netty after reviewing the implementation of PlatformDependent.usedDirectMemory()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuston Ok,I change the class name later , thanks
| private static final String USED_BUFFER_SIZE = "used_buffer_size"; | ||
| private static final String READ_USED_BUFFER_SIZE = "read_used_buffer_size"; | ||
| private static final String USED_DIRECT_MEMORY_SIZE = "used_direct_memory_size"; | ||
| private static final String NETTY_USED_DIRECT_MEMORY_SIZE = "netty_used_direct_memory_size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metrics also should be moved into netty metric class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this should be initialized in the netty metrics class rather than in shuffleServer class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grpc use Netty, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I missed this. Let reserve this. Sorry @lifeSo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuston so, I revert the code to early version.
zuston
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left the minor comments.
|
|
||
| import org.apache.uniffle.common.util.ThreadUtils; | ||
|
|
||
| public class DirectMemoryUsageReporter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I also think the nettyDirectMemoryTracker is better. And it needn't move file position
server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java
Outdated
Show resolved
Hide resolved
| .withDescription("rss direct memory usage report initial delay (ms)"); | ||
|
|
||
| public static final ConfigOption<Long> SERVER_DIRECT_MEMORY_USAGE_REPORT_INTERVAL = | ||
| ConfigOptions.key("rss.server.direct.memory.usage.report.interval") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about rss.server.netty.direct-memory-tracker.metrics.update-interval-ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I corrented it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my comment incorrectness. Follows @jerqi suggestion here: #1370 (comment)
Let's use the camel style config key.
How about rss.server.netty.directMemoryTracker.memoryUsage.initialFetchDelayMs and rss.server.netty.directMemoryTracker.memoryUsage.updateMetricsIntervalMs ? cc @jerqi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I corrected it .
zuston
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your effort. @lifeSo
* master: (23 commits) format use synchronized format modify to singleton format format format format format change Coordinator to singleton like ShuffleServerClientFactory.java [apache#1189] feat(server): Add netty used direct memory size metric (apache#1363) format format format rename file rename [apache#960] fix(dashboard): simplify dependency and correct the startup script (apache#1347) [apache#1348] improvement(metrics): Unify tags generation for shuffle-server metrics reporter (apache#1349) [MINOR] chore: fix kubernetes ci pipeline (apache#1368) [MINOR] fix(spark): Fix NPE for ShuffleWriteClientImpl.unregisterShuffle (apache#1367) ...
…re_1119_new_2023_12_23 * feature_1119_BUFFER_LIMIT_OF_HUGE_PARTITION: add file for commit [apache#1189] feat(server): Add netty used direct memory size metric (apache#1363) format format format rename file rename [apache#960] fix(dashboard): simplify dependency and correct the startup script (apache#1347) [apache#1348] improvement(metrics): Unify tags generation for shuffle-server metrics reporter (apache#1349) [MINOR] chore: fix kubernetes ci pipeline (apache#1368) [MINOR] fix(spark): Fix NPE for ShuffleWriteClientImpl.unregisterShuffle (apache#1367) format code format code 添加堆外内存监控的代码
### What changes were proposed in this pull request? Fix PR [#1363](#1363). NettyDirectMemoryTracker will not be started currently. ### Why are the changes needed? It's a followup PR for [#1363](#1363). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs.
…pache#1363) ### What changes were proposed in this pull request? ShuffleServer add Direct Memory Metric for monitor ### Why are the changes needed? Fix: # (apache#1189) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested?


What changes were proposed in this pull request?
ShuffleServer add Direct Memory Metric for monitor
Why are the changes needed?
Fix: # (#1189)
Does this PR introduce any user-facing change?
No.
How was this patch tested?