-
Notifications
You must be signed in to change notification settings - Fork 166
Add OSS statistics/monitoring plugin #2279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Sure, I'll review it but it will take me a bit longer than normal as I will be busy with an ATLAS meeting in Oslo. |
This needs a major release, I believe, so no need to rush. Cheers, |
abh3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I am confused as to what this is. A real storage system or a wrapper to a storage system. My inclination is that this is a wrapper and, if so, i9t does not follow the conventions for wrapping an oss plugin. First, it should use the class defined in XrdOssWrapper.hh because doing so alleviates us from having to chase down yet another plugin API when we make future changes in the Oss API. Second, as a wrapper it should not have a GetFileSystem entry point as it can only be added using the "++" convention. Thirdly, a wrapper should not load or otherwise initialize the file system it is wrapping. That is done by the plugin system. So, that code is not appropriate here. So, could this be rewritten to follow the standard way Oss plugins wrappers should be written?
Are there any examples of using these classes? I looked at the wrapper and it appears simple enough but it'd be good to know if I'm the only user of the class.
Sadly, it's not possible to rely on the |
|
Hi Brian,
No, you are not the only user. We have an upcoming tape RSE plugin, and we will be converting existing wrapper plugins into using the official wrapper.
Andy
…________________________________
From: Brian P Bockelman ***@***.***>
Sent: Monday, June 17, 2024 9:44 AM
To: xrootd/xrootd ***@***.***>
Cc: Andrew Hanushevsky ***@***.***>; Mention ***@***.***>
Subject: Re: [xrootd/xrootd] Add OSS statistics/monitoring plugin (PR #2279)
First, it should use the class defined in XrdOssWrapper.hh because doing so alleviates us from having to chase down yet another plugin API when we make future changes in the Oss API
Are there any examples of using these classes? I looked at the wrapper and it appears simple enough but it'd be good to know if I'm the only user of the class.
Second, as a wrapper it should not have a GetFileSystem entry point as it can only be added using the "++" convention. Thirdly, a wrapper should not load or otherwise initialize the file system it is wrapping.
Sadly, it's not possible to rely on the ++ syntax as you suggest due to #1730<#1730>. Unfortunately, that issue means a wrapper writer needs to implement exactly what you say the wrapper should not do. If you are able tackle issue #1730<#1730>, then I'd happily drop the other entry points (on this and other plugins).
—
Reply to this email directly, view it on GitHub<#2279 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAUIW56VX26PN4XMLSXXBDDZH4G5TAVCNFSM6AAAAABISZAPQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTHA3TIMRYGM>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
As for 1730 that is a cmsd problem and as outlined in the issue, the only thing the cmsd wants to use is stat() which itself is replaceable via the stat plugin. So practically there is no reason to stack the cmsd plugin. I suspect this plugin would not be used for the cmsd as well. So, I think you can ignore issue 1730. I agree that the cmsd should not be silent in this case.
…________________________________
From: Brian P Bockelman ***@***.***>
Sent: Monday, June 17, 2024 9:44 AM
To: xrootd/xrootd ***@***.***>
Cc: Andrew Hanushevsky ***@***.***>; Mention ***@***.***>
Subject: Re: [xrootd/xrootd] Add OSS statistics/monitoring plugin (PR #2279)
First, it should use the class defined in XrdOssWrapper.hh because doing so alleviates us from having to chase down yet another plugin API when we make future changes in the Oss API
Are there any examples of using these classes? I looked at the wrapper and it appears simple enough but it'd be good to know if I'm the only user of the class.
Second, as a wrapper it should not have a GetFileSystem entry point as it can only be added using the "++" convention. Thirdly, a wrapper should not load or otherwise initialize the file system it is wrapping.
Sadly, it's not possible to rely on the ++ syntax as you suggest due to #1730<#1730>. Unfortunately, that issue means a wrapper writer needs to implement exactly what you say the wrapper should not do. If you are able tackle issue #1730<#1730>, then I'd happily drop the other entry points (on this and other plugins).
—
Reply to this email directly, view it on GitHub<#2279 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAUIW56VX26PN4XMLSXXBDDZH4G5TAVCNFSM6AAAAABISZAPQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZTHA3TIMRYGM>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
472b0ee to
e358ac0
Compare
|
@abh3 - updated to rebase on the Is the If the API is not frozen, it'd be better to change the wrapper to own the object to manage the lifetime. |
|
Yes, it is part of the public API. Yes, you cannot control the lifetime of Oss system object because you should be using the passed pointer whose lifetime you do not control. However, you are correct that the lifetime of the DF object is something you do control. We need to wait for R6 to change that. Unfortunately, keeping a separate pointer is the only way to handle the lifetime situation for now.
…________________________________
From: ***@***.*** ***@***.***> on behalf of Brian P Bockelman ***@***.***>
Sent: Wednesday, July 24, 2024 6:11 AM
To: xrootd/xrootd ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [xrootd/xrootd] Add OSS statistics/monitoring plugin (PR #2279)
@abh3<https://github.com/abh3> - updated to rebase on the XrdOssWrapper and removed the support for running as part of the cmsd.
Is the XrdOssWrapper part of the public API/ABI yet? The design is awkward to utilize because it does not own the wrapDF object. Hence, I have to keep two copies of the pointer, once in StatsFile (to own the lifetime) and once in the XrdOssWrapDF (a reference).
If the API is not frozen, it'd be better to change the wrapper to own the object to manage the lifetime.
—
Reply to this email directly, view it on GitHub<#2279 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA7NRDUPNDMMMWXOKFCK7C3ZN6RZBAVCNFSM6AAAAABISZAPQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBXHA4DONRSGA>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
________________________________
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
http://listserv.slac.stanford.edu/scripts/wa.exe?SUBED1=XROOTD-DEV&A=1
|
7e12e96 to
b89a1ba
Compare
|
@abh3 - just a bump on this PR (I see I didn't press the "re-review" button; I'll do that now). If it wasn't obvious from the history, I did the refactoring you requested back in July. My follow-up questions were simply around the API design. |
|
Will do, it's a bit hectic at the moment with the workshop less than 2 weeks away.
…________________________________
From: Brian P Bockelman ***@***.***>
Sent: Monday, August 26, 2024 10:54 AM
To: xrootd/xrootd ***@***.***>
Cc: Andrew Hanushevsky ***@***.***>; Review requested ***@***.***>
Subject: Re: [xrootd/xrootd] Add OSS statistics/monitoring plugin (PR #2279)
BEWARE: This email originated outside of our organization. DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
@bbockelm<https://github.com/bbockelm> requested your review on: #2279<#2279> Add OSS statistics/monitoring plugin.
—
Reply to this email directly, view it on GitHub<#2279 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAUIW52DQ57AU7GPYDGUPC3ZTNTVRAVCNFSM6AAAAABISZAPQGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGAYTMOJVGA3TAMQ>.
You are receiving this because your review was requested.Message ID: ***@***.***>
|
abh3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's resolve the comments.
48d417e to
9933e04
Compare
|
@abh3 - any news on this front? I left the one remaining discussion thread open because I wanted to make sure you understood how the visibility changes work. |
|
This needs to wait for a feature release (i.e. at least 5.8), while the next one will be a patch release (i.e. 5.7.3). Other than that, once @abh3 approves, we can merge to |
|
Yes, no problems waiting for a feature release! Just trying to tidy up pending PRs on a Monday morning. |
945ec35 to
dbf7e44
Compare
abh3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bbockelm Almost done here. Only one question remains (no code changes needed). We don'y like putting things in the repo that a) are not explainable, b) look like they are not needed. So, could you address the question posed here.
| @@ -0,0 +1,7 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why this is needed? By default, the way XrdOssAddStorageSystem2 has been declared automatically makes it visible. Any other symbols are visible when made extern, making everything visible is generally not good practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be reading this backward. This makes all symbols local to the library (not appearing in the symbol table) except for XrdOssAddStorageSystem2*. If I didn't have the exception listed in the global section, then it would hide XrdOssAddStorageSystem2* as well -- not what we want!
Any other symbols are visible when made extern, making everything visible is generally not good practice.
That's not how C++ works. The extern keyword specifies the storage class for the variable or the language linkage property for functions. It does not affect symbol visibility in the resulting module.
For example, consider the SciTokens library (which doesn't manage symbol visibility:
$ nm -D lib64/libXrdAccSciTokens-5.so | c++filt | grep -v ' U ' | wc -l
176
Quite a bit of "stuff" -- all not needed to be visible from the outside because, really, all that's needed is XrdAccAuthorizeObjAdd and friends. Loading it pollutes up the global symbol table.
This is not because extern is or is not used -- but because C++ exports all the namespace-level symbols globally by default when dynamic linking.
Compare this to libXrdMacaroons where we do manage the symbol visibility:
$ nm -D lib64/libXrdMacaroons-5.so | c++filt | grep -v ' U ' | wc -l
14
The list is short enough I can paste it into the comment -- and it looks reasonable:
$ nm -D build/release_dir/lib64/libXrdMacaroons-5.so | c++filt | grep -v ' U '
w __cxa_finalize
000000000000e678 t _fini
w __gmon_start__
0000000000004000 t _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w __pthread_key_create
0000000000014618 B SciTokensHelper
0000000000005be0 T XrdAccAuthorizeObjAdd
0000000000014540 D XrdAccAuthorizeObjAdd_
0000000000005c50 T XrdAccAuthorizeObject
00000000000145a0 D XrdAccAuthorizeObject_
0000000000005f20 T XrdHttpGetExtHandler
00000000000144e0 D XrdHttpGetExtHandler_
You only have things that are desired to be exported -- and basically required for a shared library (_init, _fini).
Most of the Xrootd modules export way more than intended: based on your review comment, I wanted to tidy up this one.
abh3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I read it backwards. Thanks for the explanation.
| static_cast<float>(m_slow_times.m_open)/1e9, static_cast<float>(m_slow_times.m_read)/1e9, static_cast<float>(m_slow_times.m_readv)/1e9, | ||
| static_cast<float>(m_slow_times.m_pgread)/1e9, static_cast<float>(m_slow_times.m_write)/1e9, static_cast<float>(m_slow_times.m_pgwrite)/1e9, | ||
| static_cast<float>(m_slow_times.m_dirlist)/1e9, static_cast<float>(m_slow_times.m_stat)/1e9, static_cast<float>(m_slow_times.m_truncate)/1e9, | ||
| static_cast<float>(m_slow_times.m_unlink)/1e9, static_cast<float>(m_slow_times.m_rename)/1e9, static_cast<float>(m_slow_times.m_chmod)/1e9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do all these divisions when you could multiply with 1e-9f?
|
This looks like it's ready for merging. However, if you don't want me to squash commits, please merge fixes into the appropriate commits they are fixing (i.e. the commit with "Address review comments" should just be a fixup of the previous commit, most likely, same for using RAtomic, the code should be added like that in one go). |
e9dc92a to
c958645
Compare
With this, any OSS plugin may use the gstream monitoring.
The storage statistics OSS keep track of number of server-wide requests and performance, reporting this information through the g-stream. It includes separate counters for "slow" requests, allowing one to get a feel for how many requests go through the server over a configurable threshold (default: 1s).
After refactoring to use the common wrapper base clase, multiple methods in the file, directory, and filesystem for OSS stats were now identical to the default implementation from the wrapper. Remove them to reduce redundant code.
In particular: - Touchup class naming scheme to better match the rest of XRootD - Avoid throwing exceptions on failure; add check to see if the resulting object is valid and either fail or bypass it. - Hide symbol visibility outside the module.
Particularly, - Use `RAtomic` classes where possible. - Cleanup pointer on failure. - Move internal functions into the `XrdOssStats::detail` namespace
|
Updated this to the current |
|
DEB/RPM generation is not working after merging this into |
This PR adds a new OSS implementation to XRootD for gathering statistics about the underlying OSS performance.
It keeps counters about the number of operations performed and the overall time spent performing these operations then sends them out to the g-stream. It keeps a separate set of counters for "slow" operations (configurable), providing insight to administrators about the user experience beyond the averages. The intent is this information can easily be passed to time-series monitoring system such as Prometheus.
This adds two configuration options:
and
While the intent is the implementation is lightweight statistics gathering, everything has a cost; hence, this is kept as a loadable plugin instead of in the XRootD core.
The initial use case is to understand the performance of the storage subsystems in XCache. Currently our only performance monitoring is from looking at the throttle plugin. The throttle plugin includes both the storage performance for cache hits and the proxy performance for cache misses; I want to exclude the latter.