[data] New executor backend [2/n]--- Add allocation tracing util#31283
[data] New executor backend [2/n]--- Add allocation tracing util#31283ericl merged 3 commits intoray-project:masterfrom
Conversation
Signed-off-by: Eric Liang <ekhliang@gmail.com>
| if ref not in self.allocated: | ||
| meta = ray.experimental.get_object_locations([ref]) | ||
| size_bytes = meta.get("object_size", 0) | ||
| if not size_bytes: |
There was a problem hiding this comment.
when will this happen - object_size is not available?
Seems concerning that we have to fetch all objects locally here, to cause potential memory issue.
There was a problem hiding this comment.
It's not a memory issue since you'd only enable this for debugging unit tests, etc. Performance doesn't matter in this case.
| from ray import cloudpickle as pickle | ||
|
|
||
| try: | ||
| obj = ray.get(ref, timeout=5.0) |
There was a problem hiding this comment.
shall we add a warning log that
The object_size metadata is not available for {ref}. Fetching the object reference locally to get size of object.
There was a problem hiding this comment.
This is actually the common case... the warning is too verbose.
|
|
||
|
|
||
| @ray.remote | ||
| class _MemActor: |
There was a problem hiding this comment.
is this memory actor something we plan to enable by default in the long term? Or just disable by default and enable when debugging is needed?
Wondering if the logic can be done in Dataset driver DatasetStats itself, or we must need another actor for it. All the logic looks pretty lightweight except fetching the object if size is missing.
As we are preparing to kill stats actor, I want to make sure we think twice before introducing another actor for Data.
There was a problem hiding this comment.
I don't think we'd ever enable this by default, since the performance overhead is fairly substantial. Or if we did, it would require more optimizations.
If you take a look at the full PR, I have added lighter weight memory stats per-operator to the graph, which can also be used for debugging. Though, it is not as comprehensive as this one, which covers both the new code and the legacy code.
There was a problem hiding this comment.
SG. shall we add comment in this file, to capture the caveat:
NOTE: the performance overhead of tracing object allocation is fairly substantial. This is meant to use in unit test for debugging. Please do not enable in production, without performance optimization.
|
|
||
| Args: | ||
| ref: The object created. | ||
| loc: A human-readable string identifying the call site. |
There was a problem hiding this comment.
In reality, how to we want to pass the call site? Is it <class-name>:<method-name>:<variable-name>?
There was a problem hiding this comment.
I've just passed stuff like "output iterator" (see the other PR).
|
btw this looks pretty useful to be included in |
|
|
||
| try: | ||
| obj = ray.get(ref, timeout=5.0) | ||
| size_bytes = len(pickle.dumps(obj)) |
There was a problem hiding this comment.
This won't include the MessagePack wrapper that exists when the object's data goes through Ray's full serialization machinery, so this allocation size and derived stats (e.g. peak memory) can end up being a slight underestimate.
There was a problem hiding this comment.
Any size is fine, it's just to distinguish between trivial objects and those that are significant in size.
| from ray.data.context import DatasetContext | ||
|
|
||
|
|
||
| def trace_allocation(ref: ray.ObjectRef, loc: str) -> None: |
There was a problem hiding this comment.
Could we create loc automatically for the caller with something akin to inspect.getframeinfo(inspect.stack()[1]) or inspect.getouterframes(inspect.currentframe())[2]? Or we could reuse something like our get_py_stack utility:
Lines 1375 to 1415 in 2bedf24
There was a problem hiding this comment.
Probably. I'll add this as a TODO, as it isn't really a pain point.
| from ray.data.context import DatasetContext | ||
|
|
||
|
|
||
| def trace_allocation(ref: ray.ObjectRef, loc: str) -> None: |
There was a problem hiding this comment.
Probably. I'll add this as a TODO, as it isn't really a pain point.
|
|
||
| Args: | ||
| ref: The object created. | ||
| loc: A human-readable string identifying the call site. |
There was a problem hiding this comment.
I've just passed stuff like "output iterator" (see the other PR).
|
|
||
|
|
||
| @ray.remote | ||
| class _MemActor: |
There was a problem hiding this comment.
I don't think we'd ever enable this by default, since the performance overhead is fairly substantial. Or if we did, it would require more optimizations.
If you take a look at the full PR, I have added lighter weight memory stats per-operator to the graph, which can also be used for debugging. Though, it is not as comprehensive as this one, which covers both the new code and the legacy code.
| if ref not in self.allocated: | ||
| meta = ray.experimental.get_object_locations([ref]) | ||
| size_bytes = meta.get("object_size", 0) | ||
| if not size_bytes: |
There was a problem hiding this comment.
It's not a memory issue since you'd only enable this for debugging unit tests, etc. Performance doesn't matter in this case.
| from ray import cloudpickle as pickle | ||
|
|
||
| try: | ||
| obj = ray.get(ref, timeout=5.0) |
There was a problem hiding this comment.
This is actually the common case... the warning is too verbose.
|
|
||
| try: | ||
| obj = ray.get(ref, timeout=5.0) | ||
| size_bytes = len(pickle.dumps(obj)) |
There was a problem hiding this comment.
Any size is fine, it's just to distinguish between trivial objects and those that are significant in size.
) Add a utility class for tracing object allocation / freeing. This makes it a lot easier to debug memory allocation / freeing issues. This is split out from #30903
…-project#31283) Add a utility class for tracing object allocation / freeing. This makes it a lot easier to debug memory allocation / freeing issues. This is split out from ray-project#30903 Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
Why are these changes needed?
Add a utility class for tracing object allocation / freeing. This makes it a lot easier to debug memory allocation / freeing issues.
This is split out from #30903
Sample output: