-
-
Notifications
You must be signed in to change notification settings - Fork 26.8k
Open
Labels
Description
In dask.distributed we use a dispatch on type to determine memory overhead for intermediate results. Having a rough sense of the size of intermediate results is useful for the scheduler as it often correlates to the cost of serializing that intermediate between workers.
The default is to fallback to sys.getsizeof which calls the __sizeof__ method on the object. It would be useful if this (or an equivalent method) was implemented for scikit-learn estimators.
A naive generic implementation for estimators might be:
def __sizeof__(self):
return sum(x.nbytes if hasattr(x, 'nbytes') else getsizeof(x)
for x in self.__dict__.values())It'd probably even be fine to ignore (or approximate) the memory usage of parameters, and just focus on the memory usage of the results of fit. This may be straightforward for numpy arrays, but less clear for things like trees.
Reactions are currently unavailable