Birch doesn't perform inplace operations (at least not on the input array), so the copy parameter is useless and should be deprecated. It's even detrimental because by default it makes a copy.
The only place where an inplace operation happens is in the update method of _CFSubcluster:
|
def update(self, subcluster): |
|
self.n_samples_ += subcluster.n_samples_ |
|
self.linear_sum_ += subcluster.linear_sum_ |
|
self.squared_sum_ += subcluster.squared_sum_ |
|
self.centroid_ = self.linear_sum_ / self.n_samples_ |
|
self.sq_norm_ = np.dot(self.centroid_, self.centroid_) |
However, update is call in 2 places. The first one is in the _split_node function, but here we first create 2 new _CFSubcluster objects and so the update performs inplace operations on newly created data, so the input data is not modified. The second one is in the insert_cf_subcluster method of _CFNode but is only triggered if the subcluster has a child, which can only come from splitted subclusters (i.e. after _split_node), so again we're not modifying the input data.
Birchdoesn't perform inplace operations (at least not on the input array), so thecopyparameter is useless and should be deprecated. It's even detrimental because by default it makes a copy.The only place where an inplace operation happens is in the
updatemethod of_CFSubcluster:scikit-learn/sklearn/cluster/_birch.py
Lines 315 to 320 in 11e8c21
However,
updateis call in 2 places. The first one is in the_split_nodefunction, but here we first create 2 new_CFSubclusterobjects and so theupdateperforms inplace operations on newly created data, so the input data is not modified. The second one is in theinsert_cf_subclustermethod of_CFNodebut is only triggered if the subcluster has a child, which can only come from splitted subclusters (i.e. after_split_node), so again we're not modifying the input data.