RFC Should pairwise_distances preserve float32 ?

Currently the dtype of the distance matrix returned by `pairwise_distances` is not very consistent, depending on the metric and on the value of n_jobs.

For float64 input, everything is consistent: the returned matrix is always in float64.
For mixed float64 X and float32 Y, the return matrix is also always in float64 and this is what should be expected imo.

The troubles come when both X and Y are float32.
- for sklearn metrics:
  - `euclidean` and `cosine`: result is always float32
  - `manhattan`: result is float64 if n_jobs=1 and float32 otherwise
- for scipy metrics: result is float64 if n_jobs=1 and float32 otherwise
  Note that scipy cdist/pdist always returns float64.

Hence the question: should `pairwise_distances` preserve float32 ?

My opinion is that it should since `pairwise_distances` can be used as an intermediate step during fit and since there's ongoing work towards preserving float32 in estimators (see https://github.com/scikit-learn/scikit-learn/issues/11000 for transfromers for instance).

An argument against that could be reducing the numerical instabilities. A potential solution could be to use float64 accumulators for the intermediate computations only, still returning a float32 dist matrix. Note that with https://github.com/scikit-learn/scikit-learn/pull/23958 we might not need to use the scipy metrics anymore, in favor of the ones defined in `dist_metrics`, and using float64 accumulators would be easier to implement generally.

Answering this question will help to not go in the wrong direction in https://github.com/scikit-learn/scikit-learn/pull/23958

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC Should pairwise_distances preserve float32 ? #24502

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC Should pairwise_distances preserve float32 ? #24502

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions