Rationalize metric argument in T-SNE

The API of T-SNE have a `metric` argument to allow changing the metric in the input space. The original paper was developed using only the euclidean distance and changing the metric might change a bit the math. 

The main issue is to understand what is going on with the computation of the conditional probabilities. The original paper states that if `d_ij` is the euclidean distance between `x_i` and `x_j`, then the conditional probability is given by
``` 
p(x_j | x_i) = exp{ d_ij^2 / sig_i} / Z
```
where Z is a normalization constant and `sig_i` is a parameter, computed to fix the perplexity of the model. _If `d_ij` is not the euclidean distance, does it make sense to use this formula?_ 
The current implementation for other metric than the euclidean uses the formula without the square, so
``` 
p(x_j | x_i) = exp{ d_ij / sig_i} / Z
```
If it is fine, this should be documented. Else, shall we change it to use the square or another distribution? Another solution is to deprecate the `metric` argument.

A second point is the use of `init='pca'`. It is a very natural initialization for the euclidean metric but should be benchmarked for other to see if it gives an interesting starting point or not.

This issue is a follow-up of #9623 .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rationalize metric argument in T-SNE #9695

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Rationalize metric argument in T-SNE #9695

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions