Heuristics for algorithm='auto' in NearestNeighbors

#### Description

When running `NearestNeighbor` on a dense array with `n_samples ~ 5000`, `n_features ~ 100` and `n_neighbors=1`, the  `algorithm='auto'` option selects `kd_tree` algorithm, which is in this case the slowest algorithm of those available. 

#### Steps/Code to Reproduce
The bechmark script **nn_benchmark.py** can be found below,
```py
import os
from time import time

import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import normalize

np.random.seed(999999)
n_features = 150
algorithm = os.environ['NN_ALGORITHM']

def _create_X(n_samples):
    X = np.random.randn(n_samples, n_features)
    normalize(X, norm='l2', copy=False)
    return X

for n_train, n_test in [(1000, 10000),
                        (4000, 4000),
                        (10000, 2000)]:
    X_train = _create_X(n_train)
    X_test = _create_X(n_test)
    mod = NearestNeighbors(n_neighbors=1, 
                          algorithm=algorithm,
                          n_jobs=1
                          )

    t0 = time()
    mod.fit(X_train)
    t1 = time()
    mod.kneighbors(X_test, return_distance=False)
    t2 = time()
    if algorithm == 'auto':
        print("Actual method:", mod._fit_method)
    print('Training set: n_train={}, n_features={}'.format(n_train, n_features))
    print('Test set: n_test={}, t_train= {:.2f} s, t_test = {:.2f}s'.format(n_test, t1-t0, t2-t1))
```
which can be run (on Linux) with,
```
NN_ALGORITHM=auto /usr/bin/time -v python nn_benchmark.py
```
where `/usr/bin/time -v ` is used to track the peak RAM usage in the returned "Maximum resident set size (kbytes)" field.

#### Expected Results
One could expect that the nearest neighbor [algorithm selection heuristics](http://scikit-learn.org/stable/modules/neighbors.html#choice-of-nearest-neighbors-algorithm) would pick a reasonable choice for this fairly standard dataset.  In addition, one could  also have expected for `ball_tree` to outperform brute force search.

#### Actual Results

The results  of this benchmarks are given below,

|      Time (s)      | auto | kd_tree | ball_tree | brute |
|--------------------------------|------|---------|-----------|-------|
| `n_train=1000`, `n_test=10000` | 3.9  | 3.9     | 2.69      | 0.25  |
| `n_train=4000`, `n_test=4000`  | 7.5  | 7.5     | 5.55      | 0.34  |
| `n_train=10000`, `n_test=1000` | 9.6  | 9.5     | 7.48      | 0.41  |

the automatically chosen  `kd_tree` algorithm is consistently 10-20 times slower than the brute force search.  

The memory usage is another issues, where brute force search uses up to 5x more RAM , 

|           | auto | kd_tree | ball_tree | brute |
|--------------------------------|------|---------|-----------|-------|
|  Peak RAM  (MB)   | 82 MB | 82 MB     | 81 MB      | 398 MB  |

but this could be addressed by chunking the `pairwise_distances` calculations (issue https://github.com/scikit-learn/scikit-learn/issues/7287), implemented in PR https://github.com/scikit-learn/scikit-learn/pull/7979 .

Am I missing something ?  

#### Versions

```
Linux-4.6.0-gentoo-x86_64-Intel-R-_Core-TM-_i5-6200U_CPU_@_2.30GHz-with-gentoo-2.3
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.1
SciPy 0.18.1
Scikit-Learn 0.18.1
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heuristics for algorithm='auto' in NearestNeighbors #8213

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time (s)	auto	kd_tree	ball_tree	brute
`n_train=1000`, `n_test=10000`	3.9	3.9	2.69	0.25
`n_train=4000`, `n_test=4000`	7.5	7.5	5.55	0.34
`n_train=10000`, `n_test=1000`	9.6	9.5	7.48	0.41

Uh oh!

Heuristics for algorithm='auto' in NearestNeighbors #8213

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions