Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
d8614a3
DOC better docstring for TruncatedSVD (#7651)
ogrisel Oct 13, 2016
cd0d46c
MAINT make appveyor fail on old builds when PR is update (#6365)
ogrisel Oct 13, 2016
7141dd4
DOC Removing deprecated DPGMM that was also not rendering correctly f…
amueller Oct 13, 2016
ab18f68
DOC Fixed missing "Next" button (#7641)
nvavrova Oct 13, 2016
0e12b2f
DOC: fix the copybutton on the code blocks (#7634)
GaelVaroquaux Oct 13, 2016
329b7ee
dev-7369: Make common metric tests look nicer (#7620)
aashil Oct 13, 2016
5e4f524
[MRG] edit releases link from sf to pypi (#7637)
nelson-liu Oct 13, 2016
53e6381
[MRG+1] Throw an error with explicit message if n_estimators is not a…
Oct 14, 2016
78dbcb2
[MRG] fix #6101 GradientBoosting decision_function for sparse inputs …
olologin Oct 15, 2016
65a4de5
DOC: fix return value doc for mixture.base._e_step (#7682)
andrewcsmith Oct 17, 2016
fa59873
[MRG+1] FIX unstable cumsum (#7376)
yangarbiter Oct 17, 2016
0198e2c
warning for PCA with sparse input (#7649)
Oct 18, 2016
0653aac
[MRG+1] CircleCI timeout extended (#7693)
jstriebel Oct 18, 2016
647b7c1
DOC Changed Contributor's Guide to Development Guide #7690 (#7691)
FERRIA Oct 18, 2016
dad35f5
DOC mention JOBLIB_START_METHOD in crash/freeze FAQ
lesteve Oct 19, 2016
387f25c
DOC use combined unicode chars in author names (#7706)
jnothman Oct 19, 2016
829efa5
[MRG+1] Learning curve: Add an option to randomly choose indices for …
NarineK Oct 19, 2016
74e4c42
FIX #6420: Cloning decision tree estimators breaks criterion objects …
olologin Oct 19, 2016
2caa144
[MGR + 2] fix selectFdr bug (#7490)
Oct 20, 2016
cfc280d
switch to multinomial composition for mixture sampling
ljwolf Oct 18, 2016
55672f9
add shape assertions to test
ljwolf Oct 19, 2016
82a6740
Update joblib to 0.10.3 version (#7696)
lesteve Oct 20, 2016
30b936d
Remove temporary work-around for 0.15 release
lesteve Oct 20, 2016
707b6f9
[MRG+1] Added unit test for adding classes_ property to GridSearchCV,…
abatula Oct 20, 2016
716819b
explain learning_curve(shuffle=True) test.
amueller Oct 19, 2016
61a9a9b
add missing links for users to whatsnew
amueller Oct 20, 2016
568c002
[MRG + 1] Move n_iter and get_params invariance tests to common estim…
JungeAlexander Oct 20, 2016
4e1c101
Use n_components=3 to test actual regression
lesteve Oct 20, 2016
ee3e617
[MRG+2] adding multilabel support for score_func (#7676)
affanv14 Oct 20, 2016
cd714b1
[MRG] Added warning on keyboard interrupt during MLP fit (#7614)
kgilliam125 Oct 20, 2016
ad6f094
[MRG+2] switch to multinomial composition for mixture sampling (#7702)
lesteve Oct 20, 2016
edcb513
removed parameter that was documented as attribute (#7711)
amueller Oct 21, 2016
f122efa
Fix typo in OMP author name.
mblondel Oct 23, 2016
5c60f1f
[MRG+1] Reorder EllipticEnvelope docstring. (#7734)
tguillemot Oct 24, 2016
177ac84
[MRG + 1] Printing the total time in cross_validation (#7640)
srivatsan-ramesh Oct 24, 2016
74a9756
[MRG+2] Norm inconsistency between RFE and SelectFromModel (was _Lear…
antoinewdg Oct 24, 2016
7892edd
Address #7733 - MultiTaskElasticNet user guide links to MultiTaskLass…
chkoar Oct 24, 2016
8f4ebb5
r2_score - add more doctest examples (#7727)
bburns Oct 25, 2016
4ddb744
[MRG + 1] Clarified error msg in plot_partial_dependence (#7673)
kgilliam125 Oct 25, 2016
5adc832
BF: avoid importing from inside joblib (#7731)
GaelVaroquaux Oct 25, 2016
0dfc9a5
[MRG + 1] ElasticNetCV: raise ValueError if l1_ratio=0 (#7591)
erikcs Oct 25, 2016
3c18735
[MRG+2] logit -> logistic in plot_logistic.py and minor visual improv…
Deborah-Digges Oct 25, 2016
f260898
[MRG] Correcting length of explained_variance_ratio_, eigen solver (#…
JPFrancoia Oct 25, 2016
3f4524e
[MRG] DOC :issue: role to simplify what's news (#7657)
jnothman Oct 25, 2016
4da44c8
[MRG+1] replaced some assert_true(np.allclose(x, y)) with assert_almo…
amueller Oct 25, 2016
6b381ae
DOC use target_names over named categories in 20newsgroups example (#…
kmike Oct 25, 2016
ff5c36e
DOC Correct linking of TruncatedSVD (#7749)
raghavrv Oct 25, 2016
73d3f03
[MRG + 1] FIX raise an error message when n_groups > number of groups…
polmauri Oct 25, 2016
788a458
[MRG+2] LOF algorithm (Anomaly Detection) (#5279)
ngoix Oct 25, 2016
9d535ad
DOC: Provide link to LDA and NMF in the example tutorial closes #5876…
maniteja123 Oct 25, 2016
0ea8e8b
[MRG + 1] fix bug with negative values in cosine_distances (#7732)
asanakoy Oct 26, 2016
581a429
DOC framework for keeping API refs for deprecated classes/funcs
jnothman Oct 13, 2016
520e83e
DOC tagging deprecated for 0.20
waterponey Oct 19, 2016
eb918cf
suggestion for LDA/QDA deprecation
waterponey Oct 22, 2016
afc05ea
simplify deprecation message for GaussianProcess
waterponey Oct 22, 2016
f61644f
simplify deprecation messages
waterponey Oct 24, 2016
96d2a54
fixup avoid import QuadraticDiscriminantAnalysis in qda.QDA (and simi…
waterponey Oct 27, 2016
620dc2b
fixup test alias lda.LDA is instance of LinearDiscriminantAnalysis
waterponey Oct 27, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,6 @@ clean: clean-ctags

in: inplace # just a shortcut
inplace:
# to avoid errors in 0.15 upgrade
rm -f sklearn/utils/sparsefuncs*.so
rm -f sklearn/utils/random*.so
$(PYTHON) setup.py build_ext -i

test-code: in
Expand Down
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,15 +78,15 @@ Development

We welcome new contributors of all experience levels. The scikit-learn
community goals are to be helpful, welcoming, and effective. The
`Contributor's Guide <http://scikit-learn.org/stable/developers/index.html>`_
`Development Guide <http://scikit-learn.org/stable/developers/index.html>`_
has detailed information about contributing code, documentation, tests, and
more. We've included some basic information in this README.

Important links
~~~~~~~~~~~~~~~

- Official source code repo: https://github.com/scikit-learn/scikit-learn
- Download releases: http://sourceforge.net/projects/scikit-learn/files/
- Download releases: https://pypi.python.org/pypi/scikit-learn
- Issue tracker: https://github.com/scikit-learn/scikit-learn/issues

Source code
Expand Down Expand Up @@ -158,4 +158,4 @@ Communication
- Mailing list: https://mail.python.org/mailman/listinfo/scikit-learn
- IRC channel: ``#scikit-learn`` at ``irc.freenode.net``
- Stack Overflow: http://stackoverflow.com/questions/tagged/scikit-learn
- Website: http://scikit-learn.org
- Website: http://scikit-learn.org
10 changes: 10 additions & 0 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,16 @@ environment:


install:
# If there is a newer build queued for the same PR, cancel this one.
# The AppVeyor 'rollout builds' option is supposed to serve the same
# purpose but is problematic because it tends to cancel builds pushed
# directly to master instead of just PR builds.
# credits: JuliaLang developers.
- ps: if ($env:APPVEYOR_PULL_REQUEST_NUMBER -and $env:APPVEYOR_BUILD_NUMBER -ne ((Invoke-RestMethod `
https://ci.appveyor.com/api/projects/$env:APPVEYOR_ACCOUNT_NAME/$env:APPVEYOR_PROJECT_SLUG/history?recordsNumber=50).builds | `
Where-Object pullRequestId -eq $env:APPVEYOR_PULL_REQUEST_NUMBER)[0].buildNumber) { `
throw "There are newer queued builds for this pull request, failing early." }

# Install Python (from the official .msi of http://python.org) and pip when
# not already installed.
- "powershell ./build_tools/appveyor/install.ps1"
Expand Down
119 changes: 119 additions & 0 deletions benchmarks/bench_lof.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""
============================
LocalOutlierFactor benchmark
============================

A test of LocalOutlierFactor on classical anomaly detection datasets.

"""

from time import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import LocalOutlierFactor
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import fetch_kddcup99, fetch_covtype, fetch_mldata
from sklearn.preprocessing import LabelBinarizer
from sklearn.utils import shuffle as sh

print(__doc__)

np.random.seed(2)

# datasets available: ['http', 'smtp', 'SA', 'SF', 'shuttle', 'forestcover']
datasets = ['shuttle']

novelty_detection = True # if False, training set polluted by outliers

for dataset_name in datasets:
# loading and vectorization
print('loading data')
if dataset_name in ['http', 'smtp', 'SA', 'SF']:
dataset = fetch_kddcup99(subset=dataset_name, shuffle=True,
percent10=False)
X = dataset.data
y = dataset.target

if dataset_name == 'shuttle':
dataset = fetch_mldata('shuttle')
X = dataset.data
y = dataset.target
X, y = sh(X, y)
# we remove data with label 4
# normal data are then those of class 1
s = (y != 4)
X = X[s, :]
y = y[s]
y = (y != 1).astype(int)

if dataset_name == 'forestcover':
dataset = fetch_covtype(shuffle=True)
X = dataset.data
y = dataset.target
# normal data are those with attribute 2
# abnormal those with attribute 4
s = (y == 2) + (y == 4)
X = X[s, :]
y = y[s]
y = (y != 2).astype(int)

print('vectorizing data')

if dataset_name == 'SF':
lb = LabelBinarizer()
lb.fit(X[:, 1])
x1 = lb.transform(X[:, 1])
X = np.c_[X[:, :1], x1, X[:, 2:]]
y = (y != 'normal.').astype(int)

if dataset_name == 'SA':
lb = LabelBinarizer()
lb.fit(X[:, 1])
x1 = lb.transform(X[:, 1])
lb.fit(X[:, 2])
x2 = lb.transform(X[:, 2])
lb.fit(X[:, 3])
x3 = lb.transform(X[:, 3])
X = np.c_[X[:, :1], x1, x2, x3, X[:, 4:]]
y = (y != 'normal.').astype(int)

if dataset_name == 'http' or dataset_name == 'smtp':
y = (y != 'normal.').astype(int)

n_samples, n_features = np.shape(X)
n_samples_train = n_samples // 2
n_samples_test = n_samples - n_samples_train

X = X.astype(float)
X_train = X[:n_samples_train, :]
X_test = X[n_samples_train:, :]
y_train = y[:n_samples_train]
y_test = y[n_samples_train:]

if novelty_detection:
X_train = X_train[y_train == 0]
y_train = y_train[y_train == 0]

print('LocalOutlierFactor processing...')
model = LocalOutlierFactor(n_neighbors=20)
tstart = time()
model.fit(X_train)
fit_time = time() - tstart
tstart = time()

scoring = -model.decision_function(X_test) # the lower, the more normal
predict_time = time() - tstart
fpr, tpr, thresholds = roc_curve(y_test, scoring)
AUC = auc(fpr, tpr)
plt.plot(fpr, tpr, lw=1,
label=('ROC for %s (area = %0.3f, train-time: %0.2fs,'
'test-time: %0.2fs)' % (dataset_name, AUC, fit_time,
predict_time)))

plt.xlim([-0.05, 1.05])
plt.ylim([-0.05, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc="lower right")
plt.show()
3 changes: 2 additions & 1 deletion circle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ dependencies:
# Check whether the doc build is required, install build dependencies and
# run sphinx to build the doc.
override:
- ./build_tools/circle/build_doc.sh
- ./build_tools/circle/build_doc.sh:
timeout: 3600 # seconds
test:
# Grep error on the documentation
override:
Expand Down
8 changes: 8 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
'numpy_ext.numpydoc',
'sphinx.ext.linkcode', 'sphinx.ext.doctest',
'sphinx_gallery.gen_gallery',
'sphinx_issues',
]

# pngmath / imgmath compatibility layer for different sphinx versions
Expand Down Expand Up @@ -269,6 +270,13 @@ def make_carousel_thumbs(app, exception):
sphinx_gallery.gen_rst.scale_image(image, c_thumb, max_width, 190)


# Config for sphinx_issues

issues_uri = 'https://github.com/scikit-learn/scikit-learn/issues/{issue}'
issues_github_path = 'scikit-learn/scikit-learn'
issues_user_uri = 'https://github.com/{user}'


def setup(app):
# to hide/show the prompt in code examples:
app.add_javascript('js/copybutton.js')
Expand Down
2 changes: 1 addition & 1 deletion doc/documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Documentation of scikit-learn 0.19.dev0
<!-- row -->
<div class="row-fluid">
<div class="span4 box">
<h2><a href="developers/index.html">Contributing</a></h2>
<h2><a href="developers/index.html">Development</a></h2>
<blockquote>Information on how to contribute. This also
contains useful information for advanced users, for example
how to build their own estimators.
Expand Down
11 changes: 7 additions & 4 deletions doc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,10 +248,13 @@ Python processes for parallel computing. Unfortunately this is a violation of
the POSIX standard and therefore some software editors like Apple refuse to
consider the lack of fork-safety in Accelerate / vecLib as a bug.

In Python 3.4+ it is now possible to configure ``multiprocessing`` to use the
'forkserver' or 'spawn' start methods (instead of the default 'fork') to manage
the process pools. This makes it possible to not be subject to this issue
anymore.
In Python 3.4+ it is now possible to configure ``multiprocessing`` to
use the 'forkserver' or 'spawn' start methods (instead of the default
'fork') to manage the process pools. To work around this issue when
using scikit-learn, you can set the JOBLIB_START_METHOD environment
variable to 'forkserver'. However the user should be aware that using
the 'forkserver' method prevents joblib.Parallel to call function
interactively defined in a shell session.

If you have custom code that uses ``multiprocessing`` directly instead of using
it via joblib you can enable the 'forkserver' mode globally for your
Expand Down
78 changes: 74 additions & 4 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ Splitter Functions
:template: function.rst

model_selection.train_test_split
model_selection.check_cv

Hyper-parameter optimizers
--------------------------
Expand All @@ -201,6 +202,13 @@ Hyper-parameter optimizers
model_selection.ParameterGrid
model_selection.ParameterSampler


.. autosummary::
:toctree: generated/
:template: function.rst

model_selection.fit_grid_point

Model validation
----------------

Expand Down Expand Up @@ -315,7 +323,6 @@ Samples generator
decomposition.PCA
decomposition.IncrementalPCA
decomposition.ProjectedGradientNMF
decomposition.RandomizedPCA
decomposition.KernelPCA
decomposition.FactorAnalysis
decomposition.FastICA
Expand Down Expand Up @@ -560,7 +567,6 @@ From text

gaussian_process.GaussianProcessRegressor
gaussian_process.GaussianProcessClassifier
gaussian_process.GaussianProcess

Kernels:

Expand Down Expand Up @@ -957,7 +963,6 @@ See the :ref:`metrics` section of the user guide for further details.

mixture.GaussianMixture
mixture.BayesianGaussianMixture
mixture.DPGMM


.. _multiclass_ref:
Expand Down Expand Up @@ -1051,7 +1056,8 @@ See the :ref:`metrics` section of the user guide for further details.
neighbors.LSHForest
neighbors.DistanceMetric
neighbors.KernelDensity

neighbors.LocalOutlierFactor

.. autosummary::
:toctree: generated/
:template: function.rst
Expand Down Expand Up @@ -1349,3 +1355,67 @@ Low-level methods
utils.estimator_checks.check_estimator
utils.resample
utils.shuffle


Recently deprecated
===================

To be removed in 0.19
---------------------

.. autosummary::
:toctree: generated/
:template: deprecated_class.rst

lda.LDA
qda.QDA

.. autosummary::
:toctree: generated/
:template: deprecated_function.rst

datasets.load_lfw_pairs
datasets.load_lfw_people


To be removed in 0.20
---------------------

.. autosummary::
:toctree: generated/
:template: deprecated_class.rst

grid_search.ParameterGrid
grid_search.ParameterSampler
grid_search.GridSearchCV
grid_search.RandomizedSearchCV
cross_validation.LeaveOneOut
cross_validation.LeavePOut
cross_validation.KFold
cross_validation.LabelKFold
cross_validation.LeaveOneLabelOut
cross_validation.LeavePLabelOut
cross_validation.LabelShuffleSplit
cross_validation.StratifiedKFold
cross_validation.ShuffleSplit
cross_validation.StratifiedShuffleSplit
cross_validation.PredefinedSplit
decomposition.RandomizedPCA
gaussian_process.GaussianProcess
mixture.GMM
mixture.DPGMM
mixture.VBGMM


.. autosummary::
:toctree: generated/
:template: deprecated_function.rst

grid_search.fit_grid_point
learning_curve.learning_curve
learning_curve.validation_curve
cross_validation.cross_val_predict
cross_validation.cross_val_score
cross_validation.check_cv
cross_validation.permutation_test_score
cross_validation.train_test_split
Loading