Skip to content

Label propagation emnb#4

Closed
clayw wants to merge 430 commits intolarsmans:emnbfrom
clayw:label-propagation-emnb
Closed

Label propagation emnb#4
clayw wants to merge 430 commits intolarsmans:emnbfrom
clayw:label-propagation-emnb

Conversation

@clayw
Copy link
Copy Markdown

@clayw clayw commented Jan 31, 2012

Gave this as a pull request but mainly for yours / my info

I ran your "examples/semisupervised_document_classification.py" file against my label propagation implementation and your semisupervised EMNB and for this problem EMNB does way better (output below).

Here's the output of the program

$ python examples/semisupervised_document_classification.py
['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
data loaded
2034 documents (training set)
1353 documents (testing set)
4 categories

Extracting features from the training dataset using a sparse vectorizer
done in 2.360747s
n_samples: 2034, n_features: 32395

Extracting features from the test dataset using the same vectorizer
done in 1.481269s
n_samples: 1353, n_features: 32395

Removing labels of 1831 random training documents

Baseline: fully supervised Naive Bayes


Training:
MultinomialNB(alpha=0.01, fit_prior=True)
train time: 0.006s
test time: 0.003s
f1-score: 0.824
dimensionality: 32395


Training:
BernoulliNB(alpha=0.01, binarize=0.0, fit_prior=True)
train time: 0.006s
test time: 0.015s
f1-score: 0.816
dimensionality: 32395

Naive Bayes trained with Expectation Maximization


Training:
SemisupervisedNB(estimator=MultinomialNB(alpha=0.01, fit_prior=True),
estimator__alpha=0.01, estimator__fit_prior=True, n_iter=10,
relabel_all=True, tol=1e-05, verbose=False)
train time: 0.171s
test time: 0.003s
f1-score: 0.859
dimensionality: 32395


Training:
SemisupervisedNB(estimator=BernoulliNB(alpha=0.01, binarize=0.0, fit_prior=True),
estimator__alpha=0.01, estimator__binarize=0.0,
estimator__fit_prior=True, n_iter=10, relabel_all=True, tol=1e-05,
verbose=False)
train time: 0.412s
test time: 0.015s
f1-score: 0.856
dimensionality: 32395


Training:
LabelSpreading(alpha=1, gamma=25, kernel=rbf, max_iters=30, n_neighbors=7,
tol=0.001)
train time: 1.118s
test time: 0.452s
f1-score: 0.745

larsmans and others added 30 commits December 24, 2011 22:03
This reverts commit 617d731.

Breaks other tests; let's live with the negative tf-idf weights for now.
This release brings zipped storage
Conflicts:
	sklearn/ensemble/forest.py
	sklearn/tree/tree.py
Fabian Pedregosa and others added 29 commits January 9, 2012 14:58
as_float_array is not enough because cd_fast does not accept float32.
And added a remark on updating the Gram matrix. This was triggered by
tests inside dict_learning, probably the last commit made this
visible. Thus I'm not adding more tests.
Conflicts:
	doc/modules/classes.rst
	doc/modules/label_propagation.rst
	sklearn/label_propagation.py
	sklearn/tests/test_label_propagation.py
@larsmans larsmans closed this Mar 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.