Skip to content

Pickling of sklearn.tree.Tree #521

@pyodave

Description

@pyodave

When running the code below the line (note I am pickling/unpickling directly to a byte string and not writing a file object, the same error occurs with a file object pointing to a io buffer).

reLoadCLF = pickle.loads(pickle.dumps(clf))

crashes with File "sklearn/tree/_tree.pyx", line 601, in

sklearn.tree._tree.Tree.__cinit__
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'

I think it should work since we aren't running a 64 vs a 32 bit Python nor having different library versions. I have tried using different protocols and different skLearn classifiers to no avail. Is there a way to do this with the existing codebase or is this a bug?

#Random forest classsifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
#import pandas as pd

#loading the dataset
iris = load_iris()
X = iris.data
print(X)
y = iris.target
print(y)

#Split the data
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=42,test_size=0.5)

#Build the model
clf = RandomForestClassifier(n_estimators=10)

#Train the classifier
clf.fit(X_train, y_train)

#Predictions
predicted = clf.predict(X_test)

#Check accuracy
print(accuracy_score(predicted, y_test))

import pickle
reLoadCLF = pickle.loads(pickle.dumps(clf))
print('reLoad & go')
predictedRL = reLoadCLF.predict(X_test)
print(accuracy_score(predictedRL, y_test))

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions