User guide: create your own scikit-learn estimator¶
Estimator¶
The central piece of transformer, regressor, and classifier is
sklearn.base.BaseEstimator
. All estimators in scikit-learn are derived
from this class. In more details, this base class enables to set and get
parameters of the estimator. It can be imported as:
>>> from sklearn.base import BaseEstimator
Once imported, you can create a class which inherate from this base class:
>>> class MyOwnEstimator(BaseEstimator):
... pass
Transformer¶
Transformers are scikit-learn estimators which implement a transform
method.
The use case is the following:
at
fit
, some parameters can be learned fromX
andy
;at
transform
, X will be transformed, using the parameters learned duringfit
.
In addition, scikit-learn provides a
mixin, i.e. sklearn.base.TransformerMixin
, which
implement the combination of fit
and transform
called fit_transform
:
One can import the mixin class as:
>>> from sklearn.base import TransformerMixin
Therefore, when creating a transformer, you need to create a class which
inherits from both sklearn.base.BaseEstimator
and
sklearn.base.TransformerMixin
. The scikit-learn API imposed fit
to
return ``self``. The reason is that it allows to pipeline fit
and
transform
imposed by the sklearn.base.TransformerMixin
. The
fit
method is expected to have X
and y
as inputs. Note that
transform
takes only X
as input and is expected to return the
transformed version of X
:
>>> class MyOwnTransformer(BaseEstimator, TransformerMixin):
... def fit(self, X, y=None):
... return self
... def transform(self, X):
... return X
We build a basic example to show that our MyOwnTransformer
is working
within a scikit-learn pipeline
:
>>> from sklearn.datasets import load_iris
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.linear_model import LogisticRegression
>>> X, y = load_iris(return_X_y=True)
>>> pipe = make_pipeline(MyOwnTransformer(),
... LogisticRegression(random_state=10,
... solver='lbfgs'))
>>> pipe.fit(X, y)
Pipeline(...)
>>> pipe.predict(X)
array([...])
Predictor¶
Regressor¶
Similarly, regressors are scikit-learn estimators which implement a predict
method. The use case is the following:
at
fit
, some parameters can be learned fromX
andy
;at
predict
, predictions will be computed usingX
using the parameters learned duringfit
.
In addition, scikit-learn provides a mixin, i.e.
sklearn.base.RegressorMixin
, which implements the score
method
which computes the score of the predictions.
One can import the mixin as:
>>> from sklearn.base import RegressorMixin
Therefore, we create a regressor, MyOwnRegressor
which inherits from
both sklearn.base.BaseEstimator
and
sklearn.base.RegressorMixin
. The method fit
gets X
and y
as input and should return self
. It should implement the predict
function which should output the predictions of your regressor:
>>> import numpy as np
>>> class MyOwnRegressor(BaseEstimator, RegressorMixin):
... def fit(self, X, y):
... return self
... def predict(self, X):
... return np.mean(X, axis=1)
We illustrate that this regressor is working within a scikit-learn pipeline:
>>> from sklearn.datasets import load_diabetes
>>> X, y = load_diabetes(return_X_y=True)
>>> pipe = make_pipeline(MyOwnTransformer(), MyOwnRegressor())
>>> pipe.fit(X, y)
Pipeline(...)
>>> pipe.predict(X)
array([...])
Since we inherit from the sklearn.base.RegressorMixin
, we can call
the score
method which will return the score:
>>> pipe.score(X, y)
-3.9...
Classifier¶
Similarly to regressors, classifiers implement predict
. In addition, they
output the probabilities of the prediction using the predict_proba
method:
at
fit
, some parameters can be learned fromX
andy
;at
predict
, predictions will be computed usingX
using the parameters learned duringfit
. The output corresponds to the predicted class for each sample;predict_proba
will give a 2D matrix where each column corresponds to the class and each entry will be the probability of the associated class.
In addition, scikit-learn provides a mixin, i.e.
sklearn.base.ClassifierMixin
, which implements the score
method
which computes the accuracy score of the predictions.
One can import this mixin as:
>>> from sklearn.base import ClassifierMixin
Therefore, we create a classifier, MyOwnClassifier
which inherits
from both slearn.base.BaseEstimator
and
sklearn.base.ClassifierMixin
. The method fit
gets X
and y
as input and should return self
. It should implement the predict
function which should output the class inferred by the classifier.
predict_proba
will output some probabilities instead:
>>> class MyOwnClassifier(BaseEstimator, ClassifierMixin):
... def fit(self, X, y):
... self.classes_ = np.unique(y)
... return self
... def predict(self, X):
... return np.random.randint(0, self.classes_.size,
... size=X.shape[0])
... def predict_proba(self, X):
... pred = np.random.rand(X.shape[0], self.classes_.size)
... return pred / np.sum(pred, axis=1)[:, np.newaxis]
We illustrate that this regressor is working within a scikit-learn pipeline:
>>> X, y = load_iris(return_X_y=True)
>>> pipe = make_pipeline(MyOwnTransformer(), MyOwnClassifier())
>>> pipe.fit(X, y)
Pipeline(...)
Then, you can call predict
and predict_proba
:
>>> pipe.predict(X)
array([...])
>>> pipe.predict_proba(X)
array([...])
Since our classifier inherits from sklearn.base.ClassifierMixin
, we
can compute the accuracy by calling the score
method:
>>> pipe.score(X, y)
0...