wolpert.wrappers.cross_val module

Stacked ensemble wrapper using cross validation

class wolpert.wrappers.cross_val.CVStackableTransformer(estimator, method='auto', scoring=None, verbose=False, cv=3, n_cv_jobs=1)[source]

Transformer to turn estimators into meta-estimators for model stacking

This class uses the k-fold predictions to “blend” the estimator. This allows the subsequent layers to use all the data for training. The drawback is that, as the metaestimators will be re-trained using the whole training set, the train and test set for subsequent layers won’t be generated from the same probability distribution. Either way, this method still proves useful in practice.

Parameters:
estimator : predictor

The estimator to be blended.

method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

Note: due to performance reasons, the scoring here will be slightly different from the actual mean of each fold’s scores, since it uses the cross_val_predict output to generate a single score.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

cv : int, cross-validation generator or an iterable, optional (default=3)

Determines the cross-validation splitting strategy to be used for generating features to train the next layer on the stacked ensemble or, more specifically, during blend.

Possible inputs for cv are:

  • None, to use the default 3-fold cross-validation,
  • integer, to specify the number of folds.
  • An object to be used as a cross-validation generator.
  • An iterable yielding train/test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, sklearn.model_selection.StratifiedKFold is used. In all other cases, sklearn.model_selection.KFold is used.

n_cv_jobs : int, optional (default=1)

Number of jobs to be passed to cross_val_predict during blend.

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from wolpert.wrappers import CVStackableTransformer
>>> CVStackableTransformer(GaussianNB(priors=None), cv=5,
...                        method='predict_proba')
...     
CVStackableTransformer(cv=5, estimator=GaussianNB(priors=None),
                       method='predict_proba', n_cv_jobs=1,
                       scoring=None, verbose=False)

Methods

blend(X, y, **fit_params) Transform dataset using cross validation.
fit(X[, y]) Fit the estimator.
fit_blend(X, y, **fit_params) Transform dataset using cross validation and fits the estimator to the entire dataset.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(*args, **kwargs) Transform the whole dataset.
blend(X, y, **fit_params)[source]

Transform dataset using cross validation.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y=None, **fit_params)[source]

Fit the estimator.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
self : object
fit_blend(X, y, **fit_params)[source]

Transform dataset using cross validation and fits the estimator to the entire dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(*args, **kwargs)[source]

Transform the whole dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data to be transformed. Use dtype=np.float32 for maximum efficiency. Sparse matrices are also supported, use sparse csr_matrix for maximum efficiency.

Returns:
X_transformed : sparse matrix, shape=(n_samples, n_out)

Transformed dataset.

class wolpert.wrappers.cross_val.CVWrapper(default_method='auto', default_scoring=None, verbose=False, cv=3, n_cv_jobs=1)[source]

Helper class to wrap estimators with CVStackableTransformer

Parameters:
default_method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

default_scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

cv : int, cross-validation generator or an iterable, optional (default=3)

Determines the cross-validation splitting strategy to be used for generating features to train the next layer on the stacked ensemble or, more specifically, during blend.

Possible inputs for cv are:

  • None, to use the default 3-fold cross-validation,
  • integer, to specify the number of folds.
  • An object to be used as a cross-validation generator.
  • An iterable yielding train/test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, sklearn.model_selection.StratifiedKFold is used. In all other cases, sklearn.model_selection.KFold is used.

n_cv_jobs : int, optional (default=1)

Number of jobs to be passed to cross_val_predict during blend.

Methods

wrap_estimator(estimator[, method]) Wraps an estimator and returns a transformer that is suitable for stacking.
wrap_estimator(estimator, method=None, **kwargs)[source]

Wraps an estimator and returns a transformer that is suitable for stacking.

Parameters:
estimator : predictor

The estimator to be blended.

method : string or None, optional (default=None)

If not None, his method will be called on the estimator instead of default_method to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

Returns:
t : CVStackableTransformer