wolpert.wrappers package

Module contents

class wolpert.wrappers.CVStackableTransformer(estimator, method='auto', scoring=None, verbose=False, cv=3, n_cv_jobs=1)[source]

Transformer to turn estimators into meta-estimators for model stacking

This class uses the k-fold predictions to “blend” the estimator. This allows the subsequent layers to use all the data for training. The drawback is that, as the metaestimators will be re-trained using the whole training set, the train and test set for subsequent layers won’t be generated from the same probability distribution. Either way, this method still proves useful in practice.

Parameters:
estimator : predictor

The estimator to be blended.

method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

Note: due to performance reasons, the scoring here will be slightly different from the actual mean of each fold’s scores, since it uses the cross_val_predict output to generate a single score.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

cv : int, cross-validation generator or an iterable, optional (default=3)

Determines the cross-validation splitting strategy to be used for generating features to train the next layer on the stacked ensemble or, more specifically, during blend.

Possible inputs for cv are:

  • None, to use the default 3-fold cross-validation,
  • integer, to specify the number of folds.
  • An object to be used as a cross-validation generator.
  • An iterable yielding train/test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, sklearn.model_selection.StratifiedKFold is used. In all other cases, sklearn.model_selection.KFold is used.

n_cv_jobs : int, optional (default=1)

Number of jobs to be passed to cross_val_predict during blend.

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from wolpert.wrappers import CVStackableTransformer
>>> CVStackableTransformer(GaussianNB(priors=None), cv=5,
...                        method='predict_proba')
...     
CVStackableTransformer(cv=5, estimator=GaussianNB(priors=None),
                       method='predict_proba', n_cv_jobs=1,
                       scoring=None, verbose=False)

Methods

blend(X, y, **fit_params) Transform dataset using cross validation.
fit(X[, y]) Fit the estimator.
fit_blend(X, y, **fit_params) Transform dataset using cross validation and fits the estimator to the entire dataset.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(*args, **kwargs) Transform the whole dataset.
blend(X, y, **fit_params)[source]

Transform dataset using cross validation.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y=None, **fit_params)[source]

Fit the estimator.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
self : object
fit_blend(X, y, **fit_params)[source]

Transform dataset using cross validation and fits the estimator to the entire dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(*args, **kwargs)[source]

Transform the whole dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data to be transformed. Use dtype=np.float32 for maximum efficiency. Sparse matrices are also supported, use sparse csr_matrix for maximum efficiency.

Returns:
X_transformed : sparse matrix, shape=(n_samples, n_out)

Transformed dataset.

class wolpert.wrappers.CVWrapper(default_method='auto', default_scoring=None, verbose=False, cv=3, n_cv_jobs=1)[source]

Helper class to wrap estimators with CVStackableTransformer

Parameters:
default_method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

default_scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

cv : int, cross-validation generator or an iterable, optional (default=3)

Determines the cross-validation splitting strategy to be used for generating features to train the next layer on the stacked ensemble or, more specifically, during blend.

Possible inputs for cv are:

  • None, to use the default 3-fold cross-validation,
  • integer, to specify the number of folds.
  • An object to be used as a cross-validation generator.
  • An iterable yielding train/test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, sklearn.model_selection.StratifiedKFold is used. In all other cases, sklearn.model_selection.KFold is used.

n_cv_jobs : int, optional (default=1)

Number of jobs to be passed to cross_val_predict during blend.

Methods

wrap_estimator(estimator[, method]) Wraps an estimator and returns a transformer that is suitable for stacking.
wrap_estimator(estimator, method=None, **kwargs)[source]

Wraps an estimator and returns a transformer that is suitable for stacking.

Parameters:
estimator : predictor

The estimator to be blended.

method : string or None, optional (default=None)

If not None, his method will be called on the estimator instead of default_method to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

Returns:
t : CVStackableTransformer
class wolpert.wrappers.HoldoutStackableTransformer(estimator, method='auto', scoring=None, verbose=False, holdout_size=0.1, random_state=42, fit_to_all_data=False)[source]

Transformer to turn estimators into meta-estimators for model stacking

During blending, trains on one part of the dataset and generates predictions for the other part. This makes it more robust against leaks, but the subsequent layers will have less data to train on.

Beware that the blend method will return a dataset that is smaller than the original one.

Parameters:
estimator : predictor

The estimator to be blended.

method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

holdout_size : float, optional (default=.1)

Fraction of the dataset to be ignored for training. The holdout_size will be the size of the blended dataset.

random_state : int or RandomState instance, optional (default=42)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

fit_to_all_data : bool, optional (default=False)

When true, will fit the final estimator to the whole dataset. If not, fits only to the non-holdout set. This only affects the fit and fit_blend steps.

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from wolpert.wrappers import HoldoutStackableTransformer
>>> HoldoutStackableTransformer(GaussianNB(priors=None),
...                             holdout_size=.2,
...                             method='predict_proba')
...     
HoldoutStackableTransformer(estimator=GaussianNB(priors=None),
                            fit_to_all_data=False,
                            holdout_size=0.2,
                            method='predict_proba',
                            random_state=42,
                            scoring=None,
                            verbose=False)

Methods

blend(X, y, **fit_params) Transform dataset using a train-test split.
fit(X, y, **fit_params) Fit the estimator to the training set.
fit_blend(X, y, **fit_params) Fit to and transform dataset.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(*args, **kwargs) Transform the whole dataset.
blend(X, y, **fit_params)[source]

Transform dataset using a train-test split.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y, **fit_params)[source]

Fit the estimator to the training set.

If self.fit_to_all_data is true, will fit to whole dataset. If not, will only fit to the part not in the holdout set during blending.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
self : object
fit_blend(X, y, **fit_params)[source]

Fit to and transform dataset.

If self.fit_to_all_data is true, will fit to whole dataset. If not, will only fit to the part not in the holdout set during blending.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(*args, **kwargs)[source]

Transform the whole dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data to be transformed. Use dtype=np.float32 for maximum efficiency. Sparse matrices are also supported, use sparse csr_matrix for maximum efficiency.

Returns:
X_transformed : sparse matrix, shape=(n_samples, n_out)

Transformed dataset.

class wolpert.wrappers.HoldoutWrapper(default_method='auto', default_scoring=None, verbose=False, holdout_size=0.1, random_state=42, fit_to_all_data=False)[source]

Helper class to wrap estimators with HoldoutStackableTransformer

Parameters:
default_method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

default_scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

holdout_size : float, optional (default=.1)

Fraction of the dataset to be ignored for training. The holdout_size will be the size of the blended dataset.

random_state : int or RandomState instance, optional (default=42)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

fit_to_all_data : bool, optional (default=False)

When true, will fit the final estimator to the whole dataset. If not, fits only to the non-holdout set. This only affects the fit and fit_blend steps.

Methods

wrap_estimator(estimator[, method]) Wraps an estimator and returns a transformer that is suitable for stacking.
wrap_estimator(estimator, method=None, **kwargs)[source]

Wraps an estimator and returns a transformer that is suitable for stacking.

Parameters:
estimator : predictor

The estimator to be blended.

method : string or None, optional (default=None)

If not None, his method will be called on the estimator instead of default_method to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

Returns:
t : HoldoutStackableTransformer
class wolpert.wrappers.TimeSeriesStackableTransformer(estimator, method='auto', scoring=None, verbose=False, offset=0, test_set_size=1, min_train_size=1, max_train_size=None, n_cv_jobs=1)[source]

Transformer to turn estimators into meta-estimators for model stacking

Each split is composed by a train set containing the first t rows in the data set and a test set composed of rows t+k to t+k+n, where k and n are the offset and test_set_size parameters.

Parameters:
estimator : predictor

The estimator to be blended.

method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

offset : integer, optional (default=0)

Number of rows to skip after the last train split rows

test_set_size : integer, optional (default=1)

Size of the test set. This will also be the amount of rows added to the training set at each iteration

min_train_size : int, optional (default=1)

Minimum size for a single training set.

max_train_size : int, optional (default=None)

Maximum size for a single training set.

n_cv_jobs : int, optional (default=1)

Number of jobs to be passed to cross_val_predict during blend.

Methods

blend(X, y, **fit_params) Transform dataset using time series split.
fit(X[, y]) Fit the estimator.
fit_blend(X, y, **fit_params) Transform dataset using cross validation and fits the estimator to the entire dataset.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(*args, **kwargs) Transform the whole dataset.
blend(X, y, **fit_params)[source]

Transform dataset using time series split.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y=None, **fit_params)[source]

Fit the estimator.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
self : object
fit_blend(X, y, **fit_params)[source]

Transform dataset using cross validation and fits the estimator to the entire dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(*args, **kwargs)[source]

Transform the whole dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data to be transformed. Use dtype=np.float32 for maximum efficiency. Sparse matrices are also supported, use sparse csr_matrix for maximum efficiency.

Returns:
X_transformed : sparse matrix, shape=(n_samples, n_out)

Transformed dataset.

class wolpert.wrappers.TimeSeriesWrapper(default_method='auto', default_scoring=None, verbose=False, offset=0, test_set_size=1, min_train_size=1, max_train_size=None, n_cv_jobs=1)[source]

Helper class to wrap estimators with TimeSeriesStackableTransformer

Parameters:
default_method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

default_scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

offset : integer, optional (default=0)

Number of rows to skip after the last train split rows

test_set_size : integer, optional (default=1)

Size of the test set. This will also be the amount of rows added to the training set at each iteration

min_train_size : int, optional (default=1)

Minimum size for a single training set.

max_train_size : int, optional (default=None)

Maximum size for a single training set.

n_cv_jobs : int, optional (default=1)

Number of jobs to be passed to cross_val_predict during blend.

Methods

wrap_estimator(estimator[, method]) Wraps an estimator and returns a transformer that is suitable for stacking.
wrap_estimator(estimator, method=None, **kwargs)[source]

Wraps an estimator and returns a transformer that is suitable for stacking.

Parameters:
estimator : predictor

The estimator to be blended.

method : string or None, optional (default=None)

If not None, his method will be called on the estimator instead of default_method to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

Returns:
t : TimeSeriesStackableTransformer