wolpert.pipeline module

class wolpert.pipeline.StackingLayer(transformer_list, n_jobs=1, transformer_weights=None)[source]

Creates a single layer for the stacked ensemble.

This works similarly to scikit learn’s FeatureUnion class, with the only difference that it also exposes methods for blending all estimators for building stacked ensembles.

All transformers must implement blend or, in other words, all transformers must be wrapped with a class that inherits from BaseStackableTransformer.

Some precautions must be taken for this to work properly: when calling StackingLayer constructor directly, make sure all estimators are wrapped with the exact same wrapper.

Parameters of the transformers may be set using its name and the parameter name separated by a ‘__’. A transformer may be replaced entirely by setting the parameter with its name to another transformer, or removed by setting to None.

Parameters:
transformer_list : list of (string, transformer) tuples

List of transformer objects to be applied to the data. The first half of each tuple is the name of the transformer.

n_jobs : int, optional

Number of jobs to run in parallel (default 1).

transformer_weights : dict, optional

Multiplicative weights for features per transformer. Keys are transformer names, values the weights.

See also

wolpert.pipeline.make_stack_layer
convenience function for simplified layer construction.

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.svm import SVR
>>> from wolpert.wrappers import CVStackableTransformer
>>>
>>> reg1 = CVStackableTransformer(GaussianNB(priors=None),
...                               method='predict')
>>> reg2 = CVStackableTransformer(SVR(), method='predict')
>>>
>>> StackingLayer([("gaussiannb", reg1),
...                ("svr", reg2)])
...                        
    StackingLayer(n_jobs=1,
    transformer_list=[('gaussiannb',
                       CVStackableTransformer(cv=3,
                                              estimator=GaussianNB(...),
                                              method='predict',
                                              n_cv_jobs=1,
                                              scoring=None,
                                              verbose=False)),
                      ('svr',
                       CVStackableTransformer(cv=3,
                                              estimator=SVR(...),
                                              method='predict',
                                              n_cv_jobs=1,
                                              scoring=None,
                                              verbose=False))],
    transformer_weights=None)

Methods

blend(X, y, **fit_params) Transform dataset by calling blend on each transformer and concatenating the results.
fit(X[, y]) Fit all transformers using X.
fit_blend(X, y[, weight]) Fit to and transform dataset by calling fit_blend on each transformer and concatenating the results.
fit_transform(X[, y]) Fit all transformers, transform the data and concatenate results.
get_feature_names() Get feature names from all transformers.
get_params([deep]) Get parameters for this estimator.
set_params(**kwargs) Set the parameters of this estimator.
transform(X) Transform X separately by each transformer, concatenate results.
blend(X, y, **fit_params)[source]

Transform dataset by calling blend on each transformer and concatenating the results.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y=None)[source]

Fit all transformers using X.

Parameters:
X : iterable or array-like, depending on transformers

Input data, used to fit transformers.

y : array-like, shape (n_samples, …), optional

Targets for supervised learning.

Returns:
self : FeatureUnion

This estimator

fit_blend(X, y, weight=None, **fit_params)[source]

Fit to and transform dataset by calling fit_blend on each transformer and concatenating the results.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]

Fit all transformers, transform the data and concatenate results.

Parameters:
X : iterable or array-like, depending on transformers

Input data to be transformed.

y : array-like, shape (n_samples, …), optional

Targets for supervised learning.

Returns:
X_t : array-like or sparse matrix, shape (n_samples, sum_n_components)

hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.

get_feature_names()[source]

Get feature names from all transformers.

Returns:
feature_names : list of strings

Names of the features produced by transform.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**kwargs)[source]

Set the parameters of this estimator.

Valid parameter keys can be listed with get_params().

Returns:
self
transform(X)[source]

Transform X separately by each transformer, concatenate results.

Parameters:
X : iterable or array-like, depending on transformers

Input data to be transformed.

Returns:
X_t : array-like or sparse matrix, shape (n_samples, sum_n_components)

hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.

class wolpert.pipeline.StackingPipeline(steps, memory=None)[source]

A pipeline of ``StackingLayer``s with a final estimator.

During fit, sequentially apply fit_blend to each StackingLayer and feeds the transformed data into the next layer. Finally fits the final estimator to the last transformed data.

When generating predictions, calls transform on each layer sequentially before feeding the data to the final estimator.

Parameters:
steps : list of (string, estimator) tuples

List of (name, object) tuples that are chained, in the order in which they are chained, with the last object an estimator. All objects besides the last one must inherit from BaseStackableTransformer.

memory : None, str or object with the joblib.Memory interface, optional

Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.svm import SVR
>>> from sklearn.linear_model import LinearRegression
>>> layer0 = make_stack_layer(GaussianNB(priors=None), SVR())
>>> final_estimator = LinearRegression()
>>> StackingPipeline([("l0", layer0), ("final", final_estimator)])
...                        
StackingPipeline(memory=None,
         steps=[('l0', StackingLayer(...)),
                ('final', LinearRegression(...))])
Attributes:
classes_
inverse_transform

Apply inverse transformations in reverse order

named_steps
transform

Apply transforms, and transform with the final estimator

Methods

blend(X[, y]) Apply blends, and blends with the final estimator
decision_function(X) Apply transforms, and decision_function of the final estimator
fit(X[, y]) Fit the model
fit_blend(X[, y]) Applies fit_blend of last step in pipeline after transforms.
fit_predict(X[, y]) Applies fit_predict of last step in pipeline after transforms.
fit_transform(X[, y]) Fit the model and transform with the final estimator
get_params([deep]) Get parameters for this estimator.
predict(X, **predict_params) Apply transforms to the data, and predict with the final estimator
predict_log_proba(X) Apply transforms, and predict_log_proba of the final estimator
predict_proba(X) Apply transforms, and predict_proba of the final estimator
score(X[, y]) Scores the model using scikit learn’s cross_validate
set_params(**kwargs) Set the parameters of this estimator.
blend(X, y=None, **fit_params)[source]

Apply blends, and blends with the final estimator

Parameters:
X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

decision_function(X)[source]

Apply transforms, and decision_function of the final estimator

Parameters:
X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

Returns:
y_score : array-like, shape = [n_samples, n_classes]
fit(X, y=None, **fit_params)[source]

Fit the model

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

Parameters:
X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Returns:
self : Pipeline

This estimator

fit_blend(X, y=None, **fit_params)[source]

Applies fit_blend of last step in pipeline after transforms.

Applies fit_blends of a pipeline to the data, followed by the fit_blend method of the final estimator in the pipeline. Valid only if the final estimator implements fit_blend.

Parameters:
X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_predict(X, y=None, **fit_params)[source]

Applies fit_predict of last step in pipeline after transforms.

Helper function. Same result as calling fit() followed by predict.

Parameters:
X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Returns:
y_pred : array-like
fit_transform(X, y=None, **fit_params)[source]

Fit the model and transform with the final estimator

Fits all the transforms one after the other and transforms the data, then uses fit_transform on transformed data with the final estimator.

Parameters:
X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**fit_params : dict of string -> object

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

Returns:
Xt : array-like, shape = [n_samples, n_transformed_features]

Transformed samples

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

inverse_transform

Apply inverse transformations in reverse order

All estimators in the pipeline must support inverse_transform.

Parameters:
Xt : array-like, shape = [n_samples, n_transformed_features]

Data samples, where n_samples is the number of samples and n_features is the number of features. Must fulfill input requirements of last step of pipeline’s inverse_transform method.

Returns:
Xt : array-like, shape = [n_samples, n_features]
predict(X, **predict_params)[source]

Apply transforms to the data, and predict with the final estimator

Parameters:
X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

**predict_params : dict of string -> object

Parameters to the predict called at the end of all transformations in the pipeline. Note that while this may be used to return uncertainties from some models with return_std or return_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.

Returns:
y_pred : array-like
predict_log_proba(X)[source]

Apply transforms, and predict_log_proba of the final estimator

Parameters:
X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

Returns:
y_score : array-like, shape = [n_samples, n_classes]
predict_proba(X)[source]

Apply transforms, and predict_proba of the final estimator

Parameters:
X : iterable

Data to predict on. Must fulfill input requirements of first step of the pipeline.

Returns:
y_proba : array-like, shape = [n_samples, n_classes]
score(X, y=None, **validation_args)[source]

Scores the model using scikit learn’s cross_validate

Blends the whole dataset and then uses the final estimator to produce a score

Parameters:
X : iterable

Training data. Must fulfill input requirements of first step of the pipeline.

y : iterable, default=None

Training targets. Must fulfill label requirements for all steps of the pipeline.

**score_args : dictionary

Arguments to be passed to cross_validate

Returns:
scores : dict of float arrays of shape=(n_splits,)

Array of scores of the estimator for each run of the cross validation.

A dict of arrays containing the score/time arrays for each scorer is returned. The possible keys for this dict are:

test_score

The score array for test scores on each cv split.

train_score

The score array for train scores on each cv split. This is available only if return_train_score parameter is True.

fit_time

The time for fitting the estimator on the train set for each cv split.

score_time

The time for scoring the estimator on the test set for each cv split. (Note time for scoring on the train set is not included even if return_train_score is set to True

estimator

The estimator objects for each cv split. This is available only if return_estimator parameter is set to True.

set_params(**kwargs)[source]

Set the parameters of this estimator.

Valid parameter keys can be listed with get_params().

Returns:
self
transform

Apply transforms, and transform with the final estimator

This also works where final estimator is None: all prior transformations are applied.

Parameters:
X : iterable

Data to transform. Must fulfill input requirements of first step of the pipeline.

Returns:
Xt : array-like, shape = [n_samples, n_transformed_features]
wolpert.pipeline.make_stack_layer(*estimators, **kwargs)[source]

Creates a single stack layer to be used in a stacked ensemble.

Parameters:
*estimators : list

List of estimators to be wrapped and used in a layer

restack: bool, optional (default=False)

Wether to repeat the layer input in the output.

n_jobs : int, optinal (default=1)

Number of jobs to be passed to StackingLayer. Each job will be responsible for blending one of the estimators.

blending_wrapper: string or Wrapper object, optional (default=’cv’)

The strategy to be used when blending. Possible string values are ‘cv’ and ‘holdout’. If a wrapper object is passed, it will be used instead.

Returns:
l : StackingLayer

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.svm import SVR
>>> make_stack_layer(GaussianNB(priors=None), SVR())
...                        
    StackingLayer(n_jobs=1,
    transformer_list=[('gaussiannb',
                       CVStackableTransformer(cv=3,
                                              estimator=GaussianNB(...),
                                              method='auto',
                                              n_cv_jobs=1,
                                              scoring=None,
                                              verbose=False)),
                      ('svr',
                       CVStackableTransformer(cv=3,
                                              estimator=SVR(...),
                                              method='auto',
                                              n_cv_jobs=1,
                                              scoring=None,
                                              verbose=False))],
    transformer_weights=None)