wolpert.pipeline module¶

class wolpert.pipeline.StackingLayer(transformer_list, n_jobs=1, transformer_weights=None)[source]¶

Creates a single layer for the stacked ensemble.

This works similarly to scikit learn’s FeatureUnion class, with the only difference that it also exposes methods for blending all estimators for building stacked ensembles.

All transformers must implement blend or, in other words, all transformers must be wrapped with a class that inherits from BaseStackableTransformer.

Some precautions must be taken for this to work properly: when calling StackingLayer constructor directly, make sure all estimators are wrapped with the exact same wrapper.

Parameters of the transformers may be set using its name and the parameter name separated by a ‘__’. A transformer may be replaced entirely by setting the parameter with its name to another transformer, or removed by setting to None.

Parameters:	transformer_list : list of (string, transformer) tuples List of transformer objects to be applied to the data. The first half of each tuple is the name of the transformer. n_jobs : int, optional Number of jobs to run in parallel (default 1). transformer_weights : dict, optional Multiplicative weights for features per transformer. Keys are transformer names, values the weights.

See also

wolpert.pipeline.make_stack_layer: convenience function for simplified layer construction.

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.svm import SVR
>>> from wolpert.wrappers import CVStackableTransformer
>>>
>>> reg1 = CVStackableTransformer(GaussianNB(priors=None),
...                               method='predict')
>>> reg2 = CVStackableTransformer(SVR(), method='predict')
>>>
>>> StackingLayer([("gaussiannb", reg1),
...                ("svr", reg2)])
...                        
    StackingLayer(n_jobs=1,
    transformer_list=[('gaussiannb',
                       CVStackableTransformer(cv=3,
                                              estimator=GaussianNB(...),
                                              method='predict',
                                              n_cv_jobs=1,
                                              scoring=None,
                                              verbose=False)),
                      ('svr',
                       CVStackableTransformer(cv=3,
                                              estimator=SVR(...),
                                              method='predict',
                                              n_cv_jobs=1,
                                              scoring=None,
                                              verbose=False))],
    transformer_weights=None)

Methods

`blend`(X, y, **fit_params)	Transform dataset by calling `blend` on each transformer and concatenating the results.
`fit`(X[, y])	Fit all transformers using X.
`fit_blend`(X, y[, weight])	Fit to and transform dataset by calling `fit_blend` on each transformer and concatenating the results.
`fit_transform`(X[, y])	Fit all transformers, transform the data and concatenate results.
`get_feature_names`()	Get feature names from all transformers.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**kwargs)	Set the parameters of this estimator.
`transform`(X)	Transform X separately by each transformer, concatenate results.

blend(X, y, **fit_params)[source]¶

Transform dataset by calling blend on each transformer and concatenating the results.

Parameters:	X : array-like or sparse matrix, shape=(n_samples, n_features) Input data used to build forests. Use `dtype=np.float32` for maximum efficiency. y : array-like, shape = [n_samples] Target values. **fit_params : parameters to be passed to the base estimator.
Returns:	X_transformed, indexes : tuple of (sparse matrix, array-like) X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y=None)[source]¶

Fit all transformers using X.

Parameters:	X : iterable or array-like, depending on transformers Input data, used to fit transformers. y : array-like, shape (n_samples, …), optional Targets for supervised learning.
Returns:	self : FeatureUnion This estimator

fit_blend(X, y, weight=None, **fit_params)[source]¶

Fit to and transform dataset by calling fit_blend on each transformer and concatenating the results.

Parameters:	X : array-like or sparse matrix, shape=(n_samples, n_features) Input data used to build forests. Use `dtype=np.float32` for maximum efficiency. y : array-like, shape = [n_samples] Target values. **fit_params : parameters to be passed to the base estimator.
Returns:	X_transformed, indexes : tuple of (sparse matrix, array-like) X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]¶

Fit all transformers, transform the data and concatenate results.

Parameters:	X : iterable or array-like, depending on transformers Input data to be transformed. y : array-like, shape (n_samples, …), optional Targets for supervised learning.
Returns:	X_t : array-like or sparse matrix, shape (n_samples, sum_n_components) hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.

get_feature_names()[source]¶

Get feature names from all transformers.

Returns:	feature_names : list of strings Names of the features produced by transform.

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

set_params(**kwargs)[source]¶

Set the parameters of this estimator.

Valid parameter keys can be listed with get_params().

Returns:	self

transform(X)[source]¶

Transform X separately by each transformer, concatenate results.

Parameters:	X : iterable or array-like, depending on transformers Input data to be transformed.
Returns:	X_t : array-like or sparse matrix, shape (n_samples, sum_n_components) hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.

class wolpert.pipeline.StackingPipeline(steps, memory=None)[source]¶

A pipeline of ``StackingLayer``s with a final estimator.

During fit, sequentially apply fit_blend to each StackingLayer and feeds the transformed data into the next layer. Finally fits the final estimator to the last transformed data.

When generating predictions, calls transform on each layer sequentially before feeding the data to the final estimator.

Parameters:

steps : list of (string, estimator) tuples: List of (name, object) tuples that are chained, in the order in which they are chained, with the last object an estimator. All objects besides the last one must inherit from BaseStackableTransformer.
memory : None, str or object with the joblib.Memory interface, optional: Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute named_steps or steps to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.svm import SVR
>>> from sklearn.linear_model import LinearRegression
>>> layer0 = make_stack_layer(GaussianNB(priors=None), SVR())
>>> final_estimator = LinearRegression()
>>> StackingPipeline([("l0", layer0), ("final", final_estimator)])
...                        
StackingPipeline(memory=None,
         steps=[('l0', StackingLayer(...)),
                ('final', LinearRegression(...))])

Attributes:	classes_ `inverse_transform` Apply inverse transformations in reverse order named_steps `transform` Apply transforms, and transform with the final estimator

Methods

`blend`(X[, y])	Apply blends, and blends with the final estimator
`decision_function`(X)	Apply transforms, and decision_function of the final estimator
`fit`(X[, y])	Fit the model
`fit_blend`(X[, y])	Applies fit_blend of last step in pipeline after transforms.
`fit_predict`(X[, y])	Applies fit_predict of last step in pipeline after transforms.
`fit_transform`(X[, y])	Fit the model and transform with the final estimator
`get_params`([deep])	Get parameters for this estimator.
`predict`(X, **predict_params)	Apply transforms to the data, and predict with the final estimator
`predict_log_proba`(X)	Apply transforms, and predict_log_proba of the final estimator
`predict_proba`(X)	Apply transforms, and predict_proba of the final estimator
`score`(X[, y])	Scores the model using scikit learn’s `cross_validate`
`set_params`(**kwargs)	Set the parameters of this estimator.

blend(X, y=None, **fit_params)[source]¶

Apply blends, and blends with the final estimator

Parameters:	X : iterable Training data. Must fulfill input requirements of first step of the pipeline. y : iterable, default=None Training targets. Must fulfill label requirements for all steps of the pipeline. **fit_params : dict of string -> object Parameters passed to the `fit` method of each step, where each parameter name is prefixed such that parameter `p` for step `s` has key `s__p`.
Returns:	X_transformed, indexes : tuple of (sparse matrix, array-like) X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

decision_function(X)[source]¶

Apply transforms, and decision_function of the final estimator

Parameters:	X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.
Returns:	y_score : array-like, shape = [n_samples, n_classes]

fit(X, y=None, **fit_params)[source]¶

Fit the model

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

Parameters:	X : iterable Training data. Must fulfill input requirements of first step of the pipeline. y : iterable, default=None Training targets. Must fulfill label requirements for all steps of the pipeline. **fit_params : dict of string -> object Parameters passed to the `fit` method of each step, where each parameter name is prefixed such that parameter `p` for step `s` has key `s__p`.
Returns:	self : Pipeline This estimator

fit_blend(X, y=None, **fit_params)[source]¶

Applies fit_blend of last step in pipeline after transforms.

Applies fit_blends of a pipeline to the data, followed by the fit_blend method of the final estimator in the pipeline. Valid only if the final estimator implements fit_blend.

Parameters:	X : iterable Training data. Must fulfill input requirements of first step of the pipeline. y : iterable, default=None Training targets. Must fulfill label requirements for all steps of the pipeline. **fit_params : dict of string -> object Parameters passed to the `fit` method of each step, where each parameter name is prefixed such that parameter `p` for step `s` has key `s__p`.
Returns:	X_transformed, indexes : tuple of (sparse matrix, array-like) X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_predict(X, y=None, **fit_params)[source]¶

Applies fit_predict of last step in pipeline after transforms.

Helper function. Same result as calling fit() followed by predict.

Parameters:	X : iterable Training data. Must fulfill input requirements of first step of the pipeline. y : iterable, default=None Training targets. Must fulfill label requirements for all steps of the pipeline. **fit_params : dict of string -> object Parameters passed to the `fit` method of each step, where each parameter name is prefixed such that parameter `p` for step `s` has key `s__p`.
Returns:	y_pred : array-like

fit_transform(X, y=None, **fit_params)[source]¶

Fit the model and transform with the final estimator

Fits all the transforms one after the other and transforms the data, then uses fit_transform on transformed data with the final estimator.

Parameters:	X : iterable Training data. Must fulfill input requirements of first step of the pipeline. y : iterable, default=None Training targets. Must fulfill label requirements for all steps of the pipeline. **fit_params : dict of string -> object Parameters passed to the `fit` method of each step, where each parameter name is prefixed such that parameter `p` for step `s` has key `s__p`.
Returns:	Xt : array-like, shape = [n_samples, n_transformed_features] Transformed samples

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

inverse_transform¶

Apply inverse transformations in reverse order

All estimators in the pipeline must support inverse_transform.

Parameters:	Xt : array-like, shape = [n_samples, n_transformed_features] Data samples, where `n_samples` is the number of samples and `n_features` is the number of features. Must fulfill input requirements of last step of pipeline’s `inverse_transform` method.
Returns:	Xt : array-like, shape = [n_samples, n_features]

predict(X, **predict_params)[source]¶

Apply transforms to the data, and predict with the final estimator

Parameters:

X : iterable: Data to predict on. Must fulfill input requirements of first step of the pipeline.
**predict_params : dict of string -> object: Parameters to the predict called at the end of all transformations in the pipeline. Note that while this may be used to return uncertainties from some models with return_std or return_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.

Returns:

y_pred : array-like

predict_log_proba(X)[source]¶

Apply transforms, and predict_log_proba of the final estimator

Parameters:	X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.
Returns:	y_score : array-like, shape = [n_samples, n_classes]

predict_proba(X)[source]¶

Apply transforms, and predict_proba of the final estimator

Parameters:	X : iterable Data to predict on. Must fulfill input requirements of first step of the pipeline.
Returns:	y_proba : array-like, shape = [n_samples, n_classes]

score(X, y=None, **validation_args)[source]¶

Scores the model using scikit learn’s cross_validate

Blends the whole dataset and then uses the final estimator to produce a score

Parameters:

X : iterable: Training data. Must fulfill input requirements of first step of the pipeline.
y : iterable, default=None: Training targets. Must fulfill label requirements for all steps of the pipeline.
**score_args : dictionary: Arguments to be passed to cross_validate

Returns:

scores : dict of float arrays of shape=(n_splits,)

Array of scores of the estimator for each run of the cross validation.

A dict of arrays containing the score/time arrays for each scorer is returned. The possible keys for this dict are:

test_score

The score array for test scores on each cv split.

train_score

The score array for train scores on each cv split. This is available only if return_train_score parameter is True.

fit_time

The time for fitting the estimator on the train set for each cv split.

score_time

The time for scoring the estimator on the test set for each cv split. (Note time for scoring on the train set is not included even if return_train_score is set to True

estimator

The estimator objects for each cv split. This is available only if return_estimator parameter is set to True.

set_params(**kwargs)[source]¶

Set the parameters of this estimator.

Valid parameter keys can be listed with get_params().

Returns:	self

transform¶

Apply transforms, and transform with the final estimator

This also works where final estimator is None: all prior transformations are applied.

Parameters:	X : iterable Data to transform. Must fulfill input requirements of first step of the pipeline.
Returns:	Xt : array-like, shape = [n_samples, n_transformed_features]

wolpert.pipeline.make_stack_layer(*estimators, **kwargs)[source]¶

Creates a single stack layer to be used in a stacked ensemble.

Parameters:

*estimators : list: List of estimators to be wrapped and used in a layer
restack: bool, optional (default=False): Wether to repeat the layer input in the output.
n_jobs : int, optinal (default=1): Number of jobs to be passed to StackingLayer. Each job will be responsible for blending one of the estimators.
blending_wrapper: string or Wrapper object, optional (default=’cv’): The strategy to be used when blending. Possible string values are ‘cv’ and ‘holdout’. If a wrapper object is passed, it will be used instead.

Returns:

l : StackingLayer

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from sklearn.svm import SVR
>>> make_stack_layer(GaussianNB(priors=None), SVR())
...                        
    StackingLayer(n_jobs=1,
    transformer_list=[('gaussiannb',
                       CVStackableTransformer(cv=3,
                                              estimator=GaussianNB(...),
                                              method='auto',
                                              n_cv_jobs=1,
                                              scoring=None,
                                              verbose=False)),
                      ('svr',
                       CVStackableTransformer(cv=3,
                                              estimator=SVR(...),
                                              method='auto',
                                              n_cv_jobs=1,
                                              scoring=None,
                                              verbose=False))],
    transformer_weights=None)