wolpert.pipeline module¶
-
class
wolpert.pipeline.
StackingLayer
(transformer_list, n_jobs=1, transformer_weights=None)[source]¶ Creates a single layer for the stacked ensemble.
This works similarly to scikit learn’s
FeatureUnion
class, with the only difference that it also exposes methods for blending all estimators for building stacked ensembles.All transformers must implement
blend
or, in other words, all transformers must be wrapped with a class that inherits fromBaseStackableTransformer
.Some precautions must be taken for this to work properly: when calling
StackingLayer
constructor directly, make sure all estimators are wrapped with the exact same wrapper.Parameters of the transformers may be set using its name and the parameter name separated by a ‘__’. A transformer may be replaced entirely by setting the parameter with its name to another transformer, or removed by setting to
None
.Parameters: - transformer_list : list of (string, transformer) tuples
List of transformer objects to be applied to the data. The first half of each tuple is the name of the transformer.
- n_jobs : int, optional
Number of jobs to run in parallel (default 1).
- transformer_weights : dict, optional
Multiplicative weights for features per transformer. Keys are transformer names, values the weights.
See also
wolpert.pipeline.make_stack_layer
- convenience function for simplified layer construction.
Examples
>>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.svm import SVR >>> from wolpert.wrappers import CVStackableTransformer >>> >>> reg1 = CVStackableTransformer(GaussianNB(priors=None), ... method='predict') >>> reg2 = CVStackableTransformer(SVR(), method='predict') >>> >>> StackingLayer([("gaussiannb", reg1), ... ("svr", reg2)]) ... StackingLayer(n_jobs=1, transformer_list=[('gaussiannb', CVStackableTransformer(cv=3, estimator=GaussianNB(...), method='predict', n_cv_jobs=1, scoring=None, verbose=False)), ('svr', CVStackableTransformer(cv=3, estimator=SVR(...), method='predict', n_cv_jobs=1, scoring=None, verbose=False))], transformer_weights=None)
Methods
blend
(X, y, **fit_params)Transform dataset by calling blend
on each transformer and concatenating the results.fit
(X[, y])Fit all transformers using X. fit_blend
(X, y[, weight])Fit to and transform dataset by calling fit_blend
on each transformer and concatenating the results.fit_transform
(X[, y])Fit all transformers, transform the data and concatenate results. get_feature_names
()Get feature names from all transformers. get_params
([deep])Get parameters for this estimator. set_params
(**kwargs)Set the parameters of this estimator. transform
(X)Transform X separately by each transformer, concatenate results. -
blend
(X, y, **fit_params)[source]¶ Transform dataset by calling
blend
on each transformer and concatenating the results.Parameters: - X : array-like or sparse matrix, shape=(n_samples, n_features)
Input data used to build forests. Use
dtype=np.float32
for maximum efficiency.- y : array-like, shape = [n_samples]
Target values.
- **fit_params : parameters to be passed to the base estimator.
Returns: - X_transformed, indexes : tuple of (sparse matrix, array-like)
X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.
-
fit
(X, y=None)[source]¶ Fit all transformers using X.
Parameters: - X : iterable or array-like, depending on transformers
Input data, used to fit transformers.
- y : array-like, shape (n_samples, …), optional
Targets for supervised learning.
Returns: - self : FeatureUnion
This estimator
-
fit_blend
(X, y, weight=None, **fit_params)[source]¶ Fit to and transform dataset by calling
fit_blend
on each transformer and concatenating the results.Parameters: - X : array-like or sparse matrix, shape=(n_samples, n_features)
Input data used to build forests. Use
dtype=np.float32
for maximum efficiency.- y : array-like, shape = [n_samples]
Target values.
- **fit_params : parameters to be passed to the base estimator.
Returns: - X_transformed, indexes : tuple of (sparse matrix, array-like)
X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit all transformers, transform the data and concatenate results.
Parameters: - X : iterable or array-like, depending on transformers
Input data to be transformed.
- y : array-like, shape (n_samples, …), optional
Targets for supervised learning.
Returns: - X_t : array-like or sparse matrix, shape (n_samples, sum_n_components)
hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
-
get_feature_names
()[source]¶ Get feature names from all transformers.
Returns: - feature_names : list of strings
Names of the features produced by transform.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(**kwargs)[source]¶ Set the parameters of this estimator.
Valid parameter keys can be listed with
get_params()
.Returns: - self
-
transform
(X)[source]¶ Transform X separately by each transformer, concatenate results.
Parameters: - X : iterable or array-like, depending on transformers
Input data to be transformed.
Returns: - X_t : array-like or sparse matrix, shape (n_samples, sum_n_components)
hstack of results of transformers. sum_n_components is the sum of n_components (output dimension) over transformers.
-
class
wolpert.pipeline.
StackingPipeline
(steps, memory=None)[source]¶ A pipeline of ``StackingLayer``s with a final estimator.
During
fit
, sequentially applyfit_blend
to eachStackingLayer
and feeds the transformed data into the next layer. Finally fits the final estimator to the last transformed data.When generating predictions, calls
transform
on each layer sequentially before feeding the data to the final estimator.Parameters: - steps : list of (string, estimator) tuples
List of (name, object) tuples that are chained, in the order in which they are chained, with the last object an estimator. All objects besides the last one must inherit from
BaseStackableTransformer
.- memory : None, str or object with the joblib.Memory interface, optional
Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute
named_steps
orsteps
to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming.
Examples
>>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.svm import SVR >>> from sklearn.linear_model import LinearRegression >>> layer0 = make_stack_layer(GaussianNB(priors=None), SVR()) >>> final_estimator = LinearRegression() >>> StackingPipeline([("l0", layer0), ("final", final_estimator)]) ... StackingPipeline(memory=None, steps=[('l0', StackingLayer(...)), ('final', LinearRegression(...))])
Attributes: - classes_
inverse_transform
Apply inverse transformations in reverse order
- named_steps
transform
Apply transforms, and transform with the final estimator
Methods
blend
(X[, y])Apply blends, and blends with the final estimator decision_function
(X)Apply transforms, and decision_function of the final estimator fit
(X[, y])Fit the model fit_blend
(X[, y])Applies fit_blend of last step in pipeline after transforms. fit_predict
(X[, y])Applies fit_predict of last step in pipeline after transforms. fit_transform
(X[, y])Fit the model and transform with the final estimator get_params
([deep])Get parameters for this estimator. predict
(X, **predict_params)Apply transforms to the data, and predict with the final estimator predict_log_proba
(X)Apply transforms, and predict_log_proba of the final estimator predict_proba
(X)Apply transforms, and predict_proba of the final estimator score
(X[, y])Scores the model using scikit learn’s cross_validate
set_params
(**kwargs)Set the parameters of this estimator. -
blend
(X, y=None, **fit_params)[source]¶ Apply blends, and blends with the final estimator
Parameters: - X : iterable
Training data. Must fulfill input requirements of first step of the pipeline.
- y : iterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **fit_params : dict of string -> object
Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
Returns: - X_transformed, indexes : tuple of (sparse matrix, array-like)
X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.
-
decision_function
(X)[source]¶ Apply transforms, and decision_function of the final estimator
Parameters: - X : iterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
Returns: - y_score : array-like, shape = [n_samples, n_classes]
-
fit
(X, y=None, **fit_params)[source]¶ Fit the model
Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.
Parameters: - X : iterable
Training data. Must fulfill input requirements of first step of the pipeline.
- y : iterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **fit_params : dict of string -> object
Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
Returns: - self : Pipeline
This estimator
-
fit_blend
(X, y=None, **fit_params)[source]¶ Applies fit_blend of last step in pipeline after transforms.
Applies fit_blends of a pipeline to the data, followed by the fit_blend method of the final estimator in the pipeline. Valid only if the final estimator implements fit_blend.
Parameters: - X : iterable
Training data. Must fulfill input requirements of first step of the pipeline.
- y : iterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **fit_params : dict of string -> object
Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
Returns: - X_transformed, indexes : tuple of (sparse matrix, array-like)
X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.
-
fit_predict
(X, y=None, **fit_params)[source]¶ Applies fit_predict of last step in pipeline after transforms.
Helper function. Same result as calling
fit()
followed bypredict
.Parameters: - X : iterable
Training data. Must fulfill input requirements of first step of the pipeline.
- y : iterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **fit_params : dict of string -> object
Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
Returns: - y_pred : array-like
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit the model and transform with the final estimator
Fits all the transforms one after the other and transforms the data, then uses fit_transform on transformed data with the final estimator.
Parameters: - X : iterable
Training data. Must fulfill input requirements of first step of the pipeline.
- y : iterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **fit_params : dict of string -> object
Parameters passed to the
fit
method of each step, where each parameter name is prefixed such that parameterp
for steps
has keys__p
.
Returns: - Xt : array-like, shape = [n_samples, n_transformed_features]
Transformed samples
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
inverse_transform
¶ Apply inverse transformations in reverse order
All estimators in the pipeline must support
inverse_transform
.Parameters: - Xt : array-like, shape = [n_samples, n_transformed_features]
Data samples, where
n_samples
is the number of samples andn_features
is the number of features. Must fulfill input requirements of last step of pipeline’sinverse_transform
method.
Returns: - Xt : array-like, shape = [n_samples, n_features]
-
predict
(X, **predict_params)[source]¶ Apply transforms to the data, and predict with the final estimator
Parameters: - X : iterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
- **predict_params : dict of string -> object
Parameters to the
predict
called at the end of all transformations in the pipeline. Note that while this may be used to return uncertainties from some models with return_std or return_cov, uncertainties that are generated by the transformations in the pipeline are not propagated to the final estimator.
Returns: - y_pred : array-like
-
predict_log_proba
(X)[source]¶ Apply transforms, and predict_log_proba of the final estimator
Parameters: - X : iterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
Returns: - y_score : array-like, shape = [n_samples, n_classes]
-
predict_proba
(X)[source]¶ Apply transforms, and predict_proba of the final estimator
Parameters: - X : iterable
Data to predict on. Must fulfill input requirements of first step of the pipeline.
Returns: - y_proba : array-like, shape = [n_samples, n_classes]
-
score
(X, y=None, **validation_args)[source]¶ Scores the model using scikit learn’s
cross_validate
Blends the whole dataset and then uses the final estimator to produce a score
Parameters: - X : iterable
Training data. Must fulfill input requirements of first step of the pipeline.
- y : iterable, default=None
Training targets. Must fulfill label requirements for all steps of the pipeline.
- **score_args : dictionary
Arguments to be passed to
cross_validate
Returns: - scores : dict of float arrays of shape=(n_splits,)
Array of scores of the estimator for each run of the cross validation.
A dict of arrays containing the score/time arrays for each scorer is returned. The possible keys for this
dict
are:test_score
The score array for test scores on each cv split.
train_score
The score array for train scores on each cv split. This is available only if
return_train_score
parameter isTrue
.fit_time
The time for fitting the estimator on the train set for each cv split.
score_time
The time for scoring the estimator on the test set for each cv split. (Note time for scoring on the train set is not included even if
return_train_score
is set toTrue
estimator
The estimator objects for each cv split. This is available only if
return_estimator
parameter is set toTrue
.
-
set_params
(**kwargs)[source]¶ Set the parameters of this estimator.
Valid parameter keys can be listed with
get_params()
.Returns: - self
-
transform
¶ Apply transforms, and transform with the final estimator
This also works where final estimator is
None
: all prior transformations are applied.Parameters: - X : iterable
Data to transform. Must fulfill input requirements of first step of the pipeline.
Returns: - Xt : array-like, shape = [n_samples, n_transformed_features]
-
wolpert.pipeline.
make_stack_layer
(*estimators, **kwargs)[source]¶ Creates a single stack layer to be used in a stacked ensemble.
Parameters: - *estimators : list
List of estimators to be wrapped and used in a layer
- restack: bool, optional (default=False)
Wether to repeat the layer input in the output.
- n_jobs : int, optinal (default=1)
Number of jobs to be passed to
StackingLayer
. Each job will be responsible for blending one of the estimators.- blending_wrapper: string or Wrapper object, optional (default=’cv’)
The strategy to be used when blending. Possible string values are ‘cv’ and ‘holdout’. If a wrapper object is passed, it will be used instead.
Returns: - l : StackingLayer
Examples
>>> from sklearn.naive_bayes import GaussianNB >>> from sklearn.svm import SVR >>> make_stack_layer(GaussianNB(priors=None), SVR()) ... StackingLayer(n_jobs=1, transformer_list=[('gaussiannb', CVStackableTransformer(cv=3, estimator=GaussianNB(...), method='auto', n_cv_jobs=1, scoring=None, verbose=False)), ('svr', CVStackableTransformer(cv=3, estimator=SVR(...), method='auto', n_cv_jobs=1, scoring=None, verbose=False))], transformer_weights=None)