wolpert.wrappers.cross_val module¶
Stacked ensemble wrapper using cross validation
-
class
wolpert.wrappers.cross_val.
CVStackableTransformer
(estimator, method='auto', scoring=None, verbose=False, cv=3, n_cv_jobs=1)[source]¶ Transformer to turn estimators into meta-estimators for model stacking
This class uses the k-fold predictions to “blend” the estimator. This allows the subsequent layers to use all the data for training. The drawback is that, as the metaestimators will be re-trained using the whole training set, the train and test set for subsequent layers won’t be generated from the same probability distribution. Either way, this method still proves useful in practice.
Parameters: - estimator : predictor
The estimator to be blended.
- method : string, optional (default=’auto’)
This method will be called on the estimator to produce the output of transform. If the method is
auto
, will try to invoke, for each estimator,predict_proba
,decision_function
orpredict
in that order.- scoring : string, callable, dict or None (default=None)
If not
None
, will save scores generated by the scoring object on thescores_
attribute each time blend is called.Note: due to performance reasons, the scoring here will be slightly different from the actual mean of each fold’s scores, since it uses the cross_val_predict output to generate a single score.
- verbose : bool (default=False)
When true, prints scores to stdout. scoring must not be
None
.- cv : int, cross-validation generator or an iterable, optional (default=3)
Determines the cross-validation splitting strategy to be used for generating features to train the next layer on the stacked ensemble or, more specifically, during
blend
.Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if the estimator is a classifier and
y
is either binary or multiclass,sklearn.model_selection.StratifiedKFold
is used. In all other cases,sklearn.model_selection.KFold
is used.- n_cv_jobs : int, optional (default=1)
Number of jobs to be passed to
cross_val_predict
duringblend
.
Examples
>>> from sklearn.naive_bayes import GaussianNB >>> from wolpert.wrappers import CVStackableTransformer >>> CVStackableTransformer(GaussianNB(priors=None), cv=5, ... method='predict_proba') ... CVStackableTransformer(cv=5, estimator=GaussianNB(priors=None), method='predict_proba', n_cv_jobs=1, scoring=None, verbose=False)
Methods
blend
(X, y, **fit_params)Transform dataset using cross validation. fit
(X[, y])Fit the estimator. fit_blend
(X, y, **fit_params)Transform dataset using cross validation and fits the estimator to the entire dataset. fit_transform
(X[, y])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. transform
(*args, **kwargs)Transform the whole dataset. -
blend
(X, y, **fit_params)[source]¶ Transform dataset using cross validation.
Parameters: - X : array-like or sparse matrix, shape=(n_samples, n_features)
Input data used to build forests. Use
dtype=np.float32
for maximum efficiency.- y : array-like, shape = [n_samples]
Target values.
- **fit_params : parameters to be passed to the base estimator.
Returns: - X_transformed, indexes : tuple of (sparse matrix, array-like)
X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.
-
fit
(X, y=None, **fit_params)[source]¶ Fit the estimator.
Parameters: - X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples]
Target values.
- **fit_params : parameters to be passed to the base estimator.
Returns: - self : object
-
fit_blend
(X, y, **fit_params)[source]¶ Transform dataset using cross validation and fits the estimator to the entire dataset.
Parameters: - X : array-like or sparse matrix, shape=(n_samples, n_features)
Input data used to build forests. Use
dtype=np.float32
for maximum efficiency.- y : array-like, shape = [n_samples]
Target values.
- **fit_params : parameters to be passed to the base estimator.
Returns: - X_transformed, indexes : tuple of (sparse matrix, array-like)
X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X : numpy array of shape [n_samples, n_features]
Training set.
- y : numpy array of shape [n_samples]
Target values.
Returns: - X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: - self
-
transform
(*args, **kwargs)[source]¶ Transform the whole dataset.
Parameters: - X : array-like or sparse matrix, shape=(n_samples, n_features)
Input data to be transformed. Use
dtype=np.float32
for maximum efficiency. Sparse matrices are also supported, use sparsecsr_matrix
for maximum efficiency.
Returns: - X_transformed : sparse matrix, shape=(n_samples, n_out)
Transformed dataset.
-
class
wolpert.wrappers.cross_val.
CVWrapper
(default_method='auto', default_scoring=None, verbose=False, cv=3, n_cv_jobs=1)[source]¶ Helper class to wrap estimators with
CVStackableTransformer
Parameters: - default_method : string, optional (default=’auto’)
This method will be called on the estimator to produce the output of transform. If the method is
auto
, will try to invoke, for each estimator,predict_proba
,decision_function
orpredict
in that order.- default_scoring : string, callable, dict or None (default=None)
If not
None
, will save scores generated by the scoring object on thescores_
attribute each time blend is called.- verbose : bool (default=False)
When true, prints scores to stdout. scoring must not be
None
.- cv : int, cross-validation generator or an iterable, optional (default=3)
Determines the cross-validation splitting strategy to be used for generating features to train the next layer on the stacked ensemble or, more specifically, during
blend
.Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if the estimator is a classifier and
y
is either binary or multiclass,sklearn.model_selection.StratifiedKFold
is used. In all other cases,sklearn.model_selection.KFold
is used.- n_cv_jobs : int, optional (default=1)
Number of jobs to be passed to
cross_val_predict
duringblend
.
Methods
wrap_estimator
(estimator[, method])Wraps an estimator and returns a transformer that is suitable for stacking. -
wrap_estimator
(estimator, method=None, **kwargs)[source]¶ Wraps an estimator and returns a transformer that is suitable for stacking.
Parameters: - estimator : predictor
The estimator to be blended.
- method : string or None, optional (default=None)
If not
None
, his method will be called on the estimator instead ofdefault_method
to produce the output of transform. If the method isauto
, will try to invoke, for each estimator,predict_proba
,decision_function
orpredict
in that order.
Returns: - t : CVStackableTransformer