wolpert.wrappers.holdout module

Stacked ensemble wrapper using holdout strategy

class wolpert.wrappers.holdout.HoldoutStackableTransformer(estimator, method='auto', scoring=None, verbose=False, holdout_size=0.1, random_state=42, fit_to_all_data=False)[source]

Transformer to turn estimators into meta-estimators for model stacking

During blending, trains on one part of the dataset and generates predictions for the other part. This makes it more robust against leaks, but the subsequent layers will have less data to train on.

Beware that the blend method will return a dataset that is smaller than the original one.

Parameters:
estimator : predictor

The estimator to be blended.

method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

holdout_size : float, optional (default=.1)

Fraction of the dataset to be ignored for training. The holdout_size will be the size of the blended dataset.

random_state : int or RandomState instance, optional (default=42)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

fit_to_all_data : bool, optional (default=False)

When true, will fit the final estimator to the whole dataset. If not, fits only to the non-holdout set. This only affects the fit and fit_blend steps.

Examples

>>> from sklearn.naive_bayes import GaussianNB
>>> from wolpert.wrappers import HoldoutStackableTransformer
>>> HoldoutStackableTransformer(GaussianNB(priors=None),
...                             holdout_size=.2,
...                             method='predict_proba')
...     
HoldoutStackableTransformer(estimator=GaussianNB(priors=None),
                            fit_to_all_data=False,
                            holdout_size=0.2,
                            method='predict_proba',
                            random_state=42,
                            scoring=None,
                            verbose=False)

Methods

blend(X, y, **fit_params) Transform dataset using a train-test split.
fit(X, y, **fit_params) Fit the estimator to the training set.
fit_blend(X, y, **fit_params) Fit to and transform dataset.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(*args, **kwargs) Transform the whole dataset.
blend(X, y, **fit_params)[source]

Transform dataset using a train-test split.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y, **fit_params)[source]

Fit the estimator to the training set.

If self.fit_to_all_data is true, will fit to whole dataset. If not, will only fit to the part not in the holdout set during blending.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
self : object
fit_blend(X, y, **fit_params)[source]

Fit to and transform dataset.

If self.fit_to_all_data is true, will fit to whole dataset. If not, will only fit to the part not in the holdout set during blending.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(*args, **kwargs)[source]

Transform the whole dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data to be transformed. Use dtype=np.float32 for maximum efficiency. Sparse matrices are also supported, use sparse csr_matrix for maximum efficiency.

Returns:
X_transformed : sparse matrix, shape=(n_samples, n_out)

Transformed dataset.

class wolpert.wrappers.holdout.HoldoutWrapper(default_method='auto', default_scoring=None, verbose=False, holdout_size=0.1, random_state=42, fit_to_all_data=False)[source]

Helper class to wrap estimators with HoldoutStackableTransformer

Parameters:
default_method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

default_scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

holdout_size : float, optional (default=.1)

Fraction of the dataset to be ignored for training. The holdout_size will be the size of the blended dataset.

random_state : int or RandomState instance, optional (default=42)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

fit_to_all_data : bool, optional (default=False)

When true, will fit the final estimator to the whole dataset. If not, fits only to the non-holdout set. This only affects the fit and fit_blend steps.

Methods

wrap_estimator(estimator[, method]) Wraps an estimator and returns a transformer that is suitable for stacking.
wrap_estimator(estimator, method=None, **kwargs)[source]

Wraps an estimator and returns a transformer that is suitable for stacking.

Parameters:
estimator : predictor

The estimator to be blended.

method : string or None, optional (default=None)

If not None, his method will be called on the estimator instead of default_method to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

Returns:
t : HoldoutStackableTransformer