wolpert.wrappers.holdout module¶
Stacked ensemble wrapper using holdout strategy
-
class
wolpert.wrappers.holdout.
HoldoutStackableTransformer
(estimator, method='auto', scoring=None, verbose=False, holdout_size=0.1, random_state=42, fit_to_all_data=False)[source]¶ Transformer to turn estimators into meta-estimators for model stacking
During blending, trains on one part of the dataset and generates predictions for the other part. This makes it more robust against leaks, but the subsequent layers will have less data to train on.
Beware that the
blend
method will return a dataset that is smaller than the original one.Parameters: - estimator : predictor
The estimator to be blended.
- method : string, optional (default=’auto’)
This method will be called on the estimator to produce the output of transform. If the method is
auto
, will try to invoke, for each estimator,predict_proba
,decision_function
orpredict
in that order.- scoring : string, callable, dict or None (default=None)
If not
None
, will save scores generated by the scoring object on thescores_
attribute each time blend is called.- verbose : bool (default=False)
When true, prints scores to stdout. scoring must not be
None
.- holdout_size : float, optional (default=.1)
Fraction of the dataset to be ignored for training. The holdout_size will be the size of the blended dataset.
- random_state : int or RandomState instance, optional (default=42)
If int,
random_state
is the seed used by the random number generator; IfRandomState
instance,random_state
is the random number generator; IfNone
, the random number generator is theRandomState
instance used bynp.random
.- fit_to_all_data : bool, optional (default=False)
When true, will fit the final estimator to the whole dataset. If not, fits only to the non-holdout set. This only affects the
fit
andfit_blend
steps.
Examples
>>> from sklearn.naive_bayes import GaussianNB >>> from wolpert.wrappers import HoldoutStackableTransformer >>> HoldoutStackableTransformer(GaussianNB(priors=None), ... holdout_size=.2, ... method='predict_proba') ... HoldoutStackableTransformer(estimator=GaussianNB(priors=None), fit_to_all_data=False, holdout_size=0.2, method='predict_proba', random_state=42, scoring=None, verbose=False)
Methods
blend
(X, y, **fit_params)Transform dataset using a train-test split. fit
(X, y, **fit_params)Fit the estimator to the training set. fit_blend
(X, y, **fit_params)Fit to and transform dataset. fit_transform
(X[, y])Fit to data, then transform it. get_params
([deep])Get parameters for this estimator. set_params
(**params)Set the parameters of this estimator. transform
(*args, **kwargs)Transform the whole dataset. -
blend
(X, y, **fit_params)[source]¶ Transform dataset using a train-test split.
Parameters: - X : array-like or sparse matrix, shape=(n_samples, n_features)
Input data used to build forests. Use
dtype=np.float32
for maximum efficiency.- y : array-like, shape = [n_samples]
Target values.
- **fit_params : parameters to be passed to the base estimator.
Returns: - X_transformed, indexes : tuple of (sparse matrix, array-like)
X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.
-
fit
(X, y, **fit_params)[source]¶ Fit the estimator to the training set.
If self.fit_to_all_data is true, will fit to whole dataset. If not, will only fit to the part not in the holdout set during blending.
Parameters: - X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples]
Target values.
- **fit_params : parameters to be passed to the base estimator.
Returns: - self : object
-
fit_blend
(X, y, **fit_params)[source]¶ Fit to and transform dataset.
If self.fit_to_all_data is true, will fit to whole dataset. If not, will only fit to the part not in the holdout set during blending.
Parameters: - X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and n_features is the number of features.
- y : array-like, shape = [n_samples]
Target values.
- **fit_params : parameters to be passed to the base estimator.
Returns: - X_transformed, indexes : tuple of (sparse matrix, array-like)
X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X : numpy array of shape [n_samples, n_features]
Training set.
- y : numpy array of shape [n_samples]
Target values.
Returns: - X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: - deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: - params : mapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: - self
-
transform
(*args, **kwargs)[source]¶ Transform the whole dataset.
Parameters: - X : array-like or sparse matrix, shape=(n_samples, n_features)
Input data to be transformed. Use
dtype=np.float32
for maximum efficiency. Sparse matrices are also supported, use sparsecsr_matrix
for maximum efficiency.
Returns: - X_transformed : sparse matrix, shape=(n_samples, n_out)
Transformed dataset.
-
class
wolpert.wrappers.holdout.
HoldoutWrapper
(default_method='auto', default_scoring=None, verbose=False, holdout_size=0.1, random_state=42, fit_to_all_data=False)[source]¶ Helper class to wrap estimators with
HoldoutStackableTransformer
Parameters: - default_method : string, optional (default=’auto’)
This method will be called on the estimator to produce the output of transform. If the method is
auto
, will try to invoke, for each estimator,predict_proba
,decision_function
orpredict
in that order.- default_scoring : string, callable, dict or None (default=None)
If not
None
, will save scores generated by the scoring object on thescores_
attribute each time blend is called.- verbose : bool (default=False)
When true, prints scores to stdout. scoring must not be
None
.- holdout_size : float, optional (default=.1)
Fraction of the dataset to be ignored for training. The holdout_size will be the size of the blended dataset.
- random_state : int or RandomState instance, optional (default=42)
If int,
random_state
is the seed used by the random number generator; IfRandomState
instance,random_state
is the random number generator; IfNone
, the random number generator is theRandomState
instance used bynp.random
.- fit_to_all_data : bool, optional (default=False)
When true, will fit the final estimator to the whole dataset. If not, fits only to the non-holdout set. This only affects the
fit
andfit_blend
steps.
Methods
wrap_estimator
(estimator[, method])Wraps an estimator and returns a transformer that is suitable for stacking. -
wrap_estimator
(estimator, method=None, **kwargs)[source]¶ Wraps an estimator and returns a transformer that is suitable for stacking.
Parameters: - estimator : predictor
The estimator to be blended.
- method : string or None, optional (default=None)
If not
None
, his method will be called on the estimator instead ofdefault_method
to produce the output of transform. If the method isauto
, will try to invoke, for each estimator,predict_proba
,decision_function
orpredict
in that order.
Returns: - t : HoldoutStackableTransformer