wolpert.wrappers.time_series module

class wolpert.wrappers.time_series.TimeSeriesSplit(offset=0, test_set_size=1, min_train_size=1, max_train_size=None)[source]

Time Series cross-validator

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate.

This cross-validation object is a variation of KFold. In the kth split, it returns first k folds as train set and the (k+1)th fold as test set.

Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them.

Read more in the User Guide.

Parameters:
min_train_size : int, optional (default=1)

Minimum size for a single training set.

max_train_size : int, optional (default=None)

Maximum size for a single training set.

offset : integer, optional (default=0)

Number of rows to skip after the last train split rows

test_set_size : integer, optional (default=1)

Size of the test set. This will also be the amount of rows added to the training set at each iteration

Methods

split(X[, y, groups]) Generate indices to split data into training and test set.
split(X, y=None, groups=None)[source]

Generate indices to split data into training and test set.

Parameters:
X : array-like, shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape (n_samples,)

Always ignored, exists for compatibility.

groups : array-like, with shape (n_samples,), optional

Always ignored, exists for compatibility.

Yields:
train : ndarray

The training set indices for that split.

test : ndarray

The testing set indices for that split.

class wolpert.wrappers.time_series.TimeSeriesStackableTransformer(estimator, method='auto', scoring=None, verbose=False, offset=0, test_set_size=1, min_train_size=1, max_train_size=None, n_cv_jobs=1)[source]

Transformer to turn estimators into meta-estimators for model stacking

Each split is composed by a train set containing the first t rows in the data set and a test set composed of rows t+k to t+k+n, where k and n are the offset and test_set_size parameters.

Parameters:
estimator : predictor

The estimator to be blended.

method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

offset : integer, optional (default=0)

Number of rows to skip after the last train split rows

test_set_size : integer, optional (default=1)

Size of the test set. This will also be the amount of rows added to the training set at each iteration

min_train_size : int, optional (default=1)

Minimum size for a single training set.

max_train_size : int, optional (default=None)

Maximum size for a single training set.

n_cv_jobs : int, optional (default=1)

Number of jobs to be passed to cross_val_predict during blend.

Methods

blend(X, y, **fit_params) Transform dataset using time series split.
fit(X[, y]) Fit the estimator.
fit_blend(X, y, **fit_params) Transform dataset using cross validation and fits the estimator to the entire dataset.
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
set_params(**params) Set the parameters of this estimator.
transform(*args, **kwargs) Transform the whole dataset.
blend(X, y, **fit_params)[source]

Transform dataset using time series split.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y=None, **fit_params)[source]

Fit the estimator.

Parameters:
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
self : object
fit_blend(X, y, **fit_params)[source]

Transform dataset using cross validation and fits the estimator to the entire dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data used to build forests. Use dtype=np.float32 for maximum efficiency.

y : array-like, shape = [n_samples]

Target values.

**fit_params : parameters to be passed to the base estimator.
Returns:
X_transformed, indexes : tuple of (sparse matrix, array-like)

X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:
X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
self
transform(*args, **kwargs)[source]

Transform the whole dataset.

Parameters:
X : array-like or sparse matrix, shape=(n_samples, n_features)

Input data to be transformed. Use dtype=np.float32 for maximum efficiency. Sparse matrices are also supported, use sparse csr_matrix for maximum efficiency.

Returns:
X_transformed : sparse matrix, shape=(n_samples, n_out)

Transformed dataset.

class wolpert.wrappers.time_series.TimeSeriesWrapper(default_method='auto', default_scoring=None, verbose=False, offset=0, test_set_size=1, min_train_size=1, max_train_size=None, n_cv_jobs=1)[source]

Helper class to wrap estimators with TimeSeriesStackableTransformer

Parameters:
default_method : string, optional (default=’auto’)

This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

default_scoring : string, callable, dict or None (default=None)

If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.

verbose : bool (default=False)

When true, prints scores to stdout. scoring must not be None.

offset : integer, optional (default=0)

Number of rows to skip after the last train split rows

test_set_size : integer, optional (default=1)

Size of the test set. This will also be the amount of rows added to the training set at each iteration

min_train_size : int, optional (default=1)

Minimum size for a single training set.

max_train_size : int, optional (default=None)

Maximum size for a single training set.

n_cv_jobs : int, optional (default=1)

Number of jobs to be passed to cross_val_predict during blend.

Methods

wrap_estimator(estimator[, method]) Wraps an estimator and returns a transformer that is suitable for stacking.
wrap_estimator(estimator, method=None, **kwargs)[source]

Wraps an estimator and returns a transformer that is suitable for stacking.

Parameters:
estimator : predictor

The estimator to be blended.

method : string or None, optional (default=None)

If not None, his method will be called on the estimator instead of default_method to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.

Returns:
t : TimeSeriesStackableTransformer