wolpert.wrappers.time_series module¶

class wolpert.wrappers.time_series.TimeSeriesSplit(offset=0, test_set_size=1, min_train_size=1, max_train_size=None)[source]¶

Time Series cross-validator

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate.

This cross-validation object is a variation of KFold. In the kth split, it returns first k folds as train set and the (k+1)th fold as test set.

Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them.

Read more in the User Guide.

Parameters:

min_train_size : int, optional (default=1): Minimum size for a single training set.
max_train_size : int, optional (default=None): Maximum size for a single training set.
offset : integer, optional (default=0): Number of rows to skip after the last train split rows
test_set_size : integer, optional (default=1): Size of the test set. This will also be the amount of rows added to the training set at each iteration

Methods

split(X[, y, groups]) Generate indices to split data into training and test set.

split(X, y=None, groups=None)[source]¶

Generate indices to split data into training and test set.

Parameters:	X : array-like, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape (n_samples,) Always ignored, exists for compatibility. groups : array-like, with shape (n_samples,), optional Always ignored, exists for compatibility.
Yields:	train : ndarray The training set indices for that split. test : ndarray The testing set indices for that split.

class wolpert.wrappers.time_series.TimeSeriesStackableTransformer(estimator, method='auto', scoring=None, verbose=False, offset=0, test_set_size=1, min_train_size=1, max_train_size=None, n_cv_jobs=1)[source]¶

Transformer to turn estimators into meta-estimators for model stacking

Each split is composed by a train set containing the first t rows in the data set and a test set composed of rows t+k to t+k+n, where k and n are the offset and test_set_size parameters.

Parameters:

estimator : predictor: The estimator to be blended.
method : string, optional (default=’auto’): This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.
scoring : string, callable, dict or None (default=None): If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.
verbose : bool (default=False): When true, prints scores to stdout. scoring must not be None.
offset : integer, optional (default=0): Number of rows to skip after the last train split rows
test_set_size : integer, optional (default=1): Size of the test set. This will also be the amount of rows added to the training set at each iteration
min_train_size : int, optional (default=1): Minimum size for a single training set.
max_train_size : int, optional (default=None): Maximum size for a single training set.
n_cv_jobs : int, optional (default=1): Number of jobs to be passed to cross_val_predict during blend.

Methods

`blend`(X, y, **fit_params)	Transform dataset using time series split.
`fit`(X[, y])	Fit the estimator.
`fit_blend`(X, y, **fit_params)	Transform dataset using cross validation and fits the estimator to the entire dataset.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(args, *kwargs)	Transform the whole dataset.

blend(X, y, **fit_params)[source]¶

Transform dataset using time series split.

Parameters:	X : array-like or sparse matrix, shape=(n_samples, n_features) Input data used to build forests. Use `dtype=np.float32` for maximum efficiency. y : array-like, shape = [n_samples] Target values. **fit_params : parameters to be passed to the base estimator.
Returns:	X_transformed, indexes : tuple of (sparse matrix, array-like) X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit(X, y=None, **fit_params)[source]¶

Fit the estimator.

Parameters:	X : {array-like, sparse matrix}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. **fit_params : parameters to be passed to the base estimator.
Returns:	self : object

fit_blend(X, y, **fit_params)[source]¶

Transform dataset using cross validation and fits the estimator to the entire dataset.

Parameters:	X : array-like or sparse matrix, shape=(n_samples, n_features) Input data used to build forests. Use `dtype=np.float32` for maximum efficiency. y : array-like, shape = [n_samples] Target values. **fit_params : parameters to be passed to the base estimator.
Returns:	X_transformed, indexes : tuple of (sparse matrix, array-like) X_transformed is the transformed dataset. indexes is the indexes of the transformed data on the input.

fit_transform(X, y=None, **fit_params)[source]¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values.
Returns:	X_new : numpy array of shape [n_samples, n_features_new] Transformed array.

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

set_params(**params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

transform(*args, **kwargs)[source]¶

Transform the whole dataset.

Parameters:	X : array-like or sparse matrix, shape=(n_samples, n_features) Input data to be transformed. Use `dtype=np.float32` for maximum efficiency. Sparse matrices are also supported, use sparse `csr_matrix` for maximum efficiency.
Returns:	X_transformed : sparse matrix, shape=(n_samples, n_out) Transformed dataset.

class wolpert.wrappers.time_series.TimeSeriesWrapper(default_method='auto', default_scoring=None, verbose=False, offset=0, test_set_size=1, min_train_size=1, max_train_size=None, n_cv_jobs=1)[source]¶

Helper class to wrap estimators with TimeSeriesStackableTransformer

Parameters:

default_method : string, optional (default=’auto’): This method will be called on the estimator to produce the output of transform. If the method is auto, will try to invoke, for each estimator, predict_proba, decision_function or predict in that order.
default_scoring : string, callable, dict or None (default=None): If not None, will save scores generated by the scoring object on the scores_ attribute each time blend is called.
verbose : bool (default=False): When true, prints scores to stdout. scoring must not be None.
offset : integer, optional (default=0): Number of rows to skip after the last train split rows
test_set_size : integer, optional (default=1): Size of the test set. This will also be the amount of rows added to the training set at each iteration
min_train_size : int, optional (default=1): Minimum size for a single training set.
max_train_size : int, optional (default=None): Maximum size for a single training set.
n_cv_jobs : int, optional (default=1): Number of jobs to be passed to cross_val_predict during blend.

Methods

wrap_estimator(estimator[, method]) Wraps an estimator and returns a transformer that is suitable for stacking.

wrap_estimator(estimator, method=None, **kwargs)[source]¶

Wraps an estimator and returns a transformer that is suitable for stacking.

Parameters:	estimator : predictor The estimator to be blended. method : string or None, optional (default=None) If not `None`, his method will be called on the estimator instead of `default_method` to produce the output of transform. If the method is `auto`, will try to invoke, for each estimator, `predict_proba`, `decision_function` or `predict` in that order.
Returns:	t : TimeSeriesStackableTransformer