canonical_sets.data.base.BaseData

class BaseData(features=None, groups=None, scaler=MinMaxScaler(feature_range=(-1, 1)), prefix_sep='+', val_prop=0.2, test_prop=0.2, preprocess=True, seed=1234)[source]

Bases: object

Base class for data sets.

This is a base class from which all data sets inherit.

train_data

The training data.

Type: pd.DataFrame

test_data

The testing data.

Type: pd.DataFrame

train_labels

The training labels.

Type: pd.DataFrame

test_labels

The testing labels.

Type: pd.DataFrame

val_data

The validation data.

Type: pd.DataFrame

val_labels

The validation labels.

Type: pd.DataFrame

numerical_cols

The numerical columns.

Type: List[str]

categorical_cols

The categorical columns.

Type: List[str]

Initialize the data.

Parameters

features (List[str], optional) – The features to use. The default is None.
groups (Dict[str, Dict[str, str]], optional) – The groups to use. The default is None.
scaler (sklearn.base.TransformerMixin) – Any of the sklearn preprocessing modules for the numerical features. The default is sklearn.preprocessing.MinMaxScaler.
prefix_sep (str) – The prefix separator to split the categorical feature and category when one-hot encoding. For example, Color = [Red, Green] -> Color+Red and Color+Green. The default is +.
val_prop (float) – The proportion of the training data to use for validation. The default is 0.2.
test_prop (float) –

The proportion of the training data to use for testing.
The default is 0.2.
preprocess (bool) – Whether to preprocess the data. The default is True.
seed (int) – The seed for the random state. The default is 1234.

Raises

ValueError – Proportions must be between [0, 1).

Methods

`inverse_preprocess`	Inverse preprocess the data.
`load`	Load the data.
`save`	Save the object.

Attributes

`train_data`
`val_data`
`test_data`
`train_labels`
`val_labels`
`test_labels`
`numerical_cols`
`categorical_cols`

inverse_preprocess(data)[source]

Inverse preprocess the data.

Parameters: data (pd.DataFrame) – The data to inverse preprocess.
Returns: The inverse preprocessed data.
Return type: pd.DataFrame

classmethod load(path)[source]

Load the data.

Parameters: path (str) – The path to load the data from (needs to end in .pkl).

save(path)[source]

Save the object.

Parameters: path (str) – The path to save the object (needs to end in .pkl).
Return type: None