canonical_sets.data.adult.Adult

class Adult(train_path=None, test_path=None, download_train_path=None, download_test_path=None, features=None, groups=None, scaler=MinMaxScaler(feature_range=(-1, 1)), prefix_sep='+', val_prop=0.2, preprocess=True, seed=1234)[source]

Bases: BaseData

Adult Data Set - UCI Machine Learning Repository.

This class downloads and preprocesses the Adult dataset as a pd.DataFrame.

train_data

The training data.

Type

pd.DataFrame

test_data

The testing data.

Type

pd.DataFrame

train_labels

The training labels.

Type

pd.DataFrame

test_labels

The testing labels.

Type

pd.DataFrame

val_data

The validation data.

Type

pd.DataFrame

val_labels

The validation labels.

Type

pd.DataFrame

numerical_cols

The numerical columns.

Type

List[str]

categorical_cols

The categorical columns.

Type

List[str]

Example

>>> adult = Adult()

Initialize the data.

Parameters
  • train_path (str, optional) – The path to the training data if it is already downloaded.

  • test_path (str, optional) – The path to the testing data if it is already downloaded.

  • download_train_path (str, optional) – The path to save the training data to (needs to end in .csv). The default is None.

  • download_test_path (str, optional) – The path to save the testing data to (needs to end in .csv). The default is None.

  • features (List[str], optional) – The features to use. The default is None.

  • groups (Dict[str, Dict[str, str]], optional) – The groups to use. The default is None.

  • scaler (sklearn.base.TransformerMixin) – Any of the sklearn preprocessing modules. The default is sklearn.preprocessing.MinMaxScaler.

  • prefix_sep (str) – The prefix separator to split the categorical feature and category when one-hot encoding. For example, Color = [Red, Green] -> Color+Red and Color+Green. The default is +.

  • val_prop (float) – The proportion of the training data to use for validation. The default is 0.2.

  • preprocess (bool) – Whether to preprocess the data. The default is True.

  • seed (int) – The seed for the random state. The default is 1234.

Methods

inverse_preprocess

Inverse preprocess the data.

load

Load the data.

save

Save the object.

Attributes

train_data

val_data

test_data

train_labels

val_labels

test_labels

numerical_cols

categorical_cols

inverse_preprocess(data)

Inverse preprocess the data.

Parameters

data (pd.DataFrame) – The data to inverse preprocess.

Returns

The inverse preprocessed data.

Return type

pd.DataFrame

classmethod load(path)

Load the data.

Parameters

path (str) – The path to load the data from (needs to end in .pkl).

save(path)

Save the object.

Parameters

path (str) – The path to save the object (needs to end in .pkl).

Return type

None