canonical_sets.data.adult.Adult
- class Adult(train_path=None, test_path=None, download_train_path=None, download_test_path=None, features=None, groups=None, scaler=MinMaxScaler(feature_range=(-1, 1)), prefix_sep='+', val_prop=0.2, preprocess=True, seed=1234)[source]
Bases:
BaseDataAdult Data Set - UCI Machine Learning Repository.
This class downloads and preprocesses the Adult dataset as a pd.DataFrame.
- train_data
The training data.
- Type
pd.DataFrame
- test_data
The testing data.
- Type
pd.DataFrame
- train_labels
The training labels.
- Type
pd.DataFrame
- test_labels
The testing labels.
- Type
pd.DataFrame
- val_data
The validation data.
- Type
pd.DataFrame
- val_labels
The validation labels.
- Type
pd.DataFrame
Example
>>> adult = Adult()
Initialize the data.
- Parameters
train_path (str, optional) – The path to the training data if it is already downloaded.
test_path (str, optional) – The path to the testing data if it is already downloaded.
download_train_path (str, optional) – The path to save the training data to (needs to end in .csv). The default is
None.download_test_path (str, optional) – The path to save the testing data to (needs to end in .csv). The default is
None.features (List[str], optional) – The features to use. The default is
None.groups (Dict[str, Dict[str, str]], optional) – The groups to use. The default is
None.scaler (sklearn.base.TransformerMixin) – Any of the
sklearnpreprocessing modules. The default issklearn.preprocessing.MinMaxScaler.prefix_sep (str) – The prefix separator to split the categorical feature and category when one-hot encoding. For example, Color = [Red, Green] -> Color+Red and Color+Green. The default is
+.val_prop (float) – The proportion of the training data to use for validation. The default is 0.2.
preprocess (bool) – Whether to preprocess the data. The default is
True.seed (int) – The seed for the random state. The default is 1234.
Methods
Inverse preprocess the data.
Load the data.
Save the object.
Attributes
- inverse_preprocess(data)
Inverse preprocess the data.
- Parameters
data (pd.DataFrame) – The data to inverse preprocess.
- Returns
The inverse preprocessed data.
- Return type
pd.DataFrame