canonical_sets.lucid.LUCID

class LUCID(model, outputs, example_data, numb_of_samples=100, numb_of_epochs=200, lr=0.1, low=-1, high=1, seed=1234, index=True, extra_epoch=True, one_hot_pre=False, one_hot_post=True, log_every_n=0, prefix_sep='+')[source]

Bases: object

Gradient-based inverse design to generate canonical sets.

This class generates a canonical set via inverse design and attributes the pd.DataFrame to results.

results

A dataframe with the canonical inputs.

Type

pd.DataFrame

results_processed

A dataframe with the processed canonical inputs.

Type

pd.DataFrame

Examples

>>> model = tf.keras.Model()
>>> outputs = pd.DataFrame([[0, 1]], columns=["No", "Yes"])
>>> example_data = train_data
>>> lucid = LUCID(model, outputs, example_data)
>>> lucid.results

Initialize the inverse design.

Parameters
  • model (torch.nn.Module or tf.keras.Model) – The trained model to use for inverse design.

  • outputs (pd.DataFrame) – The outputs to use for inverse design. These are the targets/labels that have been used during training. For example, pd.DataFrame([[0, 1]], columns=["<=50K", ">50K"]) in the Adult data set.

  • example_data (pd.DataFrame) – The example data to infer columns, dtypes, … This is often (a part of) the training data itself, but can also be an artificial example.

  • numb_of_samples (int) – The number of samples to generate. The default is 100.

  • numb_of_epochs (int) – The number of epochs to train the model. The default is 200.

  • lr (float) – The learning rate for the optimizer. The default is 0.1.

  • low (float) – The lower bound for the random uniform distribution. The default is -1.

  • high (float) – The upper bound for the random uniform distribution. The default is 1.

  • seed (int) – The seed for the random number generator. The default is 1234.

  • index (bool) – If True the sample and epoch numbers are used as indices in the results pd.DataFrame. Otherwise they are just columns. The default is True.

  • extra_epoch (bool) – If True an additional forward pass is run after the categorical features have been one-hot encoded (post-processed). The results are saved for the last sample as the numb_of_epochs + 1 epoch. If there are no categorical features the argument is ignored. The default is True.

  • one_hot_pre (bool) – If True, the initial values for the categorical features are pre-processed to be one-hot. If there are no categorical features the argument is ignored. Note that the inverse design will start from this one-hot sample, hence the pre- process. If False, the inverse design will start from the randomly drawn initial vectors. The default is False.

  • one_hot_post (bool) – If True, the values for the categorical features are post-processed to be one-hot. Note that the predictions during the inverse design are made with the original values of the categorical features and not with the post-processed values. To run an additional forward pass with the post-processed values check the extra_epoch argument. If there are no categorical features the argument is ignored. The default is True.

  • log_every_n (int) – The number of epochs to log results. If 0, this argument is set equal to the numb_of_epochs argument which makes it a static analysis with only the start and end samples. The default is 0.

  • prefix_sep (str) – The separator for the prefix of the column names. The one-hot encoded features are grouped via the prefix. To be safe, make sure that the prefix only appears as a prefix in the column names (i.e., avoid Categorical-category-name, and opt for Categorical+category-name instead). The default is “+”.

Raises
  • ValueError – If any columns are neither integer (one-hot encoded) or float (numerical).

  • ValueError – If the model is neither a torch.nn.Module or (tf.)keras.Model.

Methods

hist

Plot the results for a given feature.

plot

Plot the outputs.

process_results

Process the results by applying inverse scaler and one-hot encoding to categories.

Attributes

results

results_processed

hist(features)[source]

Plot the results for a given feature.

Parameters

features (str or list of str) – The feature(s) to plot (either 1, 2, 3, 4, 6 or 8).

Raises

ValueError – If the features are neither a string or a list of strings of size 2, 3, 4, 6 or 8.

Note

If the results are not yet processed, they will be with process_results.

Return type

None

plot(output)[source]

Plot the outputs.

Parameters

output (str) – The name of the output to plot.

Return type

None

process_results(scaler=None)[source]

Process the results by applying inverse scaler and one-hot encoding to categories.

Parameters

scaler (sklearn.base.TransformerMixin, optional) – Any of the sklearn preprocessing modules. The default is None which means there is no transformation on numerical features.

Return type

None