datagenerator.boolean

Attributes

data

Classes

BooleanDataset

Generic synthetic dataset based on a propositional formula.

BooleanAndDataset

Generic synthetic dataset based on a propositional formula.

BooleanOrDataset

Generic synthetic dataset based on a propositional formula.

Module Contents

class datagenerator.boolean.BooleanDataset(formula: sympy.core.function.FunctionClass, atoms: Iterable | None = None, seed: int = 0, n_samples: int = 10)

Bases: xaiunits.datagenerator.data_generation.BaseFeaturesDataset

Generic synthetic dataset based on a propositional formula.

The dataset corresponds to sampling rows from the truth table of the given propositional formula. If n_samples is no larger than the size of the truth table, then the generated dataset will always contain non-duplicate samples of the truth table. Otherwise, the dataset will still contain rows for the entire truth table but will also contain duplicates.

If the input for atoms is None, the corresponding attribute is by default assigned as the atoms that are extracted from the given formula.

Inherits from:

BaseFeaturesDataset: The base class for creating continuous feature datasets.

formula

A propositional formula for which the dataset is generated.

Type:

sympy.core.function.FunctionClass

atoms

The ordered collection of propositional atoms that were used within the propositional formula.

Type:

tuple

seed

Seed for random number generators to ensure reproducibility.

Type:

int

n_samples

Number of samples in the dataset.

Type:

int

Initializes a BooleanDataset object.

Parameters:
  • formula (sympy.core.function.FunctionClass) – A propositional formula for dataset generation.

  • atoms (Iterable, optional) – Ordered collection of propositional atoms used in the formula. Defaults to None.

  • seed (int) – Seed for random number generation, ensuring reproducibility. Defaults to 0.

  • n_samples (int) – Number of samples to generate for the dataset. Defaults to 10.

atoms
formula
subset_data = ['samples']
subset_attribute = ['perturb_function', 'default_metric', 'generate_model', 'name']
cat_features
name = 'BooleanDataset'
_initialize_samples_labels(n_samples: int) Tuple[torch.Tensor, torch.Tensor]

Initializes the samples and labels of the dataset.

Parameters:

n_samples (int) – number of samples/labels contained in the dataset.

Returns:

Tuple containing the generated samples

and corresponding labels of the dataset.

Return type:

tuple[Tensor, Tensor]

perturb_function(cat_resample_prob: float = 0.2, run_infidelity_decorator: bool = True, multipy_by_inputs: bool = False) Callable

Generates perturb function to be used for XAI method evaluation. Applies gaussian noise for continuous features, and resampling for categorical features.

Parameters:
  • cat_resample_prob (float) – Probability of resampling a categorical feature. Defaults to 0.2.

  • run_infidelity_decorator (bool) – Set to true if the returned fns is to be compatible with infidelity. Set flag to False for sensitivity. Defaults to True.

  • multiply_by_inputs (bool) – Parameters for decorator. Defaults to False.

Returns:

A perturbation function compatible with Captum.

Return type:

perturb_func (function)

generate_model() torch.nn.Module

Generates a neural network model using the given propositional formula and atoms.

Returns:

A neural network model tailored to the dataset’s propositional formula.

Return type:

model.PropFormulaNN

property default_metric: Callable

The default metric for evaluating the performance of explanation methods applied to this dataset.

For this dataset, the default metric is the infidelity metric with the default perturb function.

Returns:

A class that wraps around the default metric to be instantiated

within the pipeline.

Return type:

type

__getitem__(idx: int, others: List[str] = []) Tuple[Any, Ellipsis]

Retrieve a sample and its associated label by index.

Parameters:
  • idx (int) – Index of the sample to retrieve.

  • others (list) – Additional items to retrieve. Defaults to [].

Returns:

Tuple containing the sample and its label.

Return type:

tuple

class datagenerator.boolean.BooleanAndDataset(n_features: int = 2, n_samples: int = 10, seed: int = 0)

Bases: BooleanDataset

Generic synthetic dataset based on a propositional formula.

The dataset corresponds to sampling rows from the truth table of the given propositional formula. If n_samples is no larger than the size of the truth table, then the generated dataset will always contain non-duplicate samples of the truth table. Otherwise, the dataset will still contain rows for the entire truth table but will also contain duplicates.

If the input for atoms is None, the corresponding attribute is by default assigned as the atoms that are extracted from the given formula.

Inherits from:

BaseFeaturesDataset: The base class for creating continuous feature datasets.

formula

A propositional formula for which the dataset is generated.

Type:

sympy.core.function.FunctionClass

atoms

The ordered collection of propositional atoms that were used within the propositional formula.

Type:

tuple

seed

Seed for random number generators to ensure reproducibility.

Type:

int

n_samples

Number of samples in the dataset.

Type:

int

Initializes a BooleanDataset object.

Parameters:
  • formula (sympy.core.function.FunctionClass) – A propositional formula for dataset generation.

  • atoms (Iterable, optional) – Ordered collection of propositional atoms used in the formula. Defaults to None.

  • seed (int) – Seed for random number generation, ensuring reproducibility. Defaults to 0.

  • n_samples (int) – Number of samples to generate for the dataset. Defaults to 10.

n_features = 2
ground_truth
ground_truth_attribute = 'ground_truth'
create_baselines() None
__getitem__(idx: int, others: List[str] = ['baseline', 'ground_truth_attribute']) Tuple[Any, Ellipsis]

Retrieve a sample and its associated label by index.

Parameters:
  • idx (int) – Index of the sample to retrieve.

  • others (list) – Additional items to retrieve. Defaults to [].

Returns:

Tuple containing the sample and its label.

Return type:

tuple

generate_model() torch.nn.Module

Generates a neural network model using the given propositional formula and atoms.

Returns:

A neural network model tailored to the dataset’s propositional formula.

Return type:

model.PropFormulaNN

create_ground_truth() torch.Tensor
property default_metric: Callable

The default metric for evaluating the performance of explanation methods applied to this dataset.

For this dataset, the default metric is the infidelity metric with the default perturb function.

Returns:

A class that wraps around the default metric to be instantiated

within the pipeline.

Return type:

type

class datagenerator.boolean.BooleanOrDataset(n_features: int = 2, n_samples: int = 10, seed: int = 0)

Bases: BooleanDataset

Generic synthetic dataset based on a propositional formula.

The dataset corresponds to sampling rows from the truth table of the given propositional formula. If n_samples is no larger than the size of the truth table, then the generated dataset will always contain non-duplicate samples of the truth table. Otherwise, the dataset will still contain rows for the entire truth table but will also contain duplicates.

If the input for atoms is None, the corresponding attribute is by default assigned as the atoms that are extracted from the given formula.

Inherits from:

BaseFeaturesDataset: The base class for creating continuous feature datasets.

formula

A propositional formula for which the dataset is generated.

Type:

sympy.core.function.FunctionClass

atoms

The ordered collection of propositional atoms that were used within the propositional formula.

Type:

tuple

seed

Seed for random number generators to ensure reproducibility.

Type:

int

n_samples

Number of samples in the dataset.

Type:

int

Initializes a BooleanDataset object.

Parameters:
  • formula (sympy.core.function.FunctionClass) – A propositional formula for dataset generation.

  • atoms (Iterable, optional) – Ordered collection of propositional atoms used in the formula. Defaults to None.

  • seed (int) – Seed for random number generation, ensuring reproducibility. Defaults to 0.

  • n_samples (int) – Number of samples to generate for the dataset. Defaults to 10.

n_features = 2
ground_truth
ground_truth_attribute = 'ground_truth'
create_baselines() None
__getitem__(idx: int, others: List[str] = ['baseline', 'ground_truth_attribute']) Tuple[Any, Ellipsis]

Retrieve a sample and its associated label by index.

Parameters:
  • idx (int) – Index of the sample to retrieve.

  • others (list) – Additional items to retrieve. Defaults to [].

Returns:

Tuple containing the sample and its label.

Return type:

tuple

generate_model() torch.nn.Module

Generates a neural network model using the given propositional formula and atoms.

Returns:

A neural network model tailored to the dataset’s propositional formula.

Return type:

model.PropFormulaNN

create_ground_truth() torch.Tensor
property default_metric: Callable

The default metric for evaluating the performance of explanation methods applied to this dataset.

For this dataset, the default metric is the infidelity metric with the default perturb function.

Returns:

A class that wraps around the default metric to be instantiated

within the pipeline.

Return type:

type

datagenerator.boolean.data