datagenerator.boolean
Attributes
Classes
Generic synthetic dataset based on a propositional formula. |
|
Generic synthetic dataset based on a propositional formula. |
|
Generic synthetic dataset based on a propositional formula. |
Module Contents
- class datagenerator.boolean.BooleanDataset(formula: sympy.core.function.FunctionClass, atoms: Iterable | None = None, seed: int = 0, n_samples: int = 10)
Bases:
xaiunits.datagenerator.data_generation.BaseFeaturesDatasetGeneric synthetic dataset based on a propositional formula.
The dataset corresponds to sampling rows from the truth table of the given propositional formula. If n_samples is no larger than the size of the truth table, then the generated dataset will always contain non-duplicate samples of the truth table. Otherwise, the dataset will still contain rows for the entire truth table but will also contain duplicates.
If the input for atoms is None, the corresponding attribute is by default assigned as the atoms that are extracted from the given formula.
- Inherits from:
BaseFeaturesDataset: The base class for creating continuous feature datasets.
- formula
A propositional formula for which the dataset is generated.
- Type:
sympy.core.function.FunctionClass
- atoms
The ordered collection of propositional atoms that were used within the propositional formula.
- Type:
tuple
- seed
Seed for random number generators to ensure reproducibility.
- Type:
int
- n_samples
Number of samples in the dataset.
- Type:
int
Initializes a BooleanDataset object.
- Parameters:
formula (sympy.core.function.FunctionClass) – A propositional formula for dataset generation.
atoms (Iterable, optional) – Ordered collection of propositional atoms used in the formula. Defaults to None.
seed (int) – Seed for random number generation, ensuring reproducibility. Defaults to 0.
n_samples (int) – Number of samples to generate for the dataset. Defaults to 10.
- atoms
- formula
- subset_data = ['samples']
- subset_attribute = ['perturb_function', 'default_metric', 'generate_model', 'name']
- cat_features
- name = 'BooleanDataset'
- _initialize_samples_labels(n_samples: int) Tuple[torch.Tensor, torch.Tensor]
Initializes the samples and labels of the dataset.
- Parameters:
n_samples (int) – number of samples/labels contained in the dataset.
- Returns:
- Tuple containing the generated samples
and corresponding labels of the dataset.
- Return type:
tuple[Tensor, Tensor]
- perturb_function(cat_resample_prob: float = 0.2, run_infidelity_decorator: bool = True, multipy_by_inputs: bool = False) Callable
Generates perturb function to be used for XAI method evaluation. Applies gaussian noise for continuous features, and resampling for categorical features.
- Parameters:
cat_resample_prob (float) – Probability of resampling a categorical feature. Defaults to 0.2.
run_infidelity_decorator (bool) – Set to true if the returned fns is to be compatible with infidelity. Set flag to False for sensitivity. Defaults to True.
multiply_by_inputs (bool) – Parameters for decorator. Defaults to False.
- Returns:
A perturbation function compatible with Captum.
- Return type:
perturb_func (function)
- generate_model() torch.nn.Module
Generates a neural network model using the given propositional formula and atoms.
- Returns:
A neural network model tailored to the dataset’s propositional formula.
- Return type:
- property default_metric: Callable
The default metric for evaluating the performance of explanation methods applied to this dataset.
For this dataset, the default metric is the infidelity metric with the default perturb function.
- Returns:
- A class that wraps around the default metric to be instantiated
within the pipeline.
- Return type:
type
- __getitem__(idx: int, others: List[str] = []) Tuple[Any, Ellipsis]
Retrieve a sample and its associated label by index.
- Parameters:
idx (int) – Index of the sample to retrieve.
others (list) – Additional items to retrieve. Defaults to [].
- Returns:
Tuple containing the sample and its label.
- Return type:
tuple
- class datagenerator.boolean.BooleanAndDataset(n_features: int = 2, n_samples: int = 10, seed: int = 0)
Bases:
BooleanDatasetGeneric synthetic dataset based on a propositional formula.
The dataset corresponds to sampling rows from the truth table of the given propositional formula. If n_samples is no larger than the size of the truth table, then the generated dataset will always contain non-duplicate samples of the truth table. Otherwise, the dataset will still contain rows for the entire truth table but will also contain duplicates.
If the input for atoms is None, the corresponding attribute is by default assigned as the atoms that are extracted from the given formula.
- Inherits from:
BaseFeaturesDataset: The base class for creating continuous feature datasets.
- formula
A propositional formula for which the dataset is generated.
- Type:
sympy.core.function.FunctionClass
- atoms
The ordered collection of propositional atoms that were used within the propositional formula.
- Type:
tuple
- seed
Seed for random number generators to ensure reproducibility.
- Type:
int
- n_samples
Number of samples in the dataset.
- Type:
int
Initializes a BooleanDataset object.
- Parameters:
formula (sympy.core.function.FunctionClass) – A propositional formula for dataset generation.
atoms (Iterable, optional) – Ordered collection of propositional atoms used in the formula. Defaults to None.
seed (int) – Seed for random number generation, ensuring reproducibility. Defaults to 0.
n_samples (int) – Number of samples to generate for the dataset. Defaults to 10.
- n_features = 2
- ground_truth
- ground_truth_attribute = 'ground_truth'
- create_baselines() None
- __getitem__(idx: int, others: List[str] = ['baseline', 'ground_truth_attribute']) Tuple[Any, Ellipsis]
Retrieve a sample and its associated label by index.
- Parameters:
idx (int) – Index of the sample to retrieve.
others (list) – Additional items to retrieve. Defaults to [].
- Returns:
Tuple containing the sample and its label.
- Return type:
tuple
- generate_model() torch.nn.Module
Generates a neural network model using the given propositional formula and atoms.
- Returns:
A neural network model tailored to the dataset’s propositional formula.
- Return type:
- create_ground_truth() torch.Tensor
- property default_metric: Callable
The default metric for evaluating the performance of explanation methods applied to this dataset.
For this dataset, the default metric is the infidelity metric with the default perturb function.
- Returns:
- A class that wraps around the default metric to be instantiated
within the pipeline.
- Return type:
type
- class datagenerator.boolean.BooleanOrDataset(n_features: int = 2, n_samples: int = 10, seed: int = 0)
Bases:
BooleanDatasetGeneric synthetic dataset based on a propositional formula.
The dataset corresponds to sampling rows from the truth table of the given propositional formula. If n_samples is no larger than the size of the truth table, then the generated dataset will always contain non-duplicate samples of the truth table. Otherwise, the dataset will still contain rows for the entire truth table but will also contain duplicates.
If the input for atoms is None, the corresponding attribute is by default assigned as the atoms that are extracted from the given formula.
- Inherits from:
BaseFeaturesDataset: The base class for creating continuous feature datasets.
- formula
A propositional formula for which the dataset is generated.
- Type:
sympy.core.function.FunctionClass
- atoms
The ordered collection of propositional atoms that were used within the propositional formula.
- Type:
tuple
- seed
Seed for random number generators to ensure reproducibility.
- Type:
int
- n_samples
Number of samples in the dataset.
- Type:
int
Initializes a BooleanDataset object.
- Parameters:
formula (sympy.core.function.FunctionClass) – A propositional formula for dataset generation.
atoms (Iterable, optional) – Ordered collection of propositional atoms used in the formula. Defaults to None.
seed (int) – Seed for random number generation, ensuring reproducibility. Defaults to 0.
n_samples (int) – Number of samples to generate for the dataset. Defaults to 10.
- n_features = 2
- ground_truth
- ground_truth_attribute = 'ground_truth'
- create_baselines() None
- __getitem__(idx: int, others: List[str] = ['baseline', 'ground_truth_attribute']) Tuple[Any, Ellipsis]
Retrieve a sample and its associated label by index.
- Parameters:
idx (int) – Index of the sample to retrieve.
others (list) – Additional items to retrieve. Defaults to [].
- Returns:
Tuple containing the sample and its label.
- Return type:
tuple
- generate_model() torch.nn.Module
Generates a neural network model using the given propositional formula and atoms.
- Returns:
A neural network model tailored to the dataset’s propositional formula.
- Return type:
- create_ground_truth() torch.Tensor
- property default_metric: Callable
The default metric for evaluating the performance of explanation methods applied to this dataset.
For this dataset, the default metric is the infidelity metric with the default perturb function.
- Returns:
- A class that wraps around the default metric to be instantiated
within the pipeline.
- Return type:
type
- datagenerator.boolean.data