pipeline

Submodules

Attributes

methods

Classes

`BasePipeline`	A base class for creating an end-to-end pipeline object that applies feature attribution methods and
`Pipeline`	A pipeline that evaluates the performance of the explanation methods on the neural network models
`ExperimentPipeline`	A pipeline that evaluates the performance of the explanation methods on the neural network models
`Example`	Stores a datapoint with its attributions and score, for ranking top n examples.
`Results`	Object that records and processes the results of experiments from a subclass of BasePipeline.
`Experiment`	A class representing an experimental setup for evaluating explanation methods on specific dataset and

Functions

to_device(→ Any)

Moves data to device if the data is a tensor or a dict of tensors.

Package Contents

class pipeline.BasePipeline(batch_size: int = 50, default_target: Any | None = None, results: xaiunits.pipeline.results.Results | None = None, n_examples: int | None = None)

A base class for creating an end-to-end pipeline object that applies feature attribution methods and evaluates.

This is a base class is intended to support other pipeline classes that are build upon it.

Explanation methods to be ran and evaluated must contain the .attribute method. The pipeline uses batching when applying the explanation methods and evaluating them over the evaluation metrics. The results of the explanation method’s performance will be processed and cleaned by within the pipeline, and is easily callable by the user. Non-deterministic explanation methods are accomodated too, where multiple trials runs is possible with the pipeline.

Alongside evaluation performance, runtime performance is measured in the pipeline too. For customisation of the explanation methods and evaluation metrics, applying a wrapper to them may be necessary prior to inputing to the pipeline.

batch_size

Number of data samples to be processed by the models and for the explanation methods to be applied on within each batch.

Type:: int

default_target

Type or values of the target that is expected to be outputed by the models.

Type:: Any

n_examples

Number of worst and best performing data samples stored based on evaluation metric scores.

Type:: int | NoneType

results

Instance of Results for storing and processing the evaluation results.

Type:: pipeline.Results | NoneType

Initializes a BasePipeline object.

Parameters:

batch_size (int) – Number of data samples to be processed by the models and for the explanation methods to be applied on within each batch. Defaults to 50.
default_target (Any, optional) – Type or values of the target that is expected to be outputed by the models. Defaults to None.
n_examples (int | NoneType, optional) – Number of worst and best performing data samples stored based on evaluation metric scores. Defaults to None.
results (pipeline.Results | NoneType) – Instance of Results for storing and processing the evaluation results. Defaults to None.

batch_size = 50

default_target = None

n_examples = None

results

_init_attr(attr_input: collections.abc.Iterable) → List

Initializes the attributes by checking its type and converts it into a list.

Parameters:: attr_input (Iterable | NoneType | Any) – The object or collection of objects relevant to the attribute of interest.
Returns:: A list containing all the elements of the input.
Return type:: list

_single_explanation_attribute(data: Any, model: Any, method: Any, metrics: Any, method_seed: int | None = None, device: torch.device | None = None, trial_group_name: str | None = None) → None

Computes the score of a single explanation method on a neural network and evaluates the explanation method with respect to all evaluation metrics.

The explanation method is applied over the dataset by batches, where the a batch with size batch_size is generated by the dataset iteratively and the attribution score for it is computed. Depending on the explanation method the attribution score may be fixed or unfixed for each sample within the same batch.

The total runtime of applying the explanation method to each batch will be measured.

Evaluation metrics are similarly calculated individually for the attribution score of each batch.

If the n_samples attribute is not None, the batches that give the best and worst n_samples number of evaluation metric scores will be stored.

The evaluation and runtime measurements will be stored inside the results attribute in batches as a dictionary.

Parameters:

data (torch.util.data.Dataset) – Dataset of inputs
model (torch.nn.Module) – Neural network to for the explanation method to be applied on
method (methods.methods_wrapper.Wrapper) – Explanation method to be applied to the model and evaluated.
metrics (list) – List of evaluation metrics to be applied.
method_seed (int | NoneType, optional) – The seed set for reproducibility of the explanation method. It will not be set if it is None. Defaults to None.
device (torch.device, optional) – The device which the objects will be stored on. Defaults to None.
trial_group_name (str) – string representing name of the trial group name

_apply_attribution_method(feature_inputs: torch.Tensor, method_instance: Any, batch_results: Dict) → Any

Computes the attribution evaluation for the given feature inputs using the explanation method of interest.

The time it takes to compute the feature attributions is appended to the results for the current batch of data.

Parameters:

feature_inputs (torch.Tensor) – The input features for the current batch.
method_instance (Any) – Explanation method used to compute the feature attributions.
batch_results (dict) – Recorded results for the current batch of data.

Returns:

The computed feature attributions for the given feature inputs.

Return type:

torch.Tensor

_store_top_n(n: int, key: Tuple, batch_scores: torch.Tensor, attribute: torch.Tensor, batch_info: Tuple, example_type: str) → None

Keeps a running heap of the top n examples with the highest scores, for each key.

Supports both ‘max’ / ‘min’ example_type, for the n highest / lowest scores.

Notes

If batch_scores is a single value, it is repeated to match the batch size.
The heap of examples are stored in results.examples.

Parameters:

n (int) – Number of examples to store.
key (tuple) – The key to identify which (method, model, metric) these examples are for.
batch_scores (torch.Tensor) – A tensor of scores for the current batch of examples.
attribute (torch.Tensor) – The attributes tensor for the current batch of examples.
batch_info (tuple) – A tuple containing feature inputs, labels, targets, and context for the batch.
example_type (str) – Specifies the type of scoring mechanism (‘max’ or ‘min’) to use for ranking.

unpack_batch(batch: torch.Tensor | Tuple, device: torch.device | None = None) → Tuple

Unpacks a batch into (feature_inputs, labels, context) depending on the format.

Parameters:

batch (torch.Tensor | Tuple) – A batch of data which can be a single tensor (feature_inputs), a tuple of two tensors (feature_inputs and labels), or a tuple of three elements (feature_inputs, labels, and context).
device (torch.device, optional) – The device which the objects will be stored on. Defaults to None.

Raises:

TypeError – If dataset is not in a supported format.

set_default_target(model: Any, feature_inputs: torch.Tensor, y_labels: Any) → Any

Calculates the target based on self.default_target.

Two special keywords: - If self.default_target is ‘y_labels’, returns the y_labels - If self.default_target is ‘predicted_class’, returns y=model(feature_inputs)

Three other standard options: - If self.default_target is None, returns None. - If self.default_target is an integer, returns that integer. - self.target may also be a tuple or tensor matching the batch size.

These are the standard options that will be used by the default evaluation methods. If an alternative option is needed, the user can override the evaluation metric function.

Parameters:

model (torch.nn.Module) – The model to use for predictions if self.default_target is ‘predicted_class’.
feature_inputs (torch.Tensor) – The input features to the model.
y_labels (torch.Tensor) – The actual labels for the input features, if self.default_target is ‘y_labels’

Raises:

ValueError – If the value of default_target is invalid.

_all_models_explanation_attributes(data: List[Any], models: List[Any], methods: List[Any], metrics: List[Any], method_seeds: List[int], device: torch.device | None = None, trial_group_name: str | None = None) → None

Applies every explanation methods on each of the neural network models and they are evaluated against the evalaution metrics.

Parameters:

data (torch.utils.data.Dataset) – Dataset the neural netowrk models are operating on.
models (list) – List of neural network models for the explanation methods to be applied on.
methods (list) – List of explanation methods to apply and evaluated.
metrics (list) – List of evaluation metrics to apply.
method_seeds (list) – List of random seeds for applying the explanation method.
device (torch.device, optional) – The device which the objects will be stored on. Defaults to None.
trial_group_name (str) – string representing name of the trial group name

class pipeline.Pipeline(models: Any, datas: Any, methods: Any, metrics: Any | None = None, method_seeds: int | None = None, batch_size: int = 50, default_target: str | None = None, results: xaiunits.pipeline.results.Results | None = None, n_examples: int | None = None, name: int | None = None)

Bases: BasePipeline

A pipeline that evaluates the performance of the explanation methods on the neural network models with respect to evaluation metrics of interest.

Single or multiple models, Single or multiple datasets, explanation methods, evaluation methods are supported. Explanation methods must contain the .attribute method. The pipeline uses batching when applying the explanation methods and evaluating them over the evaluation metrics. The results of the explanation method’s performance will be processed and cleaned by within the pipeline, and is easily callable by the user. Non-deterministic explanation methods are accomodated too, where multiple trials runs is possible with the pipeline.

If no evaluation metric is provided explicitly as an argument to the instantiation of a Pipeline object, then a default metric will be used from one of the provided datasets, subject to the availability of it in any of the datasets.

Alongside evaluation performance, runtime performance is measured in the pipeline too. For customization of the explanation methods and evaluation metrics, applying a wrapper to them may be necessary prior to inputting to the pipeline.

Inherits from:: BasePipeline: Base class for setting up a pipeline that applies feature attribution methods and evaluates.

models

List of neural network models for the explanation methods to be applied on.

Type:: list

datas

Dataset object for generating the data samples that are compatible to the neural networks provided.

Type:: list

methods

List of explanation methods to be evaluated for their performance on producing attribution scores.

Type:: list

metrics

List of evaluation metrics for the explanation methods to be evaluated against.

Type:: list

method_seed

List of seeds used for repeating and replicating the results over a single or multiple trials.

Type:: list

Initializes a Pipeline object.

All models, explanation methods, evaluation metrics, method seed inputs will be forced to be list that contains all the relevant items. Labels for the evaluation metrics will be generated based on methods.methods_wrapper.MetricsWrapper.__name__ and must be unique.

Parameters:

models (torch.nn.Module | Iterable) – Single or iterable collection of neural network models for explanation methods to be applied on.
data (torch.utils.data.Dataset | Iterable) – Dataset object for generating the data samples that are compatible to the neural networks provided.
methods (methods.methods_wrapper.Wrapper | captum.attr._utils.attribution.Attribution | Iterable) – Single or iterable collection of explanation methods to be evaluated.
metrics (methods.methods_wrapper.MetricsWrapper | Iterable | None) – Single or iterable collection of evaluation metrics used for evaluating explanation methods. Defaults to None.
method_seed (int | Iterable, optional) – Seeds for replicating explanation methods results over multiple trials. Defaults to None, and a random seed will be picked where no replicability will be enforced.
batch_size (int) – Number of data samples to be processed by the models and for the explanation methods to be applied on within each batch. Defaults to 50.
default_target (Any, optional) – Type or values of the target that is expected to be outputted by the models. Defaults to None.
model_names (str | Iterable, optional) – Single or iterable collection of model name used the purpose of distinguishing between models. Defaults to None.
results (Results, optional) – Instance of Results for storing and processing the evaluation results. Defaults to None, and an empty Results instance will be used.
n_examples (int, optional) – Number of worst and best performing data samples stored based on evaluation metric scores. Defaults to None.
name (str, optional) – string representing name of the trial group.

Raises:

Exception – If the collection of evaluation metrics do not all have unique names.

models

datas

methods

method_seed

trial_group_name = None

run(device: torch.device | None = None) → xaiunits.pipeline.results.Results

Alias for explanation_attribute method. Runs the pipeline and returns the results.

Parameters:: device (torch.device, optional) – The device which the objects will be stored on. Defaults to None.
Returns:: the evaluation results from running the pipeline.
Return type:: pipeline.Results

explanation_attribute(device: torch.device | None = None) → xaiunits.pipeline.results.Results

Applies every explanation methods on each of the neural network models and dataset, which are then evaluated against the evaluation metrics.

If an evaluation metric is not wrapped as a methods.methods_wrapper.Wrapper instance, manual wrapping will be performed.

The explanation methods will be repeated a variable number of times depending on the method_seed attribute.

All evaluation and runtime results will be stored within the results attribute in its unprocessed form.

Each of the neural network models will be forced to evaluation mode.

Parameters:: device (torch.device, optional) – The device which the objects will be stored on. Defaults to None.
Returns:: the evaluation results from running the pipeline.
Return type:: pipeline.Results

_init_none_metric() → Any

Returns the default metric from any of the dataset inputted, if available.

It iterates over the provided data and checks if any of them have a default metric attribute. If a default metric is found, it returns that metric. Otherwise, it raises an exception.

Returns:: The class that wraps the default metric from one of the datasets inputed.
Return type:: type
Raises:: Exception – If none of the provided datasets have a default metric.

class pipeline.ExperimentPipeline(experiments: xaiunits.pipeline.experiment.Experiment | List[xaiunits.pipeline.experiment.Experiment], batch_size: int = 50, default_target: Any | None = None, results: xaiunits.pipeline.results.Results | None = None, n_examples: int | None = None)

Bases: BasePipeline

A pipeline that evaluates the performance of the explanation methods on the neural network models with respect to evaluation metrics of interest.

This pipeline only expects a list of Experiment Class, in which data, models, explanation methods, evaluation metric and seeds are specified.

Inherits from:: BasePipeline: Base class for setting up a pipeline that applies feature attribution methods and evaluates.

experiments

List experiments to be ran.

Type:: list[pipeline.Experiment]

Initializes an ExperimentPipeline object.

Parameters:

experiments (List[Experiment]) – List of experiments to be run.
batch_size (int) – Number of data samples to be processed by the models and for the explanation methods to be applied on within each batch. Defaults to 50.
default_target (Any, optional) – Type or values of the target that is expected to be outputed by the models. Defaults to None.
results (Results, optional) – Instance of Results for storing and processing the evaluation results. Defaults to None, and an empty Results instance will be used.
n_examples (int, optional) – Number of worst and best performing data samples stored based on evaluation metric scores. Defaults to None.

experiments

run(device: torch.device | None = None) → xaiunits.pipeline.results.Results

Alias for explanation_attribute method. Runs the pipeline and returns the results.

Parameters:: device (torch.device, optional) – The device which the objects will be stored on. Defaults to None.
Returns:: the evaluation results from running the pipeline.
Return type:: pipeline.Results

explanation_attribute(device: torch.device | None = None) → xaiunits.pipeline.results.Results

Using the items stored in each Experiment object, apply every explanation methods on each of the neural network models, which are then evaluated against the evalaution metrics.

Depending on the type of object that is given to represent the dataset, multiple datasets will be instantiated using the various seeds provided, or simply using the single dataset object supplied.

If an evaluation metric is not wrapped as a methods.methods_wrapper.Wrapper instance, manual wrapping will be peformed.

The explanation methods will be repeated a variable number of times depending on the method_seed attribute.

All evaluation and runtime results will be stored within the results attribute in its unprocessed form.

Each of the neural network models will be forced to evaluation mode.

Parameters:: device (torch.device, optional) – The device which the objects will be stored on. Defaults to None.
Returns:: the evaluation results from running the pipeline.
Return type:: pipeline.Results

pipeline.to_device(data: Any, device: torch.device | None = None) → Any

Moves data to device if the data is a tensor or a dict of tensors.

Parameters:

data (Any) – Data to be moved.
device (torch.device) – Device to move the data to.

Returns:

Data object that is stored in the specified device.

Return type:

Any

pipeline.methods

class pipeline.Example

Stores a datapoint with its attributions and score, for ranking top n examples.

score_for_ranking: float

score: float

attribute: torch.Tensor

feature_inputs: torch.Tensor

y_labels: torch.Tensor

target: int | None

context: dict | None

example_type: str

__post_init__() → None

Adjusts the score_for_ranking to be negative for “min” examples. This is so that we can use heapq, a min heap, as a max heap.

Raises:: ValueError – If the value of example_type is not accepted.

class pipeline.Results

Object that records and processes the results of experiments from a subclass of BasePipeline.

data

List of all stored experiment results from a pipeline object.

Type:: list

metrics

List of the names of all evaluation metrics recorded in data.

Type:: list

examples

Dictionary containing the data samples that output the maximum or minimum scores with respect to the evaluation metrics of interest.

Type:: dict

Initializes a Results object.

raw_data = []

examples

append(incoming)

Appends the new result to the collection of results stored.

Parameters:: incoming (dict) – New result to be appended to the existing results.

property data: pandas.DataFrame

Processes the raw_data list into a DataFrame, flattening over the batch dimension.

For each data instance, the value under the ‘attr_time’ column will be the time it took for the batch, it belongs to, to compute its respective attribution scores, divided by the size of the batch.

Returns:: The processed results, one row per datapoint.
Return type:: pandas.DataFrame

process_data() → None

Convenience method for accessing the self.data property.

Returns:: The processed results, one row per datapoint.
Return type:: pandas.DataFrame

print_stats(metrics: List[Any] | None = None, stat_funcs: List[str] = ['mean', 'std'], index: List[str] = ['data', 'model', 'method'], initial_mean_agg: List[str] = ['batch_id', 'batch_row_id'], time_unit_measured: str = 'dataset', decimal_places: int = 3, column_index: List = []) → pandas.DataFrame

Prints the results in the form of a pivot table for each of the statistic required.

The indices of the printed table correspond to the model and explanation method, and the columns of the printed table correpond to the evaluation metrics. The values of the printed table are the statistics calculated across the number of experiment trials. When mean and standard deviation are needed, a single pivot table will be printed that records both of them.

Parameters:

metrics (list, optional) – A list of the names of all metrics to be printed. Defaults to None.
stat_funcs (list | str) – A list of aggregation functions that are required to be printed. It can be a str if only a single type of aggregation is required. Supports the same preset and custom aggregation functions as supported by pandas. Defaults to [‘mean’, ‘std’].
index (list) – A list of the names of the columns to be used as indices in the pivot table. Defaults to [‘data’, ‘model’, ‘method’]
initial_mean_agg (list) – A list of the columns to be used for the initial mean. For example, if we want to calculate the standard deviation between experiments, we take the mean over batch_id and batch_row_id first, then calculate the standard deviation. Defaults to [‘batch_id’, ‘batch_row_id’].
time_unit_measured (str) – The unit for which the time to perform the attribution method is calculated and aggregated. Defaults to ‘dataset’, and only 3 values are supported inputs: - ‘dataset’: time needed to apply explanation method on each dataset - ‘batch’: time needed to apply explanation method on each batch - ‘instance’: time needed to apply explanation method on each data instance. It is estimate derived from the batch time.
decimal_places (int) – The decimal places of the values displayed. Defaults to 3 decimal places.
column_index (list) – A list of column names to unpivot, in addition to Stats and Metrics. Defaults to [], so just Stats and Metrics are used as columns.

print_all_results() → None: Prints all the data as a wide table.

save(filepath)

Saves the data stored as a .pkl file.

Parameters:: filepath (str) – Path for the .pkl file to be saved in.

load(filepath: str, overwrite: bool = False) → None

Loads the data from a .pkl file.

The data loaded will be concatenated with the existing data store in the object if and only if overwrite is False.

Parameters:

filepath (str) – Path of the .pkl file to load the data from.
overwrite (bool) – True if and only if the data loaded overwrites any existing data stored. Defaults to False.

print_max_examples() → None: Prints in descending order the collection of examples that give the maximum evaluation metric scores.

print_min_examples() → None: Prints in ascending order the collection of examples that give the minimum evaluation metric scores.

_attr_time_summing(time_unit_measured: str) → pandas.DataFrame

Aggregates attribution time based on the specified time unit and formats the data in a correct format with the aggregated attribution times as part of the metrics.

Parameters:

time_unit_measured (str) – The unit for which the time to perform the attribution method is calculated and aggregated. Only allows inputs “dataset”, “batch”, “instance”.

Returns:

pandas.DataFrame object with aggregated attribution time results: formatted correctly.

Return type:

pandas.DataFrame

Raises:

Exception – If an invalid time unit is provided.

A class representing an experimental setup for evaluating explanation methods on specific dataset and neural network models.

It should be ensured that the class corresponding to the dataset contains the generate_model() method in order for a model to be generated in the case that no model is defined for the experiment at initialization.

data

The class of the dataset used for experiments.

Type:: torch.utils.data.Dataset

models

List of neural network models to apply the explanation methods on.

Type:: list | NoneType

methods

List of explanation methods to apply and evaluate.

Type:: list

metrics

List of evaluation metrics to compute.

Type:: list

seeds

List of random seeds to use for the instantiation of the dataset.

Type:: list

method_seeds

List of random seeds to use for explanation methods.

Type:: list

data_params

Additional parameters to be passed to the instantiation of the dataset.

Type:: dict

Initializes an Experiment instance.

Parameters:

data (Any) – The dataset or data class to be used for experiments. Can be passed in as an instantiated object or as a subclass of torch.utils.data.Dataset.
models (list | NoneType | torch.nn.Module) – List of neural network models to apply the explanation methods on.
methods (Any) – List of explanation methods to apply and evaluate.
metrics (Any | NoneType, optional) – List of evaluation metrics to compute. Defaults to None.
seeds (int | list) – List of random seeds to use for the instantiation of the dataset. Defaults to 0.
method_seeds (int | list) – List of random seeds to use for explanation methods. Defaults to 0.
data_params (dict | NoneType, optional) – Additional parameters to be passed during the instantiation of the dataset. Defaults to None.
name (str, optional) – string representing name of the experiment.

Raises:

Exception – If input to any attribute initialization method is invalid.

seeds

data

data_params = None

models

metrics = [None]

methods

method_seeds

exp_name = None

get_data(seed: int) → Any

Returns the dataset instance generated with the specified seed.

Parameters:: seed (int) – the seed for instantiating the dataset.
Returns:: The dataset instance with the specified seed.
Return type:: torch.utils.data.Dataset

get_models(data_instance: Any) → List[torch.nn.Module]

Returns the list of neural networks to apply the explanation methods on.

A default neural network compatible with the given dataset will be generated if the Experiment object has None as its models.

Parameters:: data_instance (torch.utils.data.Dataset) – The dataset instance.
Returns:: List of neural networks to apply the explanation methods on.
Return type:: list

get_methods(data_instance: Any) → List[Any]

Returns the list of explanation methods to apply and evaluate.

Parameters:: data_instance (torch.utils.data.Dataset) – The dataset instance and a placeholder to keep input standardized.
Returns:: List of explanation methods to apply and evaluate.
Return type:: list

get_metrics(data_instance: Any) → List[Any]

Returns the list of evaluation metrics to compute.

Parameters:: data_instance (torch.utils.data.Dataset) – The dataset instance.
Returns:: List of evaluation metrics to compute.
Return type:: list

_init_seeds(seeds: int) → List[int]

Initializes the seeds attribute and transforms the input to the desired datatype.

Parameters:: seeds (int | list) – Random seeds to use for data.
Returns:: List of random seeds to use for the instantiation of the dataset.
Return type:: list
Raises:: Exception – If input to seeds initialization is not an integer or an Iterable of integers.

_init_data(data: Any) → Any

Initializes the data attribute.

Parameters:: data (type) – The instantiated dataset or data class.
Returns:: The dataset or data class.
Return type:: torch.utils.data.Dataset | type
Raises:: Exception – If input to data initialization is neither a torch.utils.data.Dataset instance or subclass of torch.utils.data.Dataset.

_init_methods(methods: Any) → Any

Initializes the methods attribute.

Parameters:: methods (list | NoneType) – List of explanation methods.
Returns:: List of explanation methods.
Return type:: list
Raises:: Exception – If input to methods initialization is None.

_init_metrics(metrics: Any) → Any

Initializes the metrics attribute.

Parameters:: metrics (list | NoneType) – List of evaluation metrics.
Returns:: List of evaluation metrics.
Return type:: list
Raises:: Exception – If input to metrics initialization is None and the dataset does not provide a default metric.

_init_models(models: Any) → Any

Initializes the models attribute and transforms it to the desired datatype.

Parameters:: models (list | torch.nn.Module) – Neural network models.
Returns:: List of neural network models.
Return type:: list
Raises:: Exception – If input to models initialization is not an torch.nn.Module object or Iterable of torch.nn.Module objects.

_init_data_params(data_params: Dict) → Dict

Initializes the data_params attribute.

Parameters:: data_params (dict) – Additional parameters for the instantiation of the dataset.
Returns:: Dictionary of additional data parameters.
Return type:: dict
Raises:: Exception – If input to data parameters initialization is not a dictionary.

_init_method_seeds(method_seeds: int | collections.abc.Iterable[int]) → collections.abc.Iterable[int]

Initializes the method seeds attribute and transforms the input to the desired datatype.

Parameters:: method_seeds (int | list) – Random seeds to use for applying the explanation methods.
Returns:: List of random seeds for applying the explanation methods.
Return type:: list
Raises:: Exception – If input to method seeds initialization is not an integer nor an Iterable of integers.

_verify_metric(metrics: Any) → None

Verifies whether evaluation metrics have unique labels and each of them is in a valid datatype.

Parameters:: metrics (list) – A list of evaluation metrics.
Raises:: Exception – If evaluation metric is defined in an invalid datatype or if evaluation metrics do not have unique labels.