{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Customization Tutorial\n", "\n", "This notebook will serve as a tutorial on how users can use custom dataset, method or metric with our package. We will go through examples for each as well as other potential customizable parameters useful. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1 Custom Datasets\n", "\n", "Users are able to pass in their own custom datasets into xaiunits' the pipeline. \n", "\n", "Here we will show simple example of how a user can do so, and later we will show more complex variations. " ] }, { "cell_type": "markdown", "metadata": { "tags": [ "hide-input" ] }, "source": [ "### 1.1 Simple Example\n", "\n", "\n", "In this example we will \n", "1. Use sk_learn's function to download cali data (which omits categorical data) and create a custom torch dataset\n", "2. Train a model using our AutoTraining (this step is optional)\n", "3. Simple Selection of XAI Method and Metric\n", "4. Instantiate Pipeline Class and run attribute\n", "5. Print Results " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import fetch_california_housing\n", "import pickle\n", "\n", "import torch\n", "from torch.utils.data import Dataset, DataLoader\n", "from xaiunits.model import DynamicNN\n", "from xaiunits.trainer.trainer import AutoTrainer\n", "from xaiunits.metrics import perturb_standard_normal, wrap_metric\n", "from xaiunits.pipeline import Pipeline\n", "\n", "from lightning.pytorch.callbacks.early_stopping import EarlyStopping\n", "import lightning as L\n", "\n", "from captum.attr import *\n", "from captum.metrics import sensitivity_max, infidelity\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "#1. Download and Create California Dataset\n", "\n", "class CaliDataset(Dataset):\n", " def __init__(self):\n", " sk_cali = fetch_california_housing(data_home=\"data/cali\")\n", " self.feature_input = torch.tensor(sk_cali.data, dtype=float)\n", " self.labels = torch.tensor(sk_cali.target, dtype=float)\n", "\n", " def __len__(self):\n", " return self.feature_input.shape[0]\n", "\n", " def __getitem__(self, idx):\n", " return self.feature_input[idx], self.labels[idx]\n", "\n", "data = CaliDataset()\n", "train_dataset, val_dataset, test_dataset = torch.utils.data.random_split(\n", " data, [0.7, 0.2, 0.1]\n", ")\n", "train_data = DataLoader(train_dataset, batch_size=64)\n", "val_data = DataLoader(val_dataset, batch_size=64)\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# 2. Train Model\n", "\n", "hdim = 100\n", "linear_model_config = [\n", " {\n", " \"type\": \"Linear\",\n", " \"in_features\": data[:][0].shape[1],\n", " \"out_features\": hdim,\n", " \"dtype\": float,\n", " },\n", " {\"type\": \"ReLU\"},\n", " {\"type\": \"Linear\", \"in_features\": hdim, \"out_features\": hdim, \"dtype\": float},\n", " {\"type\": \"ReLU\"},\n", " {\"type\": \"Linear\", \"in_features\": hdim, \"out_features\": hdim, \"dtype\": float},\n", " {\"type\": \"ReLU\"},\n", " {\"type\": \"Linear\", \"in_features\": hdim, \"out_features\": 1, \"dtype\": float},\n", "]\n", "model = DynamicNN(linear_model_config)\n", "\n", "try:\n", " with open(\"data/model.pkl\", \"rb\") as file:\n", " state_dict = pickle.load(file)\n", " model.load_state_dict(state_dict)\n", "except:\n", " # define auto trainer\n", " loss = torch.nn.functional.mse_loss\n", " optim = torch.optim.Adam\n", " lightning_linear_model = AutoTrainer(model, loss, optim)\n", " trainer = L.Trainer(\n", " min_epochs=20,\n", " max_epochs=50,\n", " callbacks=[EarlyStopping(monitor=\"val_loss\", mode=\"min\", verbose=True)],\n", " )\n", "\n", " # test results before training\n", " trainer.test(lightning_linear_model, dataloaders=test_dataset)\n", "\n", " # train model\n", " trainer.fit(\n", " model=lightning_linear_model,\n", " train_dataloaders=train_data,\n", " val_dataloaders=val_data,\n", " )\n", " # test results after training\n", " trainer.test(lightning_linear_model, dataloaders=test_dataset)\n", "\n", " with open(\"./data/model.pkl\", \"wb\") as file:\n", " pickle.dump(lightning_linear_model.model.state_dict(), file)\n", " \n", " model = lightning_linear_model.model" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "#3. Select XAI Methods and Metrics\n", "xmethods = [\n", " InputXGradient,\n", " IntegratedGradients, \n", " DeepLift\n", " ]\n", "\n", "metrics = [\n", " wrap_metric(sensitivity_max),\n", " wrap_metric(infidelity, perturb_func=perturb_standard_normal),\n", "]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#4. Instantiate Pipeline and Run Pipeline\n", "pipeline = Pipeline(model, test_dataset, xmethods, metrics, method_seeds=[10])\n", "pipeline.run() # apply the explanation methods and evaluate them" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " mean \n", "metric infidelity sensitivity_max\n", "method \n", "DeepLift 2.423e-08 8.959e-04\n", "InputXGradient 2.557e-09 1.360e-03\n", "IntegratedGradients 2.060e-08 1.467e-03\n" ] } ], "source": [ "#5. Display Results\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 500)\n", "\n", "df = pipeline.results.print_stats([\"infidelity\", \"sensitivity_max\"], index=[\"method\"], stat_funcs=[\"mean\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2 Extension Example (+ Perturb Tutorial)\n", "\n", "In this example we will define a new dataset for experiments by creating a subclass of BaseFeatureDataset\n", "\n", "We will first create a new subclass of BaseFeaturedDataset (similar to WeightFeatureDataset but with Categorical Features). See comment for useful tips in the code. \n", "\n", "We then define an instance of our Experiment class, and later pass in the Experiment to our ExperimentPipeline class. Note in this example we only pass in one Experiment, but users can pass in multiple Experiments to be executed. \n", "\n", "In experiment class, makes testing for repetition much easier, further more given out Dataset is a subclass of BaseFeatureDataset we can also take advantage of these. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import torch\n", "\n", "from xaiunits.datagenerator import BaseFeaturesDataset\n", "from xaiunits.pipeline import Experiment, ExperimentPipeline, Pipeline\n", "from xaiunits.methods import wrap_method\n", "from xaiunits.metrics import wrap_metric, perturb_func_constructor\n", "\n", "from captum.attr import *\n", "from captum.metrics import infidelity, sensitivity_max\n", "\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Creat a new subclass of our BaseFeatureDataset, so that it is compatible with ExperimentPipeline Class\n", "\n", "class ConCatFeatureDataset(BaseFeaturesDataset):\n", " def __init__(self, n_features=6, n_samples=100, seed=0, **other):\n", " assert n_features > 3\n", " super().__init__(n_features=n_features, n_samples=n_samples, seed=0, **other )\n", "\n", " # make last 3 categorical, 1 or 0 \n", " self.samples[:,[-3, -2, -1]] = (self.samples[:,[-3, -2, -1]]>0.0).float()\n", " self.weights = torch.rand(n_features)\n", " self.weighted_samples = self.samples * self.weights \n", " self.labels = self.weighted_samples.sum(dim=1)\n", " self.features = \"samples\"\n", " self.ground_truth_attribute = \"weighted_samples\"\n", " self.subset_data = [\"samples\", \"weighted_samples\"]\n", " self.subset_attribute = [\"weights\" , \"cat_features\" ]\n", " self.cat_features = [ n_features-3, n_features-2, n_features-1 ] #Our package provides class method to generate a perturb function \n", " # this attribute is needed to determine which features are categorical. \n", " \n", " def generate_model(self):\n", " \"\"\"\n", " Generates a neural network model using the defined features and weights.\n", "\n", " Returns:\n", " ContinuousFeaturesNN: A neural network model tailored to the dataset's features and weights.\n", " \"\"\"\n", " from xaiunits.model.continuous import ContinuousFeaturesNN\n", "\n", " return ContinuousFeaturesNN(self.n_features, self.weights)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create an Experiment Class\n", "\n", "xmethods = [ \n", " Lime,\n", " DeepLift\n", " ]\n", "\n", "metrics = [\n", " wrap_metric(sensitivity_max),\n", " {\"metric_fns\": infidelity}, #Pipeline class will automatically add/override perturb function based on the dataset \n", " # This is an alternate way to specify eval_metric that only works for Experiment class\n", "]\n", "\n", "\n", "pert_neg_experiment = Experiment( ConCatFeatureDataset, \n", " None,\n", " xmethods,\n", " metrics, \n", " seeds=[3, 4], #Seeds to be used to generate dataset\n", " method_seeds=[0, 11], # Seeds for each run of the XAI method\n", " )\n", "\n", "# Create Experimenter Pipeline \n", "exp_pipe = ExperimentPipeline(pert_neg_experiment)\n", "exp_pipe.run() # apply the explanation methods and evaluate them\n", "\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " mean std \n", "metric infidelity sensitivity_max infidelity sensitivity_max\n", "method \n", "DeepLift 0.124 0.027 0.005 5.379e-05\n", "Lime 0.126 0.143 0.009 4.198e-03\n" ] } ], "source": [ "# Display Results\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 500)\n", "\n", "df = exp_pipe.results.print_stats([\"infidelity\", \"sensitivity_max\"], \n", " index=[\"method\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will pivot back to using standard Pipeline Class to show case how to use the specify perturb generator. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset = ConCatFeatureDataset()\n", "model = dataset.generate_model()\n", "xmethods = [ \n", " Lime,\n", " DeepLift\n", " ]\n", "\n", "# Most simplest way to use perturb constructor\n", "perturb_fns_1 = perturb_func_constructor(noise_scale=0.2, cat_resample_prob=0.2, cat_features= [3, 4, 5]) \n", "\n", "# This results in the same perturb function generated as above, but helps to illustrate how users\n", "#may specify the integer range of the replacement values for the categorical features. \n", "# If categorical feature i can take values from 1 to 9, then replacement = {i:[0,1,...9]}, default is [0,1]\n", "# we use uniform sampling to replace values fro categorical features. \n", "perturb_fns_2 = perturb_func_constructor(noise_scale=0.2, cat_resample_prob=0.2, cat_features= [3, 4, 5], \n", " replacements={3: [0,1] , 4: [0,1] , 5: [0,1]})\n", "\n", "# In case users want to sample the alternative from the same distribution as the data, users may also pass in\n", "# dataset as a tensor for\n", "perturb_fns_3 = perturb_func_constructor(noise_scale=0.2, cat_resample_prob=0.2, cat_features= [3, 4, 5], \n", " replacements=dataset[:][0])\n", "\n", "\n", "metrics = [\n", " # wrap_metric(sensitivity_max),\n", " wrap_metric(infidelity, perturb_func= perturb_fns_1, name= \"infidelity_1\"),\n", " wrap_metric(infidelity, perturb_func= perturb_fns_2, name= \"infidelity_2\"),\n", " wrap_metric(infidelity, perturb_func= perturb_fns_3, name= \"infidelity_3\"),\n", "]\n", "\n", "\n", "# Create Experimenter Pipeline \n", "pipeline = Pipeline(model, dataset, xmethods, metrics, method_seeds=[0])\n", "pipeline.run() # apply the explanation methods and evaluate them" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " mean \n", "metric infidelity_1 infidelity_2 infidelity_3\n", "data method \n", "ConCatFeatureDataset DeepLift 0.166 0.142 0.145\n", " Lime 0.151 0.149 0.174\n" ] } ], "source": [ "# Display Results\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 500)\n", "\n", "df = pipeline.results.print_stats([\"infidelity_1\" , \"infidelity_2\" , \"infidelity_3\"], \n", " index=[\"data\", \"method\"],\n", " stat_funcs=[\"mean\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 Custom Methods\n", "\n", "In this section we will create a custom dummy attribution method and will show case integrate this into the pipeline. \n", "\n", "As a reminder users can directly pass in Captum XAI methods, and our package will execute the methods' initialization and $attribute$ function with $model$ and $inputs$ as sole arguments respectively; default values are used for all other arguments. \n", "\n", "$model$ refers to the Neural Network XAI methods will be run on, and $inputs$ represents the input tensors of the Neural Network. For ease of discussion, we will call arguments $model$ and $inputs$, primary arguments. Other arguments used for method initialization and $attribute$ function, are referred to as secondary arguments and can be static or created at runtime. \n", "\n", "For users to pass in their custom XAI method, they first must ensure that the custom XAI Method adheres to the following:\n", "1. XAI method must be a class\n", "2. XAI method must have an initialization and $attribute$ function \n", "3. Respective primary arguments for initialization and $attribute$ function must be the first argument. \n", "4. Respective secondary arguments for initialization and $attribute$ function must have default values or specified via $input\\_fns\\_gen$ or $other\\_inputs$. See below for details" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 Simple Example\n", "\n", "Here we create a simple XAI method (DummyAttributionMethod), that returns DeepLift Attribution if flag is set to True, and random noise when flag is set to False. Because we are creating a custom XAI method we also need to create custom $input\\_fns\\_gen$. \n", "\n", "For this example we will treat $noise$ argument as non-static, and want $noise$ input to be different across the batch when we calculate attribution scores. Furthermore as there is no default values for $noise$ argument which is a secondary argument, this must be specified in either $input\\_fns\\_gen$ or $other\\_inputs$. We can treat flag as a static secondary argument. \n", "\n", "We can use $input\\_fns\\_gen$ to pass in a function to generate the non-static secondary argument, and $other\\_inputs$ to pass in static secondary arguments." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "\n", "import torch\n", "from xaiunits.datagenerator import WeightedFeaturesDataset\n", "from xaiunits.methods import wrap_method\n", "from xaiunits.metrics import wrap_metric, perturb_standard_normal\n", "from xaiunits.pipeline import Pipeline\n", "\n", "from captum.attr import DeepLift\n", "from captum.metrics import sensitivity_max, infidelity\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Defining Dummy XAI method and input_fns_gen\n", "\n", "class DummyAttributionMethod():\n", " def __init__(self, model):\n", " self.actual_attribution = DeepLift(model)\n", " self.forward_func = model # For now this is needed, \n", " def attribute(self, inputs, noise, flag=False):\n", " if flag == False:\n", " # In captum, inputs are often tuples. This is especially the case when calling infidelity or sensitivity. \n", " # Here we provide some sample code to handle tuple inputs that is compatible with infidelity\n", " if type(inputs) == tuple:\n", " output = []\n", " for x in inputs:\n", " if x.shape[0] == noise.shape[0]: # normal forward pass\n", " output.append(noise) \n", " else: # Perturbed forward pass\n", " output.append(torch.repeat_interleave(noise, x.shape[0]//noise.shape[0], dim=0))\n", "\n", " return tuple(output) \n", " else:\n", " return noise\n", " else:\n", " return self.actual_attribution.attribute(inputs)\n", "\n", "def dummy_input_gen(feature_inputs, y_labels, target, context, model):\n", " return {\n", " \"noise\": torch.rand_like(feature_inputs)\n", " }" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Common arguments\n", "\n", "dataset = WeightedFeaturesDataset()\n", "model = dataset.generate_model()\n", "metrics = [ \n", " wrap_metric(infidelity, perturb_func=perturb_standard_normal),\n", " # how to get root mean square\n", " wrap_metric(\n", " torch.nn.functional.mse_loss, \n", " out_processing=lambda x: torch.sqrt(torch.sum(x, dim=1)),\n", " )\n", "]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " " ] }, { "name": "stdout", "output_type": "stream", "text": [ " mean \\\n", "metric infidelity \n", "data model method \n", "WeightedFeaturesDataset ContinuousFeaturesNN DeepLift 1.223e-15 \n", " wrapper_DummyAttributionMethod 1.644e-15 \n", "\n", " \n", "metric mse_loss \n", "data model method \n", "WeightedFeaturesDataset ContinuousFeaturesNN DeepLift 0.0 \n", " wrapper_DummyAttributionMethod 0.0 \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r" ] } ], "source": [ "# In this example we use $other_inputs$ to override the default behavior of our DummyAttributeMethod class \n", "pipeline_true = [\n", " wrap_method(DummyAttributionMethod, dummy_input_gen, other_inputs={\"flag\": True}),\n", " DeepLift,\n", " ]\n", "\n", "pipeline_true = Pipeline(model, dataset, pipeline_true, metrics, method_seeds=[10])\n", "pipeline_true.run() # apply the explanation methods and evaluate them\n", "\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 500)\n", "df = pipeline_true.results.print_stats([ \"mse_loss\", \"infidelity\"], stat_funcs=[\"mean\"])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " " ] }, { "name": "stdout", "output_type": "stream", "text": [ " mean \\\n", "metric infidelity \n", "data model method \n", "WeightedFeaturesDataset ContinuousFeaturesNN DeepLift 1.223e-15 \n", " wrapper_DummyAttributionMethod 3.180e-02 \n", "\n", " \n", "metric mse_loss \n", "data model method \n", "WeightedFeaturesDataset ContinuousFeaturesNN DeepLift 0.000 \n", " wrapper_DummyAttributionMethod 0.895 \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\r" ] } ], "source": [ "# Here we remove $other_inputs$ override thus attribution should be random noise. \n", "# Expectation is that MSE and Infidelity metrics are worse compared to DeepLift\n", "xmethods_false = [\n", " wrap_method(DummyAttributionMethod, dummy_input_gen),\n", " DeepLift\n", " ]\n", "\n", "pipeline_false = Pipeline(model, dataset, xmethods_false, metrics, method_seeds=[10])\n", "pipeline_false.run() # apply the explanation methods and evaluate them\n", "\n", "pd.set_option('display.max_rows', 500)\n", "pd.set_option('display.max_columns', 500)\n", "\n", "df = pipeline_false.results.print_stats([\"mse_loss\", \"infidelity\"], stat_funcs=[\"mean\"])\n" ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 2 }