Fairness Verification (FairNNV)¶
Verify that neural network classifiers make fair decisions regardless of sensitive attributes like gender, race, or age. This tutorial covers counterfactual and individual fairness verification on financial benchmark datasets.
What You Will Learn¶
How to verify counterfactual fairness (predictions invariant under sensitive attribute flip)
How to verify individual fairness (similar individuals get similar predictions)
How to compute the Verified Fairness (VF) score
How adversarial debiasing affects formal fairness guarantees
See Fairness Verification (FairNNV) for the theoretical foundations.
Prerequisites¶
Trained ONNX classifier (e.g., Adult Census income predictor)
NNV installed with FairNNV module
Step 1: Load Model and Data¶
% Load trained Adult Census classifier from ONNX
dlnet = importNetworkFromONNX('adult_census_model.onnx', InputDataFormats='BC');
net = matlab2nnv(dlnet);
% Load test data (features + labels)
load('adult_census_test.mat'); % X_test, y_test
% Identify sensitive attribute index (e.g., gender = column 9)
sensitive_idx = 9;
Step 2: Counterfactual Fairness Verification¶
Flip the sensitive attribute while keeping all other features fixed:
n_samples = 100;
cf_results = zeros(n_samples, 1);
reachOptions = struct;
reachOptions.reachMethod = 'approx-star';
for i = 1:n_samples
x = X_test(i, :)';
target = y_test(i);
% Create counterfactual: flip the sensitive attribute
x_cf = x;
x_cf(sensitive_idx) = 1 - x(sensitive_idx); % binary flip
% Create Star set spanning both original and counterfactual
lb = min(x, x_cf);
ub = max(x, x_cf);
input_set = Star(lb, ub);
% Verify: does the classification change?
result = net.verify_robustness(input_set, reachOptions, target);
cf_results(i) = result; % 1=fair, 0=unfair, 2=unknown
end
cf_vf = sum(cf_results == 1) / n_samples * 100;
fprintf('Counterfactual VF Score: %.1f%%\n', cf_vf);
Step 3: Individual Fairness Verification¶
Flip the sensitive attribute AND perturb non-sensitive features by epsilon:
epsilon_values = [0.02, 0.03, 0.05, 0.07, 0.10];
for e = 1:length(epsilon_values)
epsilon = epsilon_values(e);
if_results = zeros(n_samples, 1);
for i = 1:n_samples
x = X_test(i, :)';
target = y_test(i);
% Perturb: flip sensitive + bounded perturbation on non-sensitive
lb = x - epsilon;
ub = x + epsilon;
% Flip sensitive attribute
lb(sensitive_idx) = min(x(sensitive_idx), 1 - x(sensitive_idx));
ub(sensitive_idx) = max(x(sensitive_idx), 1 - x(sensitive_idx));
input_set = Star(lb, ub);
result = net.verify_robustness(input_set, reachOptions, target);
if_results(i) = result;
end
if_vf = sum(if_results == 1) / n_samples * 100;
fprintf('IF VF Score (eps=%.2f): %.1f%%\n', epsilon, if_vf);
end
Experimental Results¶
FairNNV was evaluated on three fairness benchmark datasets:
Dataset |
Model |
Sensitive Attr. |
CF VF (%) |
IF VF at eps=0.02 (%) |
|---|---|---|---|---|
Adult Census |
AC-1 (16-8) |
Gender |
89 |
~85 |
Adult Census |
AC-3 (50) |
Gender |
87 |
~65 |
German Credit |
GC-2 |
Gender |
77 |
~80 |
Bank Marketing |
BM-1 |
Age |
89 |
~90 |
Key observations:
Counterfactual fairness: High VF scores (74–89%) with verification times under 0.03s per sample
Individual fairness: VF degrades as epsilon increases; larger models show steeper decline and higher verification cost
Debiasing paradox: Empirically debiased models (via AIF360 adversarial debiasing) show lower VF scores than originals, suggesting formal verification captures unfairness that statistical metrics miss
Adapting to Your Own Classifier¶
Train your classifier on tabular data with identified sensitive attributes
Export to ONNX format
Identify the sensitive attribute column index
Choose epsilon values appropriate for your domain
Run counterfactual and individual fairness verification