Testing & CI/CD¶

How to run NNV’s test suite, manage baselines, check coverage, and work with the CI/CD pipeline.

Quick Start¶

cd("code/nnv");
install;
addpath(fullfile(pwd, 'tests', 'test_utils'));

% Run ALL tests with regression comparison
[test_results, regression_results] = run_tests_with_regression( ...
    'tests', 'compare', true, 'verbose', true);

This runs 833 tests (full suite) or 470 tests (quick mode) covering:

Soundness tests (layers, sets, solvers)
Regression tests (end-to-end verification workflows)
Figure-saving tests (99 figures)
Baseline comparisons (46 baselines)

Test Categories¶

Directory	Purpose	Count
`tests/soundness/`	Mathematical correctness verification	~300
`tests/regression/`	End-to-end verification workflows	~100
`tests/set/`	Star/Zono/ImageStar operations	~100
`tests/nn/`	Layer operations	~80
`tests/nncs/`	Control system verification	~50
`tests/tutorial/`	Tutorial examples	~20
`tests/utils/`	Utility functions	~30

Running Specific Categories¶

% Just soundness tests
results = runtests('tests/soundness', 'IncludeSubfolders', true);

% Just Star set tests
results = runtests('tests/set/star', 'IncludeSubfolders', true);

% Just NN layer tests
results = runtests('tests/nn', 'IncludeSubfolders', true);

% Just NNCS tests
results = runtests('tests/nncs', 'IncludeSubfolders', true);

Baseline Management¶

% Check baseline status
manage_baselines('status');

% List all baselines
manage_baselines('list');

% Compare current test data against baselines
manage_baselines('compare', 'verbose', true);

% Save new baselines (after intentional changes)
manage_baselines('save');

% Clean test data (keeps baselines)
manage_baselines('clean');

Configuration¶

Edit code/nnv/tests/test_utils/get_test_config.m or use environment variables:

setenv('NNV_TEST_COMPARE_BASELINES', '1');
setenv('NNV_TEST_SAVE_FIGURES', '1');
setenv('NNV_TEST_FAIL_ON_REGRESSION', '1');

Option	Default	Description
`save_figures`	true	Save figures on every run
`close_figures`	true	Close figures after saving
`save_regression_data`	true	Save .mat files for regression comparison
`compare_baselines`	false	Compare against saved baselines
`fail_on_regression`	true	Fail test if regression detected
`tolerance`	1e-6	Numerical comparison tolerance

Output Locations¶

Output	Location
Figures	`results/tests/figures/{nn,nncs,set}/`
Test data	`results/tests/data/{nn,nncs,set}/`
Baselines	`results/tests/baselines/{nn,nncs,set}/`

Figure Breakdown¶

Category	Count
nn/	18
nncs/	24
set/star/	24
set/zono/	12
tutorial/	21
Total	99

Test Categories Explained¶

Soundness tests (tests/soundness/): Verify that NNV operations are mathematically correct – sample points from input set, pass through operation, verify output points are contained in computed output set.

Regression tests (tests/regression/): End-to-end verification workflows – neural network verification, NNCS reachability, safety/robustness checking. Compare outputs against known baselines.

Set tests (tests/set/): Test Star, Zonotope, ImageStar operations – construction, affine maps, Minkowski sums, containment, sampling, plotting, convex hulls, intersections.

NN tests (tests/nn/): Test neural network layer operations – feedforward, CNN layers (Conv2D, Pooling), activation functions (ReLU, Sigmoid).

NNCS tests (tests/nncs/): Test neural network control systems – LinearODE, DLinearODE, LinearNNCS, DLinearNNCS, reachability analysis.

Troubleshooting¶

LP Solver Errors (GLPK exitflag 111): Usually caused by random constraint generation. Use well-defined constraints instead of ExamplePoly.randHrep().

Script vs Function Issues: Test files must be MATLAB functions (not scripts) for the test framework. Add function test_name() wrapper and end statement.

Missing Baselines: Run manage_baselines('save') after a clean test pass.

Figure Not Saved: Ensure save_test_figure() is called before the test ends. Check that the test is a function (not a script).

Environment Variables¶

setenv('NNV_TEST_COMPARE_BASELINES', '1');   % Enable baseline comparison
setenv('NNV_TEST_SAVE_FIGURES', '1');         % Enable figure saving
setenv('NNV_TEST_FAIL_ON_REGRESSION', '1');   % Fail on regression

Code Coverage¶

NNV 3.0 tracks code coverage via the track_coverage.m utility.

Current Coverage¶

Directory	Coverage
Overall	48.6%
engine/nn/	86.2%
engine/set/	88.9%
engine/utils/	23.1%
engine/nncs/	18.2%
engine/hyst/	0.0%

Running Coverage Analysis¶

addpath(fullfile(pwd, 'tests', 'test_utils'));
coverage = track_coverage('tests', 'quick_mode', true);

The highest coverage is in the core verification components (nn/ and set/), which are the most critical for correctness.

See TEST_COVERAGE.md for the detailed per-file coverage report.

NNV uses GitHub Actions for automated testing on every push and pull request.

Workflows¶

Located in .github/workflows/:

ci.yml – Basic CI¶

Triggers: Push/PR to master
Runs: runtests('tests', 'IncludeSubfolders', true)
Purpose: Simple pass/fail test execution

regression-tests.yml – Regression Detection¶

Triggers: PR to master, manual dispatch
Runs: run_tests_with_regression('tests', 'compare', true, 'verbose', true)
Purpose: Compare test outputs against saved baselines, fail on mismatches
Manual option: Check “Update baselines after tests” to save new baselines

Complete PR Merge Checklist¶

%% Step 1: Setup
cd("code/nnv");
install;
addpath(fullfile(pwd, 'tests', 'test_utils'));

%% Step 2: Run full test suite with baseline comparison
[test_results, regression_results] = run_tests_with_regression( ...
    'tests', 'compare', true, 'verbose', true);

%% Step 3: Verify results
fprintf('\n=== FINAL RESULTS ===\n');
fprintf('Tests passed: %d\n', sum([test_results.Passed]));
fprintf('Tests failed: %d\n', sum([test_results.Failed]));
fprintf('Baselines matched: %d\n', length(regression_results.matches));
fprintf('Regressions: %d\n', length(regression_results.regressions));

%% Step 4: If baselines changed intentionally:
% manage_baselines('save');