Malware Detection

Verify that adversarial evasion attacks cannot fool neural network malware classifiers. This tutorial demonstrates feature-aware perturbation and two-phase verification.


What You Will Learn

  • How to apply domain-specific, feature-aware perturbations (not all features are equally perturbable)

  • How to use two-phase verification: fast falsification before expensive reachability

  • How to normalize data and handle z-score scaling in verification

  • How to interpret robustness results across multiple perturbation budgets

Background

Malware authors can make small modifications to executables to evade detection – for example, appending benign bytes to a file or modifying non-functional metadata fields. NNV verifies that such modifications cannot change a classifier’s prediction from “malware” to “benign.”

The key insight is that not all features are equally perturbable: hash values cannot be changed without recomputing the binary, while metadata fields like file size can be trivially modified. NNV’s feature-aware perturbation only perturbs the features an attacker can realistically modify.

BODMAS Dataset

The BODMAS dataset contains 57,293 malware samples and 77,142 benign samples (134,435 total), collected between August 29, 2019 and September 30, 2020. It covers 581 malware families across 14 categories (top 5: Trojan 29,972, Worm 16,697, Backdoor 7,331, Downloader 1,031, Ransomware 821). See the BODMAS paper for details.

Note

Due to large data size, the tutorial provides a subset of 500 samples.

The tutorial includes four verification scenarios:

  1. Adversarial perturbations of continuous variables

  2. Adversarial perturbations of discrete variables

  3. Adversarial perturbations of continuous & discrete variables

  4. Adversarial perturbations of all variables

See also: Robinette et al., “Case Study: Neural Network Malware Detection Verification”, FormaliSE 2024.

Prerequisites

  • BODMAS malware dataset: bodmas.mat (features X, labels y)

  • Feature metadata: bodmas-feature-analysis-NEW.csv

  • Pre-trained ONNX classifier: models/malware_bodmas_binary_scaled_none-2.onnx

  • All located in examples/Tutorial/NN/malware/

Step 1: Load Data and Normalize

% Load BODMAS dataset
load('bodmas.mat');   % X: feature matrix (N x F), y: labels (1=benign, 2=malware)

% Z-score normalize features (same normalization used during training)
X_scaled = zscore(X);

% Select samples to verify
N = 5;   % Number of samples
indices = randperm(size(X_scaled, 1), N);
XData = X_scaled(indices, :);
YData = y(indices);

Step 2: Load the Feature Metadata

The feature metadata file classifies each feature by type, determining which features an attacker can realistically perturb:

% Load feature analysis (columns: feature name, type, scaled range, etc.)
info = readtable('bodmas-feature-analysis-NEW.csv');

% Feature types:
%   "Discrete with large range"  --> perturbable (file size, section counts)
%   "Hash-like"                  --> NOT perturbable (MD5, SHA256)
%   "Categorical"               --> NOT perturbable (fixed categories)
%   "Boolean"                   --> possibly perturbable

% Only features of type "Discrete with large range" are perturbed
perturbable = strcmp(info.Type, 'Discrete with large range');
fprintf('%d of %d features are perturbable.\n', sum(perturbable), length(perturbable));

Step 3: Load and Convert the Network

% Load ONNX malware classifier
model_path = 'models/malware_bodmas_binary_scaled_none-2.onnx';
dlnet = importNetworkFromONNX(model_path, InputDataFormats='BC');
net = matlab2nnv(dlnet);

% Network: input features --> 2 outputs (benign score, malware score)

Step 4: Two-Phase Verification

For each sample, NNV runs verification in two phases:

Phase 1 – Falsification (fast): Sample random points from the perturbation region and check if any cause misclassification.

Phase 2 – Reachability (thorough): If no counterexample is found via sampling, run full Star-based reachability analysis.

epsilon_values = [0.01, 0.05, 0.1];   % Perturbation budgets to test

for e = 1:length(epsilon_values)
    epsilon = epsilon_values(e);
    fprintf('\n=== Epsilon = %.2f ===\n', epsilon);

    for i = 1:N
        x = XData(i, :)';
        target = YData(i);

        % Create feature-aware input set (only perturb realistic features)
        lb = x;
        ub = x;
        for f = 1:length(x)
            if perturbable(f)
                lb(f) = x(f) - epsilon;
                ub(f) = x(f) + epsilon;
            end
            % Non-perturbable features: lb(f) == ub(f) == x(f)
        end
        input_set = Star(lb, ub);

        % Phase 1: Random falsification (fast)
        found_counterexample = false;
        nSamples = 100;
        for s = 1:nSamples
            x_rand = lb + (ub - lb) .* rand(length(x), 1);
            y_rand = net.evaluate(x_rand);
            [~, pred] = max(y_rand);
            if pred ~= target
                fprintf('  Sample %d: NOT ROBUST (counterexample at sample %d)\n', i, s);
                found_counterexample = true;
                break;
            end
        end

        % Phase 2: Full reachability (only if falsification found nothing)
        if ~found_counterexample
            reachOptions = struct;
            reachOptions.reachMethod = 'approx-star';
            result = net.verify_robustness(input_set, reachOptions, target);

            switch result
                case 1, fprintf('  Sample %d: ROBUST (formally verified)\n', i);
                case 0, fprintf('  Sample %d: NOT ROBUST (reachability found violation)\n', i);
                case 2, fprintf('  Sample %d: UNKNOWN (inconclusive)\n', i);
            end
        end
    end
end

Understanding the Results

As epsilon increases, fewer samples can be verified as robust:

  • epsilon = 0.01: Most samples verified robust (small perturbation region)

  • epsilon = 0.05: Some samples become unknown or not robust

  • epsilon = 0.1: Many samples fail (large perturbation allows class change)

This degradation curve tells you the classifier’s maximum tolerable perturbation for each input – the largest epsilon at which robustness is still formally guaranteed.

Why Feature-Aware Perturbation Matters

A naive approach perturbs ALL features equally. But a malware author cannot change a file’s SHA-256 hash without changing the entire binary. Feature-aware perturbation only modifies features within the attacker’s realistic capability, making the verification result practically meaningful.

Adapting to Your Own Dataset

  1. Train a binary classifier on your feature dataset

  2. Export to ONNX format

  3. Create a feature metadata file classifying each feature as perturbable or not

  4. Apply the same two-phase workflow with your domain-specific epsilon values

Source Files