 Research
 Open access
 Published:
Feature partitioning for robust tree ensembles and their certification in adversarial scenarios
EURASIP Journal on Information Security volume 2021, Article number: 12 (2021)
Abstract
Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work, we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at inference time. The attacker aims at finding a perturbation of an instance that changes the model outcome.We propose a modelagnostic strategy that builds a robust ensemble by training its basic models on featurebased partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We apply the proposed strategy to decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently provides a lower bound of the accuracy of a forest in the presence of attacks on a given dataset avoiding the costly computation of evasion attacks.Experimental evaluation on publicly available datasets shows that the proposed feature partitioning strategy provides a significant accuracy improvement with respect to competitor algorithms and that the proposed certification method allows ones to accurately estimate the effectiveness of a classifier where the bruteforce approach would be unfeasible.
1 Introduction
Machine learning (ML) algorithms are currently used to train models that are then deployed to ensure system security and to control critical processes [1, 2]. Unfortunately, traditional ML algorithms proved vulnerable to a wide range of attacks, and in particular to evasion attacks, where an attacker carefully crafts perturbations of an input to force prediction errors [3–6].
While there is a large body of research on evasion attacks in linear classifiers [7, 8] and, more recently, on deep neural networks [9, 10], there are just a few works dealing with treebased models. Decision trees are interpretable models [11], yielding predictions which are humanunderstandable in terms of syntactic checks over domain features, which is particularly appealing in the security setting. Moreover, decision trees ensembles are nowadays one of the best methods for dealing with nonperceptual problems and are one of the most commonly used techniques in Kaggle competitions [12].
In this paper, we present a metalearning algorithm, called Feature Partitioned Forest (FPF), that builds an ensemble of decision trees aimed to be robust against evasion attacks. By focusing on binary classifiers, we show that the proposed algorithm is able to build models that are robust by construction. In fact, the algorithm benefits of the theoretical property that the majority of the learners in the ensemble is not fooled by an injected perturbation over a given instance, i.e., they still provide the same prediction as for the original instance. This means that the proposed algorithm limits by design the drop in accuracy an attacker may generate. Our method is based on a sampling of the features at training time. Specifically, we randomly equipartition the set of features and train each tree of the ensemble on a distinct feature partition. Such sampling limits the number of features considered by a single tree, and therefore, it limits the number of trees affected by corrupting a given feature. Even if we focus our analysis on ensembles of decision trees, the proposed learning strategy generalizes to any ensemble method.
We also propose a certification method for tree ensembles that efficiently computes a lower bound of accuracy under attack on a given dataset. Finding a successful attack requires to find a set of features of a given instance that, if corrupted, affects the prediction of the model. In general, this requires an exhaustive search of all the possible feature subsets and their possible modifications, which becomes quickly unfeasible for powerful attacks or for large feature sets. We show that this problem can be reduced to a partial set cover problem, and we use efficient set cover algorithms to assess the nonexistence of harmful attacks. By certifying the nonexistence of attacks for some instances of a given dataset, we can provide an accuracy lower bound on the full dataset. Indeed, we show that the proposed lower bounds are efficient to be computed and quite accurate. Furthermore, we devise a cascading strategy where only those instances that are not certified as nonattackable by the proposed lower bounds are eventually evaluated with a slow exact method. The cascading strategy can reduce the total running time by one or two orders of magnitude.
In summary, the contributions of this work are as follows:

We propose a novel meta tree ensemble learning algorithm named Feature Partitioned Forest that is robust against adversarial attacks by construction

We experimentally validate FPF on three public datasets and we show that FPF can provide an improvement up to 16 points in terms of accuracy in presence of attacks compared to competitor algorithms

We provide a novel certification method that allows ones to quickly compute a lower bound of the accuracy of a forest on a given dataset

We devise an exact cascading strategy for exactly computing the accuracy in presence of an attacker by exploiting the proposed lower bounds as a fast preprocessing filter
2 Background and related work
Most of the work in adversarial learning regards classifiers, in particular binary ones. The attacker starts from a positive instance that is classified correctly by the deployed ML model and is interested in introducing minimal perturbations on the instance to modify the prediction from positive to negative, thus “evading” the classifier [3, 10, 13–18]. To prevent these attacks, different techniques have been proposed for different models, including support vector machines [8, 19], deep neural networks [10, 20, 21], and decision tree ensembles [16, 22–24].
2.1 Background and notation
Let \({\mathcal {X}} \subseteq \mathbb {R}^{d}\) be a ddimensional vector space of realvalued features. An instance\(\boldsymbol {x} \in {\mathcal {X}}\) is a ddimensional feature vector (x_{1},x_{2},…,x_{d}), where we denote with \({\mathcal {F}}\) the set of features. Each instance \(\boldsymbol {x} \in {\mathcal {X}}\) is assigned a label \(y \in {\mathcal {Y}}\) by some unknown target function \(g: {\mathcal {X}} \mapsto {\mathcal {Y}}\). The goal of a supervised learning algorithm that induces a forest of decision trees is to find the forest \({\mathcal {T}}\) that best approximates the target g. At training time, learning algorithms exploit a training set \({\mathcal {D}}_{{train}} = \left \{(\boldsymbol {x}_{1}, y_{1}), \ldots, (\boldsymbol {x}_{n}, y_{n})\right \}\), then model accuracy is evaluated on a test set \({\mathcal {D}}_{{test}}\) where instance labels are unknown to the trained model.
In this paper, we discuss binary classification, where \({\mathcal {Y}} = \{1,+1\}\), and focus on binary decision trees. Each tree \(t \in {\mathcal {T}}\) can be inductively defined as follows: t is either a leaf \(\lambda ({\hat {y}})\) for some label \(\hat {y} \in {\mathcal {Y}}\), or an internal node σ(f,v,t_{l},t_{r}) (a.k.a, split), where f∈[1,d] identifies a feature, \(v \in \mathbb {R}\) is the threshold for the feature f, and t_{l},t_{r} are left/right decision trees. At test time, an instance x traverses each tree \(t \in {\mathcal {T}}\) until it reaches a leaf \(\lambda ({\hat {y}})\), which returns the prediction\(\hat {y}\), denoted by \(t(\boldsymbol {x}) = \hat {y}\). Specifically, for each internal node σ(f,v,t_{l},t_{r}),x falls into the left tree t_{l} if x_{f}≤v, and into the right tree t_{r} otherwise. Given a forest \({\mathcal {T}}\), the global prediction is defined as \({\mathcal {T}}(\boldsymbol {x})=+1\) if \(\sum _{t \in {\mathcal {T}}} t(\boldsymbol {x})>0\), and \({\mathcal {T}}(\boldsymbol {x})=1\) otherwise.
Finally, given a test set \({\mathcal {D}}_{{test}}\), we denote by \(\overline {{\mathcal {D}}}_{{test}}, \overline {{\mathcal {D}}}_{{test}} \subseteq {\mathcal {D}}_{{test}}\), the set of test instances that are erroneously classified by \({\mathcal {T}}\), i.e., \(\overline {{\mathcal {D}}}_{{test}}= \{(\boldsymbol {x}, y) \in {\mathcal {D}}_{{test}} \;  \; y \cdot {\mathcal {T}}(\boldsymbol {x}) < 0\}\). We define accuracy as:
2.2 Threat model
We focus on the evasion attack scenario, where an attacker aims at fooling an already trained classifier by maliciously modifying a given instance before submitting it to the classification model. As in Kantchelian et al. [25], we assume an attacker A_{b} that is capable of modifying a given instance x into a perturbed instance x^{′} such that the L_{0}norm of the perturbation is smaller than the attacker’s budget b, i.e., ∥x−x^{′}∥_{0}≤b. Therefore, attacker A_{b} can perturb the instance x by modifying at most b features, without any constraint on how much a given feature can be altered. Additional threat models have been investigated. Threat models based on L_{p}norm constraints are common [9], and recently, models based on rewriting rules have been proposed [23]. We restrict attention to L_{0}norm attacks because of their simplicity and effectiveness. Indeed, a very small b is sufficient to achieve successful attacks. Su et al. [26] show that with a onepixel attack, i.e., with b=1, it is possible to fool a complex deep neural network as VGG16 [27] and decrease its accuracy to a poor 16%.
Given an instance \(\boldsymbol {x}\in {\mathcal {X}}\), we denote by A_{b}(x) the set of all the perturbed instances the attacker may generate:
Finally, we can formalize the accuracy under attack that an attacker A_{b} aims to minimize. Given the test set \({\mathcal {D}}_{{test}}\), let \(\widehat {{\mathcal {D}}}_{{test}}\) be the set of the test instances that can be successfully attacked by A_{b}, i.e., \(\widehat {{\mathcal {D}}}_{{test}} = \{(\boldsymbol {x}, y) \in {\mathcal {D}}_{{test}} \setminus \overline {{\mathcal {D}}}_{{test}} \;  \; \exists \boldsymbol {x}^{\prime } \in A_{b}(\boldsymbol {x}), \ y \cdot {\mathcal {T}}(\boldsymbol {x}^{\prime }) < 0\}\). We thus define the accuracy under attack as:
While \(\overline {{\mathcal {D}}}_{{test}}\) includes the instances that are misclassified due to classifier errors, the set \(\widehat {{\mathcal {D}}}_{{test}}\) include those instances that happen to be misclassified because of the attacker perturbations. The goal of a robust training strategy is to generate models that minimize \(\widehat {{\mathcal {D}}}_{{test}}\) without hindering the accuracy on unattacked instances, i.e., without increasing \(\overline {{\mathcal {D}}}_{{test}}\). In the following, we use \({\mathcal {D}}\), thus omitting the subscripts from \({\mathcal {D}}_{{train}}\) and \({\mathcal {D}}_{{test}}\), when it is clear from the context the dataset we are referring to.
Note that the attack and the evaluation processes need to solve the same problem: to find a successful attack in A_{b}(x). This is a difficult and computationally expensive task due to the possibly large size of A_{b}(x) and to the number of interactions among trees in a forest. Kantchelian et al. [25] prove that the problem of finding a successful attack is NPcomplete, regardless of the L_{p}norm adopted. Authors provide an exact but expensive solution based on Mixed Integer Linear Programming, and an approximate solution which may fail in discovering successful attacks. Chen et al. [28] show that verifying the robustness of a forest \({\mathcal {T}}\) with at most l leaves per tree has cost \(\min \{O(l^{{\mathcal {T}}}), O((2{\mathcal {T}}l)^{{\mathcal {F}}})\}\) assuming d=L_{∞}. Also, in this case, authors propose an approximate strategy to deal with such exponential cost. Unfortunately, verifying the accuracy under attack is feasible only for weaker forests of decision stumps, i.e., trees with two leaves only [24], although recent work shows that the verification problem for L_{∞}attackers can be solved both effectively and efficiently using abstract interpretation [29].
The aforementioned approaches fall into the whitebox scenario, where the attacker has complete knowledge of the attacked model. Conversely, in the blackbox scenario, we assume the attacker can only issue classification queries to the model. But even in this case, only approximate strategies can be devised to discover possible attacks [30].
We are particularly interested in the evaluation process with the goal of determining the size of \(\widehat {{\mathcal {D}}}\). To this end, in Section 4, we introduce some efficient lower bounds of \(\phantom {\dot {i}\!}ACC_{A_{b}}\), by upper bounding the size of \(\widehat {{\mathcal {D}}}\), and a cascading certification algorithm that resorts to a bruteforce evaluation only when the lower bounds cannot guarantee the absence of successful attacks.
2.3 Robust training
In order to build robust models, two main research directions have been investigated: enriching the training dataset and modifying the optimized loss function.
Adversarial boosting by Kantchelian et al. [16] is the first adversarial learning technique for decision tree ensembles. It is an empirical data augmentation technique, borrowing from the adversarial training approach [9]. At each boosting round, a greedy algorithm is exploited to craft an adversarial counterpart of every instance in the training set. The next tree in the ensemble is trained on both original and perturbed instances. The need to craft new adversarial instances at each round makes this algorithm infeasible for large datasets. In addition, as for any adversarial training approach, adversarial instances produced at training time may not be representative of the adversarial instances occurring at test time, possibly leading to a poor accuracy.
Another adversarial learning technique for decision tree ensembles was proposed by Chen et al., who introduced the first tree learning algorithm embedding the attacker directly in the optimization problem solved upon tree construction [22]. The key idea of their approach, called robust trees, is to redefine the splitting strategy of the training examples at a tree node, so as to include the attacker impact. Also, in this case, the attacker behavior is only approximated. A similar solution is proposed in TREANT [23]. While building a single tree, the best split is chosen by minimizing the loss under attack without introducing any heuristic approximation. TREANT ensures that every node added to the tree does not increase the loss under attack of such tree and, eventually, multiple trees are grouped in a forest. Another step forward was made by Andriushchenko et al. [24], where they proposed an exact solution to optimize loss under attack for an ensemble of decision stumps, i.e., trees having only a root and two leaves. While the proposed approach is the first able to take into consideration the full ensemble, the use of decision stumps may limit the overall accuracy of the forest. Calzavara et al. [31] showed how it is possible to generalize the adversarial training to gradientboosted decision trees by exploiting the knowledge of tree thresholds, thus reducing the set of possible perturbations without loss of generality. Thanks to this, they take advantage of differentiable approximations and make their optimization problem tractable for their experimental setting. More recently, Vos et al. [32] presented GROOT, an efficient algorithm for training robust decision trees. GROOT analytically calculates the adversarial impurity significantly reducing the training time, resulting in a training algorithm faster than the state of the art and with better performance in terms of adversarial accuracy on structured data. Finally, Ranzato et al. [33] proposed a genetic adversarial training algorithm called MetaSilvae to train decision trees in order to maximize both accuracy and robustness to adversarial perturbation. The algorithm relies on a complete formal verification based on abstract interpretation.
The algorithm that is most similar to our proposal is the Random Subspace method (RSM) [34], which was successfully exploited by Biggio et al. [35] to build robust ensembles. The approach relies on the training of each single learner on a projection of the original dataset holding a random subset of its features. The rationale is given by the reduced variance of such randomized model. The algorithm FPF proposed in this work improves over RSM by exploiting projections of the dataset that provide robustness by construction for the majority of the trees in the ensemble, rather than random projections. In short, our approach divides the feature space in such a way as to ensure by construction that the majority of the ensemble is never involved in an attack. For example, if the attacker can modify at most any 3 features, and a model has been trained with FPF to resist this type of attacks, each combination of 3 features appears in less than half of the ensemble, thus leaving the majority of weak learners unaltered in their predictions. This is not guaranteed with RSM where each weak learner is trained on a random subsample of the feature space, with possible repetition among weak learners and so in the worst case, an attack on a feature can involve the entire ensemble. We highlight that FPF, similarly to RSM, is complementary to any other algorithm that can be used to train a weak learner of the final ensemble. Not only FPF can potentially be used to ensemble together decision trees, SVMs, or other models, but FPF can be exploited to create ensembles of robust learners trained with the approaches mentioned above such as adversarial boosting or robust trees. In this work, we focus on ensembles of decision trees, and we leave the investigation of other types of weak learners to future work.
3 Robust forest training
We aim to design a machine learning algorithm able to train ensembles, namely forests of binary decision trees, which are resilient to evasion attacks, which occur when an adversary manipulates test data to force prediction errors. Specifically, the robust forest is resilient by design in strong adversarial environments, where attackers can perturb at most b features of a test instance to deceive the learnt model and force a prediction error. In the following, we propose a novel training strategy that produces a forest \({\mathcal {T}}\) where the majority of its trees are not affected by the attacker A_{b}.
3.1 Feature partitioning
Given a partition \(\mathcal {P}\) of the feature set \({\mathcal {F}}\) and an attacker A_{b}, we call \(\mathcal {P}\)robust if the majority of its sets cannot be impacted by the attacker A_{b}, i.e., if the following property holds^{Footnote 1}:
In words, the majority of the sets in \(\mathcal {P}\) do not contain any feature in B, i.e., the features perturbed by A_{b}, for whatever choice of B.
When B≤b, it is straightforward to show that this property is surely satisfied if \(\lvert \mathcal {P}\rvert \geq 2b+1\). Consider the worst case: at most b distinct subsets of \(\mathcal {P}\) can have an overlap with B, leaving the remaining b+1 subset of \(\mathcal {P}\) unaffected. Hereinafter, we consider only robust feature partitions \(\mathcal {P}\) where \(\mathcal {P}=2b+1\).
3.2 Robust forest
Let us consider a forest \({\mathcal {T}}\) that, given an attacker A_{b}, is built by exploiting a robust feature partition \(\mathcal {P}\) as follows.
Let \({\mathcal {D}}\) be a training set and let \(\mathcal {P}\) be a robust partition of its feature space. Given \(P \in \mathcal {P}\), we call \(\pi _{P}({\mathcal {D}})\) the projection of \({\mathcal {D}}\) on the feature set P, i.e., the dataset obtained from \({\mathcal {D}}\) by discarding those features not included in P. Given a robust feature partitioning \({\mathcal {P}}\), it is thus possible to build a forest by training 2b+1 trees independently on the 2b+1 projections \(\pi _{P}({\mathcal {D}})\) for all \(P\in {\mathcal {P}}\). We denote by \({\mathcal {T}}_{\mathcal {P}}\) such forest, having a number of trees equal to \({\mathcal {T}}_{\mathcal {P}} = {\mathcal {P}} = 2b+1\).
The algorithm sketched above is able to build an ensemble of trees that we formally define as Robust Forest.
Definition 1
(Robust Forest) Given an attacker A_{b}, we say that a forest \({\mathcal {T}}\)is robust if the majority of its trees is not affected by A_{b} for any of its attacks:
We highlight that the above definition is very general as it is dataset independent. We define robustness for any given instance x in the feature space, and not for a specific train or test set. It is straightforward to show that in a forest \({{\mathcal {T}}_{\mathcal {P}}}\) at most b of its 2b+1 trees can be affected by the attacker, and thus, the robustness property is guaranteed by design.
In the bestcase scenario where each \(t \in {\mathcal {T}}_{\mathcal {P}}\) is perfectly accurate, the above robustness property ensures that, in the presence of attacks, only a minority of trees provides an incorrect prediction, and therefore, the forest is perfectly accurate even under attack. Clearly, this scenario is unlikely, and therefore, we discuss below how to strengthen the accuracy of \({\mathcal {T}}_{\mathcal {P}}\).
Note that the above definition and training strategy trivially generalize to any ensemble learning algorithm besides treebased classifiers.
3.3 Increasing the accuracy of a robust forest
The above definition guarantees that the majority of trees in a robust forest is not affected by the attacker but it does not provide an estimate of the accuracy under attack (see Eq. 1) of a robust forest \({\mathcal {T}}_{\mathcal {P}}\), which clearly depends on the accuracy of its single trees. Yet, the more accurate the trees \(t \in {\mathcal {T}}_{\mathcal {P}}\), the more likely the forest \({\mathcal {T}}_{\mathcal {P}}\) is accurate under attack.
The accuracy of single trees in a forest \({{\mathcal {T}}_{\mathcal {P}}}\) depends on the feature partition \(\mathcal {P}\). The larger \(\mathcal {P}\), the smaller is the number of features each tree can be trained on. To increase the accuracy of a robust forest \({\mathcal {T}}_{\mathcal {P}}\), we equipartition \({\mathcal {F}}\) across \(\mathcal {P}\) so as to have \(P \geq \left \lfloor {\lvert {\mathcal {F}}\rvert }/{(2b+1)}\right \rfloor \) for all \(P \in \mathcal {P}\). Clearly, as the attacker’s budget b increases, we need to partition \({\mathcal {F}}\) into a larger number of subsets, and for these to be effective, we need the dataset to have a larger number of high quality features. Note that this is true for every learning algorithm: if the attacker can perturb at will up to b features, it is necessary to have more than b highquality features to train an accurate model.
In addition, a specific partition \(\mathcal {P}\) may only be suboptimal as there may be multiple ways of partitioning \({\mathcal {F}}\) so as to achieve feature subsets with high predictive power. Along the lines of ensemble training, we use multiple feature partitions to train a distinct \({\mathcal {T}}_{\mathcal {P}}\) for each partition \(\mathcal {P}\), and join together the resulting robust forests \({\mathcal {T}}_{\mathcal {P}}\) in a single decision tree ensemble.
We can finally discuss our algorithm to train a Feature Partitioned Forest\({\mathcal {T}}\) aimed at being robust against an attacker A_{b}. The algorithm FPF, shown in Algorithm 1, iterates a number r of userdefined rounds. During each round i, the algorithm generates a random partition \(\mathcal {P}_{i}\) of features \({\mathcal {F}}\) present in the given training dataset \({\mathcal {D}}\). At each round, \({\mathcal {F}}\) is randomly and evenly split into k=2b+1 disjoint subsets (\(\mathcal {P}_{i} = 2b+1\)), and a new decision tree is trained on each of the dataset projections \(\pi _{P_{j}}({\mathcal {D}})\) for every feature subset \(P_{j} \in {\mathcal {P}}_{i}\). The resulting 2b+1 trees form a robust forest \(\phantom {\dot {i}\!}{\mathcal {T}}_{\mathcal {P}_{i}}\).
We use an accept condition to filter out those \(\phantom {\dot {i}\!}{\mathcal {T}}_{\mathcal {P}_{i}}\) that would not strengthen the final ensemble. For instance, it might be the case that some partitions in \(\mathcal {P}_{i}\) do not contain sufficiently predictive features to train accurate trees. In this work, we use a simple acceptance criterion according to which a \(\phantom {\dot {i}\!}{\mathcal {T}}_{\mathcal {P}_{i}}\) is accepted if and only if its training accuracy is larger than naively predicting the dataset’s majority class. In this case, \(\phantom {\dot {i}\!}{\mathcal {T}}_{\mathcal {P}_{i}}\) is added to \({\mathcal {T}}\). Eventually, an ensemble \({\mathcal {T}}\) of r robust forests is returned, including a total of r(2b+1) trees.
We propose two methods for exploiting the so built Feature Partitioned Forest\({\mathcal {T}}\) at prediction time. The hierarchical method hFPF (see Algorithm 2) exploits the hierarchical nature of \({\mathcal {T}}\) in the way that a given instance x is independently classified by each subforest \(\phantom {\dot {i}\!}{\mathcal {T}}_{\mathcal {P}_{i}} \in {\mathcal {T}}\), and then the resulting r predictions are merged through a majority voting scheme. The flat method fFPF (see Algorithm 3) considers \({{\mathcal {T}}}\) as an ensemble of r(2b+1) independent trees, and the predictions of all single trees are eventually aggregated via majority voting.
Proposition 1
(Robustness of FPF) The forest \({\mathcal {T}}\) built by the algorithm FPF is robust against an attacker A_{b}, as the majority of its trees is not affected by A_{b} for any of its attacks.
The above proposition is trivially true when considering the hFPF method, as each subforest \(\phantom {\dot {i}\!}{\mathcal {T}}_{\mathcal {P}_{i}}\) is robust by construction. When considering the fFPF method, we recall that during each round FPF trains a set of 2b+1 trees, where at most b trees can be affected by A_{b}. After the same reasoning applied to r rounds, FPF builds a forest \({\mathcal {T}}\) of r(2b+1) trees of which at most rb can be affected by A_{b}, leaving a majority of r(b+1) trees unaffected.
4 Evaluation and certification of treebased models
Evaluating the accuracy of a model in the presence of an attacker is a difficult and computationally expensive task. This is due to the possibly large size of A_{b}(x) and to the number of interactions among trees in a forest. Below, we first discuss an expensive bruteforce strategy for certifying the accuracy under L_{0}norm attacks of a tree ensemble FPF on a given test dataset. Then, we show that the existence of an attack over a given dataset instance can be reduced to the existence of a solution for the partial set coverage problem [36]. We use this result to devise a strategy aimed at reducing the cost of the bruteforce strategy and providing an efficient lower bound certification of the accuracy under attack.
4.1 Bruteforce evaluation
Given an instance \(\boldsymbol {x} \in {\mathcal {X}}\), the bruteforce evaluation of a forest \({\mathcal {T}}\) consists in generating all the possible perturbations an attacker A_{b} can operate to find whether there exists x^{′}∈A_{b}(x) such that \(\mathcal {T}(\boldsymbol {x}) \neq \mathcal {T}(\boldsymbol {x^{\prime }})\). Specifically, for a given test dataset \({\mathcal {D}}\) and a given tree ensemble \({\mathcal {T}}\), the bruteforce evaluation can exactly compute the accuracy under attack according to Eq. 1.
The size of A_{b}(x) is infinite, but we can limit its enumeration to the set of attacks that are relevant for the given forest \({\mathcal {T}}\), i.e., those attacks that can invert the outcome of a test in some internal nodes of trees in \({\mathcal {T}}\). This set of relevant attacks, denoted with \(\hat {A}_{b}(\boldsymbol {x}{\mathcal {T}})\), can be computed as follows.
Recall that nodes in a tree are in the form x_{f}≤v for some threshold v. Indeed, the thresholds used in the tree nodes induce a discretization of the input space \({\mathcal {X}}\) that we exploit as follows. For any given feature \(f\in {\mathcal {F}}\), we define with \({\mathcal {V}}_{f}\) the set of relevant thresholds as follows:
The set \({\mathcal {V}}_{f}\) includes all the thresholds that are associated with f in any node σ(f,v,t_{l},t_{r}) of any tree in \({\mathcal {T}}\), plus the infinity value that allows the algorithm to also include the attack that traverses the right branch of the node with the largest threshold.
An attacker A_{b} can perturb any subset of features \(F \subseteq {\mathcal {F}}\) such that F≤b, and therefore, the set of relevant perturbations the attacker may operate is described by the Cartesian product \(\phantom {\dot {i}\!}{\mathcal {V}}_{f_{1}} \times \ldots \times {\mathcal {V}}_{f_{b}}\), with f_{i}∈F. We denote by \(\hat {A}_{b}(\boldsymbol {x}{\mathcal {T}},F)\) the set of relevant attacks on the given set of features F, i.e., each perturbed vector \(\boldsymbol {x^{\prime }} \in \hat {A}_{b}(\boldsymbol {x}{\mathcal {T}},F)\) satisfies the following:
In conclusion, the set of relevant attacks is:
An attacker A_{b} can successfully perturb an instance \((\boldsymbol {x}, y) \in {\mathcal {D}}\) against a forest \({\mathcal {T}}\) if there exists at least one \(\boldsymbol {x^{\prime }} \in \hat {A}_{b}(\boldsymbol {x}{\mathcal {T}})\) that induces \({\mathcal {T}}\) to misclassify. We can thus exactly identify the portion of the test dataset that A_{b} can successfully perturb by using the discretized attacks in \(\hat {A}_{b}(\boldsymbol {x}{\mathcal {T}})\), i.e., \(\widehat {{\mathcal {D}}} = \left \{(\boldsymbol {x}, y) \in {\mathcal {D}} \setminus \overline {{\mathcal {D}}} \;  \; \exists \boldsymbol {x}^{\prime } \in \hat {A}_{b}(\boldsymbol {x}{\mathcal {T}}), \ y \cdot {\mathcal {T}}(\boldsymbol {x}^{\prime }) < 0\right \}\), where \(\overline {{\mathcal {D}}}\) includes the test instances misclassified by \({\mathcal {T}}\) in the absence of attack. Finally, we can exactly compute the accuracy under attack according to Eq. 1. This bruteforce approach is very expensive, due to three factors: (i) as b increases, the number of possible feature combinations \(F \subset {\mathcal {F}}\) with F=b increases; (ii) as the number of trees and nodes grows, the number of threshold values associated with each feature increases; and (iii) for each perturbed instance x^{′}, the prediction \(\mathcal {T}(\boldsymbol {x^{\prime }})\) must be computed by traversing the given forest.
4.2 Attacking forest \({\mathcal {T}}\) as a partial set coverage problem
We now introduce some simplifying worstcase assumptions and then show that an effective attack exists if it can be reduced to a solution for the partial set coverage problem. First, we assume that if a tree in \({\mathcal {T}}\) provides a wrong prediction before the attack, then its prediction will be incorrect also after the attack. Second, we assume that if a tree uses a feature f for its prediction over x, then attacking f causes the tree to generate a wrong prediction. Note that these assumptions are pessimistic from the point of view of a defender. Indeed, modifying a feature f does not necessarily flip the test performed on every node using that feature and, even if it was the case, tests over other features may suffice to avoid a wrong prediction.
Given an instance \((\boldsymbol {x}, y) \in {\mathcal {D}}\) and a forest \({\mathcal {T}}\), let C be the set of all correct trees \(t \in {\mathcal {T}}\) over x, i.e., \(C = \{t \in {\mathcal {T}} ~~ t(\boldsymbol {x}) \cdot y > 0\}\). Let \(\overline {C} = {\mathcal {T}} \setminus C\) be the set of all trees providing a wrong prediction over x. The goal of the attacker is to force a sufficient number of trees to misclassify x such that the majority of trees are incorrect. The minimum number of trees the attacker must fool is δ such that \(\overline {C}+\delta = \lceil {\mathcal {T}}/2 \rceil \). It turns out that in a robust forest of r(2b+1) trees, trained with FPF, where the attacker can affect at most rb trees, it is impossible for the attacker to fool the forest if \(\overline {C}<\lceil r/2 \rceil \). This means that a forest can be robust even if some of its trees are not correct in the absence of an attacker.
Let S_{f}⊆C be the set of all the correct trees that use feature f and let \(\Sigma = \{S_{f}\}_{f\in {\mathcal {F}}}\) be the collection of all S_{f}. In order for A_{b} to successfully attack \({\mathcal {T}}\) over x, there must exist a subset S^{∗}⊆Σ, with S^{∗}≤b since A_{b} can perturb a maximum of b features, such that \(\phantom {\dot {i}\!}\overline {C} + \bigcup _{S_{f} \in S^{*}} S_{f} \ge \lceil {\mathcal {T}}/2 \rceil \), or, equivalently, such that \(\phantom {\dot {i}\!}\bigcup _{S_{f} \in S^{*}} S_{f} \ge \delta \) with \(\delta = \lceil {\mathcal {T}}/2 \rceil  \overline {C}\). The thoughtful reader has surely recognized that this formulation of our problem is nothing else that an instance of the partial set coverage problem, where given the set of trees C and the collection Σ⊆2^{C}, we have to select up to b sets in Σ that cover at least δ trees.
Before attacking the partial set coverage problem, we make a few improvements to provide a stricter definition of sets S_{f} in relation to our scenario. First, we note that a tree may include a feature f in some of its nodes, but these nodes may never be traversed during the evaluation of an instance x. Therefore, we say that a correct tree t belongs to S_{f} for an instance x only if the traversal path of x in t includes a node with a test on feature f. This already reduces the size of each S_{f}.
Then, among the nodes along the traversal path of instance x before the attack, we can further distinguish between nodes where the test x_{f}≤v is true, and nodes where the test is false. In the former case, the attacker must increase the value of x_{f} to affect the traversal path, while in the latter case the attacker must decrease x_{f}. Clearly, these two attacks cannot coexist.
Therefore, we define sets \(S_{f}^{+}\) and \(S_{f}^{}\) as follows. Given a correct tree t∈C, we include t in \(S_{f}^{+}\) if the traversal path of x in t includes a node with a test x_{f}≤v on feature f and this test gives a true outcome. Otherwise, if the outcome of this test turns out to be false, we include t in \(S_{f}^{}\). This method allows us to achieve a more accurate modeling of when an attack can actually affect the final prediction. This also reduces the size of sets in Σ and decreases the risk of overestimating the effect of an attack. We can finally conclude the relation with the partial set cover problem as follows.
Proposition 2
(Partial set coverage as a necessary condition for successful attacks.) Given \((\boldsymbol {x}, y) \in {\mathcal {D}}\), where \({\mathcal {T}}(\boldsymbol {x}) \cdot y > 0\), a necessary condition for the existence of a successful attack x^{′}∈A_{b}(x) such that \({\mathcal {T}}(\boldsymbol {x}^{\prime }) \cdot y < 0\), is that there exists a solution for the partial set coverage problem, stated as follows:
Given the set system (C,Σ), where C is the finite set of correct trees for x, where Σ⊆2^{C} with \(\Sigma = \{S_{f}^{+}\}_{f\in {\mathcal {F}}} \cup \{S_{f}^{}\}_{f\in {\mathcal {F}}}\), and given integer b and a constant \(\delta = \lceil {\mathcal {T}}/2 \rceil  \overline {C}\), the goal is to find a subcollection S^{∗}⊆Σ, where \(\phantom {\dot {i}\!}\bigcup _{S \in S^{*}} S \ge \delta \), with the constraints that S^{∗}≤b and, \(\forall f \in {\mathcal {F}}\), if \(S_{f}^{+} \in S^{*}\) (\(S_{f}^{} \in S^{*}\)) then \(S_{f}^{} \not \in S^{*}\) (\(S_{f}^{+} \not \in S^{*}\)).
Proof
We show that if there exists x^{′}∈A_{b}(x) such that \({\mathcal {T}}(\boldsymbol {x}^{\prime }) \cdot y < 0\), then there exists S^{∗}⊆Σ, where \(\phantom {\dot {i}\!}\bigcup _{S \in S^{*}} S \ge \delta, S^{*} \le b\), and \(S_{f}^{+}\) and \(S_{f}^{}\) are mutually exclusive in S^{∗}. Given x^{′}∈A_{b}(x), we say that for any attacked feature f either the corresponding set \(S_{f}^{+}\) belongs to S^{∗} if \(x^{\prime }_{f}x_{f}>0\) (corrupted by increment) or \(S_{f}^{}\) belongs to S^{∗} if \(x^{\prime }_{f}x_{f}<0\) (corrupted by decrement). Clearly, it holds that S^{∗}≤b, and the sets \(S_{f}^{+}\) and \(S_{f}^{}\) are mutually exclusive in S^{∗}. Let \(\overline {C^{\prime }}\) be the set of (formerly correct) trees corrupted by the successful attack x^{′}, then it holds that \(\overline {C^{\prime }} \geq \delta \). By construction, any tree \(t\in \overline {C^{\prime }}\) belongs to either \(S_{f}^{+}\) or \(S_{f}^{}\) included in S^{∗}. Therefore, it holds that \(\phantom {\dot {i}\!}\overline {C^{\prime }} \leq \bigcup _{S \in S^{*}} S\), which implies \(\phantom {\dot {i}\!} \bigcup _{S \in S^{*}} S \geq \delta \). □
Note that Proposition 2 states that the existence of a solution S^{∗} for our partial set cover problem is only a necessary (not sufficient) condition for the attack. Thus, if S^{∗} exists, we cannot say that \({\mathcal {T}}\) can be fooled for sure, as the attacker might modify all the features identified by the cover without being able to affect the final forest prediction. However, we know that if a solution S^{∗} does not exist, then \({\mathcal {T}}\) is robust on the given instance x.
In the following, we use this result to compute an upper bound of the size of \(\widehat {{\mathcal {D}}}\), the set of all instances in the test dataset that can be attacked. On the one hand, this allows us to lower bound the accuracy under attack \(\phantom {\dot {i}\!}ACC_{A_{b}}\) (see Eq. 1); on the other hand, this result makes it possible to speed up the exact computation of \(\phantom {\dot {i}\!}ACC_{A_{b}}\) by employing the bruteforce approach only for those instances for which a sufficiently large set cover does not exist.
4.3 Fast accuracy lower bound
The method discussed in this section computes an overestimate of the size of the partial set cover S^{∗}⊆Σ, for the problem stated in Proposition 2.
If we consider the b largest sets in Σ, it is clear that the cardinality of the union of the sets within S^{∗} is smaller or equal to the sum of the cardinalities of such b largest sets (inclusionexclusion principle). We improve this trivial upper bound by considering that the two sets \(S_{f}^{+}\) and \(S_{f}^{}\) cannot be included together in a potential cover.
We thus define the fast lower bound set S_{FLB} to be the set of the b largest sets in Σ after enforcing the constraint that for a given feature f only the largest between \(S_{f}^{+}\) and \(S_{f}^{}\) is considered.
We can conclude that if \(\phantom {\dot {i}\!}\sum _{S\in S_{{FLB}}} S < \delta \), then a suitable partial cover cannot exist and therefore the forest \({\mathcal {T}}\) cannot be attacked on x.
Therefore, we define the set \(\widehat {{\mathcal {D}}}_{{FLB}}\) of attackable instances according to the fast lower bound method as follows. For each correctly classified instance \((\boldsymbol {x},y) \in {\mathcal {D}}\), we build the partial coverage problem according to Proposition 2, and iff\(\phantom {\dot {i}\!}\sum _{S\in S_{{FLB}}} S \geq \delta \), then we include the instance (x,y) in \(\widehat {{\mathcal {D}}}_{{FLB}}\). Since it holds that \(\widehat {{\mathcal {D}}}_{{FLB}}\geq \widehat {{\mathcal {D}}}\), we define the fast lower bound accuracy as:
4.4 Exhaustive accuracy lower bound
In order to improve over the fast lower bound, we also consider a more expensive option, where all the possible covers are considered, still respecting the constraint that any \(S_{f}^{+}\) and \(S_{f}^{}\) are mutually exclusive. We evaluate all the possible covers S^{†}⊂2^{Σ},S^{†}≤b, and we call exhaustive lower bound cover, denoted with S_{ELB}, the first cover found such that \(\phantom {\dot {i}\!}\bigcup _{A \in S_{{ELB}}} A \ge \delta \).
By applying the same procedure to every correctly classified instance (x,y) in the dataset, we identify the set of instances \(\widehat {{\mathcal {D}}}_{{ELB}}\) for which there exists an exhaustive lower bound cover S_{ELB} that solves the problem in Proposition 2.
Note that \(\widehat {{\mathcal {D}}} \le \widehat {{\mathcal {D}}}_{{ELB}} \le \widehat {{\mathcal {D}}}_{{FLB}}\), and thus, we use this method to compute another lower bound for the accuracy of \({\mathcal {T}}\) on the test dataset \({\mathcal {D}}\):
where the following relationship trivially holds:
This exhaustive lower bound search incurs into the exponential cost of enumerating the possible covers in 2^{Σ}, but it improves over the bruteforce attack, thanks to the coverbased formulation, by ignoring the relevant threshold values \({\mathcal {V}}_{f}\) each feature can be attacked to.
We recall that while it is true that \(\widehat {{\mathcal {D}}} \le \widehat {{\mathcal {D}}}_{{ELB}} \le \widehat {{\mathcal {D}}}_{{FLB}}\), we cannot claim that \(\widehat {{\mathcal {D}}} \subseteq \widehat {{\mathcal {D}}}_{{ELB}} \subseteq \widehat {{\mathcal {D}}}_{{FLB}}\). The above bounds can prove the nonexistence of an actual cover, but they may not be used to find a successful attack strategy.
4.5 Cascading evaluation
Above, we presented two algorithms, FLB and ELB, that efficiently find an overapproximation of a cover of attacked trees which allows us to estimate the upper bound of the most harmful attack. Strategies have different costs: FLB requires to sort the candidate sets of the cover, while ELB performs an exhaustive search of all the possible subsets of Σ. Both methods are however much cheaper than bruteforce evaluation.
When the lower bound information is not considered sufficient, in order to compute the actual accuracy under attack \(\phantom {\dot {i}\!}ACC_{A_{b}}\), we propose to exploit the following CASCADING strategy.
Given an instance x and an attacker A_{b}, we build the collection of sets of trees \(\Sigma = \{S_{f}^{+}\}_{f\in {\mathcal {F}}} \cup \{S_{f}^{}\}_{f\in {\mathcal {F}}}\) and proceed as follows:

1
Compute S_{FLB}⊆Σ: if \(\phantom {\dot {i}\!}\sum _{S \in S_{{FLB}}} S < \delta \), then no sufficiently large set cover exists, and therefore, the instance x cannot be attacked; otherwise

2
Search for a suitable cover S_{ELB}⊆Σ: if there is no S_{ELB} such that \(\phantom {\dot {i}\!}\bigcup _{S \in S_{{ELB}}} S \ge \delta \), then the instance x cannot be attacked; otherwise

3
Use the bruteforce method to check the existence of a successful attack on x.
Experimental results show that the above cascading strategy is able to strongly reduce the number of instances in a given dataset \({\mathcal {D}}\) for which the bruteforce approach is required.
4.6 Nonbinary classification
While we leave to future work the design and evaluation of an algorithm for nonbinary classification, we highlight that the proposed methodology can be easily generalized to a multiclass scenario, and we sketch below a basic certification methodology. The algorithms proposed so far aim at certifying the impossibility of the attacker of modifying a number of correct trees δ such that \(\delta \ge \lceil {\mathcal {T}}/2 \rceil  \overline {C}\), where \(\overline {C}\) is the set of trees wrongly classifying the given instance. In regard to a multiclass classification problem, given classes \({\mathcal {Y}}\) with \({\mathcal {Y}}>2\), it is possible to verify robustness by running \({\mathcal {Y}}1\) certifications analogous to the binary case. Let us denote with C_{c} the set of trees classifying, before the attack, the given instance (x,y) as class c, \(c \in {\mathcal {Y}}\), thus being C_{y} the trees predicting the correct label y. For a given class \(c\in {\mathcal {Y}}\), the attacker aims at attacking the trees in \(\bigcup _{i\neq c} C_{i}\) so as to make c the new majority class. The bestcase scenario from the point of the attacker is given by modifying the predictions of the trees in C_{y}, as in this case it is sufficient to attack δ=⌈(C_{y}−C_{c})/2⌉ trees. If the attacker is not able to alter at least δ trees of the forest, then no successful attack is possible. To this end, we can exploit any of the setcover base techniques proposed so far to verify the absence of any cover of size at least δ among the trees \(\bigcup _{i\neq c} C_{i}\). Such verification is to be repeated for each \(c \in {\mathcal {Y}}\).
5 Experiments
5.1 Experimental settings
In Table 1, we report the main characteristics of the datasets used in the experimental evaluation, including the number of features, the number of top relevant features measured as those contributing to 90% of the feature importance in a Random Forest, and the relative size of the majority class. Datasets, ranging from small to midsized, are associated to a binary classification task, and they are commonly used in adversarial learning literature^{Footnote 2}.
We compare our proposed algorithm FPF against the following treeensemble competitors:

Random Forest (RF) [37], which, although attackerunaware, is known to have some good level of robustness thanks to the ensembling of several decision trees. As in the original algorithm, each tree is trained on a bootstrap sample of the dataset, with no constraints on the number of leaves, and with feature sampling of size \(\sqrt {\mathcal {F}}\) at each node.

Random Subspace method (RSM) [34], which was successfully exploited in the adversarial setting by Biggio et al. [35]. In this case, each tree is trained on a projection of the original dataset on a subset of its features. Validation experiments showed best results with 20% feature sampling.

Robust Trees (RT) [22] is an adversarial learning algorithm that targets robustness through optimization of the model performance under attack. We created an ensemble by training each individual tree with the algorithm proposed in the original paper.
Hereinafter, we use b to represent the expected attacker’s budget exploited as training parameter of FPF and RT, and we use k to stand for the attacker’s budget used at test time to generate attacks. We conducted a fair comparison by allowing each method to grow 300 trees, which was observed experimentally to be an optimal configuration. By leveraging the usual 602020 train, validation and test set split, for all the considered algorithms, we finetuned their hyperparameters on the validation set by optimizing the accuracy under attack \(\phantom {\dot {i}\!}ACC_{A_{k}}\). The number of leaves was finetuned in the set {4,8,16}. The RSM algorithm requires a sampling parameter p chosen in the set {0.1,0.2,0.4,0.6}. Finally, regarding FPF, we finetuned its b parameter in the range [1,5] while the number of rounds r was set on the basis of b so as to have no more than 300 trees (\(r = \lfloor \frac {300}{2b+1} \rfloor \)). All reported results were computed on the test set of each dataset.
The following experimental evaluation aims at answering the following research questions:

Is FPF able to train robust models in terms of accuracy under attack?

How is FPF affected by the number of rounds r and the expected attacker’s budget b used at training time?

How accurate are the proposed lower bounds FLB and ELB?

How efficient is the computation of the proposed lower bounds FLB and ELB, and can we exploit them to analyze models on larger attacker’s budgets k?
5.2 Robustness analysis
In Tables 2, 3, and 4, we report the accuracy of FPF, RF, RSM, and RT against an attacker that can modify up to 3 features. To generate all the attacks, we used the exhaustive bruteforce algorithm defined in Section 4.1, through which we can calculate the exact robustness of the models. For the SPAM_BASE dataset, we limit the analysis to two attacked features, due to the cost of computing \(\phantom {\dot {i}\!}ACC_{A_{3}}\). We also report the value of \(\phantom {\dot {i}\!}ACC_{A_{0}}\), which captures the model accuracy in the absence of attacks. For FPF, we evaluate its robustness on varying the number of rounds r and the defense parameter b, still keeping the total number of trees about constant and comparable with the size of the forests generated by competitor algorithms. The rightmost column in these tables reports the difference w.r.t. the fFPF algorithm for the same value of k.
In all datasets, RF performs best or second best in the absence of attacks, but its accuracy significantly drops under attack. The perturbation of one single feature is sufficient to harm the model, with a loss of 5 to 15 points in accuracy, and when attacking two features, the accuracy drops well under 50% for WINE and SPAM_BASE datasets. As previously discussed, the L_{0}norm attack we are tackling in this work is indeed very powerful and sufficient to fool a very accurate and effective random forest model which is not adversarially trained.
The RSM model provides good performance in the absence of attacks, meaning that the dataset projection is not disadvantageous, and it is much more robust than RF in the presence of attacks. However, when attacking two features, RSM exhibits a drop of about 10 to 20 points in accuracy on all datasets. When attacking three features, the accuracy of the model is very small: below 45% on WINE and below 66% on BREAST_CANCER.
The results obtained with RT show how this strategy for training robust models against adversarial attacks does not perform well in case of L_{0}norm attacks. This can be explained by observing how the learning algorithm trains a robust model. During the training phase, the model has to choose the feature and threshold pair which maximizes the minimum accuracy under attack. However, an L_{0}norm attacker can always modify a feature as much as he wants and therefore can always cross the threshold. Consequently, it is difficult for the training algorithm to choose which pair is the best, because each pair generates the same information gain under attack. As a consequence, the algorithm cannot guarantee robustness for this type of attack. Furthermore, it can be seen that for the WINE dataset, the algorithm always returns the same accuracy and robustness and that these exactly coincide with the majority class percentage of the dataset. This happens because in the training phase the algorithm realizes that always returning the majority class is the best solution for maximizing robustness.
The proposed FPF algorithm can provide the best robustness in virtually all attack scenarios. We highlight that the best defensive b found is always larger than the attacker’s budget k, meaning that increasing the number of feature partitions provides even better robustness. The scenario where the attacker may attack k=3 features is especially relevant: the proposed FPF algorithm achieves a 15–20% better accuracy than the competitor RSM. Furthermore, the results of the experiments on WINE reported in Table 2 also highlight the ability of FPF to guarantee greater robustness even with few features in the dataset. As reported in Table 1, WINE has 13 features of which only 7 are relevant. Attacks executed with budget 1, 2, and 3 compromise the 7.7%, 15.4%, and 23.1% of the features respectively or 14.3%, 28.6%, and 42.8% if we consider only the relevant features. For each budget value, FPF showed a greater (or equal) robustness than the other models.
Experiments do not show any significant difference between fFPF and hFPF. They both provide the same accuracy with k=1 and k=3, with fFPF exhibiting slightly better figures on k=2. While hFPF has better theoretical guarantees in a blackbox scenario (see Section 6), the two variants fFPF and hFPF do not differ much in practice.
We conclude that FPF is able to outperform stateoftheart competitors, especially when considering stronger attackers.
5.3 Sensitivity analysis
In Table 5, we evaluate the sensitivity of FPF w.r.t. the number of rounds r on the BREAST_CANCER dataset for different values of b. Similar results were observed for the other datasets. With a few exceptions, the ensembling strategy improves the accuracy of the resulting model, and accuracy increases when increasing the number of rounds until a plateau is reached. We can conclude that using a large number of rounds r improves the robustness of the trained model.
The impact of b is evaluated in Table 6. Note that in the BREAST_CANCER dataset, we identified only 15 informative features which are difficult to partition in 2b+1 sets for large values of b. In this case, the results are mixed and we can identify two different trends. For attacks A_{2} and A_{3}, increasing b brings some interesting benefits, and it is always worthwhile to use b≥4. When the attacker’s budget is limited to A_{1}, then the best configuration also depends on the number of trees. When using a limited number of trees, accuracy under attack increases with b. But when the forest is sufficiently large, then the benefit of increasing the number of trees is larger than the benefit of increasing b.
We conclude that, while it is beneficial to increase the number of rounds r, it is not always a good strategy to increase b when the total number of trees is small and the number of informative features is limited.
5.4 Lower bound analysis
In Tables 7 and 8, we compare the accuracy under attack \(\phantom {\dot {i}\!}ACC_{A_{k}}\) with the lower bounds \(\phantom {\dot {i}\!}ACC_{A_{k}}^{ELB}\) and \(\phantom {\dot {i}\!}ACC_{A_{k}}^{FLB}\) on both fFPF and hFPF forests, built with different values of b on the BREAST_CANCER dataset and attacked by different attackers A_{k}.
We first highlight that with k=1, the estimate provided by the lower bounds is exact in the large majority of settings. With k=2, we observe a small difference between the two estimates \(\phantom {\dot {i}\!}ACC_{A_{k}}^{ELB}\) and \(\phantom {\dot {i}\!}ACC_{A_{k}}^{FLB}\). Still, with b≥4, both bounds provide an exact estimate, while with b=1 and b=2, the exhaustive lower bound introduces an error always smaller than 0.019 points and the fast lower bound smaller than 0.026 points. Finally, with k=3, the gap between the two lower bounds increases. The error exhibited by the exhaustive technique is within 0 and 0.052 points, while the fast lower bound shows an error between 0 and 0.088. Clearly, when k>b, the lower bound estimate is useless, as there always exists a sufficiently large partial cover.
We conclude by observing that both lower bounds are very close to the actual accuracy under attack. For instance, the exhaustive lower bound always provides the correct accuracy of the fFPF model with b≥4 and for every value of k. This makes the provided lower bounds an efficient and accurate tool. In Figs. 1 and 2, we show the accuracy lower bound computed with FLB on different datasets, while varying b and the attacker power k. In this set of experiments, we also consider three binary classification datasets generated from MNIST by isolating instances of two digits. These datasets encompass a much larger number of features that makes bruteforce approaches to security certification infeasible. Indeed, the computational efficiency of the proposed bound allows us to compute the minimum accuracy for large values of b and large attacker’s budgets k. Figures show how larger values of b allow ones to sustain a larger attacker strength. Of course, when the attacker becomes too strong compared with the number of relevant features in the dataset, then the accuracy of FPF drops. Regarding the MNIST datasets, the lower bounds allow us to state that a reasonable accuracy can be achieved also when attacking more than 20 features. The weakest dataset is that of digits 5 vs. 6, where the attacker is clearly required to change fewer pixels to induce a misclassification.
5.5 Efficiency analysis
In Table 9, we report the perinstance average time required to run the bruteforce certification method BF and the proposed CASCADING strategy. We observe how the computational cost required by BF exponentially increases when increasing k. As expected, the BF approach quickly becomes infeasible.
The proposed CASCADING strategy provides a 10 × speedup that increases to 100 × when k=3. This huge gap is due to the efficiency and accuracy of the proposed lower bounds. On varying k, the fraction of instances for which FLB cannot certify the nonattackability is respectively 2%, 8%, and 20%. These instances are processed during the ELB step, which leaves to the last BF step 2%, 7%, and 18%, respectively, of the dataset to be analyzed. The BF certification of these last instances largely covers from 85% to about 100% of the total running time, while the FLB and ELB steps are two or more orders of magnitudes faster. We thus conclude that the proposed FLB and ELB are both sufficiently accurate, as discussed in the previous subsections, and they can be used in a CASCADINGlike strategy to provide significant speedups to any other exact certification method.
6 Conclusion
This paper proposes FPF, a new algorithm to generate forests of decision trees, based on random equipartitioning of the feature set, along with a projection of the dataset on these partitions before training each single decision tree. The method is proven to be resilient against evasion attacks, and, more importantly, we are able to certify in a very efficient way that, given a test dataset, some of the instances cannot be attacked at all, thus avoiding the costly computation of all the possible evasion attacks.
The experimental evaluation, carried out on publicly available datasets, is promising and outperforms the main direct competitor, based on ensembles built on random sampling of the features. Moreover, we show that the proposed certification methods provide a very close approximation of the actual accuracy and they can be used through a cascading approach to speedup an exact accuracy under attack computation.
We finally highlight that the proposed feature partitioning methodology easily generalizes to ensembles of other machine learning algorithms, whose investigation is left as future work.
7 Appendix
7.1 Theoretical analysis
In this section, we study analytically the behavior of FPF on varying the number of rounds r, the number of features d, and the attacker’s budget b, and we also take into consideration the probability of incorrect prediction by trees of the forest. The aim of this analysis is to investigate the impact of the above hyperparameters to the robustness of a forest built by FPF. To do so, we resort to the common blackbox attack scenario where the attacker has no access to the internal structure of the forest to choose which features to attack. We thus compute the accuracy of the forest under attack by estimating the probability that the attacker A_{b} may successfully fool the given forest by picking b features at random. Indeed, this probabilistic analysis is aimed at understanding the asymptotic behavior of the proposed algorithm.
We highlight that in the experimental section, we rather adopt a more severe whitebox attack scenario where we consider an instance attacked if there is at least one successful attack.
Let d be the number of features in the input dataset, b the attacker’s budget, \(\mathcal {P}\) a robust equipartition of features \({\mathcal {F}}\), and s=d/(2b+1) the number of features in each of the 2b+1 sets. For the sake of simplicity, to guarantee equipartitioning, we assume that d is a multiple of 2b+1^{Footnote 3}. Moreover, we let e be the probability of a tree \(t \in {\mathcal {T}}\) of being erroneous, i.e., we do not assume that all trees are perfectly accurate. We further adopt the conservative and pessimistic assumption that if the attacker modifies a feature, then a tree using that feature will provide a wrong prediction.
We first compute the probability Pr(h) that b features, selected at random by A_{b}, overlap with exactly h partitions in \(\mathcal {P}\). To do so, we first restrict our attention to attacks that are entirely included into a given subset \(H\subset {\mathcal {P}}\), with H=h, and then we generalize to the full partition \({\mathcal {P}}\). When H=h, the set H includes sh features, and we denote by U_{H} the set of possible attacks over any b features in H, where \(U_{H} = \binom {sh}{b}\) is the number of such attacks. Indeed, we are interested in computing \(\widehat {U}_{H} \subseteq U_{H}\), i.e, the set of attacks to the features in H that exactly overlap all the h partitions in H (and not less).
Proposition 3
(Computing the number of attacks \(\widehat {U}_{H}\)) Let K_{i} be set of all attacks in U_{H} that do not overlap with P_{i}∈H, and let \(\bar {K}_{i}\) be the complement of K_{i} in U_{H}. The following equation holds:
Proof
Let C be a possible attack, i.e., a tuple of b features chosen from the sh features of the h partitions in H. For every \(C\in \widehat {U}_{H}\), by definition, C contains h features from h different partitions in H, and therefore, C has nonempty intersection with every \(\bar {K}_{i}\). This proves \(\widehat {U}_{H} \subseteq \cap _{i=1}^{h} \bar {K}_{i}\). To also prove that \(\cap _{i=1}^{h} \bar {K}_{i} \subseteq \widehat {U}_{H}\), let us suppose that \(C \in \cap _{i=1}^{h} \bar {K}_{i}\), and that, by contradiction, C does not overlap with h partitions as C does not contain any feature in partition P_{j}∈H. This implies that C∈K_{j}, since K_{j} contains by construction all the tuples of b features that do not overlap with P_{j}∈H. Since \(C \in K_{j} \Rightarrow C \notin \bar {K}_{j}\), which in turn implies that \(C \notin \cap _{i=1}^{h} \bar {K}_{i}\). This contradicts our hypothesis and concludes the proof. □
We can finally write the probability Pr(h) as follows:
where the factor \(\binom {2b+1}{h}\) counts the number of ways of selecting \(H \subset \mathcal {P}\), with H=h, while the denominator \(\binom {d}{b}\) is the number of all the possible attacks on the full feature space.
To compute the cardinality of \(\widehat {U}_{H}\), we resort to the complementary formulation of the inclusionexclusion principle:
We can rewrite the above formula as follows:
where the cardinality of the intersection of k distinct K_{i} sets is computed as \(\binom {s(h{}k)}{b}\), resulting by the attacks to b features limited to the remaining of h−k partitions in \(H \subset \mathcal {P}\), each of size s.
We can finally conclude by rewriting the formula in Eq. 2:
We can now compute the probability \(\text {Pr}({\mathcal {T}}_{\mathcal {P}})\) that \({\mathcal {T}}_{\mathcal {P}}\) is accurate. The forest \({\mathcal {T}}_{\mathcal {P}}\) of 2b+1 trees is accurate if the number ε of erroneous trees is at most b. There are several cases that lead to this outcome: we may have h trees affected by the attacker and the remaining ε−h trees being wrong independently of the attacker, each with probability e. Moreover, those ε−h can be selected at random among the \({\mathcal {P}}h\) ones. We define as Pr(ε∣h) the probability that a total of ε trees in \({\mathcal {T}}_{\mathcal {P}}\) provide a wrong prediction, given that h were already harmed by the attacker:
Given that the attacker may negatively affect at least one and at most b trees, we have that the probability of \({\mathcal {T}}_{\mathcal {P}}\) being correct is:
Figure 3a shows the probability Pr(h) for different values of b. The number of features in the dataset was chosen so as to have the multiple of 2b+1 closest to 100. We highlight that the probability of “hitting” only a few partitions is usually small: when b=5, there are 11 partitions and there is less than 15% probability of hitting 3 partitions and about 85% of hitting at least 4 partitions; when b=10, there is nearly 0 probability of hitting less than 6 partitions out of 21, and there is about 80% probability of hitting at least 8 partitions. Interestingly, hitting b=10 partitions is not the most probable event.
Clearly, we are hypothesizing that our attacker is very powerful, as it can likely impact on features used by a significant portion of trees in a forest \({\mathcal {T}}_{\mathcal {P}}\) by breaking these trees. This is partially confirmed in Fig. 3b where the error rate probability e of trees is also considered. The expected accuracy \(\text {Pr}({\mathcal {T}}_{\mathcal {P}})\) is not large, but, interestingly, increasing the attacker’s budget does not have a significant impact. It is true that the attacker can harm a good number of partitions, yet the majority of them are expected to provide a correct results, even considering e, eventually providing sufficient accuracy figures.
Finally, we report in Fig. 4 the expected accuracy comparing the flat fFPF versus the hierarchical hFPF ensemble prediction methods. Results were computed by simulation and, when not specified, considering a number of features d=99, an attacker’s budget b=3, number of rounds r=101, and a large tree error rate e=0.18. Figure 4a shows how using multiple rounds quickly improves the ensemble accuracy. In the long run, hierarchical and flat solutions may converge, but the hierarchical approach provides better results earlier (up to 10% improvement). Increasing the attacker’s budget has a significant impact on the accuracy, yet the hierarchical approach is more robust as shown in Fig. 4b. Clearly, the harm of the attack depends on the number of features available. In datasets with insufficient features to properly populate the partitions, the accuracy might decrease. Otherwise, the number of features is not very relevant, and the accuracy is stable once a good number of features is available as reported in Fig. 4c.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available in the UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/spambase, https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) and https://archive.ics.uci.edu/ml/datasets/wine. The datasets generated and/or analyzed during the current study are available in the The MNIST Database repository, http://yann.lecun.com/exdb/mnist/.
Notes
\(\mathbbm {1}{[e]}\) equals 1 if expression e is true and 0 otherwise.
All datasets are available at the UCI Machine Learning Repository.
Otherwise, to guarantee semiequipartitioning, some partitions will have size s=⌊d/(2b+1)⌋ and others s+1.
Abbreviations
 ML:

Machine learning
 FPF:

Feature Partitioned Forest
 RSM:

Random Subspace method
 RT:

Robust trees
 hFPF :

HierarchicalFeature Partitioned Forest
 fFPF :

FlatFeature Partitioned Forest
 SVM:

Supportvector machines
 FLB:

Fast accuracy lower bound
 ELB:

Exhaustive accuracy lower bound
 ACC:

Accuracy
 RF:

Random forest
 BF:

Brute force
References
L. Huang, A. D. Joseph, B. Nelson, B. I. P. Rubinstein, J. D. Tygar, in Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. Adversarial machine learning, (2011), pp. 43–58.
B. Biggio, F. Roli, Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognit.84:, 317–331 (2018).
B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto, F. Roli, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Evasion attacks against machine learning at test time, (2013), pp. 387–402.
A. M. Nguyen, J. Yosinski, J. Clune, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images, (2015), pp. 427–436.
N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, A. Swami, in 2016 IEEE European Symposium on Security and Privacy (EuroS&P). The limitations of deep learning in adversarial settings, (2016), pp. 372–387.
S. MoosaviDezfooli, A. Fawzi, P. Frossard, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Deepfool: a simple and accurate method to fool deep neural networks, (2016), pp. 2574–2582.
D. Lowd, C. Meek, in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. Adversarial learning, (2005), pp. 641–647.
B. Biggio, B. Nelson, P. Laskov, in Asian Conference on Machine Learning. Support vector machines under adversarial label noise, (2011), pp. 97–112.
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, R. Fergus, in ICLR. Intriguing properties of neural networks, (2014).
I. J. Goodfellow, J. Shlens, C. Szegedy, in ICLR. Explaining and harnessing adversarial examples, (2015).
G. Tolomei, F. Silvestri, A. Haines, M. Lalmas, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Interpretable predictions of treebased ensembles via actionable feature tweaking, (2017), pp. 465–474.
F. Chollet, Deep learning with Python, 1st edn. (Manning Publications Co., USA, 2017).
B. Nelson, B. I. P. Rubinstein, L. Huang, A. D. Joseph, S. Lau, S. J. Lee, S. Rao, A. Tran, J. D. Tygar, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Nearoptimal evasion of convexinducing classifiers, (2010), pp. 549–556.
B. Biggio, G. Fumera, F. Roli, Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng.26(4), 984–996 (2014).
N. Srndic, P. Laskov, in 2014 IEEE Symposium on Security and Privacy. Practical evasion of a learningbased classifier: a case study, (2014), pp. 197–211.
A. Kantchelian, J. D. Tygar, A. D. Joseph, in International Conference on Machine Learning. Evasion and hardening of tree ensemble classifiers, (2016), pp. 2387–2396.
N. Carlini, D. A. Wagner, in 2017 IEEE Symposium on Security and Privacy (SP). Towards evaluating the robustness of neural networks, (2017), pp. 39–57.
H. Dang, Y. Huang, E. Chang, in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Evading classifiers by morphing in the dark, (2017), pp. 119–133.
H. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, F. Roli, Support vector machines under adversarial label contamination. Neurocomputing. 160:, 53–62 (2015).
S. Gu, L. Rigazio, in ICLR, Workshop Track Proceedings. Towards deep neural network architectures robust to adversarial examples, (2015).
N. Papernot, P. D. McDaniel, X. Wu, S. Jha, A. Swami, in 2016 IEEE Symposium on Security and Privacy (SP). Distillation as a defense to adversarial perturbations against deep neural networks, (2016), pp. 582–597.
H. Chen, H. Zhang, D. S. Boning, C. Hsieh, in International Conference on Machine Learning. Robust decision trees against adversarial examples, (2019), pp. 1122–1131.
S. Calzavara, C. Lucchese, G. Tolomei, S. A. Abebe, S. Orlando, Treant: training evasionaware decision trees. Data Min. Knowl. Discov.34(5), 1390–1420 (2020).
M. Andriushchenko, M. Hein, in NeurIPS. Provably robust boosted decision stumps and trees against adversarial attacks, (2019), pp. 12997–13008.
A. Kantchelian, J. D. Tygar, A. Joseph, in International Conference on Machine Learning. Evasion and hardening of tree ensemble classifiers, (2016), pp. 2387–2396.
J. Su, D. V. Vargas, K. Sakurai, One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput.23(5), 828–841 (2019).
K. Simonyan, A. Zisserman, in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 79, 2015, Conference Track Proceedings, ed. by Y. Bengio, Y. LeCun. Very Deep Convolutional Networks for LargeScale Image Recognition, (2015). http://arxiv.org/abs/1409.1556.
H. Chen, H. Zhang, S. Si, Y. Li, D. Boning, C. J. Hsieh, in Advances in Neural Information Processing Systems. Robustness verification of treebased models, (2019), pp. 12317–12328.
F. Ranzato, M. Zanella, in The ThirtyFourth AAAI Conference on Artificial Intelligence, AAAI 2020, The ThirtySecond Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 712, 2020. Abstract Interpretation of Decision Tree Ensemble Classifiers (AAAI Press, 2020), pp. 5478–5486. https://aaai.org/ojs/index.php/AAAI/article/view/5998.
M. Cheng, T. Le, P. Y. Chen, H. Zhang, J. Yi, C. J. Hsieh, in International Conference on Learning Representation (ICLR). Queryefficient hardlabel blackbox attack: an optimizationbased approach, (2019).
S Calzavara, C Lucchese, G Tolomei, S. A Abebe, S. Orlando, 34. Treant: training evasionaware decision trees, (2020), pp. 1390–1420. https://doi.org/10.1007/s10618020006949.
D. Vos, S. Verwer, in Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 1824 July 2021, Virtual Event (Proceedings of Machine Learning Research), vol. 139, ed. by M. Meila, T. Zhang. Efficient Training of Robust Decision Trees Against Adversarial Examples (PMLR, 2021), pp. 10586–10595.
F. Ranzato, M. Zanella, in GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 1014, 2021, ed. by F. Chicano, K. Krawiec. Genetic adversarial training of decision trees (ACM, 2021), pp. 358–367. https://doi.org/10.1145/3449639.3459286.
T. K. Ho, The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell.20(8), 832–844 (1998).
B. Biggio, G. Fumera, F. Roli, Multiple classifier systems for robust classifier design in adversarial environments. Int. J. Mach. Learn. Cybern.1(14), 27–41 (2010).
R. Gandhi, S. Khuller, A. Srinivasan, Approximation algorithms for partial covering problems. J. Algorithms. 53(1), 55–84 (2004). https://doi.org/10.1016/j.jalgor.2004.04.002.
L. Breiman, Random forests. Mach. Learn.45(1), 5–32 (2001).
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All authors have contributed equally. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Calzavara, S., Lucchese, C., Marcuzzi, F. et al. Feature partitioning for robust tree ensembles and their certification in adversarial scenarios. EURASIP J. on Info. Security 2021, 12 (2021). https://doi.org/10.1186/s13635021001270
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13635021001270