Feature Partitioning for Robust Tree Ensembles and their Certification in Adversarial Scenarios

Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at test time. The attacker aims at finding a minimal perturbation of a test instance that changes the model outcome. We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We experimented the proposed strategy on decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently assess the minimal accuracy of a forest on a given dataset avoiding the costly computation of evasion attacks. Experimental evaluation on publicly available datasets shows that proposed strategy outperforms state-of-the-art adversarial learning algorithms against evasion attacks.


Introduction
Machine Learning (ML) algorithms are currently used to train models that are then deployed to ensure system security and to control critical processes (Huang et al., 2011;Biggio & Roli, 2018). Unfortunately, traditional ML algorithms proved vulnerable to a wide range of attacks, and in particular to evasion attacks, where an attacker carefully craft perturbations of an inputs to force prediction errors (Biggio et al., 2013;Nguyen et al., 2015;Papernot et al., 2016a;Moosavi-Dezfooli et al., 2016). * Equal contribution 1 Department of Environmental Sciences, Informatics and Statistics, Ca' Foscari University of Venice, Italy. Correspondence to: Claudio Lucchese <claudio.lucchese@unive.it>.
Technical Report. Copyright 2020 by the author(s).
While there is a large body of research on evasion attacks in linear classifiers (Lowd & Meek, 2005;Biggio et al., 2011) and, more recently, on deep neural networks (Szegedy et al., 2014;Goodfellow et al., 2015), there are a few works dealing with tree-based models. Decision trees are interpretable models (Tolomei et al., 2017), yielding predictions which are human-understandable in terms of syntactic checks over domain features, which is particularly appealing in the security setting. Moreover, decision trees ensembles are nowadays one of the best methods for dealing with nonperceptual problems, and are one of the most commonly used techniques in Kaggle competitions (Chollet, 2017).
In this paper we present an algorithm, called Feature Partitioned Forest (FPF), that builds an ensemble of decision trees aimed to be robust against evasion attacks. Indeed, the trained ensemble is a binary classifier that we show to be, in most of the cases, robust by construction. In fact, given a test instance that the attacker aims to corrupt, if most of the trees in the ensemble returns accurate binary predictions for that instance, the attacker has no chance to attack the whole ensemble. Our method is based on a particular sampling of the features, where we randomly equi-partition the set of features, and train each tree on a distinct feature partition. Moreover, as usual in the context of adversarial learning, we count on a threat model that limits the budget of an attacker, that can only manipulate a limited number of features, thus upper bounding the distance between the original instance and the perturbed one. We also propose an approximate certification method for our tree ensembles that efficiently assesses the minimal accuracy of a forest on a given dataset, avoiding the costly computation of evasion attacks.

Related Work
Most of the work in this adversarial learning regards classifiers, in particular binary ones. The attacker starts from a positive instance that is classified correctly by the deployed ML model and is interested in introducing minimal perturbations on the instance to modify the prediction from positive to negative, thus "evading" the classifier (Nelson et al., 2010;Biggio et al., 2013;Srndic & Laskov, 2014;Kantchelian et al., 2016b;Carlini & Wagner, 2017;Dang et al., 2017;Goodfellow et al., 2015). To prevent these attacks, different techniques have been proposed for different models, including support vector machines (Biggio et al., 2011;Xiao et al., 2015), deep neural networks (Gu & Rigazio, 2015;Goodfellow et al., 2015;Papernot et al., 2016b), and decision tree ensembles (Kantchelian et al., 2016b;Chen et al., 2019a). Unfortunately, the state of the art for decision tree ensembles is far from satisfactory.
The first adversarial learning technique for decision tree ensembles is due to Kantchelian et al. and is called adversarial boosting (Kantchelian et al., 2016b). It is an empirical data augmentation technique, borrowing from the adversarial training approach (Szegedy et al., 2014). Another adversarial learning technique for decision tree ensembles was proposed in a recent work by Chen et al., who introduced the first tree learning algorithm embedding the attacker directly in the optimization problem solved upon tree construction (Chen et al., 2019a). The key idea of their approach, called robust trees, is to redefine the splitting strategy of the training examples at a tree node. Finally, our algorithm FPF has some relations with Random Subspace method (RSM) (Ho, 1998), which was successfully exploited by Biggio et al., 2010 to build ensembles where each single model is trained on a projection of the original dataset on a subset of its features.

Robust Forest Training
We aim to design a machine learning algorithm able to train forests of binary decision trees that are resilient to evasion attacks, which occurs when an adversary adaptively manipulates test data to force prediction errors. Specifically, in this section we discuss our algorithm able to train a robust forest that is resilient in a strong adversarial environment, where attackers can perturb at most b features of a test instance to deceive the learnt model and force a prediction error. We first introduce some notation and the threat model, then we discuss our algorithm.

Background and Notation
where we denote with F the set of features. Each instance x ∈ X is assigned a label y ∈ Y by some unknown target function g : X → Y. The goal of a supervised learning algorithm that induces a forest of decision trees is to find the forest T that best approximates the target g.
In this paper we discuss binary classification, where Y = {−1, +1}, and focus on binary decision trees. Each tree t ∈ T can be inductively defined as follows: t is either a leaf λ(ŷ) for some labelŷ ∈ Y, or a internal test node σ(f, v, t l , t r ), where f ∈ [1, d] identifies a feature, v ∈ R is the threshold for the feature f , and t l , t r are left/right decision trees.
At test time, an instance x traverses each tree t ∈ T until it reaches a leaf λ(ŷ), which returns the predictionŷ, denoted by t(x) =ŷ. Specifically, for each internal test node σ(f, v, t l , t r ), x falls into the left tree t l if x f ≤ v, and into the right tree t r otherwise. Given a forest T , the global prediction is defined as T (x) = +1 if t∈T t(x) > 0, and T (x) = −1 otherwise.
Finally, given a test set D test , let E = {(x, y) ∈ D test | y · T (x) < 0} be the set of test instances that are not classified correctly by T . We can finally define the accuracy as:

Threat Model
We focus on the evasion attack scenario, where an attacker aims at fooling an already trained classifier by maliciously modifying a given instance before submitting it to the classification model. The perturbation caused by the attacker is not unconstrained as the attack should be "invisible" to the classification system.
As in Kantchelian et al. (Kantchelian et al., 2016a), we assume an attacker A b that is capable of modifying a given instance x into a perturbed instance x such that the L 0norm of the perturbation is smaller than the attacker's budget b, i.e., x − x 0 ≤ b. Therefore, attacker A b can perturb the instance x by modifying at most b features, without any constraint on how much a given feature can be altered. Indeed, a very small b is sufficient to achieve successful attacks. Su et al. (Su et al., 2019) show that with a one-pixel attack, i.e., with b = 1, it is possible to fool a complex deep neural network as VGG16 (Simonyan & Zisserman, 2014) and decrease its accuracy to a poor 16%.
Given an instance x ∈ X , we denote by A b (x) the set of all the perturbed instances the attacker may generate: Finally, we can define the accuracy under attack that an attacker A b aims to minimize. Given the test set D test , be the set of the perturbed instances that are not classified correctly by T . We can finally define the accuracy under attack as:

Our algorithm
In the following we propose a novel training strategy that produces a forest T where the majority of its trees are not affected by the attacker A b .
Feature Partitioning. Given a set partition P of the feature set F and an attacker A b that decided to corrupt the set of features B ⊆ F, with |B| ≤ b, we can easily compute the number of sets in P overlapping with B as: 1 We call P robust if the majority of its sets cannot be impacted by the attacker A b , i.e., if the following property holds: When |B| ≤ b, it is straightforward to show that this property is surely satisfied if |P| ≥ 2b + 1. Consider the worst case: at most b distinct subsets of P can have an overlap with B, leaving other ≥ b + 1 subset of P unaffected. Hereinafter, we consider only robust feature partitions P where |P| = 2b + 1.
Robust forest. Let's consider a forest T that, given an attacker A b , is built by exploiting a robust feature partition P as follows.
Let D be a set of training instances x ∈ X , and P be a robust partition of its feature space F. Given P ∈ P, we call π P (D) the projection of D on the feature set P , i.e., the dataset obtained from D by discarding those features not included in P . Given a robust feature partitioning P, it is thus possible to build a robust forest by training 2b + 1 trees independently on the 2b + 1 projections π P (D), with P ∈ P.
The algorithm sketched above achieves what we formally define as robustness.
Definition 1 (Robust Forest) Given an attacker A b , we say that a forest T is robust if the majority of its trees is not affected by A b for any of its attacks: It is straightforward to show that if a forest is built on the basis of a robust feature partitioning P as described above, then, at most b of its 2b + 1 trees can be affected by the attacker.
1 1[e] equals 1 if expression e is true and 0 otherwise.

Algorithm 1 FPF
In the best case scenario where each t ∈ T is perfectly accurate, the above robustness property provides that, in presence of attacks, only a minority of trees provides an incorrect prediction. Clearly, this scenario is unlikely, and therefore we discuss below how to strengthen the accuracy of T .
Note that, the above definition and training strategy trivially generalizes to any ensemble learning algorithm.
Increasing the accuracy of a robust forest. The above definition does not provide any guarantee on the accuracy of the full forest T , which clearly depends on the accuracy of its single trees. Yet, the more accurate the trees t ∈ T , the more likely the forest T is accurate under attack.
The accuracy of single trees depends on the feature partitioning P. The larger |P| the smaller is the number of features each tree can be trained on. To increase the accuracy of a robust forest T , we equi-partition F across P so as to have |P | ≥ |F | 2b+1 for all P ∈ P. Clearly, as the attacker's power b increases, we require to partition F in to a larger number of subsets, and for these to be effective we need the dataset to have a larger number of high quality features. Note that this is true for every learning algorithm: if the attacker can perturb at will up to b features, it is necessary to have more than b high quality features to train an accurate model.
In addition, a specific partitioning P may only be suboptimal as there may be multiple way of partitioning F so as to achieve features subsets with high predictive power. Along the lines of ensemble training, we use multiple feature partitionings and join together the resulting robust forests in a single decision tree ensemble.
Finally, we can sketch our algorithm FPF to train a forest T aimed to be robust against an attacker A b that can perturb at most b features. The algorithm, shown in Alg. 1, iterates a number r of user-defined rounds. During each round i, the algorithm generates a random feature partitioning P i of the features F present in the given training dataset D. The feature set F is randomly and evenly split into k = 2b + 1 disjoint subsets, and a new decision tree is trained on each of the dataset projections π P (D) for every feature subset P ∈ P i . The resulting 2b + i trees form a tree ensemble T i . We use an accept condition to filter out those T i that would not strengthen the final forest. For instance, it might be the case that the some partitions in P i do not contain sufficiently predictive features to train accurate trees. In this work, we use a simple acceptance criteria according to which a T i is accepted if its accuracy is larger than naïvly predicting the dataset's majority class. In this case, the trees of T i are added to the forest T . Eventually the returned forest contains a total of r(2b + 1) trees.
Proposition 1 (Robustness of FPF) The forest T built by the algorithm FPF is robust against and attacker A b as the majority of its tree is not affected by A b for any of its attacks.
At each round FPF trains a set of 2b + 1 trees, which, as discussed above, is robust because at most b of its trees can be affected by A b . After the same reasoning applied to r rounds, FPF builds a forest T of r(2b + 1) trees of which at most rb can be affected by A b , leaving a majority of r(b + 1) trees unaltered.

Evaluation and certification of tree-based models
Evaluating the accuracy of a model in presence of an attacker is a difficult and computationally expensive task. This is due to the possibly large size of A b (x) for some x ∈ X and to the number of interactions among trees in a forest. Chen et al., 2019b show that verifying the robustness of a forest T with at most l leaves per tree has cost min{O(l |T | ), O((2|T |l) |F | )} assuming a L ∞ -norm attacker. Kantchelian et al., 2016a prove that in case of a L 0 -norm attacker, as in this work, the problem of finding a successful attack is NP-complete.
Below we first provide an expensive brute-force strategy for evaluating the accuracy under L 0 -norm attack, then we show that the evaluation problem can be reduced to a maximum coverage problem and we propose a few very efficient heuristic strategies that can be used both to reduce the cost of the brute-force strategy and to provide a lower-bound certification for a tree-ensemble model on a given dataset.

Brute-force evaluation
Given an instance x ∈ X , the brute-force evaluation of a forest T consists in generating all the possible perturbations an attacker A b is capable of to find whether there exists The size of A b (x) is infinite, but we can limit its enumeration to the set of attacks that are relevant for the given forest T , i.e., those attacks that can invert the outcome of a test in some internal nodes of trees in T . Recall that nodes in a tree are in the form x f ≤ v for some threshold v. Indeed, the thresholds used in the tree nodes induce a discretization of the input space X that we exploit as follows.
For any given feature f ∈ F, we define with V f the set of relevant thresholds as follows: The set V f includes all the thresholds that are associated to f in any node σ(f, v, t l , t r ) of any tree in T , plus the ∞ value that allows to traverse the right branch of the node with the largest threshold.
Given an attacker A b , the set of relevant perturbations is thus given by the cartesian product of sets V f for b different features. Let F b be the set of all subsets F ⊆ F having size at most b, we denote withÂ b (x) the set of such perturbations, formally defined as: We conclude that an attacker A b can successfully perturb an instance x against a forest T if there exists at least one This brute-force approach is very expensive, due to three factors: i) as b increases, the number of feature combinations F b increases; ii) as the number of trees and nodes grows, the number of threshold values associated with each feature increases; iii) for each perturbed instance x , the prediction T (x ) must be retrieved by traversing the given forest.

Attacking forest T as a Maximum Coverage
Problem.
Given a forest T and an input instance x the attacker A b aims at finding those perturbations that lead the majority of the trees to a wrong prediction. Indeed, as some trees of T might give incorrect predictions before the attack, it could be sufficient to harm less than T /2 trees.
We now introduce some simplifying assumptions and then show that finding an attack can be reduced the the maximum coverage problem. First, we assume that if a tree in T provides a wrong prediction before the attack, then its prediction will be incorrect also after the attack. Second, we assume that if a tree uses a feature f for its prediction over x then attacking f causes the tree to generate a wrong prediction.
Note that these assumptions are very conservative. An incorrect tree may, by chance, provide a good prediction after the attack. More importantly, modifying a feature f does not necessarily flips the test performed on every node using that feature and leads to a wrong prediction. These assumptions allow to clear formulation of the problem, and our experiments show that the error introduced is interestingly small.
Under the above assumptions, the aim of the attacker A b is to find a set of b features that are used by the largest number of distinct trees. Let's denote with S f the set of trees in T using feature f , and let S be S = f ∈F S f . Then the most successful attack is given by the subset S ⊆ S with |S | ≤ b such that | ∪ Si∈S | is maximized. The thoughtful reader has surely recognized that this formulation of our problem is nothing else that an instance of the maximum coverage problem.
Note that algorithm FPF limits the use of a single feature to at most r trees (the number of rounds) out of a total of r(2b + 1) trees, to some extent, making it more difficult for the attacker to find a cover.
Before attacking the maximum coverage problem we make a few improvements to provide a more accurate definition of sets S f .
First, we do not consider trees in T with an incorrect prediction before the attack.
Second, we note that a tree may include a feature f in some of its nodes, but these nodes may never be traversed during the evaluation of an instance x. Therefore we say that a tree t belongs to S f for an instance x only if the traversal path of x in t includes a node with a test on feature f .
Last, among the nodes along the traversal path of instance x before the attack, we distinguish between nodes where the test x f ≤ v is true, and nodes where the test is false. In the former case, the attacker must increase the value of x f to affect the traversal path, while in the latter case x f should be decreased. Clearly, these two attacks cannot coexist. Therefore, we define sets S + f and S − f , where we include a tree t in S + f if feature f is in the traversal path of x with a true outcome of the test x f ≤ v, and in S − f otherwise. We thus achieved a more accurate modeling of when an attack can actually affect the final prediction. This also reduces size of sets S + f , S − f decreasing the risk of overestimating the effect of an attack.
We can finally summarize the maximum coverage problem as follows. Given the set S = f ∈F S + f ∪ S − f , the most successful attack is given by the subset S ⊆ S with |S | ≤ b under the constraint that if S + f and S − f cannot be included together in S that maximizes the cover of the (correct) trees in the forest T .
We say that there is no possible attack if the number of trees in the largest cover plus the number of trees providing a wrong prediction before attack is the majority.
Note that however, this is a conservative estimate as the attacker my modify all the features identified by the maximum cover without being able to affect the final forest prediction. In the following, we use this conservative set cover formulation to define heuristic strategies that can be used to provide a lower-bound to the accuracy of a forest on a given dataset, or to speed-up the brute-force approach by discarding those instances for which a sufficiently large cover does not exist.

Fast Accuracy Lower Bound
Given an attacker A b , a forest T and an instance x, we denote with ω the number of trees providing a wrong prediction and with S the elements of the set cover formulation as defined above.
It easy to provide an upper bound to the size of the largest cover as follows. First sort the sets in S according to their size. Then, we select the b largest sets by enforcing the constraint that S + f and S − f cannot be considered together. Let S F LB be the covering sets selected as above, we know that the size of the largest cover cannot be larger than S F LB = Si∈S F LB |S i |. Therefore, we can pessimistically estimate the number of incorrect trees under attack as ω + S F LB , which leads to an incorrect prediction over x if larger than |T |/2. By applying the same algorithm to every correctly classified instance x ∈ D, we denote with E F LB the set of instances for which ω + S F LB ≥ |T |/2, obtaining a lower bound on the accuracy of the forest T on dataset D:

Exhaustive Accuracy Lower Bound
In order to improve over the fast lower bound, we also consider a more expensive option where all the possible covers are considered and the maximal is eventually found.
Given an attacker A b , let S the elements of the set cover formulation as previously introduced. The exhaustive search consists in enumerating all the possible subset S ⊆ S of size at most b, and then for each of them we compute the corresponding cover | Si∈S S i |. Let S ELB be the maximum of such covers, we can define E ELB the set of instances in D for which ω + S ELB ≥ |T |/2, and introduce following accuracy lower bound: When the lower-bound information is not considered sufficient, we propose to exploit the above strategies in the following way. Given an instance x and an attacker A b , we proceed as follows: 1. first compute S F LB : if the cover is not sufficiently large, then the instance cannot be attacked; otherwise 2. compute S ELB : if the cover is not sufficiently large, then the instance cannot be attacked; otherwise 3. use the brute-force method to check the existence of a successful attack.
Experimental results show that the above cascading strategy is able to strongly reduce the number of instances in a given dataset D for which the brute-force approach is required.

Experimental settings
In Table 1 we report the main characteristics of the datasets used in the experimental evaluation, including the number of features, the number of top relevant features measured as those contributing to the 90% of the feature importance in a Random Forest, and the relative size of the majority class. Datasets, ranging from small to mid-sized, are associated to a binary classification task, and they are commonly used in adversarial learning literature. 2 We compared our proposed algorithm FPF against the following tree-ensemble competitors: • Random Forest (RF) (Breiman, 2001), which is known to have some good level of robustness thanks to the ensembling of several decision trees. As in the original algorithm, each tree is trained on a bootstrap sample of the dataset, with no constraints on the number of leaves, and with feature sampling of size |F| at each node.
• Random Subspace method (RSM) (Ho, 1998) which was successfully exploited by Biggio et al., 2010. In this case, each tree is trained on a projection of the original dataset on a subset of its features. Validation experiments showed best results with 20% feature sampling.
Hyper-parameter tuning on a validation set showed that both FPF and RSM perform best when limiting the number of leaves to 8. Similarly, we limited the number of trees to 100 for datasets BREAST CANCER and SPAM BASE, and to 300 for dataset WINE. We observed that WINE requires a larger forest due to its limited number of features. All results were computed on a randomly selected test set sized 1/3 of the original dataset. Hereinafter, we use b to address the training parameter of FPF, and we use k for the attack strength of attacker A k .
The following experimental evaluation aims at answering the following research questions: • is FPF able to train more robust model?
• how is FPF affected by the number of rounds r and the expected attacker power b?
• how accurate are the proposed bound, and can we exploit them to efficiently analise models on larger attacker budgets k?

Robustness Analysis
In Tables 2,3,4 we report the accuracy of FPF, RF and RSM against an attacker that can modify 0, 1 or 2 features. Indeed, we report the case of no attacks for a more complete evaluation, but in an adversarial scenario the attacker has no reason for not conducting an attack. For FPF we evaluate its robustness on varying the number of rounds r and the In all datasets, RF performs best or second best in absence of attacks, but its accuracy drops significantly under attack. The perturbation of one single feature is sufficient to harm the model, with a loss of about 40 points in accuracy on SPAM BASE, and when attacking two features the accuracy drops under 20% for SPAM BASE and WINE datasets. As previously discussed, the L 0 -norm attack we are tackling in this work is indeed very powerful and sufficient to fool a very accurate and effective random forest model.
The RSM model provides good performance in absence of attacks, meaning that the dataset projection is not disadvantageous, and it is much more robust than RF in present of attacks. However, when attacking two features, RSM exhibits a drop of 10 to 20 points in accuracy.
The proposed FPF algorithm can provide the best robustness in all attack scenarios. When increasing the defensive b, the accuracy slightly decreases in absence of attacks, but always increases under attack, suggesting that large b can be useful even with weaker attacks. The performance of FPF is similar to that of RSM when only one feature is attacked, but when two features are attacked FPF shows significantly better performance than RSM with a 10% relative improvement on both SPAM BASE and WINE datasets.
We conclude that FPF is able to outperform state-of-the-art competitors especially with a stronger attacker.

Sensitivity analysis
In Table 5 we evaluate the sensitivity of FPF w.r.t. the number of rounds r on the BREAST CANCER dataset. Similar results were observed for the other datasets. As expected, the ensembling strategy improves the accuracy of the resulting model, and accuracy increases when increasing the number of rounds until a plateau is reached. We can conclude that using a large number of rounds r improves the robustness of the trained model.
The impact of b is evaluated in Table 6. In this case a trade-off is apparent. Even if increasing b is expected to increase robustness against stronger attacks, at the same time it reduces the number of features that can be exploited when training a single tree. This harms the performance of the whole ensemble especially with larger values of b or when the dataset contains a limited number of informative features. The results in Table 6 show that when using a limited number of trees, accuracy increases with b. But when the forest is sufficiently large to exploit the ensembling benefits, then the limited accuracy of singles trees plays an important role making not rewarding the use of larger values of b. For instance, this is the case of the BREAST CANCER dataset, where we identified only 15 informative features which are difficult to partition in 2b + 1 sets for b=5.
We conclude that, while it is beneficial to increase the number of rounds r, it is not always a good strategy to increase b, unless the dataset we have at hand has a sufficiently large set of informative features compared to the attacker strength.

Lower Bound Analysis
In   estimate of the accuracy under attack. With attacks on 2 features ELB performs better with small b. In general, both models predict an accuracy very close to the real calculated one. This means that, the proposed lower bounds can certify the non attackability of a large portion of instances without the cost of the brute-force exploration.
In Figure 1 we show the accuracy lower bound computed on different dataset and on varying b and the attacker A.
The computational efficiency of the proposed bound allows to compute the minimum accuracy for large values of b and large attacker budgets. The figure shows how larger values of b allow to sustain a larger attacker strength. Of course, when the attacker becomes too strong compared with the number of relevant features in the dataset, then the accuracy of FPF drops. We include in our analysis three dataset generated from MINST by isolating instances of two digits. The lower bounds allows us to state that a reasonable accuracy can be achieved also when attacking more than 20 features. The weakest dataset is that of digits 5 vs. 6, where clearly the attacker requires to change fewer pixels to generate a misclassification.

Conclusion
This paper proposes FPF, a new algorithm to generate forests of decision trees, based on random equi-partitioning of the feature set, along with a projections of the dataset on these partitions before training each single decision tree. The method is proven to be resilient against evasion attacks, and, more importantly, we are able to certificate in a very efficient way that, given a test dataset, some of the instances cannot be attacked at all, thus avoiding the costly computation of all the possible evasion attacks.
The experimental evaluation, carried out on publicly available datasets, is promising and ouperforms the main direct competitor, based on ensembles build on random sampling of the features. Moreover, we also show that our certified lower bounds on the accuracy under attack are a very close approximation of the actual accuracy.