Evaluating the accuracy of a model in the presence of an attacker is a difficult and computationally expensive task. This is due to the possibly large size of Ab(x) and to the number of interactions among trees in a forest. Below, we first discuss an expensive brute-force strategy for certifying the accuracy under L0-norm attacks of a tree ensemble FPF on a given test dataset. Then, we show that the existence of an attack over a given dataset instance can be reduced to the existence of a solution for the partial set coverage problem [36]. We use this result to devise a strategy aimed at reducing the cost of the brute-force strategy and providing an efficient lower bound certification of the accuracy under attack.
4.1 Brute-force evaluation
Given an instance \(\boldsymbol {x} \in {\mathcal {X}}\), the brute-force evaluation of a forest \({\mathcal {T}}\) consists in generating all the possible perturbations an attacker Ab can operate to find whether there exists x′∈Ab(x) such that \(\mathcal {T}(\boldsymbol {x}) \neq \mathcal {T}(\boldsymbol {x^{\prime }})\). Specifically, for a given test dataset \({\mathcal {D}}\) and a given tree ensemble \({\mathcal {T}}\), the brute-force evaluation can exactly compute the accuracy under attack according to Eq. 1.
The size of Ab(x) is infinite, but we can limit its enumeration to the set of attacks that are relevant for the given forest \({\mathcal {T}}\), i.e., those attacks that can invert the outcome of a test in some internal nodes of trees in \({\mathcal {T}}\). This set of relevant attacks, denoted with \(\hat {A}_{b}(\boldsymbol {x}|{\mathcal {T}})\), can be computed as follows.
Recall that nodes in a tree are in the form xf≤v for some threshold v. Indeed, the thresholds used in the tree nodes induce a discretization of the input space \({\mathcal {X}}\) that we exploit as follows. For any given feature \(f\in {\mathcal {F}}\), we define with \({\mathcal {V}}_{f}\) the set of relevant thresholds as follows:
$$\begin{aligned} {\mathcal{V}}_{f} = \{ v ~|~ \exists \sigma({f,v,t_{l},t_{r}}) \in t, t \in {\mathcal{T}}\} \cup \{\infty\} \end{aligned} $$
The set \({\mathcal {V}}_{f}\) includes all the thresholds that are associated with f in any node σ(f,v,tl,tr) of any tree in \({\mathcal {T}}\), plus the infinity value that allows the algorithm to also include the attack that traverses the right branch of the node with the largest threshold.
An attacker Ab can perturb any subset of features \(F \subseteq {\mathcal {F}}\) such that |F|≤b, and therefore, the set of relevant perturbations the attacker may operate is described by the Cartesian product \(\phantom {\dot {i}\!}{\mathcal {V}}_{f_{1}} \times \ldots \times {\mathcal {V}}_{f_{b}}\), with fi∈F. We denote by \(\hat {A}_{b}(\boldsymbol {x}|{\mathcal {T}},F)\) the set of relevant attacks on the given set of features F, i.e., each perturbed vector \(\boldsymbol {x^{\prime }} \in \hat {A}_{b}(\boldsymbol {x}|{\mathcal {T}},F)\) satisfies the following:
$$\begin{aligned} x^{\prime}_{f} = \left\{\begin{array}{ll} x_{f} & \text{if}\ f \not\in F,\\ v \in \mathcal{V}_{f} & \text{if}\ f \in F. \end{array}\right. \end{aligned} $$
In conclusion, the set of relevant attacks is:
$$\begin{aligned} \hat{A}_{b}(\boldsymbol{x}|{\mathcal{T}}) = \bigcup\limits_{\substack{\forall F \subseteq {\mathcal{F}}\\ |F| = b}} \hat{A}_{b}(\boldsymbol{x}|{\mathcal{T}},F) \end{aligned} $$
An attacker Ab can successfully perturb an instance \((\boldsymbol {x}, y) \in {\mathcal {D}}\) against a forest \({\mathcal {T}}\) if there exists at least one \(\boldsymbol {x^{\prime }} \in \hat {A}_{b}(\boldsymbol {x}|{\mathcal {T}})\) that induces \({\mathcal {T}}\) to misclassify. We can thus exactly identify the portion of the test dataset that Ab can successfully perturb by using the discretized attacks in \(\hat {A}_{b}(\boldsymbol {x}|{\mathcal {T}})\), i.e., \(\widehat {{\mathcal {D}}} = \left \{(\boldsymbol {x}, y) \in {\mathcal {D}} \setminus \overline {{\mathcal {D}}} \; | \; \exists \boldsymbol {x}^{\prime } \in \hat {A}_{b}(\boldsymbol {x}|{\mathcal {T}}), \ y \cdot {\mathcal {T}}(\boldsymbol {x}^{\prime }) < 0\right \}\), where \(\overline {{\mathcal {D}}}\) includes the test instances misclassified by \({\mathcal {T}}\) in the absence of attack. Finally, we can exactly compute the accuracy under attack according to Eq. 1. This brute-force approach is very expensive, due to three factors: (i) as b increases, the number of possible feature combinations \(F \subset {\mathcal {F}}\) with |F|=b increases; (ii) as the number of trees and nodes grows, the number of threshold values associated with each feature increases; and (iii) for each perturbed instance x′, the prediction \(\mathcal {T}(\boldsymbol {x^{\prime }})\) must be computed by traversing the given forest.
4.2 Attacking forest \({\mathcal {T}}\) as a partial set coverage problem
We now introduce some simplifying worst-case assumptions and then show that an effective attack exists if it can be reduced to a solution for the partial set coverage problem. First, we assume that if a tree in \({\mathcal {T}}\) provides a wrong prediction before the attack, then its prediction will be incorrect also after the attack. Second, we assume that if a tree uses a feature f for its prediction over x, then attacking f causes the tree to generate a wrong prediction. Note that these assumptions are pessimistic from the point of view of a defender. Indeed, modifying a feature f does not necessarily flip the test performed on every node using that feature and, even if it was the case, tests over other features may suffice to avoid a wrong prediction.
Given an instance \((\boldsymbol {x}, y) \in {\mathcal {D}}\) and a forest \({\mathcal {T}}\), let C be the set of all correct trees \(t \in {\mathcal {T}}\) over x, i.e., \(C = \{t \in {\mathcal {T}} ~|~ t(\boldsymbol {x}) \cdot y > 0\}\). Let \(\overline {C} = {\mathcal {T}} \setminus C\) be the set of all trees providing a wrong prediction over x. The goal of the attacker is to force a sufficient number of trees to misclassify x such that the majority of trees are incorrect. The minimum number of trees the attacker must fool is δ such that \(|\overline {C}|+\delta = \lceil |{\mathcal {T}}|/2 \rceil \). It turns out that in a robust forest of r(2b+1) trees, trained with FPF, where the attacker can affect at most rb trees, it is impossible for the attacker to fool the forest if \(|\overline {C}|<\lceil r/2 \rceil \). This means that a forest can be robust even if some of its trees are not correct in the absence of an attacker.
Let Sf⊆C be the set of all the correct trees that use feature f and let \(\Sigma = \{S_{f}\}_{f\in {\mathcal {F}}}\) be the collection of all Sf. In order for Ab to successfully attack \({\mathcal {T}}\) over x, there must exist a subset S∗⊆Σ, with |S∗|≤b since Ab can perturb a maximum of b features, such that \(\phantom {\dot {i}\!}|\overline {C}| + |\bigcup _{S_{f} \in S^{*}} S_{f}| \ge \lceil |{\mathcal {T}}|/2 \rceil \), or, equivalently, such that \(\phantom {\dot {i}\!}|\bigcup _{S_{f} \in S^{*}} S_{f}| \ge \delta \) with \(\delta = \lceil |{\mathcal {T}}|/2 \rceil - |\overline {C}|\). The thoughtful reader has surely recognized that this formulation of our problem is nothing else that an instance of the partial set coverage problem, where given the set of trees C and the collection Σ⊆2C, we have to select up to b sets in Σ that cover at least δ trees.
Before attacking the partial set coverage problem, we make a few improvements to provide a stricter definition of sets Sf in relation to our scenario. First, we note that a tree may include a feature f in some of its nodes, but these nodes may never be traversed during the evaluation of an instance x. Therefore, we say that a correct tree t belongs to Sf for an instance x only if the traversal path of x in t includes a node with a test on feature f. This already reduces the size of each Sf.
Then, among the nodes along the traversal path of instance x before the attack, we can further distinguish between nodes where the test xf≤v is true, and nodes where the test is false. In the former case, the attacker must increase the value of xf to affect the traversal path, while in the latter case the attacker must decrease xf. Clearly, these two attacks cannot coexist.
Therefore, we define sets \(S_{f}^{+}\) and \(S_{f}^{-}\) as follows. Given a correct tree t∈C, we include t in \(S_{f}^{+}\) if the traversal path of x in t includes a node with a test xf≤v on feature f and this test gives a true outcome. Otherwise, if the outcome of this test turns out to be false, we include t in \(S_{f}^{-}\). This method allows us to achieve a more accurate modeling of when an attack can actually affect the final prediction. This also reduces the size of sets in Σ and decreases the risk of overestimating the effect of an attack. We can finally conclude the relation with the partial set cover problem as follows.
Proposition 2
(Partial set coverage as a necessary condition for successful attacks.) Given \((\boldsymbol {x}, y) \in {\mathcal {D}}\), where \({\mathcal {T}}(\boldsymbol {x}) \cdot y > 0\), a necessary condition for the existence of a successful attack x′∈Ab(x) such that \({\mathcal {T}}(\boldsymbol {x}^{\prime }) \cdot y < 0\), is that there exists a solution for the partial set coverage problem, stated as follows:
Given the set system (C,Σ), where C is the finite set of correct trees for x, where Σ⊆2C with \(\Sigma = \{S_{f}^{+}\}_{f\in {\mathcal {F}}} \cup \{S_{f}^{-}\}_{f\in {\mathcal {F}}}\), and given integer b and a constant \(\delta = \lceil |{\mathcal {T}}|/2 \rceil - |\overline {C}|\), the goal is to find a sub-collection S∗⊆Σ, where \(\phantom {\dot {i}\!}|\bigcup _{S \in S^{*}} S| \ge \delta \), with the constraints that |S∗|≤b and, \(\forall f \in {\mathcal {F}}\), if \(S_{f}^{+} \in S^{*}\) (\(S_{f}^{-} \in S^{*}\)) then \(S_{f}^{-} \not \in S^{*}\) (\(S_{f}^{+} \not \in S^{*}\)).
Proof
We show that if there exists x′∈Ab(x) such that \({\mathcal {T}}(\boldsymbol {x}^{\prime }) \cdot y < 0\), then there exists S∗⊆Σ, where \(\phantom {\dot {i}\!}|\bigcup _{S \in S^{*}} S| \ge \delta, |S^{*}| \le b\), and \(S_{f}^{+}\) and \(S_{f}^{-}\) are mutually exclusive in S∗. Given x′∈Ab(x), we say that for any attacked feature f either the corresponding set \(S_{f}^{+}\) belongs to S∗ if \(x^{\prime }_{f}-x_{f}>0\) (corrupted by increment) or \(S_{f}^{-}\) belongs to S∗ if \(x^{\prime }_{f}-x_{f}<0\) (corrupted by decrement). Clearly, it holds that |S∗|≤b, and the sets \(S_{f}^{+}\) and \(S_{f}^{-}\) are mutually exclusive in S∗. Let \(\overline {C^{\prime }}\) be the set of (formerly correct) trees corrupted by the successful attack x′, then it holds that \(|\overline {C^{\prime }}| \geq \delta \). By construction, any tree \(t\in \overline {C^{\prime }}\) belongs to either \(S_{f}^{+}\) or \(S_{f}^{-}\) included in S∗. Therefore, it holds that \(\phantom {\dot {i}\!}|\overline {C^{\prime }}| \leq |\bigcup _{S \in S^{*}} S|\), which implies \(\phantom {\dot {i}\!} |\bigcup _{S \in S^{*}} S| \geq \delta \). □
Note that Proposition 2 states that the existence of a solution S∗ for our partial set cover problem is only a necessary (not sufficient) condition for the attack. Thus, if S∗ exists, we cannot say that \({\mathcal {T}}\) can be fooled for sure, as the attacker might modify all the features identified by the cover without being able to affect the final forest prediction. However, we know that if a solution S∗ does not exist, then \({\mathcal {T}}\) is robust on the given instance x.
In the following, we use this result to compute an upper bound of the size of \(\widehat {{\mathcal {D}}}\), the set of all instances in the test dataset that can be attacked. On the one hand, this allows us to lower bound the accuracy under attack \(\phantom {\dot {i}\!}ACC_{A_{b}}\) (see Eq. 1); on the other hand, this result makes it possible to speed up the exact computation of \(\phantom {\dot {i}\!}ACC_{A_{b}}\) by employing the brute-force approach only for those instances for which a sufficiently large set cover does not exist.
4.3 Fast accuracy lower bound
The method discussed in this section computes an overestimate of the size of the partial set cover S∗⊆Σ, for the problem stated in Proposition 2.
If we consider the b largest sets in Σ, it is clear that the cardinality of the union of the sets within S∗ is smaller or equal to the sum of the cardinalities of such b largest sets (inclusion-exclusion principle). We improve this trivial upper bound by considering that the two sets \(S_{f}^{+}\) and \(S_{f}^{-}\) cannot be included together in a potential cover.
We thus define the fast lower bound set SFLB to be the set of the b largest sets in Σ after enforcing the constraint that for a given feature f only the largest between \(S_{f}^{+}\) and \(S_{f}^{-}\) is considered.
We can conclude that if \(\phantom {\dot {i}\!}\sum _{S\in S_{{FLB}}} |S| < \delta \), then a suitable partial cover cannot exist and therefore the forest \({\mathcal {T}}\) cannot be attacked on x.
Therefore, we define the set \(\widehat {{\mathcal {D}}}_{{FLB}}\) of attackable instances according to the fast lower bound method as follows. For each correctly classified instance \((\boldsymbol {x},y) \in {\mathcal {D}}\), we build the partial coverage problem according to Proposition 2, and iff\(\phantom {\dot {i}\!}\sum _{S\in S_{{FLB}}} |S| \geq \delta \), then we include the instance (x,y) in \(\widehat {{\mathcal {D}}}_{{FLB}}\). Since it holds that \(|\widehat {{\mathcal {D}}}_{{FLB}}|\geq |\widehat {{\mathcal {D}}}|\), we define the fast lower bound accuracy as:
$$\begin{aligned} ACC_{A_{b}}^{FLB} = 1- \frac{|\overline{{\mathcal{D}}}| + |\widehat{{\mathcal{D}}}_{{FLB}}| }{|{\mathcal{D}}|} \leq ACC_{A_{b}}. \end{aligned} $$
4.4 Exhaustive accuracy lower bound
In order to improve over the fast lower bound, we also consider a more expensive option, where all the possible covers are considered, still respecting the constraint that any \(S_{f}^{+}\) and \(S_{f}^{-}\) are mutually exclusive. We evaluate all the possible covers S†⊂2Σ,|S†|≤b, and we call exhaustive lower bound cover, denoted with SELB, the first cover found such that \(\phantom {\dot {i}\!}|\bigcup _{A \in S_{{ELB}}} A| \ge \delta \).
By applying the same procedure to every correctly classified instance (x,y) in the dataset, we identify the set of instances \(\widehat {{\mathcal {D}}}_{{ELB}}\) for which there exists an exhaustive lower bound cover SELB that solves the problem in Proposition 2.
Note that \(|\widehat {{\mathcal {D}}}| \le |\widehat {{\mathcal {D}}}_{{ELB}}| \le |\widehat {{\mathcal {D}}}_{{FLB}}|\), and thus, we use this method to compute another lower bound for the accuracy of \({\mathcal {T}}\) on the test dataset \({\mathcal {D}}\):
$$\begin{aligned} ACC_{A_{b}}^{ELB} = 1 - \frac{|\overline{{\mathcal{D}}}| + |\widehat{{\mathcal{D}}}_{{ELB}}|}{|{\mathcal{D}}|} \leq ACC_{A_{b}}, \end{aligned} $$
where the following relationship trivially holds:
$$\begin{aligned} ACC_{A_{b}}^{FLB} \leq ACC_{A_{b}}^{ELB} \leq ACC_{A_{b}}. \end{aligned} $$
This exhaustive lower bound search incurs into the exponential cost of enumerating the possible covers in 2Σ, but it improves over the brute-force attack, thanks to the cover-based formulation, by ignoring the relevant threshold values \({\mathcal {V}}_{f}\) each feature can be attacked to.
We recall that while it is true that \(|\widehat {{\mathcal {D}}}| \le |\widehat {{\mathcal {D}}}_{{ELB}}| \le |\widehat {{\mathcal {D}}}_{{FLB}}|\), we cannot claim that \(\widehat {{\mathcal {D}}} \subseteq \widehat {{\mathcal {D}}}_{{ELB}} \subseteq \widehat {{\mathcal {D}}}_{{FLB}}\). The above bounds can prove the non-existence of an actual cover, but they may not be used to find a successful attack strategy.
4.5 Cascading evaluation
Above, we presented two algorithms, FLB and ELB, that efficiently find an overapproximation of a cover of attacked trees which allows us to estimate the upper bound of the most harmful attack. Strategies have different costs: FLB requires to sort the candidate sets of the cover, while ELB performs an exhaustive search of all the possible subsets of Σ. Both methods are however much cheaper than brute-force evaluation.
When the lower bound information is not considered sufficient, in order to compute the actual accuracy under attack \(\phantom {\dot {i}\!}ACC_{A_{b}}\), we propose to exploit the following CASCADING strategy.
Given an instance x and an attacker Ab, we build the collection of sets of trees \(\Sigma = \{S_{f}^{+}\}_{f\in {\mathcal {F}}} \cup \{S_{f}^{-}\}_{f\in {\mathcal {F}}}\) and proceed as follows:
-
1
Compute SFLB⊆Σ: if \(\phantom {\dot {i}\!}\sum _{S \in S_{{FLB}}} |S| < \delta \), then no sufficiently large set cover exists, and therefore, the instance x cannot be attacked; otherwise
-
2
Search for a suitable cover SELB⊆Σ: if there is no SELB such that \(\phantom {\dot {i}\!}|\bigcup _{S \in S_{{ELB}}} S| \ge \delta \), then the instance x cannot be attacked; otherwise
-
3
Use the brute-force method to check the existence of a successful attack on x.
Experimental results show that the above cascading strategy is able to strongly reduce the number of instances in a given dataset \({\mathcal {D}}\) for which the brute-force approach is required.
4.6 Non-binary classification
While we leave to future work the design and evaluation of an algorithm for non-binary classification, we highlight that the proposed methodology can be easily generalized to a multi-class scenario, and we sketch below a basic certification methodology. The algorithms proposed so far aim at certifying the impossibility of the attacker of modifying a number of correct trees δ such that \(\delta \ge \lceil |{\mathcal {T}}|/2 \rceil - |\overline {C}|\), where \(\overline {C}\) is the set of trees wrongly classifying the given instance. In regard to a multi-class classification problem, given classes \({\mathcal {Y}}\) with \(|{\mathcal {Y}}|>2\), it is possible to verify robustness by running \(|{\mathcal {Y}}|-1\) certifications analogous to the binary case. Let us denote with Cc the set of trees classifying, before the attack, the given instance (x,y) as class c, \(c \in {\mathcal {Y}}\), thus being Cy the trees predicting the correct label y. For a given class \(c\in {\mathcal {Y}}\), the attacker aims at attacking the trees in \(\bigcup _{i\neq c} C_{i}\) so as to make c the new majority class. The best-case scenario from the point of the attacker is given by modifying the predictions of the trees in Cy, as in this case it is sufficient to attack δ=⌈(|Cy|−|Cc|)/2⌉ trees. If the attacker is not able to alter at least δ trees of the forest, then no successful attack is possible. To this end, we can exploit any of the set-cover base techniques proposed so far to verify the absence of any cover of size at least δ among the trees \(\bigcup _{i\neq c} C_{i}\). Such verification is to be repeated for each \(c \in {\mathcal {Y}}\).