In this section, we present the overall fuzzing algorithm. Our approach uses stochastic processes (i.e., Lévy flights as introduced in Section 3) in the input space to generate test cases. To steer the diffusivity of test case generation, we provide feedback regarding the quality of test cases (as defined in Section 4) to the test generation process in order to yield self-adaptive fuzzing.
We first prepend an example regarding the interplay between input space coverage and execution path coverage to motivate our fuzzing algorithm. Consider a program which processes inputs from an input space \(\mathcal {I}\). Our aim is to generate a subset \(\mathcal {I}' \subset \mathcal {I}\) of test cases (in finite amount of time) that yields maximal possible execution path coverage when processed by the target program. Further assume the program to reveal deep execution paths (covering long sequences of basic blocks) only for 3% of the inputs \(\mathcal {I}\), i.e., 97% of inputs are inappropriate test cases for fuzzing. Since we initially cannot predict which of the test cases reveals high quality (determined by e.g., the execution path length or the number of different executed basic blocks), one strategy to reach good code coverage would be black-box fuzzing, i.e., randomly generating test cases within \(\mathcal {I}\) hoping that we eventually hit some of the 3% high quality inputs. We could realize such an optimal search through input space with highly diffusive stochastic processes, i.e., Lévy flights as presented in Section 3.
As mentioned above, the Lévy flight hypotheses predicts an effective optimal search through input space due to their diffusivity properties. On the one hand, this diffusivity guarantees us reaching the 3% with very high probability. On the other hand, once we have reached input regions within the 3% of high quality test cases, the same diffusivity also guarantees us that we will leave them very efficiently. This is why we need to adapt the diffusivity of the stochastic process according to the quality of the currently generated test cases. If the currently generated test cases reveal high path coverage, the Lévy flight should be localized in the sense that it reduces its diffusivity to explore nearby inputs. In turn, if the currently generated test cases reveal only little coverage, diffusivity should increase in order to widen the search for more suitable input regions. By instrumenting the binary under test and applying the quality evaluation of test cases introduced in Section 4, we are able to feedback coverage information of currently explored input regions to the test case generation algorithm. In the following, we construct a self-adaptive fuzzing strategy that automatically expands its search when reaching low-quality input regions and focuses exploration when having the feedback of good code coverage.
Initial seed
We start with an initial non-empty set of input seeds \(X_{0} \subset \mathcal {I}\). As described in Section 3, we assume the elements x∈X
0 to be bit strings of length N and divide each of them into n segments of size \(m=\frac {N}{n}\) (assuming without loss of generality that N is a multiple of n). Practically, the input seeds X
0 can be arbitrary files provided manually by the tester; they may not even be valid with regard to the input format of the program under test. We further set two initial diffusive parameters 0<α
1,α
2<2 and an initial offset q
0∈{1,…,n}.
Test case generation
The test case generation step takes as input a test case x
0, diffusion parameters α
1 and α
2, an offset number q
0∈{1,…,n}, and a natural number \(k_{\text {gen}}\in \mathbb {N}\) of maximal test cases to be generated. It outputs a set X
gen of k
gen new test cases \(X_{\text {gen}} \in \mathcal {I}\).
As introduced in Section 3, we refer to the offset space as \(\mathcal {O}=\{1,\ldots,n\}\) and to the segment space as \(\mathcal {S}=\{1,\ldots,2^{m}\}\). We denote with x
0(q
0) the segment value of input x
0 at offset q
0. For the Lévy flights
$$ \begin{aligned} {L^{1}_{t}}:\ \Omega_{1} \rightarrow \ \mathcal{O} \end{aligned} $$
(15)
in the offsets \(\mathcal {O}\) and
$$ \begin{aligned} {L^{2}_{t}}:\ \Omega_{2} \rightarrow \ \mathcal{S} \end{aligned} $$
(16)
in \(\mathcal {S}\) with flight lengths l distributed according to the power law
$$ \begin{aligned} p_{j}(l) \sim |l|^{-1-\alpha_{j}},\ j=1,2, \end{aligned} $$
(17)
we set the initial conditions
$$\begin{array}{*{20}l} {L^{1}_{0}}&=q_{0} \quad \text{and} \end{array} $$
(18)
$$\begin{array}{*{20}l} {L^{2}_{0}}&=x_{0}(q_{0}), \end{array} $$
(19)
respectively. Let R(x
0,q
0,s
0) denote the bit string generated by replacing the value x
0(q
0) of bit string x
0 at offset q
0 by a new value s
0. Both stochastic processes \(({L^{1}_{t}})_{t \in \mathbb {N}}\) and \(({L^{2}_{t}})_{t \in \mathbb {N}}\) are then simulated for k
gen steps to generate the k
gen new test cases
$$ \begin{aligned} x_{1}:=R\left(x_{0}, {L^{1}_{0}}, {L^{2}_{1}} \right) \end{aligned} $$
(20)
$$ \begin{aligned} &x_{2}:=R\left(x_{1}, {L^{1}_{1}}, {L^{2}_{2}} \right)\\ &\ \ \ \ \ \ \ \ldots \\ \end{aligned} $$
(21)
$$ \begin{aligned} &x_{t+1}:=R\left(x_{t}, {L^{1}_{t}}, L^{2}_{t+1} \right) \\ &\ \ \ \ \ \ \ \ldots \\ \end{aligned} $$
(22)
$$ \begin{aligned} &x_{k_{\text{gen}}}:=R\left(x_{k_{\text{gen}}-1}, L^{1}_{k_{\text{gen}}-1}, L^{2}_{k_{\text{gen}}} \right). \end{aligned} $$
(23)
For simplicity of notation in this definition, we identify the values \({L^{j}_{t}}\) with their respective binary representations (as bit string). In words, we start with the initial test case x
0 and replace its segment content at offset \({L^{1}_{0}}=q_{0}\) with the new value \({L^{2}_{1}}\), which is the value in segment space \(\mathcal {S}=\{1,\ldots,2^{m}\}\) that we get when taking a first random step with the Lévy flight \(({L^{2}_{t}})_{t \in \mathbb {N}}\). This yields x
1. We get the next test case x
2 by considering the just generated x
1, setting the offset according to \(({L^{2}_{t}})_{t \in \mathbb {N}}\), and then replacing the content of the segment indicated by this offset by a new segment value chosen by \(({L^{2}_{t}})_{t \in \mathbb {N}}\). We proceed with this algorithm until the set
$$\begin{array}{*{20}l} X_{\text{gen}}:=\{x_{1},\ldots,x_{k_{\text{gen}}}\} \end{array} $$
(24)
of k
gen new test cases is generated.
Quality evaluation
The quality evaluation step takes as input two sets of test cases \(X_{\text {gen}}, \mathcal {I}' \subset \mathcal {I}\) and outputs a quality rating \(\tilde {E}(X_{\text {gen}}, \mathcal {I}')\) of X
gen with respect to \(\mathcal {I}'\). We already defined the number \(E(x_{0}, \mathcal {I}')\) of newly discovered basic blocks for a single test case x
0 with respect to a given subset \(\mathcal {I}' \subset \mathcal {I}\) in Eq. (14). To generalize this definition to a quality rating \(\tilde {E}(X_{\text {gen}}, \mathcal {I}')\) of a set of test cases X
gen (with respect to \(\mathcal {I}'\)), we define the mean
$$\begin{array}{*{20}l} \tilde{E}(X_{\text{gen}}, \mathcal{I}') := |X_{\text{gen}}|^{-1} \sum_{x \in X_{\text{gen}}} E(x, \mathcal{I}'). \end{array} $$
(25)
Adaptation of diffusivity
The diffusivity adaptation step takes as input a quality rating \(\tilde {E}(X_{\text {gen}}, \mathcal {I}') \in \mathbb {N}\), two parameters \(b_{1},b_{2} \in \mathbb {R}^{+}\) (controlling the switching behavior from sub-diffusion to super-diffusion) and outputs two adapted parameters 0<α
1,α
2<2, which according to the power law (17) regulate the diffusivity of the Lévy flights \(({L^{1}_{t}})_{t \in \mathbb {N}}\) and \(({L^{2}_{t}})_{t \in \mathbb {N}}\).
Our aim (as motivated at the beginning of this section) is to adapt the diffusion parameters in such a way that the algorithm automatically focuses its search (by decreasing diffusivity of the generating Lévy flights) when generating high-quality (i.e., high coverage) test cases and in turn automatically widens its search (by increasing diffusivity) in the case of low-quality (i.e., low coverage) test cases. As discussed in Section 3, we can control diffusivity by setting suitable values of α
1 and α
2. Smaller diffusivity parameters result in frequent long flights and super-diffusion, whereas higher parameters reveal frequent small steps and sub-diffusion. To achieve this, we select a monotonically increasing function \(f: \mathbb {R} \rightarrow (0,2)\) with f(0)≤ε (for ε>0 sufficiently small) and \({\lim }_{t \to \infty } f(t)= 2\). Any such function will provide self adaptation of diffusivity of the Lévy flights, and we simply choose two functions
$$\begin{array}{*{20}l} f_{i}(t):=\frac{2}{1+e^{b_{i}-t}},\ i=1,2, \end{array} $$
(26)
where \(b_{i} \in \mathbb {R}^{+}\) are fixed parameters that determine at which point within the quality rating spectrum (i.e., at which mean number of newly discovered basic blocks) the search behavior of \(({L^{1}_{t}})_{t \in \mathbb {N}}\) and \(({L^{2}_{t}})_{t \in \mathbb {N}}\) switches from sub-diffusion to super-diffusion. With this function, we adapt diffusivity to
$$\begin{array}{*{20}l} \alpha_{i} = f\left(\tilde{E}(X_{\text{gen}}, \mathcal{I}')\right),\ i=1,2. \end{array} $$
(27)
The next iteration of test case generation is then executed with adapted Lévy flights.
Test case update
This step takes as input two sets of test cases \(X_{\text {old}}, X_{\text {gen}} \subset \mathcal {I}\) and outputs an updated set of test cases X
new. During the fuzzing process, we generate a steady stream of new test cases which we directly evaluate with respect to the set of previously generated inputs (as discussed in the quality evaluation step). However, if we archive every single test case and for each generation step evaluate the k
gen currently generated new test cases against the whole history of previously generated test cases, fuzzing speed decays constantly with increasing duration of the fuzzing campaign. Therefore, we define an upper bound \(k_{\text {max}} \in \mathbb {N}\) of total test cases that we keep for quality evaluation of new test cases. Small values of k
max may cause the Lévy flights \(({L^{1}_{t}})_{t \in \mathbb {N}}\) and \(({L^{2}_{t}})_{t \in \mathbb {N}}\) to revisit already explored input regions without being adapted (by decreasing the parameters α
i
) to perform super-diffusion and widen their search behavior. However, this causes no problem due to the Lévy flight hypothesis (discussed in Section 1).
The update of X
old with X
gen simply follows a first in, first out strategy. Initially if |X
old|+|X
new|<k
max, we append all newly generated test cases so that X
new=X
old∪X
gen. Otherwise, we first delete the oldest k
old entries in X
old, where
$$\begin{array}{*{20}l} k_{\text{old}}=|X_{\text{old}}|+|X_{\text{new}}|-k_{\text{max}}, \end{array} $$
(28)
and then take the union.
Joining the pieces
Now that we have presented all individual parts, we can combine them. The overall fuzzing algorithm is depicted in Fig. 1.
The initial seed generation step outputs a non-empty set of test cases \(X_{0} \subset \mathcal {I}\), two diffusivity parameters α
1 and α
2, and an initial offset q
0. The inputs X
0 are added to the list of test cases X
all. Then, the fuzzer enters the loop of test case generation, quality evaluation, adaptation of diffusivity, and test case update. The first step within the loop (referred to as Last (X
all)) sets q
0 to the last reached offset position of \(({L^{1}_{t}})_{t \in \mathbb {N}}\). In the first invocation of Last (X
all)), this is simply the already given seed offset, in all subsequent invocations q
0 is updated to the last state of \(({L^{1}_{t}})_{t \in \mathbb {N}}\). The Last () function also selects the most recently added test case x
0 in X
all, which gives the initial condition for \(({L^{2}_{t}})_{t \in \mathbb {N}}\) in the generation step. In our implementation, we realize the Last () function by retaining the reached states of both processes \(({L^{1}_{t}})_{t \in \mathbb {N}}\) and \(({L^{2}_{t}})_{t \in \mathbb {N}}\) between simulations.
Starting at \({L^{1}_{0}}=q_{0}\) and \({L^{2}_{0}}=x_{0}(q_{0})\), the Lévy flights \(({L^{1}_{t}})_{t \in \mathbb {N}}\) and \(({L^{2}_{t}})_{t \in \mathbb {N}}\) generate the set of new inputs X
gen by diffusing through input space with diffusivity α
1 and α
2, respectively. The quality of X
gen is then evaluated against the previous test cases in X
all. Depending on the quality rating outcome, the diffusivity of \(({L^{1}_{t}})_{t \in \mathbb {N}}\) and \(({L^{2}_{t}})_{t \in \mathbb {N}}\) is then adapted correspondingly by updating α
1 and α
2 according to the sigmoid functions f
i
in Eqs. (26). Then the current list of test cases X
all is updated with the just generated set X
gen and the fuzzer continues to loop.
Regarding complexity of the fuzzing algorithm we note that all of the individual parts are processed efficiently in the sense that their time complexity is bound by a constant. Especially the evaluation step Eval() is designed to scale: in the first iterations of the loop, the cost of evaluating X
gen against X
all is bound by \(\mathcal {O}(|X_{\text {all}}|^{2})\). To counter this growth, we defined an upper bound \(k_{\text {max}} \in \mathbb {N}\) for |X
all| in the test case update step above.