Steganalysis of JSteg algorithm using hypothesis testing theory

Qiao, Tong; Retraint, Florent; Cogranne, Rémi; Zitzmann, Cathel

doi:10.1186/s13635-015-0019-7

Research
Open access
Published: 13 March 2015

Steganalysis of JSteg algorithm using hypothesis testing theory

Tong Qiao¹,
Florent Retraint¹,
Rémi Cogranne¹ &
…
Cathel Zitzmann¹

EURASIP Journal on Information Security volume 2015, Article number: 2 (2015) Cite this article

10k Accesses
32 Citations
Metrics details

Abstract

This paper investigates the statistical detection of JSteg steganography. The approach is based on a statistical model of discrete cosine transformation (DCT) coefficients challenging the usual assumption that among a subband all the coefficients are independent and identically distributed (i. i. d.). The hidden information-detection problem is cast in the framework of hypothesis testing theory. In an ideal context where all model parameters are perfectly known, the likelihood ratio test (LRT) is presented, and its performances are theoretically established. The statistical performance of LRT serves as an upper bound for the detection power. For a practical use where the distribution parameters are unknown, by exploring a DCT channel selection, a detector based on estimation of those parameters is designed. The loss of power of the proposed detector compared with the optimal LRT is small, which shows the relevance of the proposed approach.

1 Introduction

Steganography and steganalysis have received more and more focus in the past two decades since the research in this field concerns law enforcement and national strategic defence. Steganography is the art and science of hiding secret messages in the cover media. On the opposite, steganalysis is about the detection of hidden secret information embedded in the cover media, also called stego media. If a steganalysis algorithm detects the inspected media as the stego one, even without knowing any extra information about the secret message, the steganographic approach fails.

1.1 State of the art

In today’s digital world, there exists many steganographic tools available on the Internet. Due to the fact that some are readily available and very simple to use, it is necessary to design the most reliable steganalysis methodology to fight back steganography. In general, due to its simplicity, most steganographic schemes insert the secret message into the least significant bit (LSB) plane of the cover media, including two kinds of steganography: LSB replacement and LSB matching. The former algorithm aims at replacing the LSB plane in the spatial domain or frequency domain of the cover media by 0 or 1. The latter algorithm, also known as ±1 embedding (see [1-3]), randomly increments or decrements a pixel or discrete cosine transformation (DCT) coefficient value to match the secret bit to be embedded when necessary. Since LSB replacement is easier to implement, it remains more popular, and hence, as of December 2011, WetStone declared that about 70% of the available steganographic softwares are based on the LSB-replacement algorithm [4]. Therefore, the research on LSB-replacement steganalysis remains an active topic.

Although the LSB-replacement steganalysis method (see [5-10]) has been studied for many years, it can be noted that most of the prior-art detectors are designed to detect data hidden in the spatial domain. In addition, for only a few detectors, the statistical properties have been studied and established, referred to as the optimal detectors. As detailed in [11], a wide range of problems, theoretical as well as practical, remain uncovered and some prevent the moving of ‘steganography and steganalysis from the laboratory into the real world’. This is especially the case in the field of optimal detection, see ([11], sec. 3.1), in which this paper lies. Roughly speaking, the goal of optimal detection in steganalysis is to exploit an accurate statistical model of cover source, usually digital images, to design a statistical test whose properties can be established, typically, in order to guarantee a false alarm rate (FAR) and to calculate the optimal detection performance one can expect from the most powerful detector.

In 2004, the weighted stego-image (WS) method [12] and the test proposed in [13] for LSB-replacement steganalysis changed the situation opening the way to optimal detectors. Driven by these pioneer works, the enhanced WS algorithm proposed in [14] improved the detection rate by enhancing pixel predictor, adjusting weighting factor and introducing the concept of bias correction. Nevertheless, the drawback of the original WS method is that it can only be applied in the spatial domain. Due to the prevalence of images compressed in the Joint Photographic Experts Group (JPEG) format, how to deal with this kind of images becomes mandatory. Inspired by the prior studies [12,14], the WS steganalyser for JPEG covers was proposed in [15]. However, the WS steganalyser does not allow one to get a high-detection performance for a low FAR, see [16], and its statistical properties remain unknown, which prevents the guarantee of a prescribed FAR. In practical forensic cases, since a large database of images needs to be processed, the getting of a very low FAR is crucial.

1.2 Contributions of the paper

For the detection of data hidden within the DCT coefficients of JPEG images, the application of hypothesis testing theory for designing optimal detectors that are efficient in practice is facing the problem of accurately modelling statistical distribution of DCT coefficients. It can be noted that several models have been proposed in the literature to model statistically the DCT coefficients. Among those models, the Laplacian distribution is probably the most widely used due to its simplicity and its fairly good accuracy [17]. More accurate models such as the generalized Gaussian [18] and, more recently, the generalized gamma model [19] have been shown to provide much more accuracy at the cost of higher complexity. Some of those models have been exploited in the field of steganalysis, see [20,21] for instance. In the framework of optimal detection, a first attempt has been made to design a statistical test modelling the DCT coefficient with the quantized Laplacian distribution, see [22].

It should be noted that other approaches have been proposed for the detection of data hidden within DCT coefficients of JPEG images, to cite a few, the structural detection [23], the category attack [24], the WS detector [15] and the universal or blind detectors [25,26]. However, establishing the statistical properties of those detectors remains a difficult work which has not been studied yet. In addition, most accurate detectors based on statistical learning are sensitive to the so-called cover-source mismatch [27]: the training phase must be performed with caution.

In this context, the detector proposed in [22] is an interesting alternative; however, it is based on the assumption that DCT coefficients are independent and identically distributed (i. i. d.) within a subband and have a zero expectation which might be inaccurate and hence make the detection performance poor in practice. In practice, this model is not independent of the image content, which performs well only in the case of a high-texture image (see Figure 1a), but hardly holds true in the case of a low-texture image (see Figure 1b). On the opposite, this paper proposes a statistical model assuming that each DCT coefficient has a different expectation and variance. The use of this model, together with hypothesis theory, allows us to design the most powerful likelihood ratio test (LRT) when the distribution parameters (expectation and variance) are known. Then, in the practical case of not knowing those parameters, estimations have to be used instead; this leads to the design of the proposed detector with estimated parameters. By taking into account those distribution parameters as nuisance parameters and using an accurate estimation, it is shown that the loss of power compared with the optimal detector is small.

Therefore, the contributions of this paper are as follows:

1.
First, a novel model of DCT coefficients is proposed; its major originality is that this model does not assume that all the coefficients of the same subband are i. i. d.
2.
Second, assuming that all the parameters are known, this statistical model of DCT coefficients is used to design the optimal test to detect data hidden within JPEG images with JSteg algorithm. This statistical test takes into account distribution parameters of each DCT coefficient as nuisance parameters.
3.
Further, assuming that all the parameters are unknown, a simple approach is proposed to estimate the expectation (or location) parameter of each coefficient by using linear properties of DCT as well as estimation of pixel expectation in the spatial domain; the variance (or scale) parameter is also estimated locally.
4.
The designed detector is improved by exploring a DCT channel selection, which has been proposed very recently [28,29] that selects only a subset of pixels or DCT coefficients in which embedding is most likely and hence detection easier.
5.
Numerical results show the sharpness of the theoretically established results and the good performance of the proposed statistical test. A comparison with the statistical test based on the Laplacian distribution and on the assumption of i. i. d. coefficient, see [22], shows the relevance of the proposed methodology. In addition, compared with prior-art WS detector [15], experimental results show the efficiency of the proposed detector.

1.3 Organisation of the paper

This paper is organised as follows. Section 2 formalises the statistical problem of detection of information hidden within DCT coefficients of JPEG images. Then, Section 3 presents the optimal LRT for detecting the JSteg algorithm based on the Laplacian distribution model. Section 4 presents the proposed approach for estimating the nuisance parameters in practice and compares our proposed detector with the WS detector [15] theoretically. Finally, Section 5 presents numerical results of the proposed steganalyser on simulated and real images, and Section 6 concludes this paper. This paper is an extended version of [30] that also includes the findings of [31] on channel selection [28,29].

2 Problem statement

In this paper, a grayscale digital image is represented, in the spatial domain, by a single matrix Z={z _i,j},i∈{1,…,I},j∈{1,…,J}. The present work can be extended to a colour image by analysing each colour channel separately. Most digital images are stored using the JPEG compression standard. This standard exploits the linear DCT, over blocks of 8×8 pixels, to represent an image in the so-called DCT domain. In the present paper, we avoid the description of the imaging pipeline of a digital still camera; the reader can refer to [32] for a description of the whole imaging pipeline and to [33] for a detailed description of the JPEG compression standard.

Let us denote DCT coefficients by the matrix V={v _i,j}. An alternative representation of those coefficients is usually adopted by gathering the DCT coefficients that correspond to the same frequency subband. In this paper, this alternative representation is denoted by the matrix U={u _k,l},k∈{1,…,K},l∈{1,…,64} with K≈I×J/64^a.

The coefficients from the first subband u _k,1, often referred to as direct current component (DC) coefficients, represent the mean of pixel value over a k-th block of 8×8 pixels. The modification of those coefficients may be obvious and creates artifacts that can be detected easily; hence, they are usually not used for data hiding. Similarly, the JSteg algorithm does not use the coefficients from the other subbands, referred to as alternating current component (AC) coefficients, if they equal 0 or 1. In fact, it is known that using the coefficients equal to 0 or 1 modifies significantly the statistical properties of AC coefficients; this creates a flaw that can be detected.

The JSteg algorithm embeds data within the DCT coefficients of JPEG images using the well-known LSB-replacement method, see details in [34]. In brief, this method consists of substituting the LSB of each DCT coefficient by a bit of the message it is aimed to hide. The number of bits hidden per coefficient, usually referred to as the payload, is denoted R∈(0,1]. Since the JSteg algorithm does not use each DCT coefficient, the payload will in fact be measured in this paper as the number of bits hidden per usable coefficients (that is the number of bits divided by the number of AC coefficients that differ from 0 and 1).

Let us assume that the DCT coefficients are independent and that they all follow the same probability distribution, denoted $\mathcal {P}_{\theta }$, parametrised by the parameter θ which may change among the coefficients. Since the DCT coefficients can only take value into a discrete set, the distribution $\mathcal {P}_{\theta }$ may be represented by its probability mass function (pmf) denoted P _θ={p _θ[u]}; for simplicity^b, it is assumed in this paper that $u \in \mathbb {Z}$. Let us denote $\mathcal {Q}_{\theta }^{R}$ the probability distribution of usable DCT coefficients from the stego-image after embedding a message with payload R. A short calculation shows that, see [8,12,13], the stego-image distribution may be represented by following the pmf $Q_{\theta }^{R} = \left \{q_{\theta }^{R}[\!u]\right \}_{u\in \mathbb {Z}}$ where:

$$ q_{\theta}^{R}[\!u] = (1-{R}/{2}) p_{\theta}[\!u] + {R}/{2} p_{\theta}[\bar{\!u}], $$

((1))

and $\bar {u} = u+(-1)^{u}$ represents the integer u with flipped LSB. For the sake of clarity, let us denote θ _k,l the distribution parameter of the k-th DCT coefficient from the l-th subband and let θ={θ _k,l},k∈{1,…,K},l∈{2,…,64} represent the distribution parameter of all the AC coefficients.

When inspecting a given JPEG image, more precisely its DCT coefficients matrix U, in order to detect data hidden with the JSteg algorithm, the problem consists in choosing between the two following hypotheses: $\mathcal {H}_{0}$: ‘the coefficients u _k,l follow the distribution $\mathcal {P}_{\theta _{k,l}}$’ and $\mathcal {H}_{1}$: ‘the coefficients u _k,l follow the distribution $\mathcal {Q}_{\theta _{k,l}}^{R}$’ which can be written formally as:

$${} \left\{ \begin{aligned} \displaystyle \mathcal{H}_{0} &: \left\{ u_{k,l} \sim \mathcal{P}_{\theta_{k,l}},\forall k\in\{1,\ldots,K\}, \forall l\in\{2,\ldots,64\} \right\},\\ \displaystyle \mathcal{H}_{1} &: \left\{ u_{k,l} \sim \mathcal{Q}^{\tiny R}_{\theta_{k,l}},\forall k\in\{1,\ldots,K\}, \forall l\in\{2,\ldots,64\}\right\}. \end{aligned}\right. $$

((2))

A statistical test is mapping $\delta :\mathbb {Z}^{I\cdot J}\mapsto \{\mathcal {H}_{0},\mathcal {H}_{1}\}$ such that hypothesis $\mathcal {H}_{i}$ is accepted if $\delta (\mathbf {U})=\mathcal {H}_{i}$ (see [35] for details on hypothesis testing). As previously explained, this paper focuses on the Neyman-Pearson bi-criteria approach: maximising the correct detection probability for a given false alarm probability α ₀. Let:

$$ \mathcal{K}_{\alpha_{0}}=\left\{ \delta : \sup_{\boldsymbol{\theta}}{\mathbb{P}}_{\mathcal{H}_{0}}\left[\delta(\mathbf{U})=\mathcal{H}_{1}\right] \leq \alpha_{0}\right\}, $$

((3))

be the class of tests with a false alarm probability upper bounded by α ₀. Here, ${\mathbb {P}}_{\mathcal {H}_{i}}(A)$ stands for the probability of event A under hypothesis $\mathcal {H}_{i}, i=\{0,1\}$, and the supremum over θ has to be understood as whatever the distribution parameters might be, in order to ensure that the false alarm probability α ₀ cannot be exceeded. Among all the tests in $\mathcal {K}_{\alpha _{0}}$, it is aimed at finding a test δ which maximises the power function, defined by the correct detection probability:

$$ \beta_{\delta}={\mathbb{P}}_{\mathcal{H}_{1}}\left[\delta(\mathbf{U})=\mathcal{H}_{1}\right], $$

((4))

which is equivalent to minimise the missed detection probability $\alpha _{1}(\delta) = {\mathbb {P}}_{\mathcal {H}_{1}}\left [\delta (\mathbf {U})=\mathcal {H}_{0}\right ] = 1-\beta _{\delta }$.

In order to design a practical optimal detector, as referred in [11], for steganalysis in the spatial domain, the main difficulty is to estimate the distribution parameters (expectation and variance of each pixel). On the opposite, in the case of the DCT coefficients, the application of hypothesis testing theory to design an optimal detector has previously been attempted with the assumption that the distribution parameter remains the same for all the coefficients from a same subband. With this assumption, the estimation of the distribution parameters is not an issue because thousands of DCT coefficients are available. However, which distribution model to choose remains an open problem.

The hypothesis testing theory has been applied for the steganalysis of JSteg algorithm in [22] using a Laplacian distribution model and using the assumption that DCT coefficients of each subband are i. i. d. However, this pioneer work does not allow the designing of an efficient test because a very important loss of performance has been observed when comparing results on real images and theoretically established ones. Such a result can be explained by the two following reasons: 1) the Laplacian model might be not accurate enough to detect steganagraphy and 2) the assumption that the DCT coefficients of each frequency subband are i. i. d. may be wrong. Recently, it has been shown that the use of the generalised gamma model or an even more accurate model [36,37] allows the designing of a test with very good detection performance. On the opposite, in this paper, it is proposed to challenge the assumption that all the DCT coefficients of a subband are i. i. d.

A typical example is given by Figures 2 and 3. Figure 2a and Figure 2b, respectively, represent the DCT coefficients of the subband (1,2) and subband (4,4) extracted from the image lena. Observing those two graphs, it is obvious that the assumption of all those coefficients being i. i. d. is doubtful. However, if it is assumed that each coefficient has a different expectation, one can estimate this expected value and compute the ‘residual noise’, that is, the difference between the observation and the computed expectation. Such results are shown in Figure 3, with three different models for estimating the expectation of DCT coefficients of the same two subbands from lena. Moreover, Figure 4 illustrates the distribution of residual noises which are plotted in Figure 3. Obviously, residual noises look much more i. i. d. than the original DCT coefficients.

In the following section, we detail the statistical test that takes into account both the expectation and the variance as nuisance parameters, and we study the optimal detection when those parameters are known. A discussion on nuisance parameters is also provided in Section 4.

3 LRT for two simple hypothesis

3.1 Optimal detection framework

When the payload R and the distribution parameters θ={θ _k,l},k∈{1,…,K},l∈{2,…,64} are known, problem (2) is reduced to a statistical test between two simple hypotheses. In such a case, the Neyman-Pearson Lemma ([35], theorem 3.2.1) states that the most powerful test in the class $\mathcal {K}_{\alpha _{0}}$ (3) is the LRT defined, on the assumption that DCT coefficients are independent, as:

$$ \delta^{\text{lr}}(\mathbf{U})=\left\{ \begin{aligned} &\displaystyle \mathcal{H}_{0} \displaystyle \ \text{if}\ \Lambda^{\text{lr}}(\mathbf{U}) = \sum_{k=1}^{K} \sum_{l=2}^{64} \Lambda^{\text{lr}}(u_{k,l}) < \tau^{\text{lr}}, \\ &\displaystyle \mathcal{H}_{1} \displaystyle\ \text{if}\ \Lambda^{\text{lr}}(\mathbf{U}) = \sum_{k=1}^{K} \sum_{l=2}^{64} \Lambda^{\text{lr}}(u_{k,l}) \geq \tau^{\text{lr}}, \end{aligned}\right. $$

((5))

where the decision threshold τ ^lr is the solution of the equation $\mathbb {P}_{\mathcal {H}_{0}}\left [ \Lambda ^{\text {lr}}(\mathbf {U}) \geq \tau ^{\text {lr}} \right ] = \alpha _{0}$, to ensure that the false alarm probability of the LRT equals α ₀, and the log likelihood ratio (LR) for one observation is given, by definition, by:

$$ \Lambda^{\text{lr}}(u_{k,l}) = \log\left(\frac{q^{R}_{\theta_{k,l}}[\!u_{k,l}] }{ p_{\theta_{k,l}}[\!u_{k,l}]} \right). $$

((6))

In practice, when the rate R is not known, one can try to design a test which is locally optimal around a given payload rate, named Locally Asymptotically Uniformly Most Powerful (LAUMP) test, as proposed in [6,8], but this lies outside the scope of this paper.

From the definition of $p_{\theta _{k,l}}[\!u_{k,l}]$ and $q^{R}_{\theta _{k,l}}[\!u_{k,l}]$ (1), it is easy to write the LR (6) as:

$$ \Lambda^{\text{lr}}(u_{k,l}) = \log\left(1-\frac{R}{2} + \frac{R}{2} \frac{p_{\theta_{k,l}}[\!\bar{u}_{k,l}]}{p_{\theta_{k,l}}[\!u_{k,l}]} \right), $$

((7))

where, as previously defined, $\bar {u}_{k,l} = u_{k,l} + (-1)^{{u}_{k,l}}$ represents the DCT coefficient u _k,l with flipped LSB.

3.2 Statistical performance of LRT

Accepting, for a moment, that one is in this most favourable scenario, in which all the parameters are perfectly known, we can deduce some interesting results. Due to the fact that observations are considered to be independent, the LR Λ ^lr(U) is the sum of random variables and some asymptotic theorems allow to establish its distribution when the number of coefficients becomes ‘sufficiently large’. This asymptotic approach is usually verified in the case of digital images due to the very large number of pixels or DCT coefficients.

Let us denote $E_{\mathcal {H}_{i}}(\theta _{k,l})$ and $V_{\mathcal {H}_{i}}(\theta _{k,l})$ the expectation and the variance, respectively, of the LR Λ ^lr(u _k,l) under hypothesis $\mathcal {H}_{i}, i=\{0,1\}$. Those quantity obviously depend on the parameterized distribution $\mathcal {P}_{\theta _{k,l}}$. The Lindeberg’s central limit theorem (CLT) ([35], theorem 11.2.5) states that as K tends to infinity it holds true that^c:

$$ \frac{\displaystyle\sum_{k=1}^{K} \sum_{l=2}^{64} \Lambda^{\text{lr}}(u_{k,l}) - E_{\mathcal{H}_{i}}(\theta_{k,l})}{ \displaystyle\left(\sum_{k=1}^{K} \sum_{l=2}^{64} V_{\mathcal{H}_{i}}(\theta_{k,l}) \right)^{1/2}} \xrightarrow{\;d\;} \mathcal{N}(0,1) \;, \, i=\{0,1\}\,, $$

((8))

where $\xrightarrow {\;d\;}$ represents the convergence in distribution, and $\mathcal {N}(0,1)$ is the standard normal distribution, i.e. with zero mean and unit variance.

This theorem is of crucial interest to establish the statistical properties of the proposed test [7,22,37,38]. In fact, once the moments have been calculated under both $\mathcal {H}_{i}, i=\{0,1\}$, one can normalise under hypothesis $\mathcal {H}_{0}$ the LR Λ ^lr(U) as follows:

$$ \begin{aligned} \overline{\Lambda}^{\text{lr}}(\mathbf{U}) &= \frac{ \Lambda^{\text{lr}}(\mathbf{U}) - \sum_{k=1}^{K} \sum_{l=2}^{64} E_{\mathcal{H}_{0}}(\theta_{k,l})}{ \left(\sum_{k=1}^{K} \sum_{l=2}^{64} V_{\mathcal{H}_{0}}(\theta_{k,l}) \right)^{1/2}},\\ &= \frac{\sum_{k=1}^{K} \sum_{l=2}^{64} \Lambda^{\text{lr}}(u_{k,l}) - E_{\mathcal{H}_{0}}(\theta_{k,l})}{ \left(\sum_{k=1}^{K} \sum_{l=2}^{64} V_{\mathcal{H}_{0}}(\theta_{k,l}) \right)^{1/2} }. \end{aligned} $$

((9))

Since this essentially consists of adding a deterministic value and scaling the LR, this operation of normalisation preserves the optimality of the LRT. It is thus straightforward to define the normalised LRT with $\overline {\Lambda }^{\text {lr}}(\mathbf {U})$ by:

$$\begin{array}{*{20}l} & \overline{\delta}^{\text{lr}}(\mathbf{U})=\left\{ \begin{aligned} &\mathcal H_{0} \,\text{if }\, \overline{\Lambda}^{\text{lr}}(\mathbf{U}) < \overline{\tau}^{\text{lr}}\\ &\mathcal H_{1} \,\text{if }\, \overline{\Lambda}^{\text{lr}}(\mathbf{U}) \geq \overline{\tau}^{\text{lr}}. \end{aligned}\right. \end{array} $$

((10))

It immediately follows from Lindeberg’s CLT (8) that $\overline {\Lambda }^{\text {lr}}(\mathbf {U})$ asymptotically follows, as K tends to infinity, the normal distribution $\mathcal {N}(0,1)$. Hence, it is immediate to set the decision threshold that guarantees the prescribed false alarm probability:

$$ \overline{\tau}^{\text{lr}} = \Phi^{-1}\left(1-\alpha_{0} \right), $$

((11))

where Φ and Φ ⁻¹, respectively, represent the cumulative distribution function (cdf) of the standard normal distribution and its inverse. Similarly, denoting:

$$ m_{i} \,=\, \sum_{k=1}^{K} \sum_{l=2}^{64}\! E_{\mathcal{H}_{i}}(\theta_{k,l});{\sigma^{2}_{i}} \,=\, \sum_{k=1}^{K} \sum_{l=2}^{64}\! V_{\mathcal{H}_{i}}(\theta_{k,l}) \,, i \!= \!\!\{0,1\}, $$

it is also straightforward to establish the detection function of the LRT given by:

$$ \beta_{\overline{\delta}^{\text{lr}}} = 1-\Phi\left(\frac{\sigma_{0}}{\sigma_{1}} \Phi^{-1}\left(1-\alpha_{0} \right) + \frac{m_{0} - m_{1}}{\sigma_{1}} \right). $$

((12))

Equations (11) and (12) emphasise the main advantage of normalising the LR as described in relation (9): it allows setting any threshold that guarantees a false alarm probability independently from any distribution parameters, and, this is particularly crucial because digital images are heterogeneous, their properties vary for each image. Second, the normalisation allows to easily establish the detection power which again is achieved for any distribution parameters and hence for any inspected image.

3.3 Application with Laplacian distribution

In the case of Laplacian distribution, the framework of hypothesis testing theory has been applied for the steganalysis of JSteg in [22] in which the moments of LR are calculated under the two following assumptions: 1) all the DCT coefficients from the same subband are i. i. d. and 2) the expectation of each DCT coefficient is zero.

The continuous Laplacian distribution has the following probability density function (pdf):

$$ f_{\mu,b}(x) = \frac{1}{2b} \exp\left(-\frac{| x - \mu |}{b} \right) $$

((13))

where $\mu \in \mathbb {R}$, sometimes referred to as the location parameter, corresponds to the expectation, and b>0 is the so-called scale parameter. During the compression of JPEG images, the DCT coefficients are quantized. Hence, let us define the discrete Laplacian distribution by the following pmf, see details in Appendix A:

$$\begin{array}{*{20}l} f_{\mu,b}[\!k] &\stackrel{def.}{=} \mathbb{P}\left[\vphantom{\frac{0}{0}} x \in \left[\Delta(k-{1}/{2}), \Delta(k+{1}/{2})\right[ \right] \\ &=\! \left\{\!\! \begin{array}{l} \exp\!\left(-\frac{|\Delta k -\mu|}{b} \right) \sinh\!\left(\frac{\Delta}{2b} \right) \; \text{if\;} \frac{\mu}{\Delta} \!\notin\! \left[\right. k\,-\,{1}/{2} ; \!k\,+\,{1}/{2} \left[\right. \\ 1\,-\,\exp\!\left(-\frac{\Delta}{2b} \right)\cosh\!\left(-\frac{\Delta(k)-\mu }{2b} \!\right) \;\text{otherwise}\\ \end{array} \right. \end{array} $$

((14))

where Δ is the quantization step.

From the expression of the discrete Laplacian distribution (14) and from the expression of LR (7), one can express the LR for the detection of JSteg under the assumption that DCT coefficients follow a Laplacian distribution, as follows (see Appendix B):

$$ \Lambda^{\text{lr}}_{\mu,b}[k] \,=\, \log\left(\!\!1\,-\,\frac{R}{2} \,+\, \frac{R}{2} \exp\!\left[\! \frac{\Delta}{b} \text{sign}(\Delta k -\mu) (k-\bar{k}) \!\right] \right), $$

((15))

where the observed DCT coefficient, referred to as u _k,l in Equation (7), is denoted as k. It can be noted that this expression (15) of the LR is almost the same as the one obtained in [22]; assuming that all DCT coefficients have a zero mean, only the sign term sign(Δ k−μ) becomes sign(k) when assuming a zero mean. It should also be noted that the log-LR equals 0 for every DCT coefficient whose value is 0 or 1 because the JSteg algorithm does not embed hidden data in those coefficients. In the present paper, the moments of the LR (15) are not analytically established; the interested reader can refer to [22].

4 Proposed approach for estimating the nuisance parameters in practice

4.1 Estimation of expectation of each DCT coefficient

As already explained, most statistical models of DCT coefficients assume that within a subband the coefficients are i. i. d. However, as illustrated in Figures 1 and 2, this assumption is doubtful in practice. Another way to explain why the DCT coefficients may not be i. i. d. is to consider a block of 8×8 pixels in the spatial domain, say the first, z=z _i,j,i∈{1,…,8},j∈{1,…,8}. The value of those pixels can be decomposed as:

$$ z_{i,j} = x_{i,j} + n_{i,j}, $$

where x _i,j is a deterministic value that represents the expectation of a pixel at location (i,j) and n _i,j is the realisation of a random variable representing all noises corrupting the inspected image. Clearly, this decomposition can be done for the whole block z=x+n, where x={x _i,j} and n={n _i,j}. Since the DCT transformation is linear, the DCT coefficient of any block may be expressed as :

$$\begin{array}{*{20}l} \text{DCT}(\mathbf{z}) &= \mathbf{D}^{T} \mathbf{z} \mathbf{D} = \mathbf{D}^{T} (\mathbf{x} + \mathbf{n}) \mathbf{D} \\ &= \mathbf{D}^{T} \mathbf{x} \mathbf{D} + \mathbf{D}^{T} \mathbf{n} \mathbf{D} = \text{DCT}(\mathbf{x}) + \text{DCT}(\mathbf{n}), \end{array} $$

((16))

where DCT represents the DCT transform and D is the change of basis matrix from spatial to DCT basis, often referred as the DCT matrix.

It makes sense to assume that the expectation of the noise component n has a zero mean in the spatial and in the DCT domain. On the opposite, it is difficult to justify that the DCT of pixels’ expectation x should necessarily be around zero. Actually, this assumption holds true if and only if the expectation is the same for of all the pixels from a block: ∀i∈{1,…,8},∀j∈{1,…,8},x _i,j=x; see [36,37,39] for details.

On the opposite, in the paper, it is mainly aimed at estimating the expectation of each DCT coefficient. To this end, it is proposed to decompress a JPEG image V into the spatial domain to obtain Z, then to estimate the expectation of each pixel in the spatial domain $\widehat {\mathbf {Z}}$ by using a denoising filter. Then, this denoised image $\widehat {\mathbf {Z}}$ is transformed back into the DCT domain to finally obtain the estimated value of all DCT coefficients, denoted $\widehat {\mathbf {V}} = \{ \hat {v}_{i,j} \} \,, i\in \{1,\ldots,I\} \,, j\in \{1,\ldots,J\}$. Several methods have been tested to estimate the expectation of pixels in the spatial domain $\widehat {\mathbf {Z}}$, namely, the BM3D collaborative filtering [40], K-SVD sparse dictionary learning [41], non-local weighted averaging method from non-local (NL) means [42] and the wavelet denoising filter [43]. The codes used for the methods [40-42] have been downloaded from the Image Processing On-Line website^d. The codes used for the method [43] have been downloaded from DDE^e.

4.2 A local estimation of b

In addition, the proposed model also assumes that the scale parameter b _k,l is different for each DCT coefficient. The estimation of this parameter, for each DCT coefficient, is based on the WS Jpeg method to locally estimate the variance; that is, for coefficients v _i,j, it simply consists of the sample variance of the DCT coefficients of the same subband from neighbouring blocks:

$$ \widehat{\sigma}^{2}_{i,j} = \frac{1}{7} {\underset{(s,t)\neq(0,0)}{\sum_{s=-1}^{1}\sum_{t=-1}^{1}}} \left(v_{i+8s,j+8t} - \bar{v}_{i,j}\right)^{2}, $$

((17))

where $\bar {v}_{i,j}$ is the sample mean: $\frac {1}{8} {\underset {(s,t)\neq (0,0)}{\sum _{s=-1}^{1}\sum _{t=-1}^{1}}} v_{i\,+\,8s,j\,+\,8t}$. Let us recall that the MLE of the scale parameter of Laplacian distribution from realisations x1,…,x _N is given by $\hat {b} = N^{-1} \sum _{n=1}^{N} | x_{n} - \mu |$. The local estimation of the scale parameter it is proposed to use in this paper is given by:

$$ \hat{b}_{i,j} = \frac{1}{8} {\underset{(s,t)\neq(0,0)}{\sum_{s=-1}^{1}\sum_{t=-1}^{1}}} \left| v_{i+8s,j+8t} - \hat{v}_{i+8s,j+8t} \right|, $$

((18))

where $\hat {v}_{i+8s,j+8t}$ is the estimation of expectation of each DCT coefficient by using the denoising filter previously defined. As in the WS Jpeg algorithm, this approach raises the problem of scale parameter estimation for blocks located on the sides of the image. In the present paper, as in the WS Jpeg method, it is proposed not to use those blocks in the test.

4.3 A channel selection to improve the method

Inspired by the channel selection algorithms (see [28,29]), it is proposed to improve our detector with a weighting factor (WF). In practice, WF is generated from the quantized and rounded ‘residual noise’, which is calculated by the following steps:

1.
By uncompressing the JPEG format image, we obtain the intensity value of a JPEG image in the spatial domain.
2.
By using a denoising filter, we extract the raw ‘residual noise’ in the spatial domain.
3.
By using DCT transformation, we transform the raw ‘residual noise’ from the spatial to the frequency domain.
4.
By using quantization table, we can obtain the quantized ‘residual noise’.
5.
By rounding the quantized ‘residual noise’ in the frequency domain, the quantized and rounded ‘residual noise’ is obtained.
6.
If a quantized and rounded ‘residual noise’ takes zero, WF equals 0; If not, WF equals 1.

Thus, based on our proposed WF, it is proposed to categorise ‘residual noise’ set into two subsets: ‘non-zero’ subset and ‘zero’ subset. To verify the effectiveness of our improved algorithm, it is proposed to randomly choose ten exemplary images which are compressed to JPEG format images with quality factor 70 and embedding rate R=0.05. Also, all the images of the BossBase database [27] are used for computing the average value. Table 1 gives the statistical ratio of the data in which the annotations of the table are as follows:

Cover channel selection ratio: denotes the ratio of the ‘non-zero’ subset to the ‘residual noise’ set of a cover image.
Table 1 Ratio (%) comparison before and after embedding
Full size table
Stego channel selection ratio: denotes the ratio of the ‘non-zero’ subset to the ‘residual noise’ set of a stego image.
Cover DCT coefs. std: denotes the standard deviation of the ‘residual noise’ set from a cover image.
Stego DCT coefs. std: denotes the standard deviation of the ‘residual noise’ set from a stego image.
Cover JSteg selection ratio: denotes the ratio of the DCT coefficients used by JSteg in the ‘non-zero’ subset to the DCT coefficients used by JSteg in the ‘residual noise’ set from a cover image.
Stego JSteg selection ratio: denotes the ratio of the DCT coefficients used by JSteg in the ‘non-zero’ subset to the DCT coefficients used by JSteg in the ‘residual noise’ set from a stego image.
Cover and stego selection similarity: denotes the ratio of the same position in the ‘non-zero’ subset before and after embedding.

In our proposed statistical test, the number of the selected coefficients for the detection should be kept very close before and after embedding. As Table 1 illustrated, the ratio of cover channel selection ratio and stego channel selection ratio basically remains the same before and after embedding, which reveals the proportion of the coefficients used for the test as nearly the same. Similarly, the ratio of cover DCT coefs. std and stego DCT coefs. std allows us to verify our assumption that the embedding doesn’t change much the statistical properties of the ‘residual noise’. In addition, those numbers also show that, after rejection of the content, the ‘residual noise’ standard deviation is very small compared to the original DCT coefficients (see also Figures 2 and 3), which thus permits a better detection of modifications due to JSteg embedding. The ratio of cover and stego selection similarity which is kept at the high value signifies that most of the ‘residual noise’ are chosen at the same position. Then, the only difference is the comparison between the cover JSteg selection ratio and stego JSteg selection ratio. It should be noted that if all DCT coefficients used by JSteg are included in the ‘non-zero’ subset, then the ratio equals 100%. It is observed that only a few of the DCT coefficients used by the JSteg algorithm is included in the ‘non-zero’ subset. Nevertheless, after embedding, the ratio of stego JSteg selection ratio is largely improved, compared with the ratio of cover JSteg selection ratio. It can be assumed that by using a WF, more ‘residual noise’ from the embedding positions are counted. Besides, prior to embedding secret information, we never know which position will be embedded; the very low ratio of the cover JSteg selection ratio is reasonable.

By investigating the ‘non-zero’ and ‘zero’ subset, although we can not capture all the embedding positions in the DCT domain, it is totally enough to detect the JSteg steganography. Besides, all the coefficients in the ‘zero’ subset are not counted in our proposed test. On average, for a cover image with the size of 512×512, 0.63% of the coefficients are kept to compute the test; 0.64% of the coefficients from a stego image are used. As the embedding rate R=0.05, it is obvious that most of the DCT coefficients remain the same before and after embedding. Thus, it is not necessary to compute these values. Furthermore, the LR values of these DCT coefficients without embedding any information probably mask or disturb the LR from DCT coefficients with JSteg embedding.

4.4 Design of proposed test

In Section 3, the framework of hypothesis testing theory has been presented assuming that distribution parameters are known for each DCT coefficient. To design a practical test, a usual solution consists of replacing the unknown parameter by its ML estimation. This leads to the construction of a generalised LRT. A similar construction is adopted in this paper, using the ad hoc estimators presented at the beginning of Section 4, instead of using the ML method to estimate the distribution parameters of each DCT coefficient. The proposed test is thus defined as:

$$ \widehat{\delta}(\mathbf{U})=\left\{ \begin{aligned} &\displaystyle \mathcal{H}_{0} \displaystyle\ \text{if}\ \widehat{\Lambda}(\mathbf{U}) = \sum_{k=1}^{K} \sum_{l=2}^{64} \widehat{\Lambda}_{cs}(u_{k,l}) < \widehat{\tau}, \\ &\displaystyle \mathcal{H}_{1} \displaystyle\ \text{if}\ \widehat{\Lambda}(\mathbf{U}) = \sum_{k=1}^{K} \sum_{l=2}^{64} \widehat{\Lambda}_{cs}(u_{k,l}) \geq \widehat{\tau}, \end{aligned}\right. $$

((19))

where the channel-selection decision statistic $\widehat {\Lambda }_{\textit {cs}}(u_{k,l}) = \widehat {\Lambda }(u_{k,l})\cdot w_{k,l} $ for a single DCT coefficient is given, and a weighting factor w _k,l selects the DCT channel. Next, let us study the $\widehat {\Lambda }(u_{k,l})$ to verify the effectiveness of our proposed test.

To verify our improvement based on the Laplacian test (see [22]), it is proposed to consider the weighing factor w _k,l as a constant equal to 1. The scale parameter $\widehat {b}$ is estimated by using MLE and the location parameter is ignored (see details in [22]). The LR is given by:

$$ \widehat{\Lambda}(u_{k,l}) = \log\left(1+\frac{R}{2} + \frac{R}{2} \exp\left[ \!\frac{\Delta}{\widehat{b}} \text{sign}(\Delta k) (k-\bar{k}) \right] \right) \!. $$

((20))

The first improvement of the previous LR is the consideration of the location parameter $\widehat {\mu }_{k,l}$ (see Section 4.1). The new LR is designed by:

$$ \widehat{\Lambda_{1}}(u_{k,l}) \,=\, \log\!\left(1\,+\,\frac{R}{2}\! +\! \frac{R}{2} \!\exp\!\left[ \frac{\Delta}{\widehat{b}} \text{sign}(\Delta k \,-\,\widehat{\mu}_{k,l}) (k-\bar{k}) \right] \right) \!. $$

((21))

The second improvement is the estimation of the scale parameter $\widehat {b}_{k,l}$ (see Section 4.2) and ignore the location parameter. The LR is designed by:

$$ \widehat{\Lambda_{2}}(u_{k,l}) =\! \log\!\left(1+\frac{R}{2} + \frac{R}{2} \exp\!\left[ \!\frac{\Delta}{\widehat{b}_{k,l}} \text{sign}(\Delta k) (k-\bar{k})\! \right] \right) \!\!. $$

((22))

The third improvement is to give the assumption that DCT coefficients are i. i. d. The scale parameter $\widehat {b}_{k,l}$ and the location parameter $\widehat {\mu }_{k,l}$ of the distribution are estimated separately by using our proposed algorithms in Sections 4.1 and 4.2.

$$ \widehat{\Lambda_{3}}(u_{k,l}) \,=\,\log\!\left(\!1\,+\,\frac{R}{2} \,+\, \frac{R}{2} \!\exp\!\left[ \frac{\Delta}{\widehat{b}_{k,l}} \text{sign}(\Delta k \,-\,\widehat{\mu}_{k,l}) (k\,-\,\bar{k})\! \right] \right)\!. $$

((23))

Moreover, it is proposed to explore the effectiveness of introducing a weighing factor w _k,l which is defined as:

$$ w_{k,l}=\left\{ \begin{aligned} &\displaystyle 0 \;\;\;\;\text{if}\;\;\;\; \Delta k \,-\,\widehat{\mu}_{k,l} \in (-0.5, 0.5) \\ &\displaystyle 1 \;\;\;\;\text{otherwise}. \end{aligned}\right. $$

((24))

The last LR is obtained by multiplying (23) by w _k,l :

$$ \widehat{\Lambda}_{cs}(u_{k,l}) = \widehat{\Lambda_{3}}(u_{k,l}) w_{k,l}. $$

((25))

It is should be noted that (20) is the algorithm from [22]. In Section 5, the specific comparison of the detectors is presented. In order to have a normalised decision statistic for the whole image, $\widehat {\Lambda }(\mathbf {U})$ is defined as:

$$ \begin{aligned} \widehat{\Lambda}(\mathbf{U}) &= \frac{1}{S_{L}} \sum_{k=1}^{K} \sum_{l=2}^{64} \widehat{\Lambda}_{cs}(u_{k,l}) - E_{\mathcal{H}_{0}}\left(\widehat{\mu}_{k,l}, \widehat{b}_{k,l}\right) \\ \text{with}\ {S_{L}^{2}} &= \sum_{k=1}^{K} \sum_{l=2}^{64} V_{\mathcal{H}_{0}}\left(\widehat{\mu}_{k,l}, \widehat{b}_{k,l}\right). \end{aligned} $$

((26))

4.5 Comparison with prior art

The WS Jpeg, as well as the WS for spatial domain, is based on the underlying assumption that the observations follow a Gaussian distribution. As recently shown [6,8], the WS implicitly assumes that the quantization step is negligible. Let us rewrite the LR test for JSteg detection based on a Gaussian distribution model of DCT coefficients. Let X be a random variable following a quantized Gaussian distribution. Exploiting the assumption that the quantization step is negligible compared to noise standard deviation allows the writing of:

$$\begin{array}{*{20}l} \mathbb{P}[X=k] &= \int_{\Delta(k-1/2)}^{\Delta(k+1/2)} \frac{1}{\sigma\sqrt{2\pi}} \!\exp\!\left(\! - \frac{(x-\mu)^{2}}{2\sigma^{2}} \!\right) dx \\ &\approx \frac{\Delta}{\sigma\sqrt{2\pi}} \exp\left(- \frac{(\Delta k-\mu)^{2}}{2\sigma^{2}} \right). \end{array} $$

((27))

Putting this expression of the pmf under hypothesis $\mathcal {H}_{0}$ into the LR (2), and assuming that the quantization step is negligible compared to the noise standard deviation, Δ<<σ, it is immediate to obtain the following expression of the LR under the assumption of Gaussian distribution of DCT coefficient:

$$ \begin{aligned} &\log\left(1+\frac{R}{2} + \frac{R}{2} \frac{\exp\left(-\frac{(\Delta \bar{k}-\mu)^{2}}{2\sigma^{2}} \right) }{ \exp\left(-\frac{(\Delta k-\mu)^{2}}{2\sigma^{2}} \right)} \right)\\ &\approx \frac{R\Delta}{2\sigma^{2}} \;\;\;\;\; (k - \bar{k}) \;\;\;\; (\Delta k - \mu) \\ &= \overbrace{\;\; w_{\sigma} \;\;}\;\;\overbrace{\;\;\; \pm 1 \;\;\;}\;\;\; \overbrace{\;(\Delta k - \mu)\;} \end{aligned} $$

((28))

see details in Appendix C. This expression highlights the well-known fact that the WS consists in fact of three terms: 1) the term w _σ which is a weight so that pixels or DCT coefficients with the highest variance have a smallest importance, 2) the term $(k - \bar {k}) = \pm 1$ according the LSB of k and 3) the term (Δ k−μ).

In comparison, the expression of the LR for a Laplacian distribution model (15), as well as the expression of the proposed test with estimates (21) can be approximated by (see details in Appendix B):

$$ \frac{R\Delta}{2b}\ (k-\bar{k})\ \text{sign}(\Delta k -\mu) \\ = \overbrace{\; w_{b} \;}\;\;\overbrace{\;\;\; \pm 1 \;\;\;}\;\;\; \overbrace{\text{sign}(\Delta k -\mu) } $$

((29))

which is also made of three terms; the two first are roughly similar to the two first terms of the WS : 1) the term w _b is a weight so that DCT coefficients with the highest ‘scale’ b have the smallest importance; note that the variance is proportional to b ²; 2) the term $(k - \bar {k}) = \pm 1$ according to the LSB of k. However, in the expression of the LR based on the Laplacian model, the term (Δ k−μ) of the WS is replaced with its sign. This shows that the statistical tests based on the Laplacian model and based on the Gaussian model are essentially similar.

5 Numerical simulations

5.1 Results on simulated images

One of the main contributions of this paper is to show that the hypothesis testing theory can be applied in practice to design a statistical test with known statistical properties for JSteg steganalysis.

To verify the sharpness of the theoretically established results, we generate 1,000 sets of 4,000 random variables (a Monte Carlo simulation) following the Laplacian distribution, where R=0.05, μ=0 and b distributed from 1 to 10 with a step of 0.5. Then, the expectation and variance values are calculated empirically and theoretically. As shown in Figure 5, the empirically calculated moments are almost equal to the analytically established ones.

Subsequently, to verify the effectiveness of the established LRT $\overline {\delta }^{\text {lr}}(\mathbf {U})$, again, a Monte Carlo simulation is performed by repeating 10,000 times using a vector 64×4,096 following the Laplacian distribution, in which the scale parameter is selected arbitrarily as 3 and the location parameter 0. Under the hypothesis $\mathcal H_{0}$ and $\mathcal H_{1}$, respectively, Figure 6 presents the comparison between empirical and theoretical distribution of $\overline {\Lambda }^{\text {lr}}(\mathbf {U})$. The results highlight the validity of the proposed test (10).

Figure 7 gives the comparison between the empirical and theoretical FAR α ₀, respectively, of the test (10). This particularly demonstrates that two curves are very close. Figure 8 offers the receiver operating characteristic (ROC) comparison, that is the detection power $\beta _{\overline {\delta }^{\text {lr}}}$ as a function of FAR α ₀, of both empirical and theoretical established results in (11) and (12).

5.2 Results on real images

Another contribution of this paper is to design the optimal test with estimated parameters to break JSteg algorithm in a practical case.

First, let us investigate our proposed detectors (21) to (23). It is proposed to perform a numerical simulation over the 1,000 images from BossBase [27] which have been compressed in JPEG with quality factor 70. The payload, or embedding rate, R is set at 0.05 for JSteg algorithm. For a fair comparison with the detector from [22], it first shows the improvement provided by the proposed model with w _k,l=1. As Figure 9a illustrates, all the proposed detectors outperform $\widehat {\Lambda }(u_{k,l})$ (20) proposed by [22]. Moreover, in the following investigation, it is proposed to use $\widehat {\Lambda }_{\textit {cs}}(u_{k,l})$ (25). Then, it is proposed to give the performance of this detector on 1,000 simulated images in which a DCT subband is generated by strictly following the Laplacian distribution (see Figure 9b). Then, a comparison with simulations of the LR test shows the loss of power due to the estimation of expectation and scale parameters. It should be noted that in all our proposed detectors in this paper, $\widehat {\Lambda }(u_{k,l})$ (23) with w _k,l (24) performs best. Thus, it is proposed to use it as our optimal steganalyser for competing with the state-of-the-art JSteg detectors. It is should be emphasised that in Figure 9, the wavelet denoising filter [43] is used for estimating the location parameter $\widehat {\mu }_{k,l}$ (see Section 4.1).

To verify the relevance of the proposed methodology, it is proposed to compare the proposed statistical test with two other detectors. The first chosen competitor is the statistical test proposed in [22] as it is also based on a Laplacian model but does not take into account the distribution parameters as nuisance parameters; it considers that DCT coefficients are i. i. d., following a Laplacian distribution with zero mean. The comparison with this test is meaningful as it allows us to measure how much the detection performance is improved by removing the assumption that the DCT coefficients of each subband are i. i. d. The second chosen competitor is the WS [15] due to its similarity with the proposed statistical test, see details in Section 4.5.

For a large-scale verification, it is proposed to use the ‘break our steganographic system’ (BOSS) database, made of 10,000 grayscale images of size 512×512 pixels, used with payload R=0.05. Prior to our experiments, the images have been compressed in JPEG using the linux command convert which uses the standard quantization table. Note also that all the JSteg steganography was performed using a Matlab source code we developed based on Phil Sallee’s Jpeg Toolbox^f. Four denoising methods have been tested to estimate the expectation of each DCT coefficient, namely the K-SVD, the BM3D, the NL means and the wavelet denoising algorithms.

Figure 10 shows the detection performances obtained over the BOSS database compressed with quality factor 70. The detection performances are shown as ROC curves, that is the detection power is plotted as a function of false alarm probability. Figure 10a particularly emphasises that the statistical test based on the Laplacian model does not perform well while the proposed methodology which takes into account that the Laplacian distribution parameters as nuisance parameters allows us to largely improve the performance. Similarly, the WS detector achieves overall good detection performance. However, it can be shown in Figure 10b, which presents the same results using a logarithmic scale, that for low false alarm probabilities, the performance of the WS significantly decreases. On the opposite, the proposed statistical test still performs well.

Among the four denoising algorithms that have been tested, the BM3D achieves the best performance, but it can be observed in Figure 10 that the performances obtained using the K-SVD and using the wavelet denoising methods are also very good. The performance of NL means method is comparable with the WS detector [15].

To extend the results previously presented, a similar test has been performed over the BOSS database using the quality factor 85. The detection performance obtained by the proposed test and by the competitors are presented in Figure 11. Again, this figure shows that based on the Laplacian model, the statistical test assuming that DCT coefficients of a subband are i. i. d. has an unsatisfactory performance. It can also be noted that even though the WS performs slightly better for low false alarm probability, compared to the results obtained with quality factor 70, it performs much worse than the proposed statistical test.

6 Conclusions

This paper aims at improving the optimal detection of data hidden within the DCT coefficients of JPEG images. Its main originality is that the usual Laplacian model is used as a statistical model of DCT coefficients, but opposed to what is usually proposed, it is not assumed that all DCT coefficients from a subband are i. i. d. This leads us to consider the Laplacian distribution parameters, namely the expectation e and the scale parameter b, as nuisance parameters as they have no interest for the detection of hidden data, but they must be carefully taken into account to design an efficient statistical test. Numerical results show that by estimating those nuisance parameters, the Laplacian model allows the designing of an accurate statistical test which outperforms the WS detector. The comparison with the optimal detector based on the Laplacian model and on the assumption that all DCT coefficients of a subband are i. i. d. shows the relevance of the proposed approach.

A possible future work would be to apply this approach with a state-of-the-art statistical model of DCT coefficients, such as the generalized Gaussian or the generalized gamma model. This could provide improvements in the detection performance at the cost of a higher complexity.

7 Endnotes

^a In this paper, we assume, without loss of generality, that both width and height of the inspected image are multiples of 8.

^b In practice, DCT coefficients belong to set [−1024,…,1023], see [22].

^c Note that we refer to the Lindeberg’s CLT, whose conditions are easily verified in our case, because the random variable are independent but are not i. i. d..

^d Image Processing On-Line journal is available at: http://www.ipol.im

^e Source codes are available at: http://dde.binghamton.edu

^f Phil Sallee’s Jpeg Toolbox is available at: http://dde.binghamton.edu/download/jpeg_toolbox.zip

8 Appendix

9 A Quantized Laplacian pmf

Let X be a Laplacian random variable with expectation μ and variance b. Its pdf is thus (see (13)):

$$ f_{\mu,b}(x) = \frac{1}{2b} \exp\left(-\frac{| x - \mu |}{b} \right), $$

and a straightforward calculation shows that its cdf is given by:

$$\begin{array}{*{20}l} F_{\mu,b}(x) &= \frac{1}{2} + \frac{1}{2}\text{sign}(x-\mu) \left(1 - \exp\left(-\frac{| x - \mu |}{b}\right)\right), \end{array} $$

((30))

$$\begin{array}{*{20}l} &=\left\{ \begin{array}{ll} \frac{1}{2} \exp\left(\frac{x - \mu}{b} \right)& \text{if\;}\; x<\mu,\\ 1-\frac{1}{2} \exp\left(- \frac{x - \mu}{b} \right) & \text{if\;}\; x\geq\mu. \end{array} \right. \end{array} $$

((31))

Now consider the result from quantization of this random variable Y=⌊X/Δ⌋, it is immediate to establish the pmf of this random variable. Let us first consider the case Δ(k+1/2)<μ (due to the symmetry of Laplacian pdf, the case Δ(k−1/2)>μ is treated similarly).

The pmf of Y is given by:

$$\begin{array}{*{20}l} \mathbb{P}[Y=k] &= \mathbb{P}\left[ \Delta(k-{1}/{2})\leq X <\Delta(k+{1}/{2}) \right],\\ &= \frac{1}{2} \exp\left(\frac{\Delta(k+{1}/{2}) - \mu}{b} \right)\\ &\quad- \frac{1}{2} \exp\left(\frac{\Delta(k-{1}/{2}) - \mu}{b}a \right),\\ &= \frac{1}{2} \exp\left(\frac{\Delta k- \mu}{b} \right) \exp\left(\frac{\Delta}{2b} \right) \\ &\quad- \frac{1}{2} \exp\left(\frac{\Delta k- \mu}{b} \right) \exp\left(\frac{-\Delta}{2b} \right),\\ &= \exp\left(\frac{\Delta k- \mu}{b} \right) \sinh\left(\frac{\Delta}{2b} \right), \end{array} $$

Applying similar calculations for case Δ(k−1/2)>μ, one gets:

$$ \mathbb{P}[\!Y=k] = \exp\left(- \frac{|\Delta k- \mu|}{b} \right) \sinh\left(\frac{\Delta}{2b} \right), $$

((32))

which corresponds to the pmf given in Equation (14). The case Δ(k−1/2)<μ<Δ(k+1/2) is treated similarly.

10 C Log-likelihood ratio calculation

By putting the expression of quantized Laplacian pmf (32) into the expression of the LR (7), it is immediate to write:

$$ \Lambda^{\text{lr}}(u_{k,l}) \,=\, \log\!\left(\!1\,-\,\frac{R}{2} \,+\, \frac{R}{2} \frac{\exp\left(- \frac{|\Delta \bar{k}- \mu|}{b} \right) \sinh\left(\frac{\Delta}{2b} \right) }{ \exp\left(- \frac{|\Delta k- \mu|}{b} \right) \sinh\left(\frac{\Delta}{2b} \right)} \right). $$

Let us study the term:

$$ {\fontsize{8.2}{12}\begin{aligned} \frac{\exp\left(- \frac{|\Delta \bar{k}- \mu|}{b} \right) \sinh\left(\frac{\Delta}{2b} \right) }{ \exp\left(- \frac{|\Delta k- \mu|}{b} \right) \sinh\left(\frac{\Delta}{2b} \right)} &= \frac{\exp\left(- \frac{|\Delta \bar{k}- \mu|}{b} \right) }{ \exp\left(- \frac{|\Delta k- \mu|}{b} \right) },\\ &= \frac{\exp\left(- \frac{|\Delta k + \Delta (\bar{k}-k)- \mu|}{b} \right) }{ \exp\left(- \frac{|\Delta k- \mu|}{b} \right) },\\ &= \frac{\exp\left(- \frac{|\Delta k - \mu|}{b} \right) \exp\left(\frac{\Delta\text{sign}(\Delta k - \mu) (k-\bar{k}) }{b} \right) }{ \exp\left(- \frac{|\Delta k- \mu|}{b} \right)},\\ &= \exp\left(\frac{\Delta\text{sign}(\Delta k - \mu) (k-\bar{k}) }{b} \right). \end{aligned}} $$

((33))

From this Equation (33), it is immediate to establish the expression (15):

$$ \log\left(1-\frac{R}{2} + \frac{R}{2} \exp\left(\frac{\Delta\text{sign}(\Delta k - \mu) (k-\bar{k}) }{b} \right) \right). $$

By using a Taylor expansion, Λ ^lr(u _k,l) can be approximated by:

$$\begin{array}{*{20}l} \log&\left(\! 1\,-\,\frac{R}{2} \,+\, \frac{R}{2} \left(1 \,+\, \frac{\Delta\text{sign}(\Delta k - \mu) (k-\bar{k}) }{b} \right) \!\right) \\ &\approx \log\left(1+ \left(\frac{R\Delta\text{sign}(\Delta k - \mu) (k-\bar{k}) }{2b} \right) \right),\\ &\approx \frac{R\Delta}{2b}(k-\bar{k})\text{sign}(\Delta k - \mu). \end{array} $$

11 B LR based on the Gaussian model (WS)

Let X be a Gaussian random variable with expectation μ and variance σ ². Define the quantized Gaussian random variable as follows: Y=⌊X/Δ⌋; its pmf is given by $P_{\mu,\sigma } = \{p_{\mu,\sigma }[\!k]\}_{k=-\infty }^{\infty }$ with:

$$ \begin{aligned} p_{\mu,\sigma}[\!k] &= \mathbb{P}[\!Y=k] \\ &= \int_{\Delta(k-1/2)}^{\Delta(k+1/2)} \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^{2}}{2\sigma^{2}} \right) dx. \end{aligned} $$

Assuming that the quantization step Δ is ‘small enough’ compared to the variance Δ<<σ, it holds true that [6,44]:

$$ p_{\mu,\sigma}[\!k] \approx \frac{\Delta}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(\Delta k-\mu)^{2}}{2\sigma^{2}} \right), $$

((34))

and

$${} {\fontsize{8.9pt}{9.6pt}\selectfont{\begin{aligned} p_{\mu,\sigma}[\!k] + p_{\mu,\sigma}[\bar{k}] \approx \frac{2\Delta}{\sigma\sqrt{2\pi}} \exp\left(-\frac{\left(\Delta\frac{\left(k+\bar{k}\right)}{2}-\mu\right)^{2}}{2\sigma^{2}} \right). \end{aligned}}} $$

((35))

Let us rewrite the LR for the detection of JSteg (7) as follows

$$\begin{array}{*{20}l} \Lambda^{\text{lr}}(u_{k,l}) &= \log\left(1-\frac{R}{2} + \frac{R}{2} \frac{p_{\mu,\sigma}[\bar{k}] }{ p_{\mu,\sigma}[k]} \right),\\ &= \log\left(1-R + \frac{R}{2} \frac{p_{\mu,\sigma}[\bar{k}] + p_{\mu,\sigma}[k] }{ p_{\mu,\sigma}[k]} \right). \end{array} $$

((36))

Using the expressions (34) and (35) let us study the following ratio:

$$ {\fontsize{8.3}{12}\begin{aligned} \frac{p_{\mu,\sigma}[\!\bar{k}] + p_{\mu,\sigma}[\!k] }{ p_{\mu,\sigma}[\!k]} &= 2\frac{\exp\left(-\frac{\left.\Delta\frac{(k+\bar{k})}{2}-\mu\right)^{2}}{2\sigma^{2}} \right) }{ \exp\left(-\frac{(\Delta k-\mu)^{2}}{2\sigma^{2}} \right)},\\ &= 2\frac{\exp\left(-\frac{\left(\Delta k -\mu + \Delta/2 (\bar{k}-k)\right)^{2}}{2\sigma^{2}} \right) }{ \exp\left(-\frac{(\Delta k-\mu)^{2}}{2\sigma^{2}} \right)}, \\ &= 2\frac{\!\exp\!\left(\! -\frac{(\Delta k -\mu)^{2}}{2\sigma^{2}} \!\right) \!\exp\!\left(\! \frac{\Delta (\Delta k -\mu)(k-\bar{k})}{2\sigma^{2}} \!\right) \!\exp\!\left(\! -\frac{\Delta^{2}}{8\sigma^{2}} \!\right) }{ \exp\left(\! -\frac{(\Delta k-\mu)^{2}}{2\sigma^{2}} \!\right)},\\ &= 2 \exp\left(\frac{\Delta (\Delta k -\mu)(k-\bar{k})}{2\sigma^{2}} \right) \exp\left(-\frac{\Delta^{2}}{8\sigma^{2}} \right). \end{aligned}} $$

((37))

Putting the expression (37) into the expression of the log-LR (36) immediately gives:

$$ \begin{aligned} \Lambda^{\text{lr}}(u_{k,l}) &= \log\left(1 +R\left(\exp\left(\frac{\Delta (\Delta k -\mu)(k-\bar{k})}{2\sigma^{2}} \right)\right.\right.\\ &\quad\times\!\!\!\left.\left.\vphantom{\left(\frac{\Delta (\Delta k -\mu)(k-\bar{k})}{2\sigma^{2}} \right)}\exp\left(-\frac{\Delta^{2}}{8\sigma^{2}} \right) -1\right)\right) \end{aligned} $$

((38))

from which is a Taylor expansion around Δ/σ=0, this results from the assumption that Δ<<σ, and finally gives the well-known expression of the WS:

$$ \Lambda^{\text{lr}}(u_{k,l}) \!\approx\! \frac{R\Delta}{2\sigma^{2}} \left(k - \bar{k}x\right) (\Delta k - \mu) $$

((39))

References

R Böhme, Advanced Statistical Steganalysis (Springer, New York, 2010).
Book MATH Google Scholar
J Fridrich, Steganography in Digital Media: Principles, Algorithms, and Applications (Cambridge University Press, Cambridge, 2009).
Book Google Scholar
I Cox, M Miller, J Bloom, J Fridrich, T Kalker, Digital Watermarking and Steganography (Morgan Kaufmann, Burlington, 2007).
Google Scholar
J Fridrich, J Kodovskỳ, in Information Hiding. Steganalysis of LSB replacement using parity-aware features (Springer,New York2013), pp. 31–45.
Chapter Google Scholar
T Zhang, X Ping, in Proceedings of the 2003 ACM Symposium on Applied Computing. A fast and effective steganalytic technique against jsteg-like algorithms (ACM,New York2003), pp. 307–311.
Chapter Google Scholar
R Cogranne, C Zitzmann, L Fillatre, F Retraint, I Nikiforov, P Cornu, in Information Theory Proceedings (ISIT), 2011 IEEE International Symposium On. Statistical decision by using quantized observations (IEEE,New York2011), pp. 1210–1214.
Chapter Google Scholar
R Cogranne, C Zitzmann, L Fillatre, F Retraint, I Nikiforov, P Cornu, in Information Hiding. A cover image model for reliable steganalysis (Springer,New York2011), pp. 178–192.
Chapter Google Scholar
C Zitzmann, R Cogranne, F Retraint, I Nikiforov, L Fillatre, P Cornu, in Information Hiding. Statistical decision methods in hidden information detection (Springer,New York2011), pp. 163–177.
Chapter Google Scholar
J Fridrich, M Goljan, R Du, in Proceedings of the 2001 Workshop on Multimedia and Security: New Challenges. Reliable detection of LSB steganography in color and grayscale images (ACM,New York2001), pp. 27–30.
Chapter Google Scholar
S Dumitrescu, X Wu, Z Wang, Detection of LSB steganography via sample pair analysis. Signal Process. IEEE Trans. 51(7), 1995–2007 (2003).
Article Google Scholar
AD Ker, P Bas, RBöhme, R Cogranne, S Craver, T Filler, J Fridrich, T Pevnỳ, in Proceedings of the First ACM Workshop on Information Hiding and Multimedia Security. Moving steganography and steganalysis from the laboratory into the real world (ACM,New York2013), pp. 45–58.
Chapter Google Scholar
J Fridrich, M Goljan, in Electronic Imaging 2004. On estimation of secret message length in LSB steganography in spatial domain (International Society for Optics and PhotonicsWashington, 2004), pp. 23–34.
Google Scholar
O Dabeer, K Sullivan, U Madhow, S Chandrasekaran, B Manjunath, Detection of hiding in the least significant bit. Signal Process. IEEE Trans. 52(10), 3046–3058 (2004).
Article MathSciNet Google Scholar
AD Ker, R Böhme, in Electronic Imaging 2008. Revisiting weighted stego-image steganalysis (International Society for Optics and PhotonicsWashington, 2008), pp. 681905–681905.
Google Scholar
R Böhme, in Information Hiding. Weighted stego-image steganalysis for JPEG covers (Springer,New York2008), pp. 178–194.
Chapter Google Scholar
R Cogranne, C Zitzmann, F Retraint, IV Nikiforov, P Cornu, L Fillatre, A local adaptive model of natural images for almost optimal detection of hidden data. Signal Process. 100, 169–185 (2014).
Article Google Scholar
EY Lam, JW Goodman, A mathematical analysis of the DCT coefficient distributions for images. Image Process. IEEE Trans. 9(10), 1661–1666 (2000).
Article MATH Google Scholar
F Muller, Distribution shape of two-dimensional DCT coefficients of natural images. Electron. Lett. 29(22), 1935–1936 (1993).
Article Google Scholar
J-H Chang, JW Shin, NS Kim, SK Mitra, Image probability distribution based on generalized gamma function. Signal Process. Lett. IEEE. 12(4), 325–328 (2005).
Article Google Scholar
P Sallee, Model-based methods for steganography and steganalysis. Int. J. Image Graph. 5(01), 167–189 (2005).
Article Google Scholar
R Böhme, A Westfeld, Breaking cauchy model-based JPEG steganography with first order statistics. Computer Security–ESORICS, 125–140 (2004).
C Zitzmann, R Cogranne, L Fillatre, I Nikiforov, F Retraint, P Cornu, in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference On. Hidden information detection based on quantized Laplacian distribution (IEEE,New York2012), pp. 1793–1796.
Chapter Google Scholar
J Kodovsky, J Fridrich, Quantitative structural steganalysis of jsteg. Inform. Forensics Secur. IEEE Trans. 5(4), 681–693 (2010).
Article Google Scholar
K Lee, A Westfeld, S Lee, in Digital Watermarking. Category attack for LSB steganalysis of JPEG images (Springer,New York2006), pp. 35–48.
Chapter Google Scholar
S Lyu, H Farid, Steganalysis using higher-order image statistics. Inform. Forensics Secur. IEEE Trans. 1(1), 111–119 (2006).
Article Google Scholar
T Pevny, J Fridrich, Multiclass detector of current steganographic methods for JPEG format. Inform. Forensics Secur. IEEE Trans. 3(4), 635–650 (2008).
Article Google Scholar
P Bas, T Filler, T Pevný, in Information Hiding, 13th International Workshop, ed. by Filler T. Break our steganographic system — the ins and outs of organizing boss (IEEE,New York2011).
Google Scholar
T Denemark, V Sedighi, V Holub, R Cogranne, J Fridrich, in IEEE Workshop on Information Forensic and Security, Atlanta, GA. Selection-channel-aware rich model for steganalysis of digital images (IEEE,New York2014).
Google Scholar
W Tang, H Li, W Luo, J Huang, in Proceedings of the 2nd ACM Workshop on Information Hiding and Multimedia Security. Adaptive steganalysis against WOW embedding algorithm (ACM,New York2014), pp. 91–96.
Chapter Google Scholar
T Qiao, C Ziitmann, R Cogranne, F Retraint, in Proceedings of the 2nd ACM Workshop on Information Hiding and Multimedia Security. Detection of jsteg algorithm using hypothesis testing theory and a statistical model with nuisance parameters (ACM,New York2014), pp. 3–13.
Chapter Google Scholar
T Qiao, C Zitzmann, R Cogranne, F Retraint, in IEEE International Conference on Image Processing (ICIP). Statistical detection of jsteg steganography using hypothesis testing theory (IEEENew York, 2014), pp. 5517–5521.
Google Scholar
J Nakamura, Image Sensors and Signal Processing for Digital Still Cameras (CRC Press, Boca Raton, 2005).
Book Google Scholar
WB Pennebaker, JL Mitchell, JPEG: Still Image Data Compression Standard (Springer, Germany, 1993).
Google Scholar
D Upham, Jsteg steganographic algorithm 1999 Available on the internet. http://www.filewatcher.com/m/jpeg-jsteg-v4.diff.gz.8878-0.html.
EL Lehmann, JP Romano, Testing Statistical Hypotheses (Springer, Germany, 2006).
Google Scholar
TH Thai, R Cogranne, F Retraint, in ICIP. Steganalysis of Jsteg algorithm based on a novel statistical model of quantized DCT coefficients (IEEE,New York2013), pp. 4427–4431.
Google Scholar
T Thai, R Cogranne, F Retraint, Statistical model of quantized DCT coefficients: Application in the steganalysis of jsteg algorithm. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 23(5), 1980–1993 (2014).
Article MathSciNet Google Scholar
R Cogranne, F Retraint, An asymptotically uniformly most powerful test for LSB matching detection. IEEE Trans. Information Forensics and Security. Publ. IEEE Signal Process. Soc. 8(3), 464–476 (2013).
Article Google Scholar
TH Thai, F Retraint, R Cogranne, in Image Processing (ICIP) 2012 19th IEEE International Conference On. Statistical model of natural images (IEEENew York, 2012), pp. 2525–2528.
Chapter Google Scholar
M Lebrun, An analysis and implementation of the BM3D image denoising method. Image Processing On Line. 2, 175–213 (2012).
Article Google Scholar
M Lebrun, A Leclaire, An implementation and detailed analysis of the K-SVD image denoising algorithm. Image Processing On Line. 2, 96–133 (2012).
Article Google Scholar
A Buades, B Coll, J-M Morel, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference On, 2. A non-local algorithm for image denoising (IEEENew York, 2005), pp. 60–65.
Google Scholar
J Lukas, J Fridrich, M Goljan, Digital camera identification from sensor pattern noise. Inform. Forensics Secur. IEEE Trans. 1(2), 205–214 (2006).
Article Google Scholar
R Cogranne, F Retraint, C Zitzmann, I Nikiforov, L Fillatre, P Cornu, Hidden information detection using decision theory and quantized samples: Methodology, difficulties and results. Digital Signal Process. 24, 144–161 (2014).
Article MathSciNet Google Scholar

Download references

Acknowledgements

The Matlab codes will be published upon paper acceptance. The work of FR, RC and CZ is funded by Troyes University of Technology (UTT) strategic program COLUMBO. The PhD thesis of TQ is funded by the China Scholarship Council (CSC) program.

Author information

Authors and Affiliations

ICD - LM2S - Université de Technologie de Troyes (UTT) - UMR STMR CNRS, 12 rue Marie Curie, Troyes, 10004, France
Tong Qiao, Florent Retraint, Rémi Cogranne & Cathel Zitzmann

Authors

Tong Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Florent Retraint
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Cogranne
View author publications
You can also search for this author in PubMed Google Scholar
Cathel Zitzmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Qiao.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

TQ carried out the main research of this work and drafted the manuscript. FR, RC and CZ helped to modify the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Qiao, T., Retraint, F., Cogranne, R. et al. Steganalysis of JSteg algorithm using hypothesis testing theory. EURASIP J. on Info. Security 2015, 2 (2015). https://doi.org/10.1186/s13635-015-0019-7

Download citation

Received: 10 November 2014
Accepted: 10 February 2015
Published: 13 March 2015
DOI: https://doi.org/10.1186/s13635-015-0019-7

Steganalysis of JSteg algorithm using hypothesis testing theory

Abstract

1 Introduction

1.1 State of the art

1.2 Contributions of the paper

1.3 Organisation of the paper

2 Problem statement

3 LRT for two simple hypothesis

3.1 Optimal detection framework

3.2 Statistical performance of LRT

3.3 Application with Laplacian distribution

4 Proposed approach for estimating the nuisance parameters in practice

4.1 Estimation of expectation of each DCT coefficient

4.2 A local estimation of b

4.3 A channel selection to improve the method

4.4 Design of proposed test

4.5 Comparison with prior art

5 Numerical simulations

5.1 Results on simulated images

5.2 Results on real images

6 Conclusions

7 Endnotes

8 Appendix

9 A Quantized Laplacian pmf

10 C Log-likelihood ratio calculation

11 B LR based on the Gaussian model (WS)

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords