 Research Article
 Open Access
Reliable Steganalysis Using a Minimum Set of Samples and Features
 Yoan Miche^{1, 2}Email author,
 Patrick Bas^{2},
 Amaury Lendasse^{1},
 Christian Jutten^{2} and
 Olli Simula^{1}
https://doi.org/10.1155/2009/901381
© Yoan Miche et al. 2009
 Received: 1 August 2008
 Accepted: 13 March 2009
 Published: 27 April 2009
Abstract
This paper proposes to determine a sufficient number of images for reliable classification and to use feature selection to select most relevant features for achieving reliable steganalysis. First dimensionality issues in the context of classification are outlined, and the impact of the different parameters of a steganalysis scheme (the number of samples, the number of features, the steganography method, and the embedding rate) is studied. On one hand, it is shown that, using Bootstrap simulations, the standard deviation of the classification results can be very important if too small training sets are used; moreover a minimum of 5000 images is needed in order to perform reliable steganalysis. On the other hand, we show how the feature selection process using the OPELM classifier enables both to reduce the dimensionality of the data and to highlight weaknesses and advantages of the six most popular steganographic algorithms.
Keywords
 Feature Selection
 Extreme Learn Machine
 Feature Selection Technique
 Forward Algorithm
 Feature Selection Process
1. Introduction
Steganography has been known and used for a very long time, as a way to exchange information in an unnoticeable manner between parties, by embedding it in another, apparently innocuous, document.
Nowadays steganographic techniques are mostly used on digital content. The online newspaper Wired News reported in one of its articles [1] on steganography that several steganographic contents have been found on web sites with very large image database such as eBay. Provos and Honeyman [2] have somewhat refuted these facts by analyzing and classifying two million images from eBay and one million from USENet network and not finding any steganographic content embedded in these images. This could be due to many reasons, such as very low payloads, making the steganographic images less detectable to steganalysis and hence more secure.
The process is called secure if , and in this case the steganography is perfect, creating no statistical differences by the embedding of the message. Steganalysis would then be impossible.
Fortunately, such high performance for a steganographic algorithm is hardly achievable when the payload (the embedded information) is of nonnegligible size; also, several schemes have weaknesses.
One way of measuring the payload is the embedding rate, defined as follows.
Let be a steganographic algorithm, and let be a cover medium. , by its design, claims that it can embed at most information bits within ; is called the capacity of the medium and highly depends on the steganographic (stego) algorithm as well as the cover medium itself. The embedding rate is then defined as the part of used by the information to embed.
For bits to embed in the cover medium, the embedding rate is then , usually expressed as percentage. There are other ways to measure the payload and the relationship between the amount of information embedded and the cover medium, such as the number of bits per nonzero coefficient. Meanwhile, the embedding rate has the advantage of taking into account the stego algorithm properties and is not directly based on the cover medium properties—since it uses the stego algorithm estimation of the maximum capacity. Hence the embedding rate has been chosen for this analysis of stego schemes.
The emphasis in this paper is more specifically on the issues related to the increasing number of features, which are linked to the universal steganalyzers. Indeed, the very first examples of LSBbased steganalysis made use of less than ten features, with an adapted and specific methodology for each stego algorithm. The idea of "universal steganalyzers" then became popular. In , Westfeld proposes a based method, on the LSB of DCT coefficients [4]. Five years after, Fridrich in [5] uses a set of features obtained by normalizations of a much larger set, whilst Lyu and Farid already proposed in a set of features [6]. Some feature sets [7] also have variable size depending on the DCT block sizes. Since then, an increasing number of research works use supervised learningbased classifiers in very highdimensional spaces. The recent work of Shi et al. [8] is an example of an efficient result by using features based on JPEG blocks differences modeled by Markov processes.
These new feature sets usually do achieve better and better performance in terms of detection rate and enable to detect most stego algorithm for most embedding rates. Meanwhile, there are some sideeffects to this growing number of features. It has been shown, for example, in [9] that the feature space dimensionality in which the considered classifier is trained can have a significant impact on its performances: a too small amount of images regarding dimensionality (the number of features) might lead to an improper training of the classifier and thus to results with a possibly high statistical variance.
In this paper is addressed the idea of a practical way of comparing steganalysis schemes in terms of performance reliability. Ker proposed [10] such comparison by focusing on the pdf of one output of the classifier. Here are studied multiple parameters that can influence this performance:
(1)the number of images used during the training of the classifier: how to determine a sufficient number of images for an efficient and reliable classification (meaning that final results have acceptable variance)?
(2)the number of features used: what are the sufficient and most relevant features for the actual classification problem?
(3)the steganographic method: is there an important influence of the stego algorithm on the general methodology?
(4)the embedding rate used: does the embedding rate used for the steganography modify the variance of the results and the retained best features (by feature selection)?
It can also be noted that images of higher sizes would lead to a smaller secure steganographic embedding rate (following a rootsquare law), but this phenomenon has already been studied by Filler et al. [11].
The next section details some of the problems related to the number of features used (dimensionality issues) and commonly encountered in steganalysis: (1) the empty space and the distance concentration phenomena, (2) the large variance of the results obtained by the classifier whenever the number of images used for training is not sufficient regarding the number of features, and finally, (3) the lack of interpretability of the results because of the high number of features. In order to address these issues, the methodology sketched on Figure 1 is used and more thoroughly detailed: a sufficient number of images regarding the number of features is first established so that the classifier's training is "reliable" in terms of variance of its results; then, using feature selection the interpretability of the results is improved.
The methodology is finally tested in Section 4 with six different stego algorithms, each using four different embedding rates. Results are finally interpreted thanks to the most relevant selected features for each stego algorithm. A quantitative study of selected features combinations is then provided.
2. Dimensionality Issues and Methodology
The common term "curse of dimensionality" [12] refers to a wide range of problems related to a high number of features. Some of these dimensionality problems are considered in the following, in relation with the number of images and features.
2.1. Issues Related to the Number of Images
2.1.1. The Need for Data Samples
In order to illustrate this problem in a lowdimensional case, one can consider four samples in a twodimensional space (corresponding to four images out of which two features have been extracted); the underlying structure leading to the distribution of these four samples seems impossible to infer, and so is the creation of a model for it. Any model claiming it can properly explain the distribution of these samples will behave erratically (because it will extrapolate) when a new sample is introduced. On the contrary, with hundreds to thousands of samples it becomes possible to see clusters and relationships between dimensions.
More generally, in order for any tool to be able to analyze and find a structure within the data, the number of needed samples is growing exponentially with the dimensionality. Indeed, consider a dimensional unit side hypercube; the number of samples needed to fill the Cartesian grid of step inside of it is growing as . Thus using a common grid of step in dimension 10, it requires samples to fill the grid.
Fortunately, for a model to be built over some highdimensional data, that data does not have to fill the whole space in the sense of the Cartesian grid. The required space to fill highly depends on the density to be estimated.
In practice, most data sets in steganalysis use at least to dimensions, implying a "needed" number of samples impossible to achieve: storing and processing such number of images is currently impossible. As a consequence, the feature space is not filled with enough data samples to estimate the density with reliable accuracy, which can give wrong or high variance models while building classifiers, having to extrapolate for the missing samples: obtained results can have rather high confidence interval and hence be statistically irrelevant. A claim of performance improvement of using a specific classifier/steganalyzer/steganographic scheme with a variance of is rather meaningless.
2.1.2. The Increasing Variance of the Results
The construction of a proper and reliable model for steganalysis is also related to the variance of the results it obtains. Only experimental results are provided to support this claim: with a low number of images regarding the number of features (e.g., a few hundreds of images for features), the variance of the classifier's results can be very important (i.e., the variance of the detection probability).
When the number of images increases, this variance decreases toward low enough values for featurebased steganalysis and performances comparisons. These claims are verified in the next section with the experiments.
2.1.3. Proposed Solution to the Lack of Images
Overall, these two problems lead to the same conclusion: the number of images has to be important regarding dimensionality. Theory states that this number is exponential with the number of features, which is impossible to reach for featurebased steganalysis. Hence, the first step of the proposed methodology is to find a "sufficient" number of images for the number of features used, according to a criterion on the variance of the results.
A Bootstrap [13] is proposed for that task: the number of images used for the training of the classifier is increased, and for each different number of images, the variance of the results of the classifier is assessed. Once the variance of the classifier is below a certain threshold, a sufficient number of images have been found (regarding the classifier and the feature set used).
2.2. Issues Related to the Number of Features
2.2.1. The Empty Space Phenomenon
having its maximum at . Thus, when dimension increases, samples are getting farther from the mean of the distribution. A direct consequence of this is that, for the previously mentioned hypercube in dimension , the "center" of it will tend to be empty, since samples are getting concentrated in the borders and corners of the cube.
Therefore, whatever model is used in such a feature space will be trained on scattered samples which are not filling the feature space at all. The model will then not be proper for any sample falling in an area of the space where the classifier had no information about during the training. It will have to extrapolate its behavior for these empty areas and will have unstable performances.
2.2.2. Lack of Interpretability for Possible "Reverse Engineering"
The interpretability (and its applications) is an important motivation for feature selection and dimensionality reduction: high performances can indeed be reached using the whole features set used in this paper for classification. Meanwhile, if we are looking for the weaknesses and reasons why these features react vividly to a specific algorithm, it seems rather impossible on this important set.
Reducing the required number of features to a small amount through feature selection enables to understand better why a steganographic model is weak on these particular details, highlighted by the selected features. Such analysis is performed in Section 4.3 for all six steganographic algorithms.
2.2.3. Proposed Solution to the High Number of Features
These two issues motivate the feature selection process: if one can reduce the number of features (and hence the dimensionality), the empty space phenomena will have a reduced impact on the classifier used. Also, the set of features obtained by the feature selection process will give insights on the stego scheme and its possible weaknesses.
For this matter, a classical feature selection technique has been used as the second step of the proposed methodology.
The following methodology is different from the one presented previously in [15, 16]. Indeed, in this article, the goal is set toward statistically reliable results. Also, feature selection has the advantage of reducing the dimensionality of the data (the number of features), making the classifier's training much easier. The interpretation of the selected features is also an important advantage (compared to having only the classifier's performance) in that it gives insights on the weaknesses of the stego algorithm.
3. Methodology for Benchmarking of Steganographic Schemes
Addressed Problems
The number of data points to be used for building a model and classification is clearly an issue, and in the practical case, how many points are needed in order to obtain accurate results—meaning results with small standard deviation.
Reduction of complexity is another main addressed concern in this framework. Then for the selected number of points to be used for classification and also the initial dimensionality given by the features set, two main steps remain.
(i)Choosing the feature selection technique. Since analysis and computation can hardly be done on the whole set of features, the technique used to reduce the dimensionality has to be selected.
(ii)Building a classifier. This implies choosing it, selecting its parameters, training, and validating the chosen model.
3.1. Presentation of the Classifier Used: OPELM
where is the number of inputs (number of images in our case), and is the number of neurons in the hidden layer. In the case of steganalysis as performed in this article, denotes the feature vector corresponding to image , while is the corresponding class of the image (i.e., stego or cover).
As said, the novelty introduced by the ELM is to initialize the weights and biases randomly. OPELM, in comparison to ELM, brings a greater robustness to data with possibly dependent/correlated features. Also, the use of other functions (activation functions of the neural network) makes it possible to use OPELM for the case where linear components have an important contribution in the classifier's model, for example.
The validation step of this classifier is performed using classical LeaveOneOut crossvalidation, much more precise than a fold crossvalidation and hence not requiring any test step [13]. It has been shown on many experiments [17, 18] that the OPELM classifier has results very close to the ones of a Support Vector Machine (SVM) while having computational times much smaller (usually from to times).
3.2. Determination of a Sufficient Number of Images
A proper number of images, regarding the number of features, has to be determined. Since theoretical values for that number are not reachable, a sufficient number regarding a low enough value of the variance of the results is taken instead (standard deviation will be used instead of variance, in the following).
The OPELM classifier is hence used along with a Bootstrap algorithm [13] over repetitions; a subset of the complete data set ( images, features) is randomly drawn (with possible repetitions). The classifier is trained with that specific subset. This process is repeated times ( random drawings of the subset) to obtain a statistically reliable estimation of the standard deviation of the results. The size of the subset drawn from the complete data set is then increased, and the iterations are repeated for this new subset size.
The criterion to stop this process is a threshold on the value of the standard deviation of the results. Once the standard deviation of the results gets lower than , it is decided that the subset size , is sufficient. is then used for the rest of the experiments as a sufficient number of images regarding the number of features in the feature set.
3.3. Dimensionality Reduction: Feature Selection by Forward with OPELM
Given the sufficient number of images for reliable training of the classifier, , feature selection can be performed. The second step of the methodology, a Forward algorithm with OPELM (Figure 3), is used.
3.3.1. The Forward Algorithm
The forward selection algorithm is a greedy algorithm [20]; it selects one by one the dimensions, trying to find the one that combines best with the already selected ones. The algorithm is detailed in Algorithm 1 (with denoting the th dimension of the data set).
Algorithm 1: Forward.
while do
for do
Evaluate performance with
end for
Set , with the dimension
giving the best result in the loop
end while
Algorithm 1 requires instances to terminate (to be compared to the instances for an exhaustive search), which might reach the computational limits, depending on the number of dimensions and time to evaluate the performance with one set. With the OPELM as a classifier, computational times for the Forward selection are not much of an issue.
Even if its capacity to isolate efficient features is clear, the Forward technique has some drawbacks. First, if two features present good results when they are put together but poor results if only one of them is selected, Forward might not take these into account in the selection process.
Second, it does not allow to "go back" in the process, meaning that if performances are decreasing along the selection process, and that the addition of another feature makes performances increase again, combinations of previously selected features with this last one are not possible anymore.
The Forward selection is probably not the best possible feature selection technique, and recent contribution to these techniques such as Sequential Floating Forward Selection (SFFS) [21] and improvements of it [22] has shown that the number of computations required for feature selection can be reduced drastically. Nevertheless, the feature selection using Forward has been showing very good results and seems to perform well on the feature set used in this paper. It is not used here in the goal of obtaining the best possible combination of features but more to reduce the dimensionality and obtain some meaning out of the selected features. Improvements of this methodology could make use of such more efficient techniques of feature selection.
3.4. General Methodology
To summarize the general methodology on Figure 3 uses first a Bootstrap with iterations on varying subsets sizes, to obtain a sufficient subset size and statistically reliable classifiers' results regarding the number of features used. With this number of images feature selection is performed using a Forward selection algorithm; this enables to highlight possible weaknesses of the stego algorithm.
This methodology has been applied to six popular stego algorithms for testing. Experiments and results as well as a discussion on the analysis of the selected features are given in the next section.
4. Experiments and Results
4.1. Experiments Setup
4.1.1. Steganographic Algorithms Used
Six different steganographic algorithms have been used: F5 [23], ModelBased (MBSteg) [24], MMx [25] (in these experiments, MM3 has been used), JP Hide and Seek [26], OutGuess [27], and StegHide [28]; all of them with four different embedding rates: 5%, 10%, 15%, and .
4.1.2. Generation of Image Database
The image base was constituted of images from the BOWS2 Challenge [29] database (hosted by Westfeld [30]). These images are PGM greyscale (also available in color).
The steganographic algorithms and the proposed methodology for dimensionality reduction and steganalysis are only performed on these images. It can also be noted that depending on image complexity, as studied in [31], local discrepancies might be observed (a classically trained steganalyzer might have troubles for such images), but on a large enough base of images, this behavior will not be visible.
4.1.3. Extraction of the Features
In the end, the whole set of images is separated in two equal parts: one is kept as untouched cover while the other one is stego with the six steganographic algorithms at four different embedding rates: , , , and . Fridrich's DCT features [32] have been used for the steganalysis.
4.2. Results
Results are presented following the methodology steps. A discussion over the selected features and the possible interpretability of it are developed afterward. In the following, the term "detection rate" stands for the performance of the classifier on a scale from to of classification rate. It is a measure of the performance instead of a measure of error.
4.2.1. Determination of Sufficient Number of Samples
It can be seen on Figure 5 that the standard deviation behaves as expected when increasing the number of images for the cases of JPHS, MBSteg, MMx, OutGuess, and StegHide: its value decreases and tends to be below of the best performance when the number of images is (even if for MBSteg with embedding rate of it is a bit above ). This sufficient number of samples is kept as the reference and sufficient number. Another important point is that with very low number of images ( in these cases), the standard deviation is between and about of the average classifier's performance; meaning that results computed with small number of images have at most a confidence interval. While the plots decrease very quickly when increasing the number of images, values of the standard deviation remain very high until images; these results have to take into account the embedding rate, which tends to make the standard deviation higher as it decreases.
The final "sufficient" number of samples retained for the second step of the methodology—the feature selection—is , for two reasons: first, the computational times are acceptable for the following computations (feature selection step with training of classifier for each step); second, the standard deviation is small enough to consider that the final classification results are given with at most of standard deviation (in the case of MBSteg at of embedding rate).
4.2.2. Forward Feature Selection
Features have first been ranked, using the Forward feature selection algorithm, and detection rates are plotted with increasing number of features (using the ranking provided by the Forward selection) on Figure 6.
The six analyzed stego algorithms give rather different results.
(i)F5 reaches very quickly the maximum performance for all embedding rates: only few features contribute to the overall detection rate.
(ii)JPHS reaches a plateau in performance (within the standard deviation of ) for all embedding rates with features and remains around that performance.
(iii)OutGuess has this same plateau at features, and performances are not increasing anymore above that number of features (still within the standard deviation of the results).
(iv)StegHide can be considered to have reached the maximum result (within the standard deviation interval) at features.
(v)In the MM3 case, performances for embedding rates , 15%, and are very similar as are selected features. Performances stable at features. The difference for the case is most likely due to matrix embedding which makes detection harder when the payload is small.
(vi)Performances for MBSteg are stable using features for embedding rates and . Only are enough for embedding rate . The case of embedding rate has the classifier's performances increasing with the addition of features.
Interestingly, the features retained by the Forward selection for each embedding rate differ slightly within one steganographic algorithm. Details about the features ranked as first by the Forward algorithm are discussed afterward.
4.3. Discussion
First, the global performances, when using the reduced and sufficient feature sets mentioned in the results section above, are assessed. Note that feature selection for performing reverse engineering of a steganographic algorithm is theoretically efficient only if the features are carrying different information (if two features represent the same information, the feature selection will select only one of them).
4.3.1. Reduced Features Sets
Performances for OPELM LOO for the best features set along with the size of the reduced feature set (number). Performances using the reduced set are within the 1% range of standard deviation of the best results. The size of the set has been determined to be the smallest possible one giving this performance.
5%  Number  10%  Number  

F5  73.3  46  83.9  38 
JPHS  90.7  41  92.1  21 
MBSteg  63.3  57  70.9  93 
MM3  78.00  81  86.2  49 
OutGuess  81.2  65  93.2  49 
Steghide  82.3  149  91.2  89 
15%  Number  20%  Number  
F5  90.5  33  96.3  15 
JPHS  93.7  41  97.3  25 
MBSteg  83.5  73  88.5  69 
MM3  86.6  57  86.6  73 
OutGuess  98.8  33  100.0  29 
Steghide  96.4  73  99  73 
It should be noted that since the aim of the feature selection is to reduce as much as possible the feature set while keeping overall same performance, it is expected that within the standard deviation interval the performance with the lowest possible number of features is behind the "maximum" one.
It remains possible, for the studied algorithms, as Figure 6 shows, to find a higher number of features for which the performance is closer or equal to the maximum one—even though this is very disputable, considering the maximal standard deviation interval when using images. But this is not the goal of the feature selection step of the methodology.
4.4. Feature Sets Analysis for Reverse Engineering
Common feature sets have been selected according to the following rule: take the first common ten features (in the order ranked by the Forward algorithm) to each feature set obtained for each embedding rate (within one algorithm). It is hoped that through this selection the obtained features will be generic regarding the embedding rate.
The 23 features previously detailed.
Functional/Feature  Functional F 

Global histogram 

Individual histogram for 5 DCT Modes  , 
Dual histogram for 11 DCT values  , 
Variation  V 
and blockiness  , 
Cooccurrence  , , 
This set of features is expanded up to a set of , by removing the norm used previously and keeping all the values of the matrices and vectors. This results in the following features set.
(i)A global histogram of 11 dimensions , .
(ii)5 low frequency DCT histograms each of 11 dimensions .
(iii)11 dual histograms each of 9 dimensions .
(iv)Variation between blocks, of dimension 1 .
(v)2 blockinesses of dimension 1 , .
(vi)Cooccurrence matrix of dimensions .
The following is a discussion on the selected features for each steganographic algorithm.
Common feature set for F5 with average rank for each feature.






(4)  (8)  (12)  (13)  (19) 





(21)  (22)  (26)  (31)  (55) 
Common feature set for MM3 with average rank for each feature.






(1)  (3)  (7)  (22)  (22) 





(35)  (40)  (41)  (42)  (49) 
Common feature set for JPHS with average rank for each feature.






(1)  (25)  (26)  (30)  (30) 





(34)  (52)  (61)  (61)  (65) 
Common feature set for MBSteg with average rank for each feature.






(4)  (6)  (10)  (10)  (24) 





(27)  (31)  (32)  (36)  (50) 
Common feature set for OutGuess with average rank for each feature.






(3)  (3)  (7)  (8)  (12) 





(14)  (23)  (31)  (41)  (45) 
Common feature set for StegHide with average rank for each feature.






(6)  (22)  (22)  (25)  (27) 





(28)  (45)  (46)  (47)  (54) 
4.4.1. F5I
F5 (Table 3) is rather sensitive to both blockiness detections and, interestingly, is the only of the six tested algorithms to be sensitive to the variation . As for other algorithms, cooccurrence coefficients are triggered.
4.4.2. MM3
MM3 (Table 4) tends to be sensitive to global histogram features as well as DCT histograms, which are not preserved.
4.4.3. JPHS
JPHS (Table 5) seems not to preserve the DCT coefficients histograms. Also the dual histograms react vividly for center values and extremes ones ( and ).
4.4.4. MBSteg
The features used (Table 6) include global histograms with values , , and , which happens only because of the calibration in the feature extraction process. MBSteg preserves the coefficients' histograms but does not take into account a possible calibration. Hence, the unpreserved histograms are due to the calibration process in the feature extraction. Information leaks through the calibration process. Also cooccurrence values are used, which is a sign that MBSteg does not preserve low and high frequencies.
4.4.5. OutGuess
Cooccurrence values are mostly used (values , ) in the feature set for OutGuess (Table 7) and a clear weak point. The calibration process has also been of importance since the global histograms of extreme values and have been taken into account.
4.4.6. StegHide
For StegHide (Table 8), blockiness and cooccurrence values are mostly used, again for low and high frequencies.
From a general point of view, it can be seen that most of the analyzed algorithms are sensitive to statistics of lowpasscalibrated DCT coefficients, represented by features and . This is not surprising since these coefficients contain a large part of the information of a natural image; their associated densities are likely to be modified by the embedding process.
5. Conclusions
This paper has presented a methodology for the estimation of a sufficient number of images for a specific feature set using the standard deviation of the detection rate obtained by the classifier as a criterion (a Bootstrap technique is used for that purpose); the general methodology presented can nonetheless be extended and applied to other feature sets. The second step of the methodology aims at reducing the dimensionality of the data set, by selecting the most relevant features, according to a Forward selection algorithm; along with the positive effects of a lower dimensionality, analysis of the selected features is possible and gives insights on the steganographic algorithm studied.
Three conclusions can be drawn from the methodology and experiments in this paper.
(i)Results on standard deviation for almost all studied steganographic algorithms have proved that the featurebased steganalysis is reliable and accurate only if a sufficient number of images is used for the actual training of the classifier used. Indeed, from most of the results obtained concerning standard deviation values (and therefore statistical stability of the results), it is rather irrelevant to possibly increase detection performance by while working with a standard deviation for these same results of .
(ii)Through the second step of the methodology, the required number of features for steganalysis can be decreased. This with three main advantages: (a) performances remain the same if the reduced feature set is properly constructed; (b) the selected features from the reduced set are relevant and meaningful (the selected set can possibly vary, according to the feature selection technique used) and make reverseengineering possible; (c) the weaknesses of the stego algorithm also appear from the selection; this can lead, for example, to improvements of the stego algorithm.
(iii)The analysis on the reduced common feature sets obtained between embedding rates of the same stego algorithm shows that the algorithms are sensitive to roughly the same features, as a basis. Meanwhile, when embedding rates get as low as , or for very efficient algorithms, some very specific features appear.
Hence, the first step of the methodology is a requirement for not only any new stego algorithm but also new feature sets/steganalyzers, willing to present its performances: a sufficient number of images for the stego algorithm and the steganalyzer used to test it have to be assessed in order to have stable results (i.e., with a small enough standard deviation of its results to make the comparison with current state of the art techniques meaningful).
Also, from the second step of the methodology, the most relevant features can be obtained and make possible a further analysis of the stego algorithm considered, additionally to the detection rate obtained by the steganalyzer.
Appendix
Features Ranked by the Forward Algorithm
The 40 first features ranked by the Forward algorithm for the F5 algorithm at 5% embedding rate.









































The 40 first features ranked by the Forward algorithm for the JPHS algorithm at 5% embedding rate.









































The 40 first features ranked by the Forward algorithm for the MBSteg algorithm at 5% embedding rate.









































The 40 first features ranked by the Forward algorithm for the MM3 algorithm at 5% embedding rate.









































The 40 first features ranked by the Forward algorithm for the Outguess algorithm at 5% embedding rate.









































The 40 first features ranked by the Forward algorithm for the Steghide algorithm at 5% embedding rate.









































Declarations
Acknowledgments
The authors would like to thank Jan Kodovsky and Jessica Fridrich for their implementation of the DCT Feature Extraction software. Also many thanks to Tomás Pevný for his helpful comments and suggestions on this article. The work in this paper was supported (in part) by the French national funding under the project RIAM Estivale (ANR05RIAMO1903), ANR projects Nebbiano (ANR06SETI009), and TSAR French Project (ANR SSIA 20062008).
Authors’ Affiliations
References
 McCullagh D: Secret Messages Come in .Wavs. Wired News. February 2001, http://www.wired.com/news/politics/0,1283,41861,00.html
 Provos N, Honeyman P: Detecting steganographic content on the internet. Proceedings of the Network and Distributed System Security Symposium (NDSS '02), February 2002, San Diego, Calif, USAGoogle Scholar
 Cachin C: An informationtheoretic model for steganography. Proceedings of the 2nd International Workshop on Information Hiding (IH '98), April 1998, Portland, Ore, USA, Lecture Notes in Computer Science 1525: 306318.View ArticleGoogle Scholar
 Westfeld A, Pfitzmann A: Attacks on steganographic systems. In Proceedings of the 3rd International Workshop on Information Hiding (IH '99), SeptemberOctober 2000, Dresden, Germany, Lecture Notes in Computer Science. Volume 1768. Springer; 6176.Google Scholar
 Fridrich J: Featurebased steganalysis for JPEG images and its implications for future design of steganographic schemes. Proceedings of the 6th International Workshop on Information Hiding (IH '04), May 2004, Toronto, Canada, Lecture Notes in Computer Science 3200: 6781.Google Scholar
 Lyu S, Farid H: Detecting hidden messages using higherorder statistics and support vector machines. Proceedings of the 5th International Workshop on Information Hiding (IH '02), October 2003, Noordwijkerhout, The Netherlands, Lecture Notes in Computer Science 2578: 340354.View ArticleMATHGoogle Scholar
 Agaian SS, Cai H: New multilevel dct, feature vectors, and universal blind steganalysis. Security, Steganography, and Watermarking of Multimedia Contents VII, January 2005, San Jose, Calif, USA, Proceedings of SPIE 5681: 653663.View ArticleGoogle Scholar
 Shi YQ, Chen C, Chen W: A Markov process based approach to effective attacking JPEG steganography. Proceedings of the 8th International Workshop on Information Hiding (IH '06), July 2007, Alexandria, Va, USA, Lecture Notes in Computer Science 4437: 249264.View ArticleGoogle Scholar
 François D: Highdimensional data analysis: optimal metrics and feature selection, Ph.D. thesis. Université Catholique de Louvain, Louvain, Belgium; September 2006.Google Scholar
 Ker AD: The ultimate steganalysis benchmark? Proceedings of the 9th Multimedia and Security Workshop (MM/Sec '07), September 2007, Dallas, Tex, USA 141148.View ArticleGoogle Scholar
 Filler T, Ker AD, Fridrich J: The square root law of steganographic capacity for Markov covers. In Media Forensics and Security, January 2009, San Jose, Calif, USA, Proceedings of SPIE Edited by: Delp EJ III, Dittmann J, Memon ND, Wong PW. 7254: 111.View ArticleGoogle Scholar
 Bellman R: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton, NJ, USA; 1961.MATHGoogle Scholar
 Chapman & Hall/CRC, Londres, Argentina,Efron B, Tibshirani RJ: An Introduction to the Bootstrap. Chapman & Hall/CRC, Londres, Argentina; 1994.MATHGoogle Scholar
 Scott DW, Thompson JR: Probability density estimation in higher dimensions. In Computer Science and Statistics: Proceedings of the 15th Symposium on the Interface, March 1983, Houston, Tex, USA. Edited by: Douglas SR. NorthHolland; 173179.Google Scholar
 Miche Y, Bas P, Lendasse A, Jutten C, Simula O: Extracting relevant features of steganographic schemes by feature selection techniques. Proceedings of the 3rd Wavilla Challenge (Wacha '07), June 2007, St. Malo, France 115.Google Scholar
 Miche Y, Roue B, Lendasse A, Bas P: A feature selection methodology for steganalysis. Proceedings of the International Workshop on Multimedia Content Representation, Classification and Security (MRCS '06), September 2006, Istanbul, Turkey, Lecture Notes in Computer Science 4105: 4956.View ArticleGoogle Scholar
 Miche Y, Bas P, Jutten C, Simula O, Lendasse A: A methodology for building regression models using extreme learning machine: OPELM. Proceedings of the 16th European Symposium on Artificial Neural Networks (ESANN '08), April 2008, Bruges, Belgium 16.Google Scholar
 Sorjamaa A, Miche Y, Weiss R, Lendasse A: Longterm prediction of time series using NNEbased projection and OPELM. Proceedings of the International Joint Conference on Neural Networks (IJCNN '08), June 2008, Hong Kong 26742680.Google Scholar
 Huang GB, Zhu QY, Siew CK: Extreme learning machine: theory and applications. Neurocomputing 2006, 70(1–3):489501.View ArticleGoogle Scholar
 Rossi F, Lendasse A, François D, Wertz V, Verleysen M: Mutual information for the selection of relevant variables in spectrometric nonlinear modelling. Chemometrics and Intelligent Laboratory Systems 2006, 80(2):215226. 10.1016/j.chemolab.2005.06.010View ArticleGoogle Scholar
 Ververidis D, Kotropoulos C: Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing 2008, 88(12):29562970. 10.1016/j.sigpro.2008.07.001View ArticleMATHGoogle Scholar
 Ververidis D, Kotropoulos C: Fast sequential floating forward selection applied to emotional speech features estimated on des and susas data collections. In Proceeding of the 14th European Signal Processing Conference (EUSIPCO '06), September 2006, Florence, Italy Edited by: EURASIP. 15.Google Scholar
 Westfeld A: F5—a steganographic algorithm. Proceedings of the 4th International Workshop on Information Hiding (IH '01), April 2001, Pittsburgh, Pa, USA, Lecture Notes in Computer Science 2137: 289302.View ArticleMATHGoogle Scholar
 Sallee P: Modelbased steganography. Proceedings of the 2nd International Workshop Digital Watermarking (IWDW '03), October 2004, Seoul, Korea, Lecture Notes in Computer Science 2939: 254260.Google Scholar
 Kim Y, Duric Z, Richards D: Modified matrix encoding technique for minimal distortion steganography. Proceedings of the 8th International Workshop on Information Hiding (IH '06), July 2007, Alexandria, Va, USA, Lecture Notes in Computer Science 4437: 314327.View ArticleGoogle Scholar
 Latham A: Jphide & seek. August 1999, http://linux01.gwdg.de/~alatham/stego.html
 Provos N: Defending against statistical steganalysis. Proceedings of the 10th USENIX Security Symposium, August 2001, Washington, DC, USA 24.Google Scholar
 Hetzl S, Mutzel P: A graphtheoretic approach to steganography. In Proceedings of the 9th IFIP TC6 TC11 International Conference on Communications and Multimedia Security (CMS '05), September 2005, Salzburg, Austria, Lecture Notes in Computer Science. Volume 3677. Springer; 119128.Google Scholar
 Watermarking Virtual Laboratory (Wavila) of the European Network of Excellence ECRYPT The 2nd bows contest (break our watermarking system), 2007Google Scholar
 Westfeld A: Reproducible signal processing (bows2 challenge image database, public).Google Scholar
 Liu Q, Sung AH, Ribeiro B, Wei M, Chen Z, Xu J: Image complexity and feature mining for steganalysis of least significant bit matching steganography. Information Sciences 2008, 178(1):2136. 10.1016/j.ins.2007.08.007View ArticleGoogle Scholar
 Pevny T, Fridrich J: Merging Markov and DCT features for multiclass JPEG steganalysis. Security, Steganography, and Watermarking of Multimedia Contents IX, January 2007, San Jose, Calif, USA, Proceedings of SPIE 6505: 113.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.