Estimating Previous Quantization Factors on Multiple JPEG Compressed Images

The JPEG compression algorithm has proven to be efficient in saving storage and preserving image quality thus becoming extremely popular. On the other hand, the overall process leaves traces into encoded signals which are typically exploited for forensic purposes: for instance, the compression parameters of the acquisition device (or editing software) could be inferred. To this aim, in this paper a novel technique to estimate “previous” JPEG quantization factors on images compressed multiple times, in the aligned case by analyzing statistical traces hidden on Discrete Cosine Transform (DCT) histograms is exploited. Experimental results on double, triple and quadruple compressed images, demonstrate the effectiveness of the proposed technique while unveiling further interesting insights.


Introduction
The life-cycle of a digital image is extremely complicated nowadays: images are acquired by smartphones or digital cameras, edited, shared through Instant Messaging platforms [1], etc. In each step, the image could go through a modification that potentially changes something without modifying (in almost cases) the semantic content. This makes forensics analysis really difficult in order to reconstruct the history of an image from the first acquisition device to each of the subsequent processing ( [2,3]). Even detecting if an investigated image has been compressed only two times is a challenging task, namely Double Compression Detection ( [4][5][6]). The problem is furtherly complicated by considering the possibility to crop and/or resize images (e.g., aligned and non-aligned scenario [7,8]). State-of-the-art image forensics techniques usually make use of different underlying assumptions specifically addressed for the task ( [7][8][9][10]). This becomes particularly relevant when dealing with multiple compressions [11]. The robust inference of how many times an image has been compressed is a problem investigated *Correspondence: battiato@dmi.unict.it † Sebastiano Battiato, Oliver Giudice, Francesco Guarnera and Giovanni Puglisi contributed equally to this work. 1 University of Catania, Catania, Italy Full list of author information is available at the end of the article with techniques working mainly for the aligned scenario ( [12][13][14][15]). In particular, [15] pushes the detection up to triple compression by defining a three-class classification problem demonstrated to work only for multiple compressed images with the same Quality Factor.
Once an image has been detected to be multiply compressed, the reconstruction of the history of the image itself becomes challenging. First Quantization Estimation (FQE) has been widely investigated for both the aligned and non-aligned cases w.r.t. different datasets in the double compressed scenario.
A first technique for FQE was proposed by Bianchi et al. ( [16][17][18]). They proposed a method based on the Expectation Maximization algorithm to predict the most probable quantization factors of the primary compression over a set of candidates. Other techniques based on statistical consideration of Discrete Cosine Transform (DCT) histograms were proposed by Galvan et al. [4]. Their technique works effectively in specific conditions on double compressed images exploiting the a-priori knowledge of monotonicity of the DCT coefficients by histogram iterative refinement. Strategies related to histogram analysis and filtering similar to Galvan et al. [4] were introduced until these days ( [19][20][21]). Still they lack of robustness and are likely to work only in double compressed scenario and at specific conditions demonstrating many limits. Recently, Machine Learning has been employed for the prediction task making many black-boxes able to train and model statistical data w.r.t. specific datasets. For instance Lukáš and Fridrich in [22] introduced a first attempt exploiting neural networks, furtherly improved in [23] with considerations on errors similar to [4]. At last Convolutional Neural Networks (CNN) were also introduced in some works ( [24][25][26]). CNNs have demonstrated to be incredibly powerful in finding hidden correlations among data, specifically in images but they are also very prone to overfitting, making all the techniques extremely dependent on the dataset used for training ( [27]). This drawback is in some way mitigated by employing as much training data as possible in wild conditions, Niu et al. [28] in this way achieved top-rated results for both aligned and non-aligned double compressions. All the techniques reported above tried to estimate the first quantization matrix in a double compression scenario, although estimating just the previous quantization matrix for multiple compressed images, could be of extreme importance for investigation in order to understand intermediate processing. When it comes to multiple compressions, the number of compression parameters involved in each step for every single image becomes huge. Machine Learning techniques need to see and consider almost all combinations during the training phase, and are not easily viable for this specific task. In this paper, a FQE technique based on simulations of multiple compression processes is proposed in order to detect the most similar DCT histogram computed in the previous compression step. The method is based only on information coming from a single image, thus it does not need a training phase.
The proposed technique starts from the information of the (known) last quantization matrix (easily readable from the image file itself ) in conjunction with simulations of compressions applied to the image itself with proper matrices. Experiments on 2, 3 and 4 times compressed images show the robustness of the technique providing useful insights for investigators at specific compression parameters combination. The remainder of this paper is organized as follows: Sections 2 and 3 describe the proposed approach and datasets, in Sections 4 experimental results are reported in different scenarios, and Section 5 concludes the paper.

Proposed Approach
Given a JPEG m-compressed (compressed m times) image I, the main objective of this work is the estimation of a reliable number of k quantization factors (zig-zag order) of the 8 × 8 quantization matrix Q m−1 (i.e., the quantization table of (m − 1)-th compression), which it is possible to define as q m−1 = q 1 , q 2 , ......, q k . The unique information available about I is the last quantization matrix q m , which can be one of the standard JPEG quantization tables or custom ones ( [29,30]), available by accessing the JPEG file and the extracted (e.g., with LibJpeg C library 1 ) DCT coefficients of each 8 × 8 block (D ref ). No inverse-DCT operation is done at this step, thus no further rounding and truncation errors can be introduced. The set of the obtained DCT blocks and the respective coefficients (multiplied by q m ) are collected to compute an histogram for each of the first k coefficients in classic zig-zag order denoted with: h ref ,k D ref with k ∈ {1, 2, .., 64}. A square patch C I of a size d × d is cropped from the image I previously decompressed (e.g., Python Pillow library 2 ), leaving out 4 pixels for each direction, in order to break the JPEG block structure [22]. C I is then used as input to simulate JPEG compressions, carried out with a certain number n > 0 of constant 8 × 8 matrices M i with i ∈ {1, 2, .., n}. The parameter n is simply set considering the greatest value that can be assumed by the quantization factors employed in the previous quantization step for the worst scenario (i.e., lowest Quality Factor). Once the parameter n is defined, the simulation of compression of C I is arranged as follows: given C I for i = 1, 2, ...n, a 8× 8 quantization matrix M i with each element equal to i is defined, allowing to generate C I,i compressed images. The current (second) compression is then simulated by employing the known q m on each of the n C I,i thus generating C I,i new compressed images. Each C I,i represents a simulation of compression with known previous and last quantization parameters.
As done with I, the DCT coefficients (D i ) are extracted from C I,i , the distributions h i,k (D i ) are computed, with i ∈ {1, .., n}, which represent a set of n distributions for the k coefficient, where k ∈ {1, .., 64}. h i,k (D i ) are then analytically compared, one by one, with the real one h ref ,k D ref through the χ 2 distance defined as follows: where x and y represent the distributions to be compared. Finally the estimation of q m−1 = q 1 , q 2 , ......, q k , can be done for every q k quantization factor as follows: For sake of clarity, the pseudo-code of the process is reported in Algorithm 1.

Datasets
The effectiveness of the proposed approach was demonstrated through experiments performed on four datasets (BOSSBase [31], RAISE [32], Dresden [33] and UCID [34]) for the first quantization estimation in the double compression scenario: patches of different dimensions for j = 1 to k do 11: end for 13: end for 14: for j = 1 to k do 15: were obtained by extracting a proper region from the central part of the original images. A new set of doubly compressed images was then created starting from the cropped images with a certain number of combinations of parameters in terms of crop size and compression quality factors (employing only standard quantization tables [29]).
Other experimental datasets were similarly created from RAISE using custom quantization tables employed in Photoshop and from the collection shared by Park et al. [35]. The first dataset is obtained from all RAISE images cropped in patches 64 × 64, by employing the 8 highest Photoshop custom quantization tables (on 12 total) for first compression (where higher values correspond to better quality factors) and QF 2 = {80, 90}. The second dataset is built from 500 randomly picked full-size RAISE images by considering for first and second compression a collection of 1070 custom tables, with substantial differences from the standard ones, splitted in 3 quality clusters (LOW, MID, HIGH) calculated by the mean of the first 15 DCT coefficients and selected randomly from the clusters in the compression phase.
Finally, a dataset for the multiple compression scenario was created starting from UCID [34]

Experimental Results
To properly assess the performances of the proposed solution, a series of tests have been conducted, considering the datasets described in the previous Section, in multiple compression scenarios. Four approaches were considered for comparison: Bianchi et al. [17], that is a milestone among analytical methods and has great similarity with the proposed approach; Galvan et al. [4] and Dalmia et al. [19] which achieve state of the art results when QF 1 < QF 2 and Niu et al. [28], which represents the state-of-the-art with the use of CNNs with best results as today. It is worth noting that Niu et al. [28] uses different trained neural models for each QF 2 (80 and 90), while the proposed solution works for any QF 2 with the same technique. Although [28] has been designed to work on a more general scenario and the related CNN has been trained considering also the non-aligned double compression, it achieves the best results among CNN based approaches also in the aligned scenario.
As regards implementations used for testing above mentioned techniques: the publicly available 3 Matlab implementation was employed for Bianchi et al. [17]; code from the ICVGIP-2016.RAR archive available on Dr. Manish Okade's website 4 was employed for Dalmia et al. [19]; models and implementation available on Github 5 were employed for Niu at al. [28] and finally an implementation from scratch was employed for Galvan et al. [4] Experiments were carried out for standard tables and custom ones, all employing 64×64 patches extracted from RAISE dataset [32]. As reported in Table 1 and Figs. 1, 2, 3 and 4 the proposed method outperforms almost always the state-of-the-art when the first quantization is computed with standard tables, while the obtained results on images employing Photoshop custom tables demonstrate a much greater gap in accuracy values (see Table 2 and Figs. 5, 6). Results on custom tables show better generalization capabilities w.r.t. [28] which, being CNN-based, seems to be dependent on tables used for training.
Further tests have been performed to demonstrate the robustness of the proposed solution w.r.t. image contents and acquisition conditions (e.g., different devices). Specifically, three datasets have been considered: Dresden [33], UCID [34] and BOSSBase [31]. Results reported in Tables 3-5, confirm the effectiveness of the proposed solution. The impact of the resolution/crop pair is evident observing the results of a single dataset (Table 4)  d×d extracted from an high resolution image contains less information than that extracted from a smaller one, delivering a flatter histogram that is difficult to discriminate. A final test regarding double compressed images has been performed in a much more challenging scenario: a dataset of 500 full-size RAISE images was employed for first and second compression by using 1070 custom tables collected by Park et al. [35] (as described in Section 3). For this test, the parameter of the proposed approach was n = 136 which is the maximum value of the first 15 coefficients among the 1070 quantization tables in this context. Results obtained, in terms of accuracy, are reported in Table 6 and definitively demonstrate the robustness of the technique even in a wild scenario of non-standard tables.

Experiments with Multiple Compressions
The hypothesis that only one compression was performed before the last one could be a strong limit. Thus, a method able to extract information about previous quantization matrices, in a multiple compression scenario, may be a considerable contribution. For this reason, the proposed approach was tested in a triple JPEG compression scenario, where the new goal was the estimation of the    As shown in Fig. 7, the method in general achieves satisfactory results. Some limits are visible when the first compression is strong (low QF) and the second one has been performed with an high quality factor QF 2 ∈ {90, 95, 100}. By analyzing the results in these particular cases, it is worth noting that the method estimates QF m−2 instead of QF m−1 . Figure 8 shows the accuracies obtained in these last cases (QF 2 ∈ {90, 95, 100}) considering as correct estimations the quantization factors related to Q m−1 (a), Q m−2 (b) and both (c). Results shown in (c) demonstrate how the method is able to return information about quantization factors (not only m − 1) even in this challenging scenario. Starting from this phenomenon, in order to discriminate a predicted factor q k between Q m−2 and Q m−1 , a simple test has been carried out on 100 triple compressed images with QF 1 = 65, QF 2 = 95 and QF 3 = 90. Starting from the cropped image C I (see Section 2), we simulated, similarly to the case of double compressions in the proposed approach, all the possible triple compressions taking into account only two hypothesis (i.e., q k belongs to Q 2 or Q 1 ) and considering a constant matrix built from q k as Q 1 or Q 2 respectively. Thus, the obtained simulated distributions are compared with the real one through χ 2 distance (1). In this scenario, the proposed solution correctly estimated Q 1 quantization factors with an accuracy of 95.5%. Moreover, as a side effect of the triple compression also Q 2 is predicted with 76.6% accuracy.
The insights found for the triple compression experiments were confirmed on 4 times JPEG compressed images (Fig. 9). Even in this scenario, if high QF are employed in the third compression (e.g., 90, 95, 100) Q 2 factors are actually predicted in a similar way of what was described before. Besides, if both QF 3 and QF 2 are high, Q 1 elements could be estimated, confirming how the method in each case obtains information about previous compressions.
The proposed method estimates the strongest previous compression which is basically the behavior of most First Quantization Estimation (FQE) methods. For this reason,      Figure 10 reports the accuracy in the QF 3 = 90 scenario showing how our method (left graph) maintains good result even in triple compression while [28] has a significant performance drop compared to double compression.

Cross JPEG Validation
Recent works in literature demonstrate how different JPEG implementations could employ various Discrete Cosine Transform and mathematical operators to perform floating-point to integer conversion of DCT coefficients [36].

Conclusions
In this paper a novel method for previous quantization factor estimation was proposed. The technique outperforms the state-of-the-art in the aligned double compressed JPEG scenario, specifically in the challenging cases where custom JPEG quantization tables are involved. The good results obtained, even in the multiple compression scenarios (up to 4 compressions) highlight that previous compressions leave traces detectable in the distributions of quantization factors. Furthermore, the use of these distributions for previous quantization estimation makes the proposed technique simple with a relatively low computational effort, avoiding extremely computationally hungry techniques while maintaining the same accuracy results. The strengths of the proposed method compared to machine learning approaches are its simplicity and the fact that it does not need training sets.