Use of SHDM in commutative watermarking encryption

SHDM stands for Sphere-Hardening Dither Modulation and is a watermarking algorithm based on quantizing the norm of a vector extracted from the cover work. We show how SHDM can be integrated into a fully commutative watermarking-encryption scheme and investigate implementations in the spatial, DCT, and DWT domain with respect to their fidelity, robustness, capacity, and security of encryption. The watermarking scheme, when applied in the DCT or DWT domain, proves to be very robust against JPEG/JPEG2000 compression. On the other hand, the spatial domain-based approach offers a large capacity. The increased robustness of the watermarking schemes, however, comes at the cost of rather weak encryption primitives, making the proposed CWE scheme suited for low to medium security applications with high robustness requirements.

Another motivation to consider watermarking in the encrypted domain are the increasing needs generated by cloud computing platforms and various privacy preserving applications.
In [1], four requirements on watermarking in the encrypted domain are formulated: • Property 1. The marking function M can be performed on an encrypted image. • Property 2. The verification function V is able to reconstruct a mark in the encrypted domain when it has been embedded in the encrypted domain. • Property 3. The verification function V is able to reconstruct a mark in the encrypted domain when it has been embedded in the clear domain. • Property 4. The decryption function does not affect the integrity of the watermark.
As is pointed out in [1], properties 2 and 3 are equivalent, if the encryption function E and the marking function M commute, that is, where E is the encryption function, K is the encryption key, I is the plaintext media data, and m is the mark to be embedded. In recent years, a number of commutative watermarking-encryption (CWE) schemes have been formulated. The present paper describes a novel CWE scheme, that couples sign-bit encryption of selected pixel grey values or transform coefficients as encryption part with Sphere-Hardening Dither Modulation (SHDM) [2,3] as the watermarking part. The encryption part can optionally be enhanced by permuting the pixels or transform coefficients, respectively. While the idea of encrypting coefficient sign-bits within a CWE scheme is not new, the use of SHDM as watermarking part is, leading to a more robust scheme than previous approaches.
The rest of the paper is organized as follows: previous CWE proposals are summarized in Section 2. SHDM is briefly reviewed in Section 3. In Section 4, we describe implementations of the proposed CWE scheme in the spatial, DCT, and DWT domain, respectively, and discuss their security. Section 5 provides experimental results on the robustness of the watermarking part in the three implementation domains, and Section 6 discusses the security aspects of the CWE schemes in terms of cryptographic and watermarking security. Section 7 concludes the paper.

Related work
There are currently three basic approaches to commutative watermarking encryption for raw image data. The first approach, called partial encryption, divides the image data into two parts and encrypts one of them (typically the perceptually more important part) and watermarks the other part. Thus, encryption part and watermarking part are completely independent and do not interfere with each other. First, important examples in this vein are provided by [4] and [5]. In [5], the basic idea is to encrypt DCT sign bits and to watermark their absolute values by means of dithered modulation. Similarly, in [4], the data are partitioned into two parts after a four-level discrete wavelet transformation. The low-level coefficients are fully encrypted, while in the medium-and high-level coefficients only the signs are encrypted and their absolute values are watermarked. In [6], encryption and watermarking happens within a secret transform domain, the Tree-Structured-Haar (TSH) Transform, which involves a secret parameter. Both the watermark embedder and encryptor need to share knowledge about the secret parameter generating the transform domain. The transform coefficients are first quantized to get B bitplanes. The N most significant bitplanes are encrypted, and B − N − 1 of the remaining bitplanes are watermarked. The least significant bitplane is replaced by the signs of the plaintext coefficients. In this approach, using a secret transform domain increases the security of the scheme, albeit at the cost that encryption and watermarking are not completely independent, but need to share a common secret.
Another approach to commutative watermarking is provided by deploying homomorphic encryption techniques so that some basic algebraic operations such as addition and multiplication on the plaintexts can be transferred onto the corresponding ciphertexts, i.e., they are transparent to encryption [1,Sec. 2.1]. Especially, if both the encryption and the watermarking process consist of the same homomorphic operation, one gets a commutative watermarking encryption scheme. Examples of homomorphic operations are exponentiation modulo n, multiplication modulo n, and addition modulo n (including the bitwise XOR operation). One major drawback of this approach is the influence of encryption on robustness of the watermarking algorithm: after strong encryption, there is no visual information available for the watermark embedder to adapt itself to in order to increase robustness while at the same time minimizing visual quality degradation [7,Sec. 9.4]. Another drawback is that the homomorphic watermarking operation can seriously affect the fidelity. In [8], for example, addition modulo n is used, where n is the number of grey values. However, the modular addition operation may cause overflow/underflow pixels that are handled in a preprocessing step during the encryption operation, making the system only "quasi-commutative. " Third, in invariant encryption schemes, the data are fully encrypted, but the encryption operation leaves a certain subspace of the data invariant, which may be used for watermarking. In [9], a permutation cipher is applied to the image, leaving the histogram of grey values invariant. The watermark is embedded by manipulating the histogram. Depending on viewpoint, schemes based on encrypting sign bits of transform coefficients may also be seen as invariant encryption schemes, as the absolute values of the coefficients form an invariant subspace.
In a another line of work, researchers concentrate their efforts on watermarking and encrypting the bitstream after encoding the data according to a certain standard. In [10], a commutative watermarking encryption scheme based on encrypting the intra-prediction modes and the sign bits of the DCT coefficients within the H.264 bitstream is presented. Here, watermarking of residual DCT coefficients is based on Quantized Index Modulation (QIM) [11]. In [12], in order to achieve the commutative property, one set of syntax elements within the HEVC bitstream is utilized for data hiding, while another set is exploited for encryption. In [13] and [14], the JPEG-LS bitstream is jointly watermarked and encrypted using the AES algorithm in Cipher Block Chaining (CBC)-mode, making this scheme suitable for scenarios with high security requirements, like in medical imaging.

SHDM
Sphere-Hardening Dither Modulation, or SHDM for short, was proposed by Balado in [2] and [3] as an alternative to STDM (Spread-Transform Dither Modulation), which was proposed in [11]. Both SHDM and STDM have in common that in order to embed a single bit b, a multidimensional host vector x is extracted from the cover work C 0 and modified using some dithered quantization function Q b , where different message bits lead to different dither values. While in STDM the projection x t · u of the host vector x onto some random vector u is quantized, in SHDM, the norm x is quantized. More specifically, the embedding rule in SHDM is given by where for b ∈ {0, 1}, is the quantizing function. Extraction of the embedded bit is done via where˜ y is the disturbed signal vector at the detector site.
Note that the direction of the signal vector x is not changed by embedding, which is advantageous from a perceptual point of view, as opposed to related methods like STDM. As is shown in [2], SHDM offers the same level of robustness against additive white noise as STDM.

Using SHDM in a CWE scheme
In SHDM, the signal vector x = (x 1 , . . . , x N ) may be extracted from the host in an arbitrary fashion. In this section, we implement SHDM in the spatial (pixel) domain, the DCT domain, and the DWT domain and combine it with matching encryption schemes. As in SHDM the vector norm is quantized, we have the following options for encryption: • Encrypt the sign bits of the x i by means of a stream cipher. • Permute the x i . • Apply some other norm-preserving operation on x, e.
g. a random rotation.
Of course, the options may be combined. In the rest of the paper, we will only explore the first two options.

Implementation in the spatial domain
In the spatial domain, we work directly with pixel grey values ranging initially between 0 and 255. In order to create sign bits, we subtract 128 from each grey value, so that the new range is −128 ≤ 0 ≤ 127.

Watermarking part
The watermarking part uses the following parameters: • The watermarking key W K .
• The dimension of the host vectors x i , i.e., the number of coefficients N into which one bit b i is embedded. • The quantizing step .
In the spatial domain, we assume that the watermarking key consists of two parts: After choosing a step size and N, the embedding process consists of the following steps: • For each bit b i to be embedded, randomly select N pixels. The selection is controlled by W • Quantize the norm of x i according to b i : • Embed the mark into x i by changing its norm to x i q : and replace the grey values of pixels corresponding to components of x i by the corresponding entries in y i .
For extraction of bit b i , the disturbed signal vector˜ y i is formed from the marked image C W in the same way as x i was generated from the host image C 0 . The norm of˜ y i is quantized and b i is computed according to Figure 1 shows two embedding examples with different resolutions (512 × 512 and 800 × 1600, respectively), where a random 64-bit watermark was embedded using a quantization step size of = 75.

Encryption part
Images in the spatial domain with grey values ranging between 0 and 255 can be represented by eight so-called bitplanes, where the most significant bitplane (MSB) indicates whether the grey value of a certain pixel is greater than 127 or not. Thus, after subtracting 128 from every grey value, the MSB indicates the sign of the grey values. Sign bit encryption is therefore equivalent to encrypting the MSB by means of a stream cipher.
The security of encrypting the MSB of an image has been investigated in [15]. Not only is the amount of image quality degradation insufficient for most applications, it is also possible to estimate the encrypt sign bits based on the assumption that neighboring grey values in a natural image do not change abruptly. Both problems can be remedied if a permutation cipher is applied on the pixels in addition. However, the permutation must not mix the N pixels used for embedding b i with the N pixels used for embedding a different bit b j . Therefore, the permutation cipher and the watermark embedder must share knowledge of W (1) K , which is the part of W K governing pixel selection. If this condition is met, watermarking and (permutation-based) ciphering commute. However, as is well-known, permutation ciphers are vulnerable to known plaintext attacks (see [16] for a quantitative analysis).
Moreover, in order to have as many permutation as possible, N should be chosen as large as possible, that means in the spatial domain, where H and W are the height and width of the host image C 0 , and n is the length of the embedded string. In order to have a minimum level of security against brute-force attacks, we need N ≥ 32, leading to a maximum capacity of bits in the spatial domain, meaning, e.g., 2 13 bits for a 512 × 512 image (note that the term capacity is used throughout this paper according to the definition given in [17]: the watermarking capacity of digital image is the number of bits that can be embedded in a given host image.). Figure 2 shows encrypted versions of the marked Lena image in Fig. 1a. Thanks to the commutativity of watermarking and encryption, the mark can be extracted from both without errors.

Implementation in the DCT domain
In the DCT domain, we assume that the watermarking key consists of three parts: We begin by performing a block-based two-dimensional DCT on the host image C 0 , i.e., we divide C 0 into non-overlapping 8 × 8 pixel blocks and perform a twodimensional DCT on each block.

Watermarking part
• For each bit b i to be embedded, randomly select N blocks. The selection is controlled by W (1) K . Each block can only be selected once. The selected blocks for bit b i form subset T i of the set of all blocks.
• For each T i , randomly select a horizontal and a vertical frequency index from the medium frequencies. The selection is controlled by W Note that because all DCT coefficients in x i correspond to the same horizontal and vertical frequency pair, it is possible to choose individual quantizing step sizes i according to their perceptual importance (see below). • Embed the mark into x i by changing its norm to x i q : x i x i (12) and replace the selected DCT coefficients in T i with the corresponding entries in y i .
In choosing the i step sizes, we were led by the JPEG quantization matrix, which assigns a perceptual relevance to each DCT coefficient in an (8 × 8) block. The essential step in JPEG compression consists in quantizing DCTcoefficients according to fixed quantization tables corresponding to a certain quality factor. On the other hand, it is well known that QIM-based watermarking schemes are sensitive to re-quantization.
In order to counter the adverse effects of requantization, we therefore chose quantization step sizes i for the individual DCT coefficients selected for embedding bit b i that were oriented at the actual quantization step sizes in the JPEG standard. More specifically, we used the quantization matrix taken from the JPEG standard ( [18]), which gives the JPEG quantization steps for the DCT coefficients within a 8 × 8 block referring to a 50% quality factor, and multiplied it with a constant c > 1. If the DCT coefficients for embedding b i correspond to frequencies (k, ), we have the rationale behind this approach being the well-known fact that for a quantizing function Q, we have if > δ (see [19], Theorem 1). This means that quantizing some value y with step size δ can be reversed by another quantization with a larger step size . Figure 3 shows two embedding examples, where a random 64-bit message was embedded. For the Lena image, we set N = 64, c = 3.5, and for the higher resolution Norba image, we set N = 312, c = 3.5 (cf. Section 4.2.2 for details on how N was chosen).
In order to extract message bit b i , the disturbed marked signal vector˜ y i is extracted from the marked image C W in the same way as the unmarked vector x i was built from C 0 with the help of W K . The message bit is then decoded according to

Encryption part
As in the spatial domain, we investigate the two options of encrypting DCT-coefficient sign bits and of permuting them. The idea of encrypting the sign bits of DCT coefficients goes back to [20] and [21], where it is proposed to encrypt sign bits of DCT coefficients and motion vectors in MPEG video. The security of this approach for still images is classified as low in [15], p. 51. In order to create a larger visual distortion, instead of permuting the DCT coefficients alone, we permuted the complete (8 × 8)−blocks containing the coefficients (see [20] and [21]). If only those blocks containing the selected coefficients for watermarking are permuted, however, the corresponding subset T becomes visible to an attacker, who can in turn concentrate her efforts to remove the mark on T. We therefore need to permute all image blocks. Moreover, as in the spatial domain case, in order to make sure that the selected blocks T i for a single bit b i do not get mixed up with blocks for a different bit or nonselected blocks, the permutation algorithm needs to know part W (1) K of the watermarking key. More specifically, each subset T i needs to form an invariant subset of the set of all blocks under the permutation. As in the spatial domain, these subsets need to be as large as possible. We therefore have in the DCT domain. The requirement N ≥ 32 gives a maximum capacity of meaning 128 bits for a 512 × 512 image. Figure 4 shows encrypted versions of the marked Lena image in Fig. 4. Again, the mark can be extracted from both without errors.

Implementation in the DWT domain 4.3.1 Watermarking part
In the DWT domain, we performed a three-level DWT and embedded the mark into the level 3 approximation coefficients. This way, the number N of coefficients used to embed one bit is the same as in Section 4.2, namely In the DWT-case, however, there are no blocks of coefficients to choose a frequency from, thus W K consists of only two parts: K governs the selection of N coefficients for each message bit b i , and W (2) K controls the dither d i for each bit. Likewise, a single quantization step size is used for all message bits. As an example, Fig. 5 shows the results of embedding 64 bits into the Lena and Norba image, setting = 100.

Encryption part
As in the DCT case, we have the options to either encrypt the sign bits of DWT coefficients, as already proposed in [21], and/or to permute the DWT coefficients, as originally proposed in [22]. Note that in the DWT domain, permuting coefficients is not as vulnerable to knownplaintext attacks as in other domains, because the location of coefficients is image-dependent [15]. However, if only the level 3 approximation coefficients are encrypted or permuted, the image content is not rendered completely unintelligible, but fine structures are still visible (see Figs. 6a, b). As in the DCT-case, we have the additional option of not only permuting the level 3 DWT-coefficients themselves but the complete (8 × 8) blocks leading to the level 3 approximation for a more complete obfuscation of the image content (see Fig. 6c), without sacrificing the commutativity with watermarking. The maximum capacity is the same as in the DCT-based implementation.

Experimental results
In our experiments, we used 50 standard images of format 512 × 512, most of them downloaded from http://decsai.ugr.es/cvg/CG/base.htm. We embedded 64 random bits and fixed all other parameters in such a way that a PSNR of about 50dB resulted for the watermarked images. In the spatial domain and the DWT domain, this meant a quantizing step size of = 175 (see Section 5.1 for details).
In the DCT domain, the c-Parameter (see Section 4.2) was set to 8.0. The similarity of the extracted watermarks to the originally embedded watermarks was measured using the normalized correlation of the two vectors.

Fidelity
We first investigated how the resp. the c-parameter affects the PSNR of the watermarked image compared to the host image. Perhaps not surprisingly, the effect of on the PSNR is practically the same for the spatial domain and the DWT domain (see Fig. 7).
As the c Parameter is not directly comparable to theparameter for the other two domains, the corresponding graph is shown here in a separate diagram (Fig. 8).
Both In another fidelity experiment, we investigated the influence of the message size on the PSNR (see Fig. 9).
Again, the spatial domain and DWT-based implementations show almost equal behavior, except that the spatial domain scheme has much a larger capacity.

JPEG compression
Both the DCT -and the DWT-based implementations prove to be very robust against JPEG compression (see Fig. 10). Both schemes also outperform the scheme proposed in [6] with respect to JPEG compression, which offers a normalized correlation of 0.22 at a JPEG quality factor of 50%.

JPEG2000 compression
The results of our experiments with JPEG2000 compression basically follow the same pattern as the JPEG experiments. The spatial domain implementation is the most fragile one, but is still surprisingly robust, especially at low compression rates. The DCT-based implementation and the DWT-based implementation perform almost equally well. Only for higher compression rates, the DWT-based implementation has a slight advantage. (see Fig. 11). Again, both transform domain based schemes outperform the scheme presented in [6] and have roughly the same robustness against JPEG2000 compression as the scheme in [4], which works in the LL 4 -subband.

Adding noise
All three implementation domains perform equally well in the presence of low-or medium-density additive white noise. For higher noise densities, the DWT-based implementation is the most robust (see Fig. 12).

Security of encryption
In this subsection, we summarize and enhance the security assessments made in Section 4 for the three implementation domains.
As sign bit encryption in the spatial domain can be attacked directly [15] to reveal part of the image contents, this approach seems to be weakest of all options. Combining it with a permutation cipher makes for a cryptographically and visually stronger cipher, although the permutation cipher is in turn vulnerable to known plaintext attacks. This means, however, to share part of the watermarking key between content owner and seller.
Sign bit encryption in the DCT domain has been attacked by Wu and Kuo [23], who could recover some visual information from the encrypted image by setting the DC coefficient to 128 and giving all AC coefficients a positive sign. Again, a combination with a block-based permutation will strengthen the security of the cipher (note that we do not recommend to permute DCT coefficients directly, as the DC coefficient will normally stick out as the one with the largest absolute value). For the DWT domain, there are, to the best of our knowledge, no analogous attacks on sign-bit encryption in the literature. Still, it is recommended to encrypt not only the watermarked DWT coefficients of the LL 3 subband, but all subbands, and combine the sign bit encryption with permutation, if a secret sharing between content owner and seller is possible. If this is not the case, the content owner can resort to permute all subbands excluding the LL 3 subband.

Watermarking security
According to [24], watermarking security means the occurrence of an adversary trying to break the system, as opposed to random modifications of a marked image due to benign image processing. In the following discussion, we assume that an attacker has access to the unencrypted, marked image C W , but not to the original host image C 0 or the watermarking key W K . In this context, breaking the system means that the attacker is either able to insert a mark of her own, or to detect a mark, or to remove the mark from C W without rendering the image unusable.
In order to successfully detect a watermark or embed a watermark of her own without knowledge of the watermarking key W K , an attacker would have to guess how the signal vectors are formed as a first step. If the mark is embedded in the DCT-or DWT domain, there are As is shown in Section 5, it is rather hard for an attacker to remove the watermark from a marked image by adding white noise or compression, especially if the mark was embedded in the DCT or DWT domain. Without knowledge of the correct watermarking key, depending on the implementation domain, an attacker would have to modify the value of the pixel grey values or transform coefficients in such way that the norm of each possible signal vector is changed by an amount of at least /2.

Conclusion
We have presented a novel commutative watermarking encryption (CWE) scheme, which is very robust to common attacks like JPEG/JPEG2000 compression and noise addition, especially when implemented in some transform domain (Discrete-Cosine or Discrete-Wavelet). On the other hand, the spatial domain implementation has the advantage of a much higher capacity. However, the robustness comes at the cost of relatively weak encryption primitives, especially if the scheme is applied in the spatial or DCT domain. The implementation in the DWT domain offers the best tradeoff between robustness and security of the cipher. Nevertheless, because of the leakage of visual contents if sign bit encryption is used exclusively, and because of the inherent weaknesses of permutation ciphers, the proposed scheme is recommended for scenarios with low to medium security requirements with regard to the image contents, where robustness of the watermark has the highest priority. For many commercial application scenarios, this seems to be a good fit. In future research, we will explore ways to further enhance the security of the encryption primitives by using norm-preserving operations.