Skip to main content

Use of SHDM in commutative watermarking encryption


SHDM stands for Sphere-Hardening Dither Modulation and is a watermarking algorithm based on quantizing the norm of a vector extracted from the cover work. We show how SHDM can be integrated into a fully commutative watermarking-encryption scheme and investigate implementations in the spatial, DCT, and DWT domain with respect to their fidelity, robustness, capacity, and security of encryption. The watermarking scheme, when applied in the DCT or DWT domain, proves to be very robust against JPEG/JPEG2000 compression. On the other hand, the spatial domain-based approach offers a large capacity. The increased robustness of the watermarking schemes, however, comes at the cost of rather weak encryption primitives, making the proposed CWE scheme suited for low to medium security applications with high robustness requirements.


Encryption and watermarking are both important tools in protecting digital contents, e.g., in digital rights management (DRM) systems. While encryption is used to protect the contents from unauthorized access, watermarking can be deployed for various purposes, ranging from ensuring authenticity of content to embedding metadata, e.g., copyright or authorship information, into the contents. In the DRM context, for example, clients need to have the ability to decrypt the contents and may thus eventually misuse the ciphertext contents. The protection provided by digital watermarks, however, remains within the contents, and can serve to identify misbehaving clients.

In a buyer-seller scenario where the content owner does not trust the seller to sell copies of her own, the content owner can supply the seller with an encrypted version of the content, which is in turn individually watermarked for each buyer by the seller. In such a situation, it is important that a watermark can be embedded in the encrypted domain and detected in the cleartext domain.

Another motivation to consider watermarking in the encrypted domain are the increasing needs generated by cloud computing platforms and various privacy preserving applications.

In [1], four requirements on watermarking in the encrypted domain are formulated:

  • Property 1. The marking function \(\mathcal {M}\) can be performed on an encrypted image.

  • Property 2. The verification function \(\mathcal {V}\) is able to reconstruct a mark in the encrypted domain when it has been embedded in the encrypted domain.

  • Property 3. The verification function \(\mathcal {V}\) is able to reconstruct a mark in the encrypted domain when it has been embedded in the clear domain.

  • Property 4. The decryption function does not affect the integrity of the watermark.

As is pointed out in [1], properties 2 and 3 are equivalent, if the encryption function \(\mathcal E\) and the marking function \(\mathcal M\) commute, that is,

$$ \mathcal M({\mathcal E}_{K}(I),m) = {\mathcal E}_{K}(\mathcal M(I,m)) $$

where \({\mathcal E}\) is the encryption function, K is the encryption key, I is the plaintext media data, and m is the mark to be embedded.

In recent years, a number of commutative water-marking-encryption (CWE) schemes have been formulated. The present paper describes a novel CWE scheme, that couples sign-bit encryption of selected pixel grey values or transform coefficients as encryption part with Sphere-Hardening Dither Modulation (SHDM) [2, 3] as the watermarking part. The encryption part can optionally be enhanced by permuting the pixels or transform coefficients, respectively. While the idea of encrypting coefficient sign-bits within a CWE scheme is not new, the use of SHDM as watermarking part is, leading to a more robust scheme than previous approaches.

The rest of the paper is organized as follows: previous CWE proposals are summarized in Section 2. SHDM is briefly reviewed in Section 3. In Section 4, we describe implementations of the proposed CWE scheme in the spatial, DCT, and DWT domain, respectively, and discuss their security. Section 5 provides experimental results on the robustness of the watermarking part in the three implementation domains, and Section 6 discusses the security aspects of the CWE schemes in terms of cryptographic and watermarking security. Section 7 concludes the paper.

Related work

There are currently three basic approaches to commutative watermarking encryption for raw image data. The first approach, called partial encryption, divides the image data into two parts and encrypts one of them (typically the perceptually more important part) and watermarks the other part. Thus, encryption part and watermarking part are completely independent and do not interfere with each other. First, important examples in this vein are provided by [4] and [5]. In [5], the basic idea is to encrypt DCT sign bits and to watermark their absolute values by means of dithered modulation. Similarly, in [4], the data are partitioned into two parts after a four-level discrete wavelet transformation. The low-level coefficients are fully encrypted, while in the medium- and high-level coefficients only the signs are encrypted and their absolute values are watermarked. In [6], encryption and watermarking happens within a secret transform domain, the Tree-Structured-Haar (TSH) Transform, which involves a secret parameter. Both the watermark embedder and encryptor need to share knowledge about the secret parameter generating the transform domain. The transform coefficients are first quantized to get B bitplanes. The N most significant bitplanes are encrypted, and BN−1 of the remaining bitplanes are watermarked. The least significant bitplane is replaced by the signs of the plaintext coefficients. In this approach, using a secret transform domain increases the security of the scheme, albeit at the cost that encryption and watermarking are not completely independent, but need to share a common secret.

Another approach to commutative watermarking is provided by deploying homomorphic encryption techniques so that some basic algebraic operations such as addition and multiplication on the plaintexts can be transferred onto the corresponding ciphertexts, i.e., they are transparent to encryption [1, Sec. 2.1]. Especially, if both the encryption and the watermarking process consist of the same homomorphic operation, one gets a commutative watermarking encryption scheme. Examples of homomorphic operations are exponentiation modulo n, multiplication modulo n, and addition modulo n (including the bitwise XOR operation). One major drawback of this approach is the influence of encryption on robustness of the watermarking algorithm: after strong encryption, there is no visual information available for the watermark embedder to adapt itself to in order to increase robustness while at the same time minimizing visual quality degradation [7, Sec. 9.4]. Another drawback is that the homomorphic watermarking operation can seriously affect the fidelity. In [8], for example, addition modulo n is used, where n is the number of grey values. However, the modular addition operation may cause overflow/underflow pixels that are handled in a preprocessing step during the encryption operation, making the system only “quasi-commutative.”

Third, in invariant encryption schemes, the data are fully encrypted, but the encryption operation leaves a certain subspace of the data invariant, which may be used for watermarking. In [9], a permutation cipher is applied to the image, leaving the histogram of grey values invariant. The watermark is embedded by manipulating the histogram. Depending on viewpoint, schemes based on encrypting sign bits of transform coefficients may also be seen as invariant encryption schemes, as the absolute values of the coefficients form an invariant subspace.

In a another line of work, researchers concentrate their efforts on watermarking and encrypting the bitstream after encoding the data according to a certain standard. In [10], a commutative watermarking encryption scheme based on encrypting the intra-prediction modes and the sign bits of the DCT coefficients within the H.264 bitstream is presented. Here, watermarking of residual DCT coefficients is based on Quantized Index Modulation (QIM) [11]. In [12], in order to achieve the commutative property, one set of syntax elements within the HEVC bitstream is utilized for data hiding, while another set is exploited for encryption. In [13] and [14], the JPEG-LS bitstream is jointly watermarked and encrypted using the AES algorithm in Cipher Block Chaining (CBC)-mode, making this scheme suitable for scenarios with high security requirements, like in medical imaging.


Sphere-Hardening Dither Modulation, or SHDM for short, was proposed by Balado in [2] and [3] as an alternative to STDM (Spread-Transform Dither Modulation), which was proposed in [11]. Both SHDM and STDM have in common that in order to embed a single bit b, a multidimensional host vector \(\vec x\) is extracted from the cover work C0 and modified using some dithered quantization function Qb, where different message bits lead to different dither values. While in STDM the projection \({\vec x^{t} }\cdot \vec u\) of the host vector \(\vec x\) onto some random vector \(\vec u\) is quantized, in SHDM, the norm \(\Vert \vec x \Vert \) is quantized.

More specifically, the embedding rule in SHDM is given by

$$ \vec y = Q_{b}(\Vert x \Vert,\Delta,d) \cdot {\vec x \over \Vert x \Vert}, $$

where for b{0,1},

$$ Q_{b}(\Vert x \Vert,\Delta,d) = \Delta \cdot \lfloor{ {\Vert x\Vert - d - b\Delta/2 \over \Delta} \rfloor} + d + b \cdot \Delta/2 $$

is the quantizing function. Extraction of the embedded bit is done via

$$ b = \arg \min_{b \in \{0,1\}} \vert \Vert \tilde{\vec y} \Vert - Q_{b}\left(\Vert \tilde{\vec y} \Vert,\Delta,d\right) \vert, $$

where \(\tilde {\vec y}\) is the disturbed signal vector at the detector site.

Note that the direction of the signal vector \(\vec x\) is not changed by embedding, which is advantageous from a perceptual point of view, as opposed to related methods like STDM. As is shown in [2], SHDM offers the same level of robustness against additive white noise as STDM.

Using SHDM in a CWE scheme

In SHDM, the signal vector \(\vec x = (x_{1},\dots,x_{N})\) may be extracted from the host in an arbitrary fashion. In this section, we implement SHDM in the spatial (pixel) domain, the DCT domain, and the DWT domain and combine it with matching encryption schemes. As in SHDM the vector norm

$$ \Vert \vec x \Vert = \sqrt{x_{1}^{2} + x_{2}^{2} + \dots x_{N}^{2}}, $$

is quantized, we have the following options for encryption:

  • Encrypt the sign bits of the xi by means of a stream cipher.

  • Permute the xi.

  • Apply some other norm-preserving operation on \(\vec x\), e. g. a random rotation.

Of course, the options may be combined. In the rest of the paper, we will only explore the first two options.

Implementation in the spatial domain

In the spatial domain, we work directly with pixel grey values ranging initially between 0 and 255. In order to create sign bits, we subtract 128 from each grey value, so that the new range is −128≤0≤127.

Watermarking part

The watermarking part uses the following parameters:

  • The watermarking key WK.

  • The watermark \(W = (b_{1},\dots,b_{n})\) to be embedded.

  • The dimension of the host vectors \(\vec x_{i}\), i.e., the number of coefficients N into which one bit bi is embedded.

  • The quantizing step Δ.

In the spatial domain, we assume that the watermarking key consists of two parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)}\right)\). After choosing a step size Δ and N, the embedding process consists of the following steps:

  • For each bit bi to be embedded, randomly select N pixels. The selection is controlled by \(W_{K}^{(1)}\). The corresponding grey values form the signal vector \(\vec x_{i}\).

  • Randomly generate a dither value di, controlled by \(W_{K}^{(2)}\).

  • Quantize the norm of \(\vec x_{i}\) according to bi:

    $$ \Vert \vec x_{i} \Vert_{q} = Q_{b_{i}}\left(\Vert \vec x_{i} \Vert, \Delta, d_{i}\right) $$
  • Embed the mark into \(\vec x_{i}\) by changing its norm to \(\Vert \vec x_{i} \Vert _{q}\):

    $$ \vec y_{i} = \Vert \vec x_{i} \Vert_{q} \cdot {\vec x_{i} \over \Vert \vec x_{i} \Vert} $$

    and replace the grey values of pixels corresponding to components of \(\vec x_{i}\) by the corresponding entries in \(\vec y_{i}\).

For extraction of bit bi, the disturbed signal vector \(\tilde {\vec y}_{i}\) is formed from the marked image CW in the same way as \(\vec x_{i}\) was generated from the host image C0. The norm of \(\tilde {\vec y}_{i}\) is quantized and bi is computed according to

$$ b_{i} = \arg \min_{b_{i} \in \{0,1\}} \vert \Vert \tilde{\vec y}_{i} \Vert - Q_{b_{i}}\left(\Vert \tilde{\vec y}_{i} \Vert,\Delta,d_{i}\right) \vert. $$

Figure 1 shows two embedding examples with different resolutions (512×512 and 800×1600, respectively), where a random 64-bit watermark was embedded using a quantization step size of Δ=75.

Fig. 1

Embedding 64 bits in the spatial domain. a PSNR = 57.41 dB. b PSNR = 79.16 dB

Encryption part

Images in the spatial domain with grey values ranging between 0 and 255 can be represented by eight so-called bitplanes, where the most significant bitplane (MSB) indicates whether the grey value of a certain pixel is greater than 127 or not. Thus, after subtracting 128 from every grey value, the MSB indicates the sign of the grey values. Sign bit encryption is therefore equivalent to encrypting the MSB by means of a stream cipher.

The security of encrypting the MSB of an image has been investigated in [15]. Not only is the amount of image quality degradation insufficient for most applications, it is also possible to estimate the encrypt sign bits based on the assumption that neighboring grey values in a natural image do not change abruptly. Both problems can be remedied if a permutation cipher is applied on the pixels in addition. However, the permutation must not mix the N pixels used for embedding bi with the N pixels used for embedding a different bit bj. Therefore, the permutation cipher and the watermark embedder must share knowledge of \(W_{K}^{(1)}\), which is the part of WK governing pixel selection. If this condition is met, watermarking and (permutation-based) ciphering commute. However, as is well-known, permutation ciphers are vulnerable to known plaintext attacks (see [16] for a quantitative analysis).

Moreover, in order to have as many permutation as possible, N should be chosen as large as possible, that means

$$ N = \lfloor {H\cdot W \over n} \rfloor $$

in the spatial domain, where H and W are the height and width of the host image C0, and n is the length of the embedded string. In order to have a minimum level of security against brute-force attacks, we need N≥32, leading to a maximum capacity of

$$ n_{max} = \lfloor {H\cdot W \over 32} \rfloor $$

bits in the spatial domain, meaning, e.g., 213 bits for a 512×512 image (note that the term capacity is used throughout this paper according to the definition given in [17]: the watermarking capacity of digital image is the number of bits that can be embedded in a given host image.).

Figure 2 shows encrypted versions of the marked Lena image in Fig. 1a. Thanks to the commutativity of watermarking and encryption, the mark can be extracted from both without errors.

Fig. 2

Encrypting the marked Lena image in the spatial domain. a Sign bit encryption. b Permutation cipher

Implementation in the DCT domain

In the DCT domain, we assume that the watermarking key consists of three parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)},W_{K}^{(3)}\right)\). We begin by performing a block-based two-dimensional DCT on the host image C0, i.e., we divide C0 into non-overlapping 8×8 pixel blocks and perform a two-dimensional DCT on each block.

Watermarking part

  • For each bit bi to be embedded, randomly select N blocks. The selection is controlled by \(W_{K}^{(1)}\). Each block can only be selected once. The selected blocks for bit bi form subset Ti of the set of all blocks.

  • For each Ti, randomly select a horizontal and a vertical frequency index from the medium frequencies. The selection is controlled by \(W_{K}^{(2)}\). The corresponding DCT-coefficients from the selected blocks form an N-dimensional vector \(\vec x_{i}\).

  • For each Ti, randomly generate a dither value di under control of \(W_{K}^{(3)}\).

  • Quantize the norm of \(\vec x_{i}\) according to bi:

    $$ \Vert \vec x_{i} \Vert_{q} = Q_{b_{i}}\left(\Vert \vec x_{i} \Vert, \Delta_{i}, d_{i}\right) $$

    Note that because all DCT coefficients in \(\vec x_{i}\) correspond to the same horizontal and vertical frequency pair, it is possible to choose individual quantizing step sizes Δi according to their perceptual importance (see below).

  • Embed the mark into \(\vec x_{i}\) by changing its norm to \(\Vert \vec x_{i} \Vert _{q}\):

    $$ \vec y_{i} = \Vert \vec x_{i} \Vert_{q} \cdot {\vec x_{i} \over \Vert \vec x_{i} \Vert} $$

    and replace the selected DCT coefficients in Ti with the corresponding entries in \(\vec y_{i}\).

In choosing the Δi step sizes, we were led by the JPEG quantization matrix, which assigns a perceptual relevance to each DCT coefficient in an (8×8) block. The essential step in JPEG compression consists in quantizing DCT-coefficients according to fixed quantization tables corresponding to a certain quality factor. On the other hand, it is well known that QIM-based watermarking schemes are sensitive to re-quantization.

In order to counter the adverse effects of re-quantization, we therefore chose quantization step sizes Δi for the individual DCT coefficients selected for embedding bit bi that were oriented at the actual quantization step sizes in the JPEG standard. More specifically, we used the quantization matrix

$$J = \left(\begin{array}{lllllllll} 16 & 11 & 10 & 16 & 24 & 40 & 51 & 61 \\ 12 & 12 & 14 & 19 & 26 & 58 & 60 & 55 \\ 14 & 13 & 16 & 24 & 40 & 57 & 69 & 56 \\ 14 & 17 & 22 & 29 & 51 & 87 & 80 & 62 \\ 18 & 22 & 37 & 56 & 68 & 109 & 103 & 77 \\ 24 & 36 & 55 & 64 & 81 & 104 & 113 & 92 \\ 49 & 64 & 78 & 87 & 103 & 121 & 120 & 101 \\ 72 & 92 & 95 & 98 & 112 & 100 & 103 & 99 \\ \end{array}\right) $$

taken from the JPEG standard ([18]), which gives the JPEG quantization steps for the DCT coefficients within a 8×8 block referring to a 50% quality factor, and multiplied it with a constant c>1. If the DCT coefficients for embedding bi correspond to frequencies (k,), we have

$$ \Delta_{i} = c\cdot J_{k \ell}, $$

the rationale behind this approach being the well-known fact that for a quantizing function Q, we have

$$ Q(Q(Q(x,\Delta),\delta),\Delta) = Q(x,\Delta), $$

if Δ>δ (see [19], Theorem 1). This means that quantizing some value y with step size δ can be reversed by another quantization with a larger step size Δ.

Figure 3 shows two embedding examples, where a random 64-bit message was embedded. For the Lena image, we set N=64,c=3.5, and for the higher resolution Norba image, we set N=312,c=3.5 (cf. Section 4.2.2 for details on how N was chosen).

Fig. 3

Embedding 64 bits in the DCT domain. a N=64, PSNR = 56.10 dB. b N=312, PSNR = 63.74 dB

In order to extract message bit bi, the disturbed marked signal vector \(\tilde {\vec y}_{i}\) is extracted from the marked image CW in the same way as the unmarked vector \(\vec x_{i}\) was built from C0 with the help of WK. The message bit is then decoded according to

$$ b_{i} = \arg \min_{b_{i} \in \{0,1\}} \vert \Vert \tilde{\vec y}_{i} \Vert - Q_{b_{i}}\left(\Vert \tilde{\vec y}_{i} \Vert,\Delta_{i},d_{i}\right) \vert. $$

Encryption part

As in the spatial domain, we investigate the two options of encrypting DCT-coefficient sign bits and of permuting them. The idea of encrypting the sign bits of DCT coefficients goes back to [20] and [21], where it is proposed to encrypt sign bits of DCT coefficients and motion vectors in MPEG video. The security of this approach for still images is classified as low in [15], p. 51.In order to create a larger visual distortion, instead of permuting the DCT coefficients alone, we permuted the complete (8×8)−blocks containing the coefficients (see [20] and [21]). If only those blocks containing the selected coefficients for watermarking are permuted, however, the corresponding subset T becomes visible to an attacker, who can in turn concentrate her efforts to remove the mark on T. We therefore need to permute all image blocks. Moreover, as in the spatial domain case, in order to make sure that the selected blocks Ti for a single bit bi do not get mixed up with blocks for a different bit or non-selected blocks, the permutation algorithm needs to know part \(W_{K}^{(1)} \) of the watermarking key. More specifically, each subset Ti needs to form an invariant subset of the set of all blocks under the permutation. As in the spatial domain, these subsets need to be as large as possible. We therefore have

$$ N = \vert T_{i} \vert = \lfloor {{(H/8)\cdot (W/8)}\over n} \rfloor $$

in the DCT domain. The requirement N≥32 gives a maximum capacity of

$$ n_{max} = \lfloor {H\cdot W \over {64\cdot 32} }\rfloor, $$

meaning 128 bits for a 512×512 image.

Figure 4 shows encrypted versions of the marked Lena image in Fig. 4. Again, the mark can be extracted from both without errors.

Fig. 4

Encrypting the marked Lena image in the DCT domain. a Coefficient sign bit encryption. b Permutation of 8×8 blocks

Implementation in the DWT domain

Watermarking part

In the DWT domain, we performed a three-level DWT and embedded the mark into the level 3 approximation coefficients. This way, the number N of coefficients used to embed one bit is the same as in Section 4.2, namely

$$ N = \vert T_{i} \vert = \lfloor {{(H/8)\cdot (W/8)}\over n} \rfloor. $$

In the DWT-case, however, there are no blocks of coefficients to choose a frequency from, thus WK consists of only two parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)}\right)\), where \(W_{K}^{(1)}\) governs the selection of N coefficients for each message bit bi, and \(W_{K}^{(2)}\) controls the dither di for each bit. Likewise, a single quantization step size Δ is used for all message bits. As an example, Fig. 5 shows the results of embedding 64 bits into the Lena and Norba image, setting Δ=100.

Fig. 5

Embedding 64 bits in the DWT domain.a N=64, PSNR = 54.84 dB. b N=312, PSNR = 62.94 dB

Encryption part

As in the DCT case, we have the options to either encrypt the sign bits of DWT coefficients, as already proposed in [21], and/or to permute the DWT coefficients, as originally proposed in [22]. Note that in the DWT domain, permuting coefficients is not as vulnerable to known-plaintext attacks as in other domains, because the location of coefficients is image-dependent [15]. However, if only the level 3 approximation coefficients are encrypted or permuted, the image content is not rendered completely unintelligible, but fine structures are still visible (see Figs. 6a, b). As in the DCT-case, we have the additional option of not only permuting the level 3 DWT-coefficients themselves but the complete (8×8) blocks leading to the level 3 approximation for a more complete obfuscation of the image content (see Fig. 6c), without sacrificing the commutativity with watermarking. The maximum capacity is the same as in the DCT-based implementation.

Fig. 6

Encrypting the marked Lena image in the DWT domain. a LL3 coefficient sign bit encryption. b Permutation of LL3 coefficients. c Permutation of 8×8 blocks

Experimental results

In our experiments, we used 50 standard images of format 512×512, most of them downloaded from We embedded 64 random bits and fixed all other parameters in such a way that a PSNR of about 50dB resulted for the watermarked images. In the spatial domain and the DWT domain, this meant a quantizing step size of Δ=175 (see Section 5.1 for details).

In the DCT domain, the c-Parameter (see Section 4.2) was set to 8.0. The similarity of the extracted watermarks to the originally embedded watermarks was measured using the normalized correlation of the two vectors.


We first investigated how the Δ resp. the c-parameter affects the PSNR of the watermarked image compared to the host image. Perhaps not surprisingly, the effect of Δ on the PSNR is practically the same for the spatial domain and the DWT domain (see Fig. 7).

Fig. 7

PSNR versus Δ in the spatial and the DWT domain (averaged over 50 images)

As the c Parameter is not directly comparable to the Δ-parameter for the other two domains, the corresponding graph is shown here in a separate diagram (Fig. 8).

Fig. 8

PSNR vs c Parameter (averaged over 50 images)

Both Figs. 7 and 8 reveal that a parameter choice of Δ=175 for spatial and DWT domain and of c=8.0 for the DCT domain give rise to a PSNR of about 50 dB, if 64 bits are embedded. This provides the basic setting for our further experiments.

In another fidelity experiment, we investigated the influence of the message size on the PSNR (see Fig. 9). Again, the spatial domain and DWT-based implementations show almost equal behavior, except that the spatial domain scheme has much a larger capacity.

Fig. 9

PSNR vs message size (averaged over 50 images)

JPEG compression

Both the DCT - and the DWT-based implementations prove to be very robust against JPEG compression (see Fig. 10). Both schemes also outperform the scheme proposed in [6] with respect to JPEG compression, which offers a normalized correlation of 0.22 at a JPEG quality factor of 50%.

Fig. 10

Correlation value versus JPEG quality factor in three investigated domains (averaged over 50 images)

JPEG2000 compression

The results of our experiments with JPEG2000 compression basically follow the same pattern as the JPEG experiments. The spatial domain implementation is the most fragile one, but is still surprisingly robust, especially at low compression rates.

The DCT-based implementation and the DWT-based implementation perform almost equally well. Only for higher compression rates, the DWT-based implementation has a slight advantage. (see Fig. 11). Again, both transform domain based schemes outperform the scheme presented in [6] and have roughly the same robustness against JPEG2000 compression as the scheme in [4], which works in the LL4-subband.

Fig. 11

Correlation value versus JPEG2000 compression ratio in three investigated domains (averaged over 50 images)

Adding noise

All three implementation domains perform equally well in the presence of low- or medium-density additive white noise. For higher noise densities, the DWT-based implementation is the most robust (see Fig. 12).

Fig. 12

Correlation value versus noise density in the three investigated domains (averaged over 50 images)

Security considerations

Security of encryption

In this subsection, we summarize and enhance the security assessments made in Section 4 for the three implementation domains.

As sign bit encryption in the spatial domain can be attacked directly [15] to reveal part of the image contents, this approach seems to be weakest of all options. Combining it with a permutation cipher makes for a cryptographically and visually stronger cipher, although the permutation cipher is in turn vulnerable to known plaintext attacks. This means, however, to share part of the watermarking key between content owner and seller.

Sign bit encryption in the DCT domain has been attacked by Wu and Kuo [23], who could recover some visual information from the encrypted image by setting the DC coefficient to 128 and giving all AC coefficients a positive sign. Again, a combination with a block-based permutation will strengthen the security of the cipher (note that we do not recommend to permute DCT coefficients directly, as the DC coefficient will normally stick out as the one with the largest absolute value).

For the DWT domain, there are, to the best of our knowledge, no analogous attacks on sign-bit encryption in the literature. Still, it is recommended to encrypt not only the watermarked DWT coefficients of the LL3 subband, but all subbands, and combine the sign bit encryption with permutation, if a secret sharing between content owner and seller is possible. If this is not the case, the content owner can resort to permute all subbands excluding the LL3 subband.

Watermarking security

According to [24], watermarking security means the occurrence of an adversary trying to break the system, as opposed to random modifications of a marked image due to benign image processing. In the following discussion, we assume that an attacker has access to the unencrypted, marked image CW, but not to the original host image C0 or the watermarking key WK. In this context, breaking the system means that the attacker is either able to insert a mark of her own, or to detect a mark, or to remove the mark from CW without rendering the image unusable.

In order to successfully detect a watermark or embed a watermark of her own without knowledge of the watermarking key WK, an attacker would have to guess how the signal vectors are formed as a first step. If the mark is embedded in the DCT- or DWT domain, there are \({(H/8)\cdot (W/8)}\choose {N}\) possibilities for the first bit, where W and H are the dimensions of the image and N is the dimension of the signal vector. For typical values (H=W=512,N=32), this means about 1080 possibilities.

As is shown in Section 5, it is rather hard for an attacker to remove the watermark from a marked image by adding white noise or compression, especially if the mark was embedded in the DCT or DWT domain. Without knowledge of the correct watermarking key, depending on the implementation domain, an attacker would have to modify the value of the pixel grey values or transform coefficients in such way that the norm of each possible signal vector is changed by an amount of at least Δ/2.


We have presented a novel commutative watermarking encryption (CWE) scheme, which is very robust to common attacks like JPEG/JPEG2000 compression and noise addition, especially when implemented in some transform domain (Discrete-Cosine or Discrete-Wavelet). On the other hand, the spatial domain implementation has the advantage of a much higher capacity. However, the robustness comes at the cost of relatively weak encryption primitives, especially if the scheme is applied in the spatial or DCT domain. The implementation in the DWT domain offers the best tradeoff between robustness and security of the cipher.Nevertheless, because of the leakage of visual contents if sign bit encryption is used exclusively, and because of the inherent weaknesses of permutation ciphers, the proposed scheme is recommended for scenarios with low to medium security requirements with regard to the image contents, where robustness of the watermark has the highest priority. For many commercial application scenarios, this seems to be a good fit. In future research, we will explore ways to further enhance the security of the encryption primitives by using norm-preserving operations.

Availability of data and materials

The experimental results of this study are based on the image data set available at The corresponding code is available from the author on request.



Cipher block chaining


Commutative watermarking encryption


Discrete cosine transform


Digital rights management


Discrete wavelet transform


Joint photographic expert group


Peak signal-to-noise ratio


Quantized index modulation


Sphere hardening dither modulation


Spread-transform dither modulation


  1. 1

    J. Herrera-Joancomartí, S. Katzenbeisser, D. Megías, J. Minguillón, A. Pommer, M. Steinebach, A. Uhl, Ecrypt European network of excellence in cryptology, first summary report on hybrid systems (2005).

  2. 2

    F. Balado, in International Workshop on Digital Watermarking. New geometric analysis of spread-spectrum data hiding with repetition coding, with implications for side-informed schemesSpringer, (2005), pp. 336–350.

  3. 3

    F. Balado, N. Hurley, G. Silvestre, in Security, Steganography, and Watermarking of Multimedia Contents VIII, 6072. Sphere-hardening dither modulationInternational Society for Optics and Photonics, (2006), p. 60720.

  4. 4

    S. Lian, Z. Liu, R. Zhen, H. Wang, Commutative watermarking and encryption for media data. Opt. Eng.45(8), 080510 (2006).

    Article  Google Scholar 

  5. 5

    S. Lian, Z. Liu, Z. Ren, H. Wang, Commutative encryption and watermarking in video compression. IEEE Trans. Circ. Syst. Video Technol.17(6), 774–778 (2007).

    Article  Google Scholar 

  6. 6

    M. Cancellaro, F. Battisti, M. Carli, G. Boato, F. G. De Natale, A. Neri, A commutative digital image watermarking and encryption method in the tree structured Haar transform domain. Signal Process. Image Commun.26(1), 1–12 (2011).

    Article  Google Scholar 

  7. 7

    S. Lian, Multimedia content encryption (CRC Press, 2009).

  8. 8

    S. Lian, Quasi-commutative watermarking and encryption for secure media content distribution. Multimedia Tools Appl.43(1), 91–107 (2009).

    MathSciNet  Article  Google Scholar 

  9. 9

    R. Schmitz, S. Li, C. Grecos, X. Zhang, in IFIP International Conference on Communications and Multimedia Security, Lecture Notes in Computer Science, 7394, ed. by B. De Decker, D. Chadwick. A new approach to commutative watermarking encryptionSpringer, (2012), pp. 117–130.

  10. 10

    A. Boho, G. Van Wallendael, A. Dooms, J. De Cock, G. Braeckman, P. Schelkens, B. Preneel, R. Van de Walle, End-to-end security for video distribution: the combination of encryption, watermarking, and video adaptation. IEEE Signal Proc. Mag.30(2), 97–107 (2013).

    Article  Google Scholar 

  11. 11

    B. Chen, G. W. Wornell, Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory. 47(4), 1423–1443 (2001).

    MathSciNet  Article  Google Scholar 

  12. 12

    B. Guan, D. Xu, Q. Li, An efficient commutative encryption and data hiding scheme for HEVC video. IEEE Access. 8:, 60232–60245 (2020).

    Article  Google Scholar 

  13. 13

    S. Haddad, G. Coatrieux, M. Cozic, in 2018 25th IEEE International Conference on Image Processing (ICIP). A new joint watermarking-encryption-JPEG-LS compression method for a priori & a posteriori image protection, pp. 1688–1692.

  14. 14

    S. Haddad, G. Coatrieux, A. Moreau-Gaudry, M. Cozic, Joint watermarking-encryption-JPEG-LS for medical image reliability control in encrypted and compressed domains. IEEE Trans. Inf. Forensics Secur.15:, 2556–2569 (2020).

    Article  Google Scholar 

  15. 15

    A. Uhl, A. Pommer, Image and video encryption: from digital rights management to secured personal communication, vol. 15 (Springer, 2004).

  16. 16

    S. Li, C. Li, G. Chen, N. G. Bourbakis, K. -T. Lo, A general quantitative cryptanalysis of permutation-only multimedia ciphers against plaintext attacks. Signal Process. Image Commun.23(3), 212–223 (2008).

    Article  Google Scholar 

  17. 17

    F. Zhang, in Handbook of Research on Secure Multimedia Distribution. Digital watermarking capacity and detection error rateIGI Global, (2009), pp. 257–276.

  18. 18

    J. -D. Huang, The JPEG standard. Graduate Institute of Communication Engineering National Taiwan University (2006).

  19. 19

    C. -Y. Lin, S. -F. Chang, in Security and Watermarking of Multimedia Contents II, 3971. Semifragile watermarking for authenticating JPEG visual contentInternational Society for Optics and Photonics, (2000), pp. 140–151.

  20. 20

    W. Zeng, S. Lei, in Proceedings of the Seventh ACM International Conference on Multimedia (Part 1). Efficient frequency domain video scrambling for content access control, (1999), pp. 285–294.

  21. 21

    W. Zeng, S. Lei, Efficient frequency domain selective scrambling of digital video. IEEE Trans. Multimedia. 5(1), 118–129 (2003).

    MathSciNet  Article  Google Scholar 

  22. 22

    T. Uehara, R. Safavi-Naini, P. Ogunbona, in First IEEE Pacific-Rim Conference on Multimedia. Securing wavelet compression with random permutationsIEEE, (2000), pp. 332–335.

  23. 23

    C. -P. Wu, C. -C. Kuo, Design of integrated multimedia compression and encryption systems. IEEE Trans. Multimedia. 7(5), 828–839 (2005).

    Article  Google Scholar 

  24. 24

    P. Bas, T. Furon, F. Cayre, G. Doërr, B. Mathon, Watermarking security: fundamentals, secure designs and attacks (Springer, 2016).

Download references


The research described in this article was done during a sabbatical semester granted by the Stuttgart Media University. The author gratefully acknowledges having been given this opportunity. He also thanks the anonymous reviewers for their helpful comments.


The author did not receive any funding for this research.

Author information




The entire manuscript is a sole contribution of the author. The author read and approved the final manuscript.

Corresponding author

Correspondence to Roland Schmitz.

Ethics declarations

Competing interests

The author declares that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schmitz, R. Use of SHDM in commutative watermarking encryption. EURASIP J. on Info. Security 2021, 1 (2021).

Download citation


  • Watermarking
  • Encryption
  • DRM