In SHDM, the signal vector \(\vec x = (x_{1},\dots,x_{N})\) may be extracted from the host in an arbitrary fashion. In this section, we implement SHDM in the spatial (pixel) domain, the DCT domain, and the DWT domain and combine it with matching encryption schemes. As in SHDM the vector norm
$$ \Vert \vec x \Vert = \sqrt{x_{1}^{2} + x_{2}^{2} + \dots x_{N}^{2}}, $$
(5)
is quantized, we have the following options for encryption:
-
Encrypt the sign bits of the xi by means of a stream cipher.
-
Permute the xi.
-
Apply some other norm-preserving operation on \(\vec x\), e. g. a random rotation.
Of course, the options may be combined. In the rest of the paper, we will only explore the first two options.
Implementation in the spatial domain
In the spatial domain, we work directly with pixel grey values ranging initially between 0 and 255. In order to create sign bits, we subtract 128 from each grey value, so that the new range is −128≤0≤127.
Watermarking part
The watermarking part uses the following parameters:
-
The watermarking key WK.
-
The watermark \(W = (b_{1},\dots,b_{n})\) to be embedded.
-
The dimension of the host vectors \(\vec x_{i}\), i.e., the number of coefficients N into which one bit bi is embedded.
-
The quantizing step Δ.
In the spatial domain, we assume that the watermarking key consists of two parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)}\right)\). After choosing a step size Δ and N, the embedding process consists of the following steps:
-
For each bit bi to be embedded, randomly select N pixels. The selection is controlled by \(W_{K}^{(1)}\). The corresponding grey values form the signal vector \(\vec x_{i}\).
-
Randomly generate a dither value di, controlled by \(W_{K}^{(2)}\).
-
Quantize the norm of \(\vec x_{i}\) according to bi:
$$ \Vert \vec x_{i} \Vert_{q} = Q_{b_{i}}\left(\Vert \vec x_{i} \Vert, \Delta, d_{i}\right) $$
(6)
-
Embed the mark into \(\vec x_{i}\) by changing its norm to \(\Vert \vec x_{i} \Vert _{q}\):
$$ \vec y_{i} = \Vert \vec x_{i} \Vert_{q} \cdot {\vec x_{i} \over \Vert \vec x_{i} \Vert} $$
(7)
and replace the grey values of pixels corresponding to components of \(\vec x_{i}\) by the corresponding entries in \(\vec y_{i}\).
For extraction of bit bi, the disturbed signal vector \(\tilde {\vec y}_{i}\) is formed from the marked image CW in the same way as \(\vec x_{i}\) was generated from the host image C0. The norm of \(\tilde {\vec y}_{i}\) is quantized and bi is computed according to
$$ b_{i} = \arg \min_{b_{i} \in \{0,1\}} \vert \Vert \tilde{\vec y}_{i} \Vert - Q_{b_{i}}\left(\Vert \tilde{\vec y}_{i} \Vert,\Delta,d_{i}\right) \vert. $$
(8)
Figure 1 shows two embedding examples with different resolutions (512×512 and 800×1600, respectively), where a random 64-bit watermark was embedded using a quantization step size of Δ=75.
Encryption part
Images in the spatial domain with grey values ranging between 0 and 255 can be represented by eight so-called bitplanes, where the most significant bitplane (MSB) indicates whether the grey value of a certain pixel is greater than 127 or not. Thus, after subtracting 128 from every grey value, the MSB indicates the sign of the grey values. Sign bit encryption is therefore equivalent to encrypting the MSB by means of a stream cipher.
The security of encrypting the MSB of an image has been investigated in [15]. Not only is the amount of image quality degradation insufficient for most applications, it is also possible to estimate the encrypt sign bits based on the assumption that neighboring grey values in a natural image do not change abruptly. Both problems can be remedied if a permutation cipher is applied on the pixels in addition. However, the permutation must not mix the N pixels used for embedding bi with the N pixels used for embedding a different bit bj. Therefore, the permutation cipher and the watermark embedder must share knowledge of \(W_{K}^{(1)}\), which is the part of WK governing pixel selection. If this condition is met, watermarking and (permutation-based) ciphering commute. However, as is well-known, permutation ciphers are vulnerable to known plaintext attacks (see [16] for a quantitative analysis).
Moreover, in order to have as many permutation as possible, N should be chosen as large as possible, that means
$$ N = \lfloor {H\cdot W \over n} \rfloor $$
(9)
in the spatial domain, where H and W are the height and width of the host image C0, and n is the length of the embedded string. In order to have a minimum level of security against brute-force attacks, we need N≥32, leading to a maximum capacity of
$$ n_{max} = \lfloor {H\cdot W \over 32} \rfloor $$
(10)
bits in the spatial domain, meaning, e.g., 213 bits for a 512×512 image (note that the term capacity is used throughout this paper according to the definition given in [17]: the watermarking capacity of digital image is the number of bits that can be embedded in a given host image.).
Figure 2 shows encrypted versions of the marked Lena image in Fig. 1a. Thanks to the commutativity of watermarking and encryption, the mark can be extracted from both without errors.
Implementation in the DCT domain
In the DCT domain, we assume that the watermarking key consists of three parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)},W_{K}^{(3)}\right)\). We begin by performing a block-based two-dimensional DCT on the host image C0, i.e., we divide C0 into non-overlapping 8×8 pixel blocks and perform a two-dimensional DCT on each block.
Watermarking part
-
For each bit bi to be embedded, randomly select N blocks. The selection is controlled by \(W_{K}^{(1)}\). Each block can only be selected once. The selected blocks for bit bi form subset Ti of the set of all blocks.
-
For each Ti, randomly select a horizontal and a vertical frequency index from the medium frequencies. The selection is controlled by \(W_{K}^{(2)}\). The corresponding DCT-coefficients from the selected blocks form an N-dimensional vector \(\vec x_{i}\).
-
For each Ti, randomly generate a dither value di under control of \(W_{K}^{(3)}\).
-
Quantize the norm of \(\vec x_{i}\) according to bi:
$$ \Vert \vec x_{i} \Vert_{q} = Q_{b_{i}}\left(\Vert \vec x_{i} \Vert, \Delta_{i}, d_{i}\right) $$
(11)
Note that because all DCT coefficients in \(\vec x_{i}\) correspond to the same horizontal and vertical frequency pair, it is possible to choose individual quantizing step sizes Δi according to their perceptual importance (see below).
-
Embed the mark into \(\vec x_{i}\) by changing its norm to \(\Vert \vec x_{i} \Vert _{q}\):
$$ \vec y_{i} = \Vert \vec x_{i} \Vert_{q} \cdot {\vec x_{i} \over \Vert \vec x_{i} \Vert} $$
(12)
and replace the selected DCT coefficients in Ti with the corresponding entries in \(\vec y_{i}\).
In choosing the Δi step sizes, we were led by the JPEG quantization matrix, which assigns a perceptual relevance to each DCT coefficient in an (8×8) block. The essential step in JPEG compression consists in quantizing DCT-coefficients according to fixed quantization tables corresponding to a certain quality factor. On the other hand, it is well known that QIM-based watermarking schemes are sensitive to re-quantization.
In order to counter the adverse effects of re-quantization, we therefore chose quantization step sizes Δi for the individual DCT coefficients selected for embedding bit bi that were oriented at the actual quantization step sizes in the JPEG standard. More specifically, we used the quantization matrix
$$J = \left(\begin{array}{lllllllll} 16 & 11 & 10 & 16 & 24 & 40 & 51 & 61 \\ 12 & 12 & 14 & 19 & 26 & 58 & 60 & 55 \\ 14 & 13 & 16 & 24 & 40 & 57 & 69 & 56 \\ 14 & 17 & 22 & 29 & 51 & 87 & 80 & 62 \\ 18 & 22 & 37 & 56 & 68 & 109 & 103 & 77 \\ 24 & 36 & 55 & 64 & 81 & 104 & 113 & 92 \\ 49 & 64 & 78 & 87 & 103 & 121 & 120 & 101 \\ 72 & 92 & 95 & 98 & 112 & 100 & 103 & 99 \\ \end{array}\right) $$
taken from the JPEG standard ([18]), which gives the JPEG quantization steps for the DCT coefficients within a 8×8 block referring to a 50% quality factor, and multiplied it with a constant c>1. If the DCT coefficients for embedding bi correspond to frequencies (k,ℓ), we have
$$ \Delta_{i} = c\cdot J_{k \ell}, $$
(13)
the rationale behind this approach being the well-known fact that for a quantizing function Q, we have
$$ Q(Q(Q(x,\Delta),\delta),\Delta) = Q(x,\Delta), $$
(14)
if Δ>δ (see [19], Theorem 1). This means that quantizing some value y with step size δ can be reversed by another quantization with a larger step size Δ.
Figure 3 shows two embedding examples, where a random 64-bit message was embedded. For the Lena image, we set N=64,c=3.5, and for the higher resolution Norba image, we set N=312,c=3.5 (cf. Section 4.2.2 for details on how N was chosen).
In order to extract message bit bi, the disturbed marked signal vector \(\tilde {\vec y}_{i}\) is extracted from the marked image CW in the same way as the unmarked vector \(\vec x_{i}\) was built from C0 with the help of WK. The message bit is then decoded according to
$$ b_{i} = \arg \min_{b_{i} \in \{0,1\}} \vert \Vert \tilde{\vec y}_{i} \Vert - Q_{b_{i}}\left(\Vert \tilde{\vec y}_{i} \Vert,\Delta_{i},d_{i}\right) \vert. $$
(15)
Encryption part
As in the spatial domain, we investigate the two options of encrypting DCT-coefficient sign bits and of permuting them. The idea of encrypting the sign bits of DCT coefficients goes back to [20] and [21], where it is proposed to encrypt sign bits of DCT coefficients and motion vectors in MPEG video. The security of this approach for still images is classified as low in [15], p. 51.In order to create a larger visual distortion, instead of permuting the DCT coefficients alone, we permuted the complete (8×8)−blocks containing the coefficients (see [20] and [21]). If only those blocks containing the selected coefficients for watermarking are permuted, however, the corresponding subset T becomes visible to an attacker, who can in turn concentrate her efforts to remove the mark on T. We therefore need to permute all image blocks. Moreover, as in the spatial domain case, in order to make sure that the selected blocks Ti for a single bit bi do not get mixed up with blocks for a different bit or non-selected blocks, the permutation algorithm needs to know part \(W_{K}^{(1)} \) of the watermarking key. More specifically, each subset Ti needs to form an invariant subset of the set of all blocks under the permutation. As in the spatial domain, these subsets need to be as large as possible. We therefore have
$$ N = \vert T_{i} \vert = \lfloor {{(H/8)\cdot (W/8)}\over n} \rfloor $$
(16)
in the DCT domain. The requirement N≥32 gives a maximum capacity of
$$ n_{max} = \lfloor {H\cdot W \over {64\cdot 32} }\rfloor, $$
(17)
meaning 128 bits for a 512×512 image.
Figure 4 shows encrypted versions of the marked Lena image in Fig. 4. Again, the mark can be extracted from both without errors.
Implementation in the DWT domain
Watermarking part
In the DWT domain, we performed a three-level DWT and embedded the mark into the level 3 approximation coefficients. This way, the number N of coefficients used to embed one bit is the same as in Section 4.2, namely
$$ N = \vert T_{i} \vert = \lfloor {{(H/8)\cdot (W/8)}\over n} \rfloor. $$
(18)
In the DWT-case, however, there are no blocks of coefficients to choose a frequency from, thus WK consists of only two parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)}\right)\), where \(W_{K}^{(1)}\) governs the selection of N coefficients for each message bit bi, and \(W_{K}^{(2)}\) controls the dither di for each bit. Likewise, a single quantization step size Δ is used for all message bits. As an example, Fig. 5 shows the results of embedding 64 bits into the Lena and Norba image, setting Δ=100.
Encryption part
As in the DCT case, we have the options to either encrypt the sign bits of DWT coefficients, as already proposed in [21], and/or to permute the DWT coefficients, as originally proposed in [22]. Note that in the DWT domain, permuting coefficients is not as vulnerable to known-plaintext attacks as in other domains, because the location of coefficients is image-dependent [15]. However, if only the level 3 approximation coefficients are encrypted or permuted, the image content is not rendered completely unintelligible, but fine structures are still visible (see Figs. 6a, b). As in the DCT-case, we have the additional option of not only permuting the level 3 DWT-coefficients themselves but the complete (8×8) blocks leading to the level 3 approximation for a more complete obfuscation of the image content (see Fig. 6c), without sacrificing the commutativity with watermarking. The maximum capacity is the same as in the DCT-based implementation.