In SHDM, the signal vector \(\vec x = (x_{1},\dots,x_{N})\) may be extracted from the host in an arbitrary fashion. In this section, we implement SHDM in the spatial (pixel) domain, the DCT domain, and the DWT domain and combine it with matching encryption schemes. As in SHDM the vector norm
$$ \Vert \vec x \Vert = \sqrt{x_{1}^{2} + x_{2}^{2} + \dots x_{N}^{2}}, $$
(5)
is quantized, we have the following options for encryption:

Encrypt the sign bits of the x_{i} by means of a stream cipher.

Permute the x_{i}.

Apply some other normpreserving operation on \(\vec x\), e. g. a random rotation.
Of course, the options may be combined. In the rest of the paper, we will only explore the first two options.
Implementation in the spatial domain
In the spatial domain, we work directly with pixel grey values ranging initially between 0 and 255. In order to create sign bits, we subtract 128 from each grey value, so that the new range is −128≤0≤127.
Watermarking part
The watermarking part uses the following parameters:

The watermarking key W_{K}.

The watermark \(W = (b_{1},\dots,b_{n})\) to be embedded.

The dimension of the host vectors \(\vec x_{i}\), i.e., the number of coefficients N into which one bit b_{i} is embedded.

The quantizing step Δ.
In the spatial domain, we assume that the watermarking key consists of two parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)}\right)\). After choosing a step size Δ and N, the embedding process consists of the following steps:

For each bit b_{i} to be embedded, randomly select N pixels. The selection is controlled by \(W_{K}^{(1)}\). The corresponding grey values form the signal vector \(\vec x_{i}\).

Randomly generate a dither value d_{i}, controlled by \(W_{K}^{(2)}\).

Quantize the norm of \(\vec x_{i}\) according to b_{i}:
$$ \Vert \vec x_{i} \Vert_{q} = Q_{b_{i}}\left(\Vert \vec x_{i} \Vert, \Delta, d_{i}\right) $$
(6)

Embed the mark into \(\vec x_{i}\) by changing its norm to \(\Vert \vec x_{i} \Vert _{q}\):
$$ \vec y_{i} = \Vert \vec x_{i} \Vert_{q} \cdot {\vec x_{i} \over \Vert \vec x_{i} \Vert} $$
(7)
and replace the grey values of pixels corresponding to components of \(\vec x_{i}\) by the corresponding entries in \(\vec y_{i}\).
For extraction of bit b_{i}, the disturbed signal vector \(\tilde {\vec y}_{i}\) is formed from the marked image C_{W} in the same way as \(\vec x_{i}\) was generated from the host image C_{0}. The norm of \(\tilde {\vec y}_{i}\) is quantized and b_{i} is computed according to
$$ b_{i} = \arg \min_{b_{i} \in \{0,1\}} \vert \Vert \tilde{\vec y}_{i} \Vert  Q_{b_{i}}\left(\Vert \tilde{\vec y}_{i} \Vert,\Delta,d_{i}\right) \vert. $$
(8)
Figure 1 shows two embedding examples with different resolutions (512×512 and 800×1600, respectively), where a random 64bit watermark was embedded using a quantization step size of Δ=75.
Encryption part
Images in the spatial domain with grey values ranging between 0 and 255 can be represented by eight socalled bitplanes, where the most significant bitplane (MSB) indicates whether the grey value of a certain pixel is greater than 127 or not. Thus, after subtracting 128 from every grey value, the MSB indicates the sign of the grey values. Sign bit encryption is therefore equivalent to encrypting the MSB by means of a stream cipher.
The security of encrypting the MSB of an image has been investigated in [15]. Not only is the amount of image quality degradation insufficient for most applications, it is also possible to estimate the encrypt sign bits based on the assumption that neighboring grey values in a natural image do not change abruptly. Both problems can be remedied if a permutation cipher is applied on the pixels in addition. However, the permutation must not mix the N pixels used for embedding b_{i} with the N pixels used for embedding a different bit b_{j}. Therefore, the permutation cipher and the watermark embedder must share knowledge of \(W_{K}^{(1)}\), which is the part of W_{K} governing pixel selection. If this condition is met, watermarking and (permutationbased) ciphering commute. However, as is wellknown, permutation ciphers are vulnerable to known plaintext attacks (see [16] for a quantitative analysis).
Moreover, in order to have as many permutation as possible, N should be chosen as large as possible, that means
$$ N = \lfloor {H\cdot W \over n} \rfloor $$
(9)
in the spatial domain, where H and W are the height and width of the host image C_{0}, and n is the length of the embedded string. In order to have a minimum level of security against bruteforce attacks, we need N≥32, leading to a maximum capacity of
$$ n_{max} = \lfloor {H\cdot W \over 32} \rfloor $$
(10)
bits in the spatial domain, meaning, e.g., 2^{13} bits for a 512×512 image (note that the term capacity is used throughout this paper according to the definition given in [17]: the watermarking capacity of digital image is the number of bits that can be embedded in a given host image.).
Figure 2 shows encrypted versions of the marked Lena image in Fig. 1a. Thanks to the commutativity of watermarking and encryption, the mark can be extracted from both without errors.
Implementation in the DCT domain
In the DCT domain, we assume that the watermarking key consists of three parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)},W_{K}^{(3)}\right)\). We begin by performing a blockbased twodimensional DCT on the host image C_{0}, i.e., we divide C_{0} into nonoverlapping 8×8 pixel blocks and perform a twodimensional DCT on each block.
Watermarking part

For each bit b_{i} to be embedded, randomly select N blocks. The selection is controlled by \(W_{K}^{(1)}\). Each block can only be selected once. The selected blocks for bit b_{i} form subset T_{i} of the set of all blocks.

For each T_{i}, randomly select a horizontal and a vertical frequency index from the medium frequencies. The selection is controlled by \(W_{K}^{(2)}\). The corresponding DCTcoefficients from the selected blocks form an Ndimensional vector \(\vec x_{i}\).

For each T_{i}, randomly generate a dither value d_{i} under control of \(W_{K}^{(3)}\).

Quantize the norm of \(\vec x_{i}\) according to b_{i}:
$$ \Vert \vec x_{i} \Vert_{q} = Q_{b_{i}}\left(\Vert \vec x_{i} \Vert, \Delta_{i}, d_{i}\right) $$
(11)
Note that because all DCT coefficients in \(\vec x_{i}\) correspond to the same horizontal and vertical frequency pair, it is possible to choose individual quantizing step sizes Δ_{i} according to their perceptual importance (see below).

Embed the mark into \(\vec x_{i}\) by changing its norm to \(\Vert \vec x_{i} \Vert _{q}\):
$$ \vec y_{i} = \Vert \vec x_{i} \Vert_{q} \cdot {\vec x_{i} \over \Vert \vec x_{i} \Vert} $$
(12)
and replace the selected DCT coefficients in T_{i} with the corresponding entries in \(\vec y_{i}\).
In choosing the Δ_{i} step sizes, we were led by the JPEG quantization matrix, which assigns a perceptual relevance to each DCT coefficient in an (8×8) block. The essential step in JPEG compression consists in quantizing DCTcoefficients according to fixed quantization tables corresponding to a certain quality factor. On the other hand, it is well known that QIMbased watermarking schemes are sensitive to requantization.
In order to counter the adverse effects of requantization, we therefore chose quantization step sizes Δ_{i} for the individual DCT coefficients selected for embedding bit b_{i} that were oriented at the actual quantization step sizes in the JPEG standard. More specifically, we used the quantization matrix
$$J = \left(\begin{array}{lllllllll} 16 & 11 & 10 & 16 & 24 & 40 & 51 & 61 \\ 12 & 12 & 14 & 19 & 26 & 58 & 60 & 55 \\ 14 & 13 & 16 & 24 & 40 & 57 & 69 & 56 \\ 14 & 17 & 22 & 29 & 51 & 87 & 80 & 62 \\ 18 & 22 & 37 & 56 & 68 & 109 & 103 & 77 \\ 24 & 36 & 55 & 64 & 81 & 104 & 113 & 92 \\ 49 & 64 & 78 & 87 & 103 & 121 & 120 & 101 \\ 72 & 92 & 95 & 98 & 112 & 100 & 103 & 99 \\ \end{array}\right) $$
taken from the JPEG standard ([18]), which gives the JPEG quantization steps for the DCT coefficients within a 8×8 block referring to a 50% quality factor, and multiplied it with a constant c>1. If the DCT coefficients for embedding b_{i} correspond to frequencies (k,ℓ), we have
$$ \Delta_{i} = c\cdot J_{k \ell}, $$
(13)
the rationale behind this approach being the wellknown fact that for a quantizing function Q, we have
$$ Q(Q(Q(x,\Delta),\delta),\Delta) = Q(x,\Delta), $$
(14)
if Δ>δ (see [19], Theorem 1). This means that quantizing some value y with step size δ can be reversed by another quantization with a larger step size Δ.
Figure 3 shows two embedding examples, where a random 64bit message was embedded. For the Lena image, we set N=64,c=3.5, and for the higher resolution Norba image, we set N=312,c=3.5 (cf. Section 4.2.2 for details on how N was chosen).
In order to extract message bit b_{i}, the disturbed marked signal vector \(\tilde {\vec y}_{i}\) is extracted from the marked image C_{W} in the same way as the unmarked vector \(\vec x_{i}\) was built from C_{0} with the help of W_{K}. The message bit is then decoded according to
$$ b_{i} = \arg \min_{b_{i} \in \{0,1\}} \vert \Vert \tilde{\vec y}_{i} \Vert  Q_{b_{i}}\left(\Vert \tilde{\vec y}_{i} \Vert,\Delta_{i},d_{i}\right) \vert. $$
(15)
Encryption part
As in the spatial domain, we investigate the two options of encrypting DCTcoefficient sign bits and of permuting them. The idea of encrypting the sign bits of DCT coefficients goes back to [20] and [21], where it is proposed to encrypt sign bits of DCT coefficients and motion vectors in MPEG video. The security of this approach for still images is classified as low in [15], p. 51.In order to create a larger visual distortion, instead of permuting the DCT coefficients alone, we permuted the complete (8×8)−blocks containing the coefficients (see [20] and [21]). If only those blocks containing the selected coefficients for watermarking are permuted, however, the corresponding subset T becomes visible to an attacker, who can in turn concentrate her efforts to remove the mark on T. We therefore need to permute all image blocks. Moreover, as in the spatial domain case, in order to make sure that the selected blocks T_{i} for a single bit b_{i} do not get mixed up with blocks for a different bit or nonselected blocks, the permutation algorithm needs to know part \(W_{K}^{(1)} \) of the watermarking key. More specifically, each subset T_{i} needs to form an invariant subset of the set of all blocks under the permutation. As in the spatial domain, these subsets need to be as large as possible. We therefore have
$$ N = \vert T_{i} \vert = \lfloor {{(H/8)\cdot (W/8)}\over n} \rfloor $$
(16)
in the DCT domain. The requirement N≥32 gives a maximum capacity of
$$ n_{max} = \lfloor {H\cdot W \over {64\cdot 32} }\rfloor, $$
(17)
meaning 128 bits for a 512×512 image.
Figure 4 shows encrypted versions of the marked Lena image in Fig. 4. Again, the mark can be extracted from both without errors.
Implementation in the DWT domain
Watermarking part
In the DWT domain, we performed a threelevel DWT and embedded the mark into the level 3 approximation coefficients. This way, the number N of coefficients used to embed one bit is the same as in Section 4.2, namely
$$ N = \vert T_{i} \vert = \lfloor {{(H/8)\cdot (W/8)}\over n} \rfloor. $$
(18)
In the DWTcase, however, there are no blocks of coefficients to choose a frequency from, thus W_{K} consists of only two parts: \(W_{K} = \left (W_{K}^{(1)},W_{K}^{(2)}\right)\), where \(W_{K}^{(1)}\) governs the selection of N coefficients for each message bit b_{i}, and \(W_{K}^{(2)}\) controls the dither d_{i} for each bit. Likewise, a single quantization step size Δ is used for all message bits. As an example, Fig. 5 shows the results of embedding 64 bits into the Lena and Norba image, setting Δ=100.
Encryption part
As in the DCT case, we have the options to either encrypt the sign bits of DWT coefficients, as already proposed in [21], and/or to permute the DWT coefficients, as originally proposed in [22]. Note that in the DWT domain, permuting coefficients is not as vulnerable to knownplaintext attacks as in other domains, because the location of coefficients is imagedependent [15]. However, if only the level 3 approximation coefficients are encrypted or permuted, the image content is not rendered completely unintelligible, but fine structures are still visible (see Figs. 6a, b). As in the DCTcase, we have the additional option of not only permuting the level 3 DWTcoefficients themselves but the complete (8×8) blocks leading to the level 3 approximation for a more complete obfuscation of the image content (see Fig. 6c), without sacrificing the commutativity with watermarking. The maximum capacity is the same as in the DCTbased implementation.