Compression Independent Reversible Encryption for Privacy in Video Surveillance
© Paula Carrillo et al. 2009
Received: 16 April 2009
Accepted: 13 December 2009
Published: 8 February 2010
One of the main concerns of the wide use of video surveillance is the loss of individual privacy. Individuals who are not suspects need not be identified on camera recordings. Mechanisms that protect the identity while ensuring legitimate security needs are necessary. Selectively encrypting regions that reveal identity (e.g., faces or vehicle tags) are necessary to preserve individuals' right to privacy while recognizing the legitimate needs for video surveillance. The video used in surveillance applications usually needs to be transcoded or recoded for distribution and archival. Transcoding a traditionally encrypted video is not possible without decrypting the video first. This paper presents a compression algorithm independent solution that provides privacy in video surveillance applications. The proposed approach uses permutation-based encryption in the pixel domain to hide identity revealing features. The permutation-based encryption tolerates lossy compression and transcoding and allows decryption of the transcoded video at a later time. The use of permutation-based encryption makes the proposed solution independent of the compression algorithms used and robust to transcoding. The cost of providing this privacy is an increase in bitrate that depends on the percentage of blocks encrypted.
With video surveillance becoming an integral part of our security infrastructure, privacy rights are beginning to gain importance. The key concern is the fact that private citizens, who are not suspects, are being recorded and recordings archived through the use of video surveillance systems. Such a record-everything-and-process-later approach has serious privacy implications. The same privacy issues arise when surveillance cameras routinely record highway traffic as vehicle tags are recorded. The solution of removing the identities by blurring/blackening the portions of video is not acceptable to security personnel as they may have legitimate need to review the videos. On the contrary, leaving the videos with identities of people and vehicles public is a breach of privacy.
A solution to the problem is selective encryption of portions of the video that reveal identity (e.g., faces, vehicle tags) in surveillance applications. Regions of a video can be encrypted to ensure privacy and still allow decryption for legitimate security needs at anytime in the future. The goals of the video surveillance are still met as selective encryption allows monitoring the activities without knowing the identities of those being monitored. When a suspicious activity needs to be investigated, the identities can be uncovered with proper authorization. The few existing solutions are specific to video and image compression algorithms used and require modification to the video encoders [1, 2]. These approaches limit the flexibility of surveillance systems. This paper presents an innovative solution that meets the needs of individuals' privacy and legitimate security needs. Preliminary results from this work were reported in . The proposed solution is independent of the image and video compression algorithms used. This allows the use of standard video encoders and decoders and also enables smart-cameras that output encrypted video. The proposed solution also survives video transcoding and recoding allowing a normal video distribution chain with multiple video encoding and decoding operations. Another innovation in the proposed approach is the use of permutation based encryption that can survive lossy compression. Yet another feature is the ability of the system to detect encrypted regions automatically and allow for automatic decryption without any additional information to identify the encrypted blocks sent to the decoding terminals. The proposed system is considered to support selective encryption as it can encrypt only regions of a video that reveal identity. However, the task of detecting regions of a video that reveal identity is outside the scope of the work and is not addressed by the proposed system.
The rest of the paper is organized as follows: Section 2 presents the background work and summarizes the characteristics of privacy preserving surveillance systems. Section 3 presents the proposed solution. Section 4 presents experimental results and performance evaluation and conclusions are drawn in Section 5.
The main techniques used in video privacy systems are summarized below.
The system presented in  describes a privacy preserving video console that uses a rendering face images technique in the pixel domain and leaves the face unrecognizable by identification software. Based on computer vision techniques, the video console determines the interesting components of a video and then obscures that piece of information, or its components, such that face recognition software cannot recognize the faces. With this method the privacy is maintained but the surveillance and security needs are not met due to the irreversibility of the obfuscation process. In  a medical application for automatic patient detection, tracking, labeling and obscuring (the obscuring option in the case the patient does not want to be involved in the research) in real time has been developed. In this particular case, reversibility is not required or desired. Martin and Plataniotis present an interesting solution of shape and texture encryption using Secure Shape and Texture Set Partitioning in Hierarchical Trees (SecST-SPIHT) . This method encrypts about 5% of the bitstream using a secure key to protect the video. Since the encryption is in the bitstream domain, any transcoding requires decoding the frames first. This method has a desirable attribute of obfuscating the shape and thereby providing a layer of security.
Transform-Domain Coefficient Scrambling
This technique is applied in the transform domain for motion JPEG or MPEG video and was presented in . The region of interest is detected and then the signs of selected transform coefficients are scrambled. More specifically for JPEG2000 Discrete Wavelet Transform (DWT) and for MPEG the Discrete Cosine Transform (DCT) coefficients, corresponding to the regions of interest (ROIs), are scrambled by pseudo-randomly inverting their signs. Consequently, the scene remains understandable, but the ROIs are unidentifiable. The decoded video will have blocky regions unless a proper key is used for descrambling. This process is reversible but it is specific to video compression used and cannot survive operations such as transcoding and recoding that may be necessary to distribute video.
Invertible Cryptographic Obfuscation
Another technique proposed in  is privacy through a cryptographic obfuscation; it uses Data Encryption Standard (DES) and Advanced Encryption Standard (AES) to encrypt regions of JPEG images during the compression stage, after Huffman encoding, in the bitstream domain. This is similar to the transform coefficient sign scrambling. This method also suffers from the same drawbacks: it is compression algorithm specific and cannot survive transcoding.
Skin Tone Detection and Replacement
In  the approach to privacy protection is based on detecting skin tones in images and replacing it with other colors, hence making it impossible to determine the race of the individual. This process works in the pixel domain (see Figure 1). Cameras systems based on this method have been developed; the idea is to detect the face and then overlay this information with a dark patch or a mosaic or any other obfuscation technique before the video is recorded. At the end no copies of the original faces will exist. This method is compression and transcoding independent. However, specifically in the case of just color replacement, it does not hide the identity completely and since the cameras perform the replacement before recording the video, the method is not reversible. Another issue is that the skin replacement method is applicable only for privacy involving human identity and cannot be used in applications where identity of nonhuman objects has to be protected, for example, a car's license tag.
Low Quality ROI Coding
In the privacy system proposed in , the authors propose to decrease the ROI quality in JPEG2000, locating this information in the lowest quality layer of the codestream. This ensures poor visual quality in lossy compression, up to invisibility if required. This proposal is in the bitstream domain and hence specific to compression standards used and it is not reversible. Therefore, when a suspicious activity needs to be investigated, the identities cannot be uncovered to meet the security needs.
An ideal surveillance system should ensure individual privacy while meeting law enforcement/security needs. Key characteristics of such surveillance systems are briefly discussed below.
(1)Provides Complete Privacy. A surveillance system should provide complete privacy by hiding portions of the video that reveal the identity of individuals. These features include faces, license tags on cars, and textual information/markings. Assuming the identifying features in a video can be detected, a surveillance system should then hide these features. A few ways of hiding such features are: ( ) removing/replacing the corresponding pixels from the frame and ( ) encrypt the corresponding pixels. Completely encrypting the video streams will not serve the purpose of surveillance as the monitors cannot understand the context without decrypting the video first.
(2)Balance Security Needs. It is an important requirement to balance the security needs. Meeting the needs of the law enforcement personnel implies that a means of revealing the hidden identity shall be provided. If the identities in a video are hidden by removing the pixels corresponding to the identity revealing features, then these features can never be recovered. On the other hand, if the identify revealing features are hidden by encrypting, the hidden areas can be uncovered by decrypting the relevant regions. Rights and privileges to access hidden identities can be managed through well designed corporate security policies. For example, hidden regions in a video cannot be decrypted unless explicitly authorized by the chief of security and/or a court order.
(3)Compression Independent. Compression independence is an important requirement that is essential to keep surveillance systems independent of the compression algorithms used. If a privacy solution depends on the compression algorithms, the system has to be redesigned for each compression algorithm used in the surveillance system. A system designed for MPEG-2 video will not work when MPEG-4 video compression is used. Privacy solutions that use coefficient scrambling  are compression algorithm specific and decrypting and decoding have to be integrated. Furthermore, the system cannot easily evolve to use new compression algorithms. Another drawback of this solution is that it cannot survive re-encoding or transcoding.
(4)Survive Recoding and Transcoding. Video surveillance can span large areas and videos captured are typically distributed over networks. Network distribution may require using a different video formats or changing the video bitrate to meet the network and receiver constraints. Surveillance videos may have to be compressed for archival purposes. More importantly, as surveillance systems evolve, there are bound to be receivers, players, or systems that require conversion to a specific format and robustness to recoding and transcoding becomes a key requirement of such surveillance systems. Privacy solutions that use coefficient scrambling  or solutions that are compression dependent cannot survive any recoding or transcoding or even bitrate changes. A secure system for transcoding video without decrypting was proposed in . This approach uses scalable video and truncation of enhancement layers to reduce the bitrate or resolution. Since the encryption is done in the bitstream layer, video will have to be decrypted and transcoded if the bitrate or resolution has to be reduced below that of base layer or coding format has to be changed.
3. Compression Independent Reversible Encryption
The system can be configured to automatically decrypt and display the live video while keeping the identifying features encrypted in all recorded videos. When recorded video is played, all the identifying features are obscured through encryption. Since the proposed system is compression independent, this encrypted video can be played back on any standard video player such as a standard Media Player. However, when there is a legitimate need to decrypt and reveal the identifying features, for example to aid a criminal investigation, the video has to be played in a special player/security console that has the ability to decrypt the regions. This solution provides additional security when access to surveillance consoles with decrypting ability is further restricted.
3.1. Compression Independent Encryption
Encryption before encoding makes this system compression independent. The regions detected as containing identifying features are encrypted before the encoding stage. With this approach a standard encoder can be used for encoding and a standard decoder can be used for decoding. The encoder and decoder are not aware of the encryption used. Since video compression is a lossy process, the decoded video is not identical to the video input to the encoder. This means that the input to the decryption stage is not identical to the output of the encryption stage. This "corruption" of the encrypted data caused by lossy video encoding rules out traditional encryption algorithms such as AES.
The encryption used in the proposed solutions is based on permuting pixel values using pseudo-random permutations. The generating process for these pseudorandom permutations is based on "logarithmic signatures" as described in [10–12] and uses a secret pass phrase as a key. This pass phrase can also be automatically generated and managed by the surveillance system.
The identity revealing regions in a video are encrypted on a block basis. The region of interest is expanded to an area with width and height that are integral multiples of 16. This region expansion is done to allow encryption of blocks as this is the standard unit of coding in most compression algorithms. The blocks that cover the selected regions are determined and then a block based encryption is applied. The size of the block or the number of pixels in a block affects the strength of encryption. A small block size is easy to attack because of small set of permutations possible. A larger block size significantly increases the encryption strength. Larger block sizes also increase the number of pixels that are randomized because of permutations and result in a higher bitrate because of the loss in correlation. A block is selected as this balances the encryption strength and penalty due to loss in correlation. Video codecs typically encode/decode video one block known as a macro block (MB) at a time. The block size of is also a unit of rate control and allows one to adjust the quantization parameter (QP) per MB and perform better rate control in order to reduce the increase in bitrate resulting from the encrypted regions. Keeping the encryption block size fixed is also necessary to support automatic recognition and decryption of encrypted regions without transmitting additional information to the decoders.
For each of the blocks to be encrypted in a frame, a sequence of pseudorandom permutations are applied to the cleartext sequence of blocks to yield the encrypted block sequence. Each key choice yields a sequence of random permutations ( ) of periodicity . For the sake of economy we do not present the method of generating pseudorandom permutations here, a complete discussion of the method can be found in [10–12]. The size of the theoretical key-space (the number of logarithmic signatures, each providing a different pseudo-random permutation generator) by far exceeds ( ) making a brute force attack impossible. The encryption key can be generated dynamically based on the frame number and block number. The encryption key can be varied on a per-block or per-frame basis if desired.
3.2. Security Analysis
The cryptographic robustness of the technique we use has been discussed in length and theoretically established in [10–12]. Here we only make some simple observations. Once a logarithmic signature has been selected for the symmetric group of degree 256, by means of the secret pass phrase key, a seed in the range [1, 256!] is also selected by means of the secret key, and the sequence of random permutations , ( ), is generated and applied to the blocks to be encrypted. The periodicity of the random sequence of permutations is , significantly larger than what is considered adequate by modern standards. Because the number of logarithmic signatures is gigantic (much larger than , a brute force attack is out of the question. If a fixed permutation were to be used for all blocks, within all frames, then one might consider the possibility of a cryptanalytic attack based on the constancy of the permutation. However in our scheme, different blocks within frames are encrypted with different and distinct elements of the random sequence of permutations. Finally, it is well established that knowledge of the statistical distribution of pixel values in no way allows for the reconstruction of the encrypted image. For instance in  by applying an appropriate permutation to the pixels of a grey scale image of Marilyn Monroe produces an image of John Wayne.
Tools used in the proposed surveillance system.
Tools Used To Support the Feature
Pixel domain encryption
Selective encryption of regions of interest
Balance Security Needs
Allows decryption of just the selected regions/frames
Permutation based encryption
4. Performance Evaluation
4.1. Compression Independence
Our experiments show that the quality of decrypted regions is good for H.264 encoded videos with QP of up to 26. The same set of experiments were repeated and the upper bounds for QP to ensure acceptable quality for decrypted regions is QP of 6 for H.263 and MPEG-4 and 3 Mbps for MPEG-2. The experiments show that the video has to be recorded with good quality in order to preserve the quality of decrypted regions.
4.1.1. Fixed Region of Interest QP for Low Bitrate Surveillance
Limiting the distortion of the encrypted regions, referred to as Region of Interest (ROI) here, allows surveillance systems to record video at lower bitrates. The relatively higher quality of ROI maintains the quality of decrypted regions at an acceptable level. The upper bound on QP, however, increases the bitrate. This increase in bitrate is, as in the previous case, the cost of providing privacy in video surveillance systems.
4.2. Robustness to Transcoding
Video surveillance systems can use different formats and there is a need to convert the video from one format to the other. If regions of video are encrypted, any transcoding or recoding would "corrupt" the encryption and decryption would not be possible. The proposed permutation based encryption, however, survives such transcoding and recoding operation. The key benefit of the proposed systems is that only the endpoints—capture end and authorized playback end—have to be aware of the encryption.
The system is evaluated with H.264 to MPEG-2 and H.264 to MPEG-4 video transcoders, using the Crew sequence at resolution. The experiments were based on the video encoders available in the Intel Integrated Performance Primitives (IPP) SDK. Face detection for the experiments was done manually and face regions are input to the system. As in the compression independence experiments, regions of interest are identified, encrypted and encoded using H.264 video encoding. The H.264 video is then transcoded to MPEG-2 and MPEG-4—simulating a scenario for legacy codec support in video surveillance system.
Experiments show that the video has to be recorded with good quality in order to preserve the quality of decrypted regions. When lower bitrate surveillance is necessary, the encoder can enforce an upper bound on the QP used for the encrypted blocks. In the encoder independence experiments, we showed that for H.264 video a QP of 26 is necessary to maintain the quality of decrypted video and this is used as a basis for comparisons. It was chosen as a base because it gives a good tradeoff between quality and bitrate when H.264 is used.
Upper bounds for transcoding.
MPEG-2 bitrate of 3 Mbps
MPEG-4 QP of 4
H263 QP of 4
Auto Detection Performance Summary for Crew Video.
Correctly Classified MBs
Incorrectly Classified as Encrypted
False Negatives (marked as not encrypted)
4.2.1. Transcoding with a Fixed ROI QP
Limiting the distortion of the encrypted regions, referred to as Region of Interest (ROI) here, allows surveillance systems to record video at lower bitrates. The relatively higher quality of ROI maintains the quality of decrypted regions at an acceptable level. Lowering the upper bound on the ROI QP increases the video bitrate. However, for transcoding purposes, lowering the ROI QP below the upper bound (26) can decrease the overall bitrate of the transcoded video. With higher quality for the ROI (lower QP), transcoders can use a lower bitrate and still preserve the quality of the decrypted regions (ROI) at an acceptable level as the impact of lower transcoder bitrate on high quality ROI would be small.
4.3. Automatic Detection of Encrypted Regions
4.3.1. Encrypted Block Detection Using DCT
With this approach, the randomness of a block is measured by examining the high frequency coefficients. A DCT is applied to all macro blocks in the decoded video. A block is marked as a candidate for decryption when non-zero coefficients are present in the bottom-right block of the DCT block (high frequency coefficients). A block is marked as encrypted if the sum of the absolute value of the high frequency coefficients in the bottom-right block of the DCT block is greater than 5. The threshold is determined experimentally after evaluating encoding videos at various bitrates.
4.3.2. Encrypted Block Detection Using Row-Column Differences
This approach is similar to edge detection in a block; the pixel values are compared with the neighbors, first along rows and then along columns. In our case, if the difference between neighboring pixels is greater than 11, the big-pixel-difference count is incremented by 1. If the total number of big-pixel-differences is greater than 115, the block is marked as encrypted. Thresholds for this method are also determined experimentally. We use Crew.yuv video as the basic video for tuning.
4.3.3. Performance of Automatic Encrypted Block Detection
Experiments were conducted to detect the encrypted blocks in Crew and CarTag videos. The Crew video has 237 frames with 6023 encrypted blocks out of a total of 93, 852 blocks. The CarTag video has 240 frames with 3525 encrypted blocks out of a total of 324,000 blocks.
The DCT based method clearly outperforms the Row-col method but is computationally more expensive. When a block is incorrectly marked as encrypted, the decryption performed on the block (i.e., inverse permutation) essentially encrypts the block. Since the false positive rate is very low, the effect of incorrect classification is minimal. To overcome this, users can interactively undo the decryption when necessary. It is also important to mention the importance of increase ROI quality with a higher QP, not only because that permit us compress the rest of the image even more without fear of losing ROI quality, but also because ROI enhancement quality also increase the auto-detection accuracy.
The detection performance of these methods will drop when video includes regions that have high frequencies naturally. We discovered this in scenes with grass and waves in an ocean. The CarTag video has grass and leaves in the background and result in a large number of false positives. The DCT method has a recall of 100% and is able to detect all the encrypted blocks. The decreased performance is in the form of increased false positives and these can be interactively addressed when the surveillance videos are reviewed.
4.4. Comparative Evaluation
Comparative evaluation of the proposed solution.
Video selective encryption method
Robustness to Transcoding
General bit-rate increases
Transform-domain scrambling coefficients 
Invertible cryptographic obscuration 
Skin tone replacement 
Lower quality ROI 
This paper presents a system for encrypting selected regions in videos. The system can be used for ensuring privacy in video surveillance by hiding the identity revealing regions in the video. The encrypted video can be transcoded and or decrypted at a later time with the right decryption keys. The proposed system is independent of the compression algorithms used. The system was tested using H.264 to MPEG-2, H.264 to MPEG-4, and H.264 to H.263 video transcoders in order to verify video quality. The bitrate increases with the number of encrypted blocks. The proposed reversible encryption increases the video bitrate and experiments with H264 to MPEG-4 show that the increase is up to 23% for high bitrate videos with about 7% of blocks encrypted. This bitrate can be reduced by keeping the ROI QP constant and increasing the frame QP. The increase in bitrates depends on the type of video and the size of encrypted regions. The increase in bitrate is a reasonable cost to pay for protecting individual privacy. The proposed solution does not require any additional information to detect and decrypt encrypted regions. A DCT based method was developed to automatically detect the encrypted regions thus making the system truly independent of the compression algorithms used.
- Senior A, Pankanti S, Hampapur A, et al.: Enabling video privacy through computer vision. IEEE Security and Privacy 2005, 3(3):50-57. 10.1109/MSP.2005.65View ArticleGoogle Scholar
- Dufaux F, Ebrahimi T: Scrambling for privacy protection in video surveillance systems. IEEE Transactions on Circuits and Systems for Video Technology 2008, 18(8):1168-1174.View ArticleGoogle Scholar
- Carrillo P, Kalva H, Magliveras S: Compression independent object encryption for ensuring privacy in video surveillance. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '08), June 2008, Hannover, Germany 273-276.Google Scholar
- Martínez-Ponte I, Desurmont X, Meessen J, Delaigle J: Robust human face hiding ensuring privacy. Proceedings of the Workshop on the Integration of Knowledge, Semantics and Digital Media Technology (WIAMIS '05), April 2005, Montreux, SwitzerlandGoogle Scholar
- Martin K, Plataniotis KN: Privacy protected surveillance using secure visual object coding. IEEE Transactions of Circuits and Systems for Video Technology 2008, 18(8):1152-1162.View ArticleGoogle Scholar
- Boult TE: PICO: privacy through invertible cryptographic obscuration. Proceedings of the Computer Vision for Interactive and Intelligent Environment, November 2005 27-38.View ArticleGoogle Scholar
- Berger M: Privacy mode for acquisition cameras and camcorders. US patent no. 6,067,399, May 2000Google Scholar
- Chen D, Chang Y, Yan R, Yang J: Tools for protecting the privacy of specific individuals in video. EURASIP Journal on Advances in Signal Processing 2007., 2007:Google Scholar
- Wee SJ, Apostolopoulos JG: Secure scalable streaming enabling transcoding without decryption. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), 2001 1: 437-440.Google Scholar
- Magliveras SS, Memon ND: Random permutations from logarithmic signatures. In Proceedings of the 1st Great Lakes Computer Science Conference Computing in the 90's, 1989, Lecture Notes in Computer Science. Volume 507. Springer; 91-97.Google Scholar
- Magliveras SS, Memon ND: Algebraic properties of cryptosystem PGM. Journal of Cryptology 1992, 5(3):167-183. 10.1007/BF02451113MathSciNetView ArticleMATHGoogle Scholar
- Magliveras SS, van Trung T, Stinson DR: New approaches to designing public key cryptosystems using one-way functions and trap-doors in finite groups. Journal of Cryptology 2002, 15: 285-297. 10.1007/s00145-001-0018-3MathSciNetView ArticleMATHGoogle Scholar
- Socek D, Kalva H, Magliveras SS, Marques O, Culibrk D, Furht B: New approaches to encryption and steganography for digital videos. Multimedia Systems 2007, 13(3):191-204. 10.1007/s00530-007-0083-zView ArticleGoogle Scholar
- Magliveras SS: A cryptosystem from logarithmic signatures of finite groups. In Proceedings of the 29th Midwest Symposium on Circuits and Systems, 1986. Elsevier; 972-975.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.