Markov Modelling of Fingerprinting Systems for Collision Analysis
© Neil J. Hurley et al. 2008
Received: 8 May 2007
Accepted: 3 December 2007
Published: 12 December 2007
Multimedia fingerprinting, also known as robust or perceptual hashing, aims at representing multimedia signals through compact and perceptually significant descriptors (hash values). In this paper, we examine the probability of collision of a certain general class of robust hashing systems that, in its binary alphabet version, encompasses a number of existing robust audio hashing algorithms. Our analysis relies on modelling the fingerprint (hash) symbols by means of Markov chains, which is generally realistic due to the hash synchronization properties usually required in multimedia identification. We provide theoretical expressions of performance, and show that the use of -ary alphabets is advantageous with respect to binary alphabets. We show how these general expressions explain the performance of Philips fingerprinting, whose probability of collision had only been previously estimated through heuristics.
Multimedia fingerprinting, also known as robust or perceptual hashing, aims at representing multimedia signals through compact and perceptually significant descriptors (hash values). Such descriptors are obtained through a hashing function that maps signals surjectively onto a sufficiently lower-dimensional space. This function is akin to a cryptographic hashing function in the sense that, in order to perform nearly unique identification from the hash values, perceptually different signals—according to some relevant distance—must lead with high probability to clearly different descriptors. Equivalently, the probability of collision ( ) between the descriptors corresponding to perceptually different signals must be kept low. Differently than in cryptographic hashing, signals that are perceptually close must lead to similar robust hashes. Despite this difference with respect to cryptographic hashing, the probability of collision remains the parameter that determines the "resolution" of a method for identification purposes.
A large number of robust hashing algorithms have been proposed recently. This flurry of activity calls for a more systematic examination of robust hashing strategies and their performance properties. In this paper, we take a step in that direction by examining the probability of collision of a certain general class of robust hashing systems, rather than analyzing a particular method. In its binary alphabet version, the class considered broadly encompasses several existing algorithms, in particular, a number of robust audio hashing algorithms [1–4]. We will show that the -ary alphabet version of the class provides an advantage over the binary version for fixed storage size. In order to keep our exposition simple, other issues such as robustness to distortions or to desynchronization are not considered in this analysis. The study of the tradeoffs brought about by the simultaneous consideration of these issues is left as further work. We must also note that we will be dealing with unintentional collisions due to the inherent properties of the signals to be hashed. A related problem not tackled in this paper is the analysis of intentional forgeries of signals—perhaps under distortion constraints—in order to maximize the probability of collision.
In any hashing system, a distance measure must be established in order to determine the closeness between hash values. The commonly used distance for comparing sequences formed by discrete-alphabet symbols is the Hamming distance. This distance is defined as the number of times that symbols with the same index differ in the two sequences. Therefore, when comparing any two -ary symbols their Hamming distance can only take the values or .
for all . Furthermore, we assume that the process is stationary, that is, with statistics independent of . We will also focus without loss of generality on one particular element of the feature vector. Hence, we will write the relevant random variables of the feature vector as and to represent the distributions of the feature value at and , respectively, for any , dropping the implicit index .
Finally note that, although methods which deal with real-valued fingerprints could be deemed in principle to belong to this class (using very large values of ), they rely on the use of mean square error distances instead of the Hamming distance. Thus, their study is not covered by the class of methods studied here.
Lowercase boldface letters such as represent column vectors, while matrices are represented by upper case Roman letters such as . is a matrix with the elements of in the diagonal and zero elsewhere. The symbols and denote the identity and the all-zero matrices, respectively, whereas denotes an all-ones vector, all of suitable size depending on the context. denotes the trace of . The operator stacks sequentially the columns of an matrix into an column vector. The symbol denotes the Kronecker (or direct) product of two matrices, and denotes their Hadamard (component-wise) product. Finally, denotes the Kronecker delta function.
2. Probability of Collision
To fix a point of operation, we consider hash sequences of symbols (assumed integer) which have fixed bit size (storage size). We investigate the probability of collision between two such independent sequences of symbols generated from the Markov chain with transition matrix , whose elements are defined in (4). Note that is a column-stochastic matrix, so that .
with the Hamming distance between the elements of the two sequences. If the random variables were independent, we could apply the central limit theorem (CLT) to for large , in order to compute the probability (6). Although there are short-term dependencies created by the Markov chain, these vanish in the long term. Then we may invoke a broader version of the CLT for locally correlated signals . In summary, the result in  states that, provided the second and third moments of are bounded, then tends to the normal distribution. Finally, notice that is discrete, and then applying the CLT entails approximating a distribution with support in the positive integers using a distribution with support in the whole real line.
We investigate this direct approach in Section 4. Finally, in Section 6 we propose a Chernoff bound to , which is useful when the CLT assumption is not accurate or when the exact computation presents computational difficulties.
3. Mean and Variance of Hamming Distance
In this section, we derive the mean and variance of the Hamming distance using the Markov chain of symbol transitions , defined by (4). To proceed, we assume that represents an irreducible, aperiodic Markov chain.
Using the probabilities (12) and (13), we can derive the mean and variance of the Hamming distance between two independent hash sequences of symbols, assuming that the process starts in the equilibrium distribution (11). This is tantamount to assuming , in which case and , that is, we can drop the index and write . When the initial symbol is chosen with uniform probability from this condition holds if the transition matrix is symmetric. Even if all values for the initial symbol are not equiprobable in reality, the assumption is not too demanding whenever convergence to equilibrium is fast. We investigate a more general case for binary hashes in Section 5.
4. The Stochastic Process of Elemental Distances
In this section, we will investigate the stochastic process of elemental distances, that is, the process that generates the sequence . Through an analysis of this process, we arrive at a full expression for the probability of collision, which is exact in the case of binary hashing sequences with symmetric transition matrices. This is possible because, as we will show, the elemental distance process is itself a Markov chain when and the transition matrix is symmetric. Even for the case , we note that the elemental distance process is well approximated by a Markov chain, and then the expression obtained for the probability of collision can be interpreted as a good approximation to the true collision probability.
It follows that, whenever the diagonal elements of are all equal and the off-diagonals are all equal, the dependence of on factors from (23) and (24), and is independent of the time-step . In this case, the process of elemental distances is itself a stationary Markov chain. Let us assume that has the structure with and . In this case, as , we can see that with and . As we have discussed above, this is the structure that allows to cancel the dependence on in (23) and (24). For , observe that symmetry implies that is always of the form above, and then the conditions are always fullfilled in that case.
On the other hand, even when the elemental distances do not follow a Markov chain, since , the equilibrium probability, the elemental distance process is well approximated by the Markov chain with transition matrix obtained by replacing in (23) and (24) with , such that . From now on, we will refer loosely to the elemental distance Markov chain, meaning, when appropriate, the Markov chain derived from this approximation.
4.1. Probability of Collision
Expression (28) gives the exact probability of collision when the sequence of elemental distances is a Markov chain. In other cases, it will lead to an approximation. Consequently, the analysis is exact for and symmetric, in which case ( ) can be determined easily from .
5. Binary Hashes with Symmetric Transition Matrix
While (31) holds under the assumption that the distribution of is the equilibrium distribution, it is also possible to derive the exact mean and variance of from an arbitrary initial distribution. This case is interesting, since, although the symbol sequences are assumed to be generated from independent sources, at the application level, the first bit of the hash sequence corresponding to the input signal is sometimes aligned with that of the hash sequences in the database. We can handle this scenario by assuming that the distance between the initial pair of bits is zero.
5.1. Exact Mean and Variance
6. Chernoff Bounding
For large and small probabilities the CLT can exhibit large deviations from the true probabilities. This is due to the fact that the CLT gives an approximation based only on the two first moments of the real distribution. Also, the exact computation (28) can run into numerical difficulties due to the combinatorials involved. Then, it is interesting to see what can be obtained by means of Chernoff bounding on (6). Apart from the interest of a strict upper bound, this strategy also provides the error exponent followed by the integral of the tail of the distribution of .
7. Empirical Results
Matlab source code and data assoicated with the empirical results given below can be downloaded from http://www.ihl.ucd.ie.
7.1. Synthetic Markov Chains
The CLT approximation has good agreement in the binary case for , but is significantly less accurate for 4-ary hashes. This is due to the fact that in the second case, the pdf of is significantly skewed as zero distances are more likely to happen. Due to this, the CLT approximation understimates the tail of the true distribution. The Chernoff bound, also shown in Figure 1, follows the same shape as the exact distribution and is tighter for high values of than the CLT approximation.
7.2. The Philips Method
We show in this subsection how the Markov modelling that we have described is applicable to the hashing method proposed by Haitsma et al. , commonly known as the Philips method. Moreover we show how previous work on modelling this particular method allows to obtain analytically the parameters of the Markov chain.
In previous work , we developed a model that allows the analysis of the performance of the Philips method under additive noise and desynchronisation. Using this model, the transition matrix of the Markov chain associated to the bitstream of the Philips hash can be determined analytically as follows. In  we analysed the bit error that results from desynchronization, the lack of alignment between the original framing used in the acquisition stage and the framing that takes place in the identification stage.
where is the correlation coefficient corresponding to that band and that level of desynchronization. This model was shown therein to give very good agreement with empirical results, even with real audio (and hence nonstationary) input signals.
In the results presented in Figure 2, and hence the correction factor for this value of is . In summary, our analysis is able to tackle dependencies without resorting to any heuristics.
7.2.1. Real Audio Signals
Although our model assumes stationarity, which is clearly not the case for real audio signals, good agreement is found between the model predictions and empirical data. The greatest discrepancy appears in the AC/DC audio and may be due to greater dynamics in this song. To improve the results, we could apply the approach used in , where real audio signals are approximated by stationary stretches and apply our model separately to each stretch. While this approach can provide the probability of collision within each stationary stretch, combining these into an overall probability of collision could prove problematic.
We have examined the probability of collision of a certain general class of robust hashing systems that can be described by means of Markov chains. We have given theoretical expressions for the performance of general chains of -ary hashes, by deriving the mean and variance of the distance between independent hashes and applying a CLT approximation for the probability distribution. We have been able to derive an expression for the distribution, which is exact for binary symmetric hashes and gives a very good approximation otherwise. We have confirmed the accuracy of the Gaussian distribution on binary hashes once the hash sequence is sufficiently large. Moreover, we derived the binary transition matrix for the Philips method and showed that the Markov chain model has very good agreement with empirical results for this method. While we have shown that for , -ary chains have an advantage over binary chains from the point of view of collision, higher order alphabets will inevitably lead to a degradation of performance under additive noise and desynchronisation error. The performance tradeoffs that result will be examined in future work.
Using (17) and (A.12) in (15) we finally obtain (21).
B. Variance of Binary Symmetric Hash Sequence
Finally, inserting (36) and (B.5) into (15), we arrive at (39).
- Haitsma J, Kalker T, Oostveen J: Robust audio hashing for content identification. Proceedings of the International Workshop on Content-Based Multimedia Indexing (CBMI '01), September 2001, Brescia, Italy 117-125.Google Scholar
- Mihçak MK, Venkatesan R: A perceptual audio hashing algorithm: a tool for robust audio identification and information hiding. In Proceedings of the 4th International Workshop on Information Hiding (IHW '01), April 2001, Pittsburgh, Pa, USA, Lecture Notes In Computer Science. Volume 2137. Springer; 51-65.Google Scholar
- Baluja S, Covell M: Content fingerprinting using wavelets. Proceedings of the 3rd European Conference on Visual Media Production (CVMP '06), November 2006, London, UK 209-212.Google Scholar
- Kim S, Yoo CD: Boosted binary audio fingerprint based on spectral subband moments. Proceedings of the 32nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '07), April 2007, Honolulu, Hawaii, USA 1: 241-244.Google Scholar
- Haitsma J, Kalker T: A highly robust audio fingerprinting system. Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR '02), October 2002, Paris, France 107-115.Google Scholar
- Blum M: On the central limit theorem for correlated random variables. Proceedings of the IEEE 1964,52(3):308-309.View ArticleGoogle Scholar
- Magnus JR, Neudecker H: Matrix Differential Calculus with Applications in Statistics and Econometrics. 2nd edition. John Wiley & Sons, New York, NY, USA; 1999.MATHGoogle Scholar
- Balado F, Hurley NJ, McCarthy EP, Silvestre GCM: Performance analysis of robust audio hashing. IEEE Transactions on Information Forensics and Security 2007,2(2):254-266.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.