Fingerprint template protection using minutia-pair spectral representations

Storage of biometric data requires some form of template protection in order to preserve the privacy of people enrolled in a biometric database. One approach is to use a Helper Data System. Here it is necessary to transform the raw biometric measurement into a fixed-length representation. In this paper we extend the spectral function approach of Stanko and Skoric [WIFS2017], which provides such a fixed-length representation for fingerprints. First, we introduce a new spectral function that captures different information from the minutia orientations. It is complementary to the original spectral function, and we use both of them to extract information from a fingerprint image. Second, we construct a helper data system consisting of zero-leakage quantisation followed by the Code Offset Method. We show empirical data which demonstrates that applying our helper data system causes only a small performance penalty compared to fingerprint authentication based on the unprotected spectral functions.


A. Biometric template protection
Biometric authentication has become popular because of its convenience. Biometrics cannot be forgotten or left at home. Although biometric data is not exactly secret (we are leaving a trail of fingerprints, DNA etc.), it is important to protect biometric data for privacy reasons. Unprotected storage of biometric data could reveal medical conditions and would allow cross-matching of entries in different databases. Largescale availability of unprotected biometric data would make it easier for malevolent parties to leave misleading traces at crime scenes (e.g. artificial fingerprints [13], synthesized DNA [8].) One of the easiest ways to properly protect a biometric database against breaches and insider attacks (scenarios where the attacker has access to decryption keys) is to store biometrics in hashed form, just like passwords. An errorcorrection step has to be added to get rid of the measurement noise. To prevent critical leakage from the error correction redundancy data, one uses a Helper Data System (HDS) [11], [5], [17], for instance a Fuzzy Extractor or a Secure Sketch [10], [6], [3]. The best known and simplest HDS scheme is the code-offset method (COM). The COM utilizes a linear binary error-correction code and thus requires a fixedlength representation of the biometric measurement. Such a representation is not straightforward when the measurement noise can cause features of the biometric to appear/disappear. For instance, some minutiae may not be detected in every image captured from the same finger. A fixed-length representation called spectral minutiae was introduced by Xu et al. [24], [21], [22], [23]. For every detected minutia of sufficient quality, the method evaluates a Fourier-like spectral function on a fixed-size two-dimensional grid; the contributions from the different minutiae are added up. Disappearance of minutiae or appearance of new ones does not affect the size of this representation. One of the drawbacks of Xu et al. 's construction is that phase information is discarded in order to obtain translation invariance. Nandakumar [14] proposed a variant which does not discard the phase information. However, it reveals personalised reliability data, which makes it difficult to use in a privacypreserving scheme. A minutia-pair based variant of Xu et al.'s technique was introduced in [18]. It has a more compact grid and reduced computation times. Minutia pairs (and even triplets) were used in [7], [9], but with a different attacker model that allows encryption keys to exist that are not accessible to the attacker.

B. Contributions and outline
First we extend the pair-based spectral minutiae method [18] by introducing a new spectral function that captures different information from the minutia orientations. Then we use the spectral functions as the basis for a template protection system. Our HDS consists of two stages. In the first stage, we discretise the analog spectral representation using a zero-leakage HDS [5], [17]. This first HDS reduces quantisation noise, and the helper data reveals no information about the quantised data. Discretisation of the spectral functions typically yields only one bit per grid point. We concatenate the discrete data from all the individual grid points into one long bitstring. In the second stage we apply the Code Offset Method. Our code of choice is a Polar code, because Polar code are low-complexity capacity-achieving codes with flexible rate. We present False Accept vs. False Reject tradeoffs at various stages of the data processing. We introduce the 'superfinger' enrollment method, in which we average the spectral functions from multiple enrollment images. By combining three enrollment images in this way, and constructing a polar code specifically tuned to the individual bit error rate of each bit position, we achieve an Equal Error Rate around 1% for a high-quality fingerprint database, and around 6% for a lowquality database. The outline of the paper is as follows. In Section II we introduce notation briefly review helper data systems, the spectral minutiae representation, and polar codes. In Section III we introduce the new spectral function. In Section IV we explain our experimental approach and motivate certain design choices such as the number of discretisation intervals and the use of a Gaussian approximation. We introduce two methods for averaging enrollment images. Section V contains our results, mostly in the form of ROC curves. In Section VI we discuss the results and identify topics for future work.

A. Notation and terminology
We use capitals to represent random variables, and lowercase for their realizations. Sets are denoted by calligraphic font. The set S is defined as S = {0, . . . , N − 1}. The mutual information (see e.g. [4]) between X and Y is I(X; Y ). The probability density function (pdf) of the random variable X ∈ R in written as f (x) and its cumulative distribution function (cdf) as F (x). We denote the number of minutiae found in a fingerprint by Z. The coordinates of the j'th minutia are x j = (x j , y j ) and its orientation is θ j . We write x = (x j ) Z j=1 and θ = (θ j ) Z j=1 We will use the abbreviations FRR = False Reject Rate, FAR = False Accept Rate, EER = Equal Error Rate, ROC = Receiver Operating Characteristic. Bitwise xor of binary strings is denoted as ⊕.

B. Helper Data Systems
A HDS is a cryptographic primitive that allows one to reproducibly extract a secret from a noisy measurement. A HDS consist of two algorithms: Gen (generation) and Rec (reconstruction), see Fig. 1. The Gen algorithm takes a measurement X as input and generates the secret S and a helper data W . The Rec algorithm has as input a noisy measurement Y and the helper data; it outputs an estimatorŜ. If Y is sufficiently close to X thenŜ = S. The helper data should not reveal much about S. Ideally it holds that I(W ; S) = 0. This is known as Zero Leakage helper data.
on M grid points. The first-stage enrollment procedure Gen1 is applied to each x i individually, yielding short (mostly one-bit) secrets s i and zero-leakage helper data w i . The s 1 . . . s M are concatentated into a string k. Residual noise in k is dealt with by the secondstage HDS (Code Offset Method), whose Gen2 produces a secret c and helper data r. A hash h(c||z) is computed, where z is salt. The hash and the salt are stored. In the verification phase, the noisy y is processed as shown in the bottom half of Fig. 2. The reconstructed secretĉ is hashed with the salt z; the resulting hash is compared to the stored hash.

D. Minutia-pair spectral representation
Minutiae are special features in a fingerprint, e.g. ridge endings and bifurcations. We briefly describe the minutia-pair spectral representation introduced in [18]. For minutia indices a, b ∈ {1, . . . , Z} the distance and angle between these minutiae are given by R ab = |x a − x b | and tan φ ab = ya−y b xa−x b . The spectral function M xθ is defined as where σ is a width parameter. The spectral function is evaluated on a discrete (q, R) grid. The variable q is integer and can be interpreted as the Fourier conjugate of an angular variable, i.e. a harmonic. The function M xθ is invariant under translations of x. When a rotation of the whole fingerprint image is applied over an angle δ, the spectral function transforms in a simple way, (2)

E. Zero Leakage Helper Data Systems
We briefly review the ZLHDS developed in [5], [17] for quantisation of an enrollment measurement X ∈ R. The density function of X is f , and the cumulative distribution function is F . The verification measurement is Y . The X and Y are considered to be noisy versions of an underlying 'true' value. They have zero mean and variance σ 2 X , σ 2 Y , respectively. The correlation between X and Y can be characterised by writing Y = λX + V , where λ ∈ [0, 1] is the attenuation parameter and V is zero-mean noise independent of X, with variance σ 2 V . It holds that σ 2 Y = λσ 2 X + σ 2 V . We consider the identical conditions case: the amount of noise is the same during enrollment and reconstruction. In this situation we have The quantisation boundaries are given by The Gen algorithm produces the secret s as s = max{α ∈ S : x ≥ Ω α } and the helper data w ∈ [0, 1) as w = [F (x) − s−1 j=0 p j ]/p s . The inverse relation, for computing x as a function of s and w, is given by The Rec algorithm computes the estimatorŝ as the value in S for which it holds that y ∈ (τŝ ,w , τŝ +1,w ), where the parameters τ are decision boundaries. In the case of Gaussian noise these boundaries are given by Here it is understood that ξ −1,w = −∞ and ξ N,w = ∞, resulting in τ 0,w = −∞, τ N,w = ∞.
The above scheme ensures that I(W ; S) = 0 and that the reconstruction errors are minimized.  Fig. 2 the Gen2 computes the helper data r as r = Syn k. The c in Fig. 2 is equal to k. The Rep2 computes the reconstructionk = k ⊕ SynDec(r ⊕ Syn k ).

G. Polar codes
Polar codes, proposed by Arıkan [2], are a class of linear block codes that get close to the Shannon limit even at small code length. They are based on the repeated application of the polarisation operation 1 0 1 1 on two bits of channel input. Applying this operation creates two virtual channels, one of which is better than the original channel and one worse. For n channel inputs, repeating this procedure in the end yields m near-perfect virtual channels, with m/n close to capacity, and n − m near-useless channels. The m-bit message is sent over the good channels, while the bad ones are 'frozen', i.e used to send a fixed string known a priori by the recipient. The most popular decoder is the Successive Cancellation Decoder (SCD), which sequentially estimates message bits (c i ) m i=1 according to the frozen bits and the previously estimated bitsĉ i−1 . Polar codes have been recently adopted for the next generation wireless standard (5G), especially for control channels, which have short block length (≤ 1024).

III. A NEW SPECTRAL FUNCTION
Consider Fig. 3 (modified from [20]). The invariant angle β a is defined as the angle from the orientation of minutia a to the connecting line ab, taken in the positive direction. (The β b is defined analogously). Modulo 2π it holds that θ a + β a = φ ab and θ b + β b = φ ab + π. The spectral function (1) uses only the invariant angle β a − β b + π = θ b − θ a . The second invariant angle, which can be written e.g. as π − β a − β b = θ a + θ b − 2φ ab , is not used. We therefore now introduce a new spectral function, denoted as M xβ , which incorporates the invariant We will use M xθ , M xβ and their fusion.

A. Databases
We use the MCYT, FVC2000, and FVC2002 database. The MCYT database [15] contains good-quality images from 100 individuals: 10 fingers per individual and 12 images per finger. FVC2000 and FVC2002 contain low-quality images (only index and middle fingers [12]). Each FVC database contains 100 fingers, 8 images per finger. In FVC2002, images number 3, 4, 5, and 6 have an exceptionally large angular displacement, so they are omitted from the experiments. We extract the minutia position and orientation (x j , y j , θ j ) by using VeriFinger software [1]. For MCYT we evaluate the spectral functions on the same grid as [18], namely R ∈ {16, 22, 28, . . . , 130} and q ∈ {1, 2, . . . , 16} and we maintain σ = 2.3 pixels. For the FVC databases we use the same grid, and σ = 3.2 pixels turns out to be a good choice. The average number of minutiae that can be reliably found is Z = 35.

B. No image rotation
As mentioned in [18], during the reconstruction procedure one can try different rotations of the verification image, but it results only in a minor improvement of the EER. For this reason we do not apply image rotation.

C. Quantization methods
Before quantization all spectral functions are normalized to zero mean and unit variance, where the variance is taken of the real and imaginary part together. We quantize the real and imaginary part of the spectral functions separately. We study two methods: 'hard thresholding' (without helper data) and the Zero Leakage quantisation of Section II-B. The hard thresholding gives a bit value '1' if Re M > 0 and '0' otherwise. We will show results for this method mainly to demonstrate the advantages of Zero Leakage quantisation.

D. Gaussian probability distributions
When using the ZLHDS formulas we will assume that the spectral functions are Gaussian-distributed. Figs. 4 and 5 illustrate that this assumption is not far away from the truth. 1  1 Note that we often see correlations between the real and imaginary part. This has no influence on the ZLHDS. E. Zero leakage quantization 1) Signal to noise ratio; setting N In the ZL HDS of Section II-E, the optimal choice of the parameter N (number of quantization intervals) depends on the signal to noise ratio. Fig. 6 shows a comparison between N = 2 and N = 3. At low noise it is obvious that N = 3 extracts more information from the source than N = 2. At σ V /σ X larger than approximately 0.3, there is a regime where N = 3 can extract more in theory, but is hindered in practice by the high bit error rate. At σ V /σ X > 0.55 the N = 2 'wins' in all respects. Lines without markers: Mutual information between the enrolled key S and the reconstructed keyŜ given helper data W , as a function of σ V /σ X . Markers: bit error rate as a function of σ V /σ X . The curves follow equations (22) and (26) from [18].
For our data set, we define a σ 2 X (q, R) for every grid point (q, R) as the variance of M(q, R) over all images in the database. The noise σ 2 V (q, R) is the variance over all available images of the same finger, averaged over all fingers. Figs. 7 and 8 show the noise-to-signal ratio. Note the large amount of noise; even the best grid points have σ V /σ X > 0.45. Fig. 6 tells us that setting N = 2 is the best option, and this is the choice we make. At N = 2 we extract two bits per grid point from each spectral function (one from Re M, one from Im M). Hence our bit string string k (see Fig. 2) derived from M xθ has length 640. When we apply fusion of M xθ and M xβ this becomes 1280. For N = 2 the formulas in Section II-E simplify to A 0 = (−∞, 0), , τ 1,w = λ 2 (ξ 0,w + ξ 1,w ). Since we work with Gaussian distributions, F is the Gaussian cdf ('probability function').

2) Enrollment and reconstruction
We have experimented with three different enrollment methods: E1. A single image is used. E2: We take the first 2 t images of a finger and calculate the average spectral function. We call this the 'superfinger'  method. In the ZLHDS calculations the signal-to-noise ratio of the average spectral function is used. E3: For each of t images we calculate an enrollment string k. We apply bitwise majority voting on these strings. (This requires odd t.) The reconstruction boundaries are calculated based on the superfinger method, i.e. as in E2.

Reconstruction:
We study fingerprint authentication with genuine pairs and impostor pairs. For pedagogical reasons we will present results at each stage of the signal processing: (1) spectral function domain, before quantisation; (2) binarized domain, without HDS; (3) with ZLHDS; (4) with ZLHDS and discarding the highest-noise grid points. In the spectral function domain the fingerprint matching is done via a correlation score [18]. In the binarized domain we look at the Hamming weight between the enrolled k and the reconstructedk. For all cases we will show ROC curves in order to visualise the FAR-FRR tradeoff as a function of the decision threshold. Let the number of images per finger be denoted as M , and the number of fingers in a database as L . For the MCYT database, which is larger, we take only one random image per impostor finger, resulting in O(M L 2 ) data points. E2+E3: For genuine pairs we compare the superfinger to the remaining M − t images. Thus we have (M − t)L data points. Impostor pairs are generated as for E1. Note: The VeriFinger software was not able to extract information for every image.

A. FAR/FRR rates before error correction
For each the data processing steps/options before application of the Code Offset method, we investigate the False Accept rates and False Reject rates. We identify a number of trends.
• Figs. 9 and 10 show ROC curves. All the non-analog curves were made under the implicit assumption that for each decision threshold (number of bit flips) an error-correcting code can be constructed that enforces that threshold, i.e. decoding succeeds only if the number of bit flips is below the threshold. Unsurprisingly, we see in the figures that quantisation causes a performance penalty. Furthermore the penalty is clearly less severe when the ZLHDS is used. Finally, it is advantageous to discard some grid points that have bad signal-to-noise ratio. For the curves labeled 'ZLHDS+reliable components' only the least noisy 3 512 bits of k were kept (1024 in the case of fusion). Our choice for the number 512 is not entirely arbitrary: it fits errorcorrecting codes. Note in Fig. 10 that ZLHDS with reliable component selection performs better than analog spectral functions without reliable component selection. (But not better than analog with selection.) • The E2 and E3 enrollment methods perform better than E1. Furthermore, performance increases with t. A typical example is shown in Fig. 11.
• The spectral functions M xθ and M xβ individually have roughly the same performance. Fusion yields a noticeable improvement. An example is shown in Fig. 12. (We implemented fusion in the analog domain as addition of the two similarity scores.) • Tables I to V show Equal Error Rates and Bit Error Rates. We see that enrollment methods E2 and E3 have similar performance, with E2 yielding a somewhat lower genuinepair BER than E3.
• In Table I it may look strange that the EER in the rightmost column is sometimes lower than in the 'analog' column. We think this happens because there is no reliable component selection in the 'analog' procedure.
• Ideally the impostor BER is 50%. In the tables we see that the impostor BER can get lower than 50% when the ZLHDS is used and the enrollment method is E2. On the other hand, it is always around 50% in the 'No HDS' case. This seems Fig. 9: Performance result for several processing methods. FVC2000. Enrollment method E2 with t = 3.
to contradict the Zero Leakage property of the helper data system. The ZLHDS is supposed not to leak, i.e. the helper data should not help impostors. However, the zero-leakage property is guaranteed to hold only if the variables are independent. In real-life data there are correlations between grid points and correlations between the real and imaginary part of a spectral function.

B. Error correction: Polar codes
The error rates in the genuine reconstructedk are rather high, at least 0.21. In order to apply the Code Offset Method with a decent message size it is necessary to use a code that has a high rate even at small codeword length. Consider the case of fusion of M xθ and M xβ . The codeword length is 1280 bits (1024 if reliable component selection is performed). Suppose we need to distinguish between 2 20 users. Then the message length needs to be at least 20 bits, in spite of the high bit error rate. Furthermore, the security of the template protection is determined by the entropy of the data that is input into the hash function (see Fig. 2); it would be preferable to have at least 64 bits of entropy. We constructed a number of Polar codes tuned to the signalto-noise ratios of the individual grid points. The codes are designed to find a set of reliable channels, which are then assigned to the information bits. Each code yields a certain FAR (impostor string accidentally decoding correctly) and FRR (genuine reconstruction string failing to decode correctly), and hence can be represented as a point in an ROC plot. This is shown in Fig. 13. For the MCYT database we have constructed a Polar code with message length 25 at an EER around 1.2% (compared to 0.7% before error correction). For the FVC2000 database we have constructed a Polar code with message length 15 at an EER around 6% (compared to 3.3% EER before error correction). Note that the error correction is an indispensable part of the privacy protection and inevitably leads to a performance penalty. However, we see that the penalty is not that bad, especially for high-quality fingerprints.
From our results we also see that even under the best circumstances (high-quality MCYT database) the entropy of the extracted string is severely limited (≤25 bits). In order to achieve a reasonable security level of the hash, at least two fingers need to be combined. We do not see this as a drawback of our helper data system; given that the EER for one finger is around 1%, which is impractical in real-life applications, it is necessary anyhow to combine multiple fingers.  We experimented with random codebooks to see if we could extract more entropy from the data than with polar codes. At low code rates, a code based on random codewords can be practical to implement. Let the message size be , and the  codeword size m. A random table needs to be stored of size 2 · m bits, and the process of decoding consists of computing 2 Hamming distances. We split the 1024 reliable bits into 4 groups of m = 256 bits, for which we generated random codebooks, for various values of . The total message size is k = 4 and the total codeword size is n = 4m. The results are shown in Fig. 13. In short: random codebooks give hardly any improvement over Polar codes.

VI. SUMMARY AND DISCUSSION
A Helper Data System protects privacy but causes a fingerprint recognition degradation in the form of increased EER. We have built a HDS from a spectral function representation of fingerprint data, combined with a Zero Leakage quantisation scheme. It turns out that our HDS causes only a very small EER penalty when the fingerprint quality is high. The best results were obtained with the 'superfinger' enrollment method (E2, taking the average over multiple enrollment  images in the spectral function domain), and with fusion of the M xθ ,M xβ functions. The superfinger method performs slightly better than the E3 method and also has the advantage that it is not restricted to an odd number of enrollment captures.
For the high-quality MCYT database, our HDS achieves an EER around 1% and extracts a 1024-bit string with ≤ 25 bits of entropy. In practice multiple fingers need to be used in order to obtain an acceptable EER. This automatically increases the entropy of the hashed data. The entropy can be further increased by employing tricks like the Spammed Code Offset Method [19].
As topics for future work we mention (i) testing the HDS on more databases; (ii) further optimisation of parameter choices such as the number of reliable components, and the number of minutiae used in the computation of the spectral functions; (iii) further tweaking of the Polar codes.