An efficient privacy-preserving comparison protocol in smart metering systems

In smart grids, providing power consumption statistics to the customers and generating recommendations for managing electrical devices are considered to be effective methods that can help to reduce energy consumption. Unfortunately, providing power consumption statistics and generating recommendations rely on highly privacy-sensitive smart meter consumption data. From the past experience, we see that it is essential to find scientific solutions that enable the utility providers to provide such services for their customers without damaging customers’ privacy. One effective approach relies on cryptography, where sensitive data is only given in the encrypted form to the utility provider and is processed under encryption without leaking content. The proposed solutions using this approach are very effective for privacy protection but very expensive in terms of computation and communication. In this paper, we focus on an essential operation for designing a privacy-preserving recommender system for smart grids, namely comparison, that takes two encrypted values and outputs which one is greater than the other one. We improve the state-of-the-art comparison protocol based on Homomorphic Encryption in terms of computation and communication by 56 and 25 %, respectively, by introducing algorithmic changes and data packing. As the smart meters are very limited devices, the overall improvement achieved is promising for the future deployment of such cryptographic protocols for enabling privacy enhanced services in smart grids.


Introduction
Smart grids, as the next generation of power grid, are utilizing both communication technologies and information processing to monitor and manage power grids to enhance reliability, efficiency, and sustainability of power generation.One of the advantages of smart grids compared to traditional power grids is the ability to observe the power consumption of households in very short time intervals in the order of seconds to minutes.As a result of the fine-coarse data reporting, it is possible to provide power consumption statistics to the consumers, which might help to reduce the overall consumption by changing customer behavior, as pointed out in several works [1][2][3][4][5].For example, Honebein et al. [6] defined people as the only true smart part of a smart grid; therefore, monitoring, understanding, and promoting the end-users' roles from passive to active is considered as a fundamental action in smart grids.To this end, there are already several utility companies providing their customers devices and smart phone applications to monitor their real-time consumption.Furthermore, one of the goals of the utility providers, balancing the supply and the demand, also known as demand response (DR), can be achieved more effectively if the utility provider can also provide statistics about the power usage in the surrounding area and generate personalized recommendations, for example, to manage electrical devices like electric cars, heating systems, and ovens in the household [7].
Providing statistics on power consumption and generating personalized recommendations to inform customers are heavily dependent on the smart meter consumption readings.Unfortunately, these readings are highly privacysensitive [8][9][10].The utility provider can use the readings from the smart meters for other purposes, misuse them or even transfer them to other entities without the consent of the customers.As seen in many cases, privacy is considered to be a big challenge for using smart meters to the fullest extent, e.g., enabling personalized services such as generating recommendations.
In this paper, we assume that the utility provider generates statistics and recommendations for the customers so that the customers can adjust the electrical devices for the most cost-effective and environmentally friendly manner.To achieve this, we rely on cryptography, which provides us tools to create privacy by design algorithms.For instance, there are already a number of studies for computing bills and aggregating data [11][12][13][14].The main idea in this research line is to provide only the encrypted power consumption to the utility provider and enable processing the encrypted data without decrypting any sensitive information.This way, the utility provider cannot access to the content but at the same time can perform the algorithms required for the service.Unfortunately, the cryptographic algorithms for this purpose are expensive in terms of computation and communication, which mostly require smart meters to be involved in the computation [15][16][17][18].Since the smart meters are very limited devices, improving the efficiency of the cryptographic algorithms is a challenge.
We address the efficiency problem of a fundamental operation, namely comparison, which is required to design any recommender system.In our setting, the encrypted consumption readings are collected from the customers by an aggregator and the utility provider has the decryption key.For privacy reasons, the aggregator cannot transfer the data directly to the utility provider but can co-operate with the utility provider to generate recommendations.One important step in the system is to compare values, which are only available in the encrypted form.More precisely, the aggregator has two encrypted values, and it needs to know which one is greater than the other one without revealing their contents to anyone including itself.
There are numerous comparison protocols designed for comparing encrypted values [15,16,18].In this paper, we improve the state-of-the-art comparison protocol that relies on homomorphic encryption in terms of run-time by 56 % by introducing algorithmic changes.Furthermore, we also reduce communication cost of the protocol by 25 % by deploying data packing [19,20].Together, these improvements increase the overall efficiency of the comparison protocol with encrypted inputs, bringing smart meters one step closer to run privacy-preserving cryptographic protocols based on homomorphic encryption.
Note that a secure comparison protocol with encrypted values is needed in many applications, not only for generating recommendations, like face recognition [17], fingercode authentication [21], and K-means clustering [22].
Therefore, the protocol we improved in this paper provide a significant performance improvement for other applications as well.

Preliminaries
In this section, we describe the application setting, the security assumptions, and the cryptographic tools used in this work.We also present the symbols and their descriptions in Table 1.

Application setting
In our application setting, we define three roles: (1) smart meters installed at the households, (2) a data aggregator, and (3) a utility provider.Smart meters measure, encrypt, and send consumers' power consumption to the data aggregator, which collects and analyzes encrypted power consumption.Then, the utility provider generates recommendations for its customers by running a cryptographic protocol with the data aggregator.The output of the cryptographic protocol, which depends on the purpose of the recommender system, is in the encrypted form; thus, it is not available neither to the data aggregator nor to the utility provider.The output is then revealed to the customer by using another protocol, secure decryption, which is explained in [23].

Security model
The proposed protocol in this work is built on the semihonest adversarial model, where the data aggregator and the utility provider are honest in the sense that they faithfully follow the designed protocol but will try to infer information from the protocol execution transcript.This assumption is realistic since companies are expected to properly perform required services mentioned in the service level agreement, when engaging in a collaboration.We assume that the utility provider is the only party holding the private keys, while the smart meters and the data aggregator have the public keys for the encryption schemes.We assume that neither party colludes.

Homomorphic encryption
In this work, we rely on two additively homomorphic cryptosystems, Paillier [24] and Damgård, Geislet and Krøigaard (DGK) [15].An additively homomorphic encryption scheme preserves certain structure that can be exploited to process ciphertexts without decryption.Given E pk (m 1 ) and E pk (m 2 ), a new ciphertext whose decryption yields the sum of the plaintext messages m 1 and m 2 can be obtained by performing a certain operation over the ciphertexts: Consequently, exponentiation of any ciphertext with a public value yields the encrypted product of the original plaintext and the exponent:

Paillier cryptosystem
The Paillier encryption function for a given message m ∈ Z η is defined as follows: where η is the product of two distinct large prime numbers p and q, ciphertext c ∈ Z * η 2 , τ ∈ R Z * η and g is a generator of order η.The decryption function is, where λ η is the Carmichael value that is the smallest positive integer such that {∀a ∈ Z * η : The public key is (g, η) and the private key is λ η .
The homomorphic property can be shown as below: ( 3 )

DGK cryptosystem
We also use the DGK cryptosystem [15,25], which is used in constructing cryptographic protocols [17,23] for its efficiency due to its small message size.For generating the public and the private keys, there are three parameters: k, t, and , where < t < k.The process of key generation is as follows: 1. Choose two distinct t-bit prime numbers v p , v q .2. Construct two distinct prime numbers p and q, where v p |(p − 1) and v q |(q − 1) such that n = pq is a k-bit RSA modulus.3. Choose u, the smallest possible prime number but greater than + 2. 4. Choose a random r that is a 2.5t-bit integer [15]. 5. Choose g and h such that ord(g) = uv p v q and ord(h) = v p v q .
The public and the private keys are pk = (n, g, h, u) and sk = (p, q, v p , v q ), respectively.The encryption of a plaintext m ∈ Z u is given as follows: To decrypt the ciphertext, one can build a look-up table for all m ∈ Z u values and obtain m from c v p mod p = (g v p ) m modp.However, DGK scheme can efficiently check whether a ciphertext is an encryption of zero or not.To achieve this, we check whether c v p v q mod n = 1 or more efficiently we only need to prove that c v p v q mod p = 1 or c v p v q mod q = 1, since u < p.
In the rest of the paper, we denote the ciphertext of a message m by [ m] for the Paillier cryptosystem and m for the DGK.

Secure comparison protocol with secret inputs
In this section, we describe the state-of-the-art secure comparison protocol (SCP), which takes two encrypted inputs and outputs the greater one in the encrypted form.SCP based on the DGK construction introduced in [25] is one of the widely-used comparison protocols due to its efficiency.The DGK comparison protocol is a subprotocol in the SCP, where each party possesses a secret but plaintext value.The sub-protocol also uses the DGK cryptosystem for efficiency reasons.
The comparison protocol in [25] is modified and used by Erkin et al. in [17], and Veugen proposed an improved DGK comparison protocol (IDCP) in [18].In the following, we describe the SCP construction.
For the sake of simplicity, we use the names Alice and Bob as the data aggregator and the utility provider, respectively.We assume that Bob has the secret key sk and Alice has access to two encrypted values, [ a] and [ b], and wants to know if a < b.
Initially, Alice computes and then obtains the result of comparison as follows: where

Computing z mod 2
Notice that Alice has access only to [z], and interaction with Bob, who has the private key, is needed to compute modulo reduction, [ z mod 2 ].However, Alice cannot give [z] directly to Bob since this value reveals information on the difference of a and b.Therefore, Alice masks [z] using a random value as follows: where r is a (κ + )-bit uniformly random number and κ is a security parameter.After masking, Alice sends Alice can obtain [z ] by using Eq. 5.
[z ] can be computed more efficiently as follow: where (x) = x/2 .For computing [λ], we run a secure comparison protocol with private inputs as described in the following section.

Computing [ λ]
This protocol outputs an encrypted bit, which shows whether d > r = r mod 2 or not.However, different than the original problem of comparing encrypted a and b, in this protocol Alice and Bob possess r and d in plaintext, respectively.Based on this setting, the IDCP for computing [λ] securely works as follows: 1. Bob sends a bitwise encryption of his input, d0 , . . ., d −1 , to Alice.

Alice blinds each c i with a uniformly random
then permutes e i and sends them to Bob.Note that if c t = 0, where t ∈ {0, . . ., − 1} then e t = 0 as well.4. Bob checks whether there is a zero among e i values.If none of the e i values are encrypted zero then he sets λ = 0, otherwise λ = 1.Then he encrypts λ and sends [ λ] to Alice. 5. Alice corrects [ λ] to obtain [λ] as follows: After obtaining [λ], Alice computes [ z mod 2 ] and [ z ] based on Eqs. 7 and 5 respectively.

Efficient privacy-preserving comparison protocol
In this section, we describe a new version of the original SCP based on the DGK construction, which is significantly more efficient in terms of run-time and communication cost.

Proposed comparison protocol
Complexity analysis and experimental results reveal that the XOR operation in computing c i , in Eq. 9, has a significant impact on the overall efficiency of the DGK comparison protocol for the following two reasons: Veugen [18] proposed a more efficient technique of computing XOR, where r ⊕ d = d when r = 0; otherwise, r ⊕ d = 1 • d −1 (recall that Alice and Bob have access to values r and d, respectively and Alice is computing XOR).Thus, if r equals to 1, one multiplication and one exponentiation with negative exponent should be computed over DGK ciphertexts, which affects the performance of DGK comparison protocol significantly.2. Since the equation that involves XOR is computed during the protocol with encrypted inputs, it is not possible to introduce pre-computation for c i to obtain a more efficient protocol.
Table 2 shows that computing c i constitutes 70 % of the overall run-time of the IDCP for Alice.
Based on these two facts, we propose a more efficient way of computing c i , which does not rely on the original XOR computation.The value c i can be re-written as follows: Alice computes Eq. 11 in three steps: , and sends t i to Alice.

Alice computes
Alice computes c i as follows, Note that Alice can pre-compute v i and factor "3" is not needed in the computation of c i .After computing all c i values, Alice masks each c i and sends masked values to Bob, where he checks if any of the given masked c i is zero, then generates [ λ], and sends it to Alice.She corrects [ λ] based on value s to obtain [ λ], computes Eq. 7, and 5 to obtain [ z ] as in the original protocol.Note that we compare 2 d and 2r instead of d and r respectively for technical reasons explained in the following section.

Correctness proof of computing c i
In this section, we prove the correctness of generating c i by Eq. 12.In order to do that, we check if Eq. 12 generates encrypted zero in the same conditions as the Eq. 9. Table 3 shows the values of c i computed based on the efficient privacy-preserving comparison protocol (EPPCP) and the IDCP, which are denoted as c E i and c I i , respectively.Table 3 analyzes the existence of zero in c E i generated based on the s, d, r, and Based on this table, the value of c IDCP i can be zero in two conditions, where { d < r, s = 1, di = 0, and ri = 1} and { d > r, s = −1, di = 1, and ri = 0}.However, c E i generates zero in more conditions than c I i does.For instance, if {S i+1 = 2, s = −1, d > r, and di = ri = 1}, then c E i = 0. Table 3 shows that values of c I i can be zero in the conditions 4 and 5; however, c E i values are zero in the conditions 3, 4, 5, and 6 based on the assumed values of S i+1 for each condition.We note that if d1 d0 * r1 r0 = 2 and di = ri for 2 i − 1, then the value of c E 0 becomes zero.To fix this problem, we compare 2 d and 2r instead of d and r.Therefore, Eq. 12 does not generate zero in the conditions 3 and 6.Furthermore, for the comparison protocol to work when d = r, we compare 3 d and 3r + 1 instead of d and r respectively, as suggested similarly in [17].

Data packing
According to Table 2, Paillier decryption of [d] (Eq.6) dominates more than 62 % of the comparison protocol execution time at Bob side.We decrease the run-time of Paillier decryption by employing data packing similar to [19,20].The main idea behind data packing is to efficiently use the message space of the Paillier cryptosystem that is much larger than the values to be compared.Assume that z and r are and + κ-bit integers, respectively.Then, [d] =[z + r] is a ( + κ + 1)-bit integer.Let the message space of the Paillier cryptosystem be η = pq, then Alice packs ρ = ( + κ + 1)/η into one Paillier message as follows: and sends [ d] to Bob.Then, Bob computes D sk [ d] , unpacks ρ different values and performs modulo reduction on each unpacked value.
Employing the data packing technique not only reduces the number of very expensive Paillier decryption to be performed but also decreases the number of encrypted messages to be transmitted.

Performance analysis
In this section, we analyze the number of operations over ciphertexts, since they are computationally expensive compared to operations on the plaintext and dominate the protocol execution run-time and provide experimental results for run-time performance.For this purpose, we implemented the EPPCP using C++ and SeComLib [26] library, on a Linux machine running Ubuntu 14.04 LTS, with 64-bit microprocessor and 8 GB of RAM.The experiments are repeated for 10,000 comparisons.Table 4 provides more information about parameters and their corresponding values in our implementation.Table 5 shows the computational complexity of the original DGK comparison protocol, the IDCP, and the EPPCP.Note that the number of multiplications and exponentiations are regarding the computation of c i .According to the Table 5, the original DGK comparison protocol suffers from its high computational complexity regarding the number of multiplications and exponentiations over ciphertexts.Veugen [18] presented two improvements to decrease the computational cost of the DGK comparison protocol, namely an efficient method to compute XOR and an algorithm to mask less c i , which results in a lower number of exponentiations with positive exponent.However, according to Table 5, the new technique of computing XOR have a slight impact on the overall number of multiplications and exponentiations.Moreover, Table 2 shows that computation of e i takes 15 % of the protocol run-time in Alice (the improvement for computing e i [18] is not applied in the implementation); therefore, even a significant improvement over computation of e i does not provide a significant influence on the overall run-time.
Table 5 shows that the computational complexity of computing c i in the EPPCP is decreased to multiplications over ciphertexts, and there is no exponentiation with positive or negative exponent.According to Table 6, this low computational complexity results in 91 % decrease in computation of c i compared to the IDCP.This improvement also reduces the run-time of all computations performed by Alice by 64 %.Table 7 shows the running times of the Paillier decryption (PD), computation of c i , and the total run-time (online phase) of both the EPPCP and the IDCP for different key sizes.It shows that the EPPCP achieves better efficiency compared to the IDCP for the large key sizes.
According to Table 8, running EPPCP 10,000 times takes 41 s, where it takes 93 s for the IDCP.Table 8 also shows that pre-computation phase takes more time in EPPCP as a result of the new method of computing c i , which allows performing more initial computations before runtime.The communication cost between Alice and Bob is decreased by 25 % in EPPCP because of using data packing technique.

Security and privacy of comparison protocol
In this section, we provide a security sketch of the proposed privacy-preserving comparison protocol in the semi-honest model.For a more elaborate security proof, we refer readers to [25].
As mentioned before, smart meters encrypt the power consumption using the Paillier cryptosystem, which is semantically secure under the decisional composite residuosity assumption (for more information about the security of Paillier cryptosystem, we refer reader to [24]); thus, Alice (data aggregator) has only encrypted values.Here, we show that not only does Bob (utility provider)

Conclusions
Comparing consumers' power consumption profiles is a necessary part of smart grids for a number of services including generating personalized recommendations.Since personal profiles contain private information about consumers' power consumption, privacy-preserving approaches should be considered.One of the most effective approaches is based on using cryptographic tools that enable processing encrypted data.Unfortunately, secure and privacy-sensitive versions of such services are computationally expensive, which hinders the deployment of [z ] is the most significant bit of [z] and the result of comparison.If z = 1 then we have a > b, and otherwise a < b.A more efficient method of computing [z ] is based on the IDCP, where we can compute z = z/2 and [ a < b] =[1−z ] =[1] •[ z ] −1 , but we still need to compute z mod 2 .A more detailed explanation regarding computation of [z ] is provided in the following sections.
[d] to Bob to perform modulo reduction, where Bob first decrypts [d], then computes d = d mod 2 and sends [ d] and [d/2 ] back to Alice.Subsequently, to obtain [z mod 2 ], Alice computes [z mod 2 ] =[ d − r mod 2 ] =[ d] •[ r mod 2 ] −1 .Note that z mod 2 = z mod 2 if d > r mod 2 .When d < r mod 2 , an underflow occurs, and Alice has to add 2 to [z] to make the value positive again.Therefore, Alice needs to determine whether d > r mod 2 or not.This is achieved by computing an encrypted value, [λ], which shows the relation between d and r mod 2 .Then, Alice can perform following computation to obtain [ z mod 2 ]: a and b; therefore, Alice masks [z] by adding a random value, [d] =[ z + r], and sends [d] to Bob instead of [z].Since r is a uniformly random (k + )-bit value, [z] is statistically indistinguishable from [d] to Bob. Bob sends back [d mod 2 ] to Alice in the encrypted form, which means she cannot learn any information about the content of [z], but only [ z mod 2 ].Then Alice sends e i values, which are the masked and the permuted c i values, to Bob who checks the existence of an encrypted zero among given e i .Therefore, Bob only receives a list of uniformly random values.Moreover, using a binary random value s through computation of c i prevents Bob from drawing any conclusions about the result of the comparison by checking e i .Since Alice is not authorized to know the result of the comparison, Bob only sends the encrypted value of λ, [ λ], to Alice.Then, she can only correct the [ λ] based on s to obtain [λ] and compute [z ].

Table 1
Symbols and their descriptions

Table 2
Run-time performance for several steps of the IDCP

Table 3
Different conditions based on s, d and r

Table 4
Parameters and their values used in the implementation

Table 6
Run-time performance of the several steps of the EPPCP and the improvements compared to the IDCP

Table 2
also shows that Paillier decryption dominates 62 % of the IDCP run-time by Bob.According to the Table6, by deploying data packing the run-time of the Paillier decryption and all Bob's computations are decreased by 85 and 53 %, respectively.

Table 8
Overall performance of the IDCP and the EPPCP not learn anything about the given encrypted values but also Alice does not learn any information about encrypted output of the algorithm at the end of the proposed comparison protocol.Alice computes together with Bob [z mod 2 ] without revealing any information about [z] to him.Since this value reveals information on the distance between

Table 7
Performance of the IDCP and the EPPCP for different key lengths