Low-cost data partitioning and encrypted backup scheme for defending against co-resident attacks

Aiming at preventing user data leakage and the damage that is caused by co-resident attacks in the cloud environment, a data partitioning and encryption backup (P&XE) scheme is proposed. After the data have been divided into blocks, the data are backed up using the XOR operation between the data. Then, the backup data are encrypted using a random string. Compared with the existing scheme, the proposed scheme resolves the conflict between data security and survivability via encrypted backup. At the same time, because the XOR-encrypted backup causes multiple data blocks to share the same backup data, the storage overhead of the user is reduced. In this paper, existing probabilistic models are used to compare the performances of an existing scheme and the P&XE scheme in terms of data security, data survivability and user storage overhead, and the overall performances of the two schemes in terms of these three aspects that are compared using control variables. Finally, the experimental results demonstrate the effectiveness of the P&XE scheme at improving user data security and survivability and reducing user storage overhead.


Introduction
Cloud computing provides users with various computing resources and storage resources in an on-demand and ubiquitous manner through the network, thereby substantially reducing users' computing and storage overhead [1][2][3]. Virtualization technology is an important part of cloud computing. To effectively utilize physical resources, a cloud service provider typically allocates multiple virtual machines of various tenants to the same physical machine, which is called the co-resident of the virtual machine [4]. Despite the logical isolation of a VM from its underlying hardware and from other VMs that are hosted on the same server, the co-resident architecture can be exploited by attackers, thereby exposing the cloud environment to a huge potential threat ( [5][6][7]; Wei [8]; Wei [9]). For example, when an attacker co-resides with its target virtual machine, it can bypass the logical isolation to illegally access (steal) or destroy user data. In the relevant literature, the probabilities that data cannot be stolen and cannot be corrupted are called the data security and the survivability, respectively [10][11][12].
In recent years, research on resisting co-resident attacks has yielded fruitful results.
The most straightforward solution for resisting coresident attacks is to eliminate the side channel [13]. There are also studies [14][15][16][17] that demonstrate the vulnerability of virtual machine monitors. Once an attacker controls a virtual machine monitor, all virtual machines that are running on the same physical machine will face significant security risks. Therefore, a mechanism that is based on removing virtual machine monitors for defending against such attacks was proposed in [18]. However, the above solution requires the modification or even redesign of the existing system architecture. Based on Intel cache allocation technology, a mitigation mechanism was proposed in [19] for defending against co-resident attacks on the last-level cache in cloud servers in multicore processors. Another mechanism for mitigating coresident attacks by detecting abnormal behavior based on system components (CPU, cache, etc.) was proposed in [20,21]. The HomeAlone defense mechanism, which was designed by [22], identifies malicious co-residents via the analysis of the side channel. Network flow watermarking technology was introduced in [23] to mitigate co-resident attacks by detecting malicious virtual machines in the same network. Based on LLC access collision, a covert channel communication method was proposed in [24] for virtual machine co-resident detection. A defensive mechanism, namely, virtual private cloud (VPC), was introduced in [25] for mitigating coresident attacks in the Amazon Elastic cloud. Then, [26] further evaluated the performance of VPC technology. The virtual machine allocation strategy was first proposed in [27] for mitigating co-resident attacks by increasing the difficulty of the attacker co-residing with the target. Then, a new virtual machine placement strategy, namely, PSSF, was proposed in [28,29] for increasing the security of virtual machines by prioritizing the physical machines that are used or in use by users to increase the difficulty of malicious users co-residing with their targets. In addition, a game-theory-based approach was used in [30] and ( [31,32];) to increase co-resident difficulties, thereby reducing the probability of coresident attacks.
The prior works that address co-resident attacks have mostly focused on addressing side channels or VM allocations, which typically requires the modification of the existing cloud system architecture or assistance from the cloud service provider. In [33,34], a new solution to the problem of co-resident attacks in the cloud environment was proposed. From the perspective of user's original requests based on the data partition technique, a user's information is divided into multiple separate data blocks. Each of these blocks is handled by a separate VM. For cases where data can be useful only in its integrity [35,36], data partitioning has been used as an effective method for protecting sensitive information in the cloud. For example, the stripping method using data partitioning and image analysis was proposed in [36,37] for protecting image data with sensitive information in the cloud. In [38], data partitioning techniques were used in conjunction with remote backup algorithms to enhance the security of data that are stored on cloud servers. Data partitioning techniques were first introduced in [33] for solving the problem of co-resident attacks. In addition, this article identifies the best data partitioning strategy (the optimal number of user VMs) for mitigating the effects of co-residence attacks.
Data partitioning technology can effectively improve the security of data: unless the attacker can access all the independent data blocks, the complete information cannot be obtained. However, data partitioning reduces the survivability of the data because any data block corruption will destroy the integrity of the information, thereby rendering the data unusable. To improve the survivability of data, users can create a replica of each block [39]; however, copying the data will increase the probability of the data being stolen. The trade-off between data security and data survivability under traditional information systems was studied in [35] without considering coresident attacks. Then, [40] modeled the effects of coresident attacks on the data partitioning and replication backup schemes. The partition and replica backup (P&R) scheme has been proposed for determining the optimal partitioning and backup strategy against coresident attacks and balances data security, data survivability, and user storage overhead. However, the P&R scheme imposes high storage overhead on users.
To reduce the user storage overhead and improve the data security and survivability, this paper proposes the partitioning and XOR-encrypted backup (P&XE) scheme. Via data partitioning and encrypted backup, the user's storage overhead is reduced, and the data security and data survivability are improved.
The remainder of the paper is organized as follows: Section 2 introduces the existing P&R scheme and attack model. Section 3 introduces the P&XE scheme. Section 4 presents the formulas for measuring data security and user storage overhead. The P&XE and P&R solutions are compared in terms of data security, data survivability, and user storage overhead in Section 5. Section 6 presents the conclusions of this work.
2 Existing scheme and attack models

Existing scheme
Users have sensitive information that must be protected. The attacker's actions may result in unauthorized access (stealing) of information and/or corrupted information, thereby rendering the information impossible for the user to use. To prevent information from being stolen  (data security), the user divides it into x blocks of data (see Fig. 1), where x > 1 (the maximum number of blocks can be limited according to the scenario/demand). Unless the attacker has access to all x blocks of data, the data are safe. However, the data can only be used if its integrity is maintained. If any block of the information has been corrupted, the information integrity is lost and the user cannot use the information. To avoid this scenario, the user enhances the data survivability by creating y i replicas of each data block i (1 ≤ i ≤ 10) (see Fig. 1). The data partitioning/replication scheme is denoted as R = (x,y 1 ,...,y x ), in which the user divides the data into x blocks. The number of replications of the ith block of data is y i . To destroy the user's data, the attacker should destroy all y i copies of any data block i. To steal information, an attacker must acquire at least one copy of any data block. Creating more blocks makes the information more difficult to steal but more vulnerable to corruption. Creating more copies for each block makes the data less susceptible to data corruption but makes the data more vulnerable to data theft. The optimal data blocking scheme should balance the security and survivability of the data.
Suppose there are n servers in the cloud computing system. After the user divides the data into separate blocks and creates multiple copies of these blocks (a total of k data blocks), the user sends k requests to the resource management system (RMS) to create a VM for each data block. The cloud resource management system (RMS) creates k users' user virtual machines (UVMs) and distributes these UVMs randomly to available physical servers. A server can obtain between 0 and k UVMs, and k UVMs can be distributed among between 1 and min (n, k) servers.

Attack Model
An attacker attempts to access a user's information to steal or destroy it. It is only possible to access the relevant data of the UVM if the attacker's virtual machine (AVM) is located on the same server as the UVM. To co-reside with the user's UVM, the attacker submits m requests to assign m AVMs to the same cloud system. The RMS creates an AVM for each request and randomly distributes it to n servers. If the AVM co-resides with UVMs on the same server, it can construct a side channel for each co-resident UVM and steal or destroy the data with a specified probability. Suppose the probability of an attacker stealing data is t, and the probability of corrupting data is c. For convenience of discussion and without loss of generality, make the following assumptions: 1. The same data protection measures are used in all physical servers. The event that the attacker builds a side channel and steals or corrupts the data is the same for all servers where the AVM and the UVM are co-resident; hence, if the AVM successfully builds a side channel in one server, other AVMs that co-reside with the UVM can also successfully construct a side channel.
2. An attacker can steal data from all UVMs in the same server that are co-located with the AVM with probability t and damage the data with probability c. 3. The probabilities t and c do not depend on the numbers of UVMs and AVMs that co-reside in the same server. 4. The probabilities t and c are not necessarily equal.
For example, if an attacker obtains encrypted data, the data cannot be decrypted and used; however, the data can be destroyed (c > t). Conversely, if the data are write-protected, stealing is easier than destroying (t > c).
To increase the difficulty of data theft, data partitioning technology is used to divide the data into multiple blocks, thereby improving the data security; however, this improvement also increases the probability of data corruption. To reduce the probability of data corruption, multiple copies are created of each data block to increase the difficulty of data corruption. Data partitioning and replication are in conflict between improving data security and the data survivability. Although increasing the numbers of blocks and data replications at the same time can improve the security and survivability of user data, it also imposes significant storage 3 Partitioning and XOR-encrypted backup scheme

Backup data generation process
This section describes the process of generating backup data in the P&XE scheme (Fig. 2). The P&XE scheme consists of two parts: 1. Original data partitioning. The user divides the data to be protected into x blocks via data partitioning technology and D origin = (D 1 ,D 2 ,…,D x ). 2. XOR-encrypted backup of data blocks. The XORencrypted backup data are generated by XORing multiple blocks of data with a random string (RS) of the user. The number of data blocks that are used to generate the XOR-encrypted backup data is called the group size, which is denoted as g (2 ≤ g ≤ x).
The generation of the i-th XOR-encrypted backup data starts from data block D i , which is XORed with the g-1 (i < x + g-1) block data behind it, and finally uses the RS to encrypt the backup data; the formula is as follows: In the following, x = 5 is used as an example to illustrate the process of generating XOR-encrypted backup data.
According to Fig. 3, a change in the group size (g) only affects the number of times a data block appears in the operation of XOR-encrypted backup data and does not increase the number of XOR-encrypted backup data, namely, the number of UVMs that are used in the P&XE scheme depends only on the number of data blocks x; the number of UVMs that are used by the P&XE scheme is 2x. This number is also why the P&XE scheme can maintain high security and low user storage overhead if the number of user data blocks is increased (see Section 5.4 for the analysis).

Data Recovery Process
Since the XOR operation satisfies the commutative law, namely, a ⊕ b = b ⊕ a, according to formula (1), there are g XOR-encrypted backup data that are related to D i . When the i-th data block D i is destroyed, one of the g XOR data is selected according to the formula for generating the XOR-encrypted backup data. The XORencrypted backup data on both sides of the equation are converted to D i , and the data D i can be restored via the exclusive or operation. In the following, x = 5 is used as an example to demonstrate the data recovery process.
If g = 3, all XOR-encrypted backup data are generated as above. Assume that data D 4 are damaged. According to the above formula, the XOR-encrypted backup data that are related to D 4 are XOR 2 , XOR 3 , and XOR 4 according to the properties of the exclusive or operation: D 4 can be obtained by swapping D 4 with XOR 2 , XOR 3 , or XOR 4 in the formulas and recovering D 4 by selecting one of the above equations for performing the XOR operation.

Theoretical analysis of data security and data survivability
This section analyzes the impact of the P&XE scheme on data security and data survivability and compares the impacts of P&R and P&XE on data security and data survivability.
According to the above, the security and survivability of user data are related to the number of blocks and the number of copies of the data, respectively. Consider the scheme R = (5,3,3,3,3,3) as an example, in which the data are divided into 5 blocks, each with 3 copies. In the P&R scheme [40], there are 3 copies of each block. When the attacker obtains data, at least one of the three blocks is obtained for each block, and the data can be successfully stolen. In the case of data corruption, the attacker simply destroys all copies of any data to successfully corrupt the data. For the P&XE scheme, R = (5,3,3,3,3,3) corresponds to g = 2 because for data corruption, and the attacker will destroy the original data and the two pieces of XOR-encrypted backup data that are associated with the data. Therefore, the P&XE scheme has the same security as the P&R scheme for data corruption. However, for data theft, since the XOR-encrypted backup data are encrypted by the user random string (RS), the attacker cannot obtain other user data through the XORencrypted backup data; therefore, when addressing data theft, R = (5,1,1,1,1,1) and only when an attacker steals the original block can the data be stolen successfully. P&XE will outperform the P&R scheme in addressing data theft. An attacker must steal the original data of each data block when stealing user data. If XOR-encrypted backup data are stolen, the attacker cannot use the data because it cannot crack the user's RS. It is not possible to obtain other data of the user from XOR-encrypted backup data. When the attacker destroys the data, not only the user's original data but also all XOR-encrypted backup data that are related to the original data must be destroyed. Therefore, the P&XE scheme can improve the survivability of user data without reducing the security of user data.

Probabilities of data Theft and data corruption
To measure the impacts of P&XE and P&R on data security, data survivability, and user storage overhead, the measurement formulas in [40] are used: T(R) is used for data security, W(R) for data survivability, and O(R) for user's storage overhead.
Consider the following scenario: there are n servers in the cloud environment, k UVMs, m AVMs, and data partitioning/replication scheme R = (x,y 1 ...,y x ). p (n,k,m) and w (n,k,m) are the probability that the attacker's AVM co-resides with all UVMs and the probability that the attacker's AVM co-resides with at least one UVM, respectively [40]. Then, if the number of AVMs is known, the probability of data being stolen is: The probability of the data being corrupted is: When the value of m is uncertain, but the distribution form and range of m are known, μ (l) = Pr (m = l) and (m min ≤ l ≤ m max ) and the probabilities of data theft and data corruption are:   Figure 4 shows the relationship between the data theft probability T (x-axis) and the data corruption probability C (y-axis) under various numbers of servers under the P&XE scheme, where c = t = 1; n = 30 or 50; and m = 10, m = 30, or 10 ≤ m ≤ 30. According to Fig. 4, under the condition that the group size is the same, as the number of blocks increases, the probability of the user data being stolen is reduced, and the probability of the user data being damaged increases. At the same time, under the condition that the number of attackers' virtual machines is constant, increasing the number of physical machines can improve the security and survivability of user data.
The number of UVMs that are created by the user is O vm is the overhead associated with creating one VM, and the user's overhead that is associated with creating k UVMs is:

Experimental comparison
In this section, the P&XE scheme and the P&R scheme are compared in terms of the probability of data theft (T), the probability of data corruption (C), and the user storage overhead (O). The comparison process considers the P&R scheme [40]. According to [40], set T * , C * , and O * as the constraint of T, C, O separately as mentioned before. After defining the thresholds of two parameters, find the solution that optimizes the remaining parameters. Then, by controlling the variables, the overall performances of the two schemes on T, C, and O are compared. Finally, the feasibility of the P&XE scheme is evaluated in terms of the time cost of XOR (Since the P&XE scheme requires x > 1, there are no corresponding data in the experimental results for point x = 1).  of user data being stolen because the increase in the number of blocks makes the attacker less likely to obtain complete data. Under the same number of AVMs, the more data blocks there are, the lower the probability that an AVM will co-reside with it and the lower the probability of data being stolen. If m = 30, the P&XE scheme identifies a scheme that satisfies C < C * when x is 9. As the number of data blocks increases, the probability of user data being stolen is reduced; hence, if x is 10, the scheme satisfies C < C * . Figure 6 compares the results with the optimal value of T for the P&R scheme with n = 50, t = 0.2, C * = 0.05, and c = 0.6. Compared with Fig. 5, as the number of servers increases, the probability that the attacker's AVMs co-reside with the user's UVMs is reduced; hence, the probability of user data being stolen is reduced. At the same time, the probability of data being stolen decreases as the number of data blocks increases.

Probability of data theft (T) comparison
According to Figs. 5 and 6, the P&XE scheme can effectively reduce the probability of user data being stolen because under the P&XE scheme, the probability of user data being stolen depends only on the number of blocks. Under the P&XE scheme, regardless of the group size, there is only one copy of each data block for the user. Therefore, only when a malicious user obtains all the original data of the user can the data be successfully stolen. Figures 7 and 8 show the relationships among T, C, and O when C * = 0.05, t = 0.2, c = 0.6, and O vm = 1 in the P&XE scheme. According to Fig. 7, if m = 30, when the number of AVMs is large, increasing the number of blocks does not reduce the probability of data being stolen or the probability of data being corrupted. If the AVMs are distributed across all servers, any partitioning/backup strategy will fail. The probability of such an event occurring increases as n decreases or/and as m increases. Figure 9 compares the performances of P&XE and P&R on C under various T * limits with n = 50, t = 0.2, and c = 0.6. The experimental results demonstrate that under the same T * limit, the P&XE scheme makes the user data less likely to be destroyed and realizes higher security because the P&XE scheme can guarantee data security and data survivability at the same time. Due to the characteristics of the P&XE scheme, if the attacker corrupts the data, it must destroy the original data and all related XOR-encrypted backup data. However, since the XORencrypted backup data are encrypted by the user's random string, the attacker cannot decrypt the original data through XOR-encrypted backup data; therefore, when stealing data, the attacker must obtain all the original data. Hence, the P&XE scheme better protects the security of the user data. Figure 10 shows the variations in T, C, and O at various values of T * in the P&XE scheme when t = 0.2, c = 0.6, and O vm = 1. With the relaxation of T * , users can reduce the probability of data corruption by using more UVMs (increasing the number of data blocks or increasing the number of XOR-encrypted backups). Figure 11 shows that in the P&XE scheme, under the same T * constraint, as C * decreases, users will use more UVMs to protect against data corruption. At this time, the increase of UVMs is due to the increase in the amount of XOR-encrypted backup data. Similarly, under the same conditions of C * , as T * decreases, users must also use more UVMs to prevent data theft. The increase in UVMs at this time is due to the increase in the number of blocks.  Figures 12 and 13 compare the user storage overhead between the P&XE and P&R schemes under various C * and T * constraints when t = 0.2, c = 0.6, and O vm = 1. With the relaxation of T * , users require fewer UVMs to satisfy the T * requirements. In Fig. 13, when C * = 0.05 and T * = 0.03, the P&XE scheme uses more UVMs. This is because under the P&XE scheme, since the number of data blocks is at least 2, the number of generated XORencrypted backup data is 2, and the user's minimum overhead is 4. In contrast, in the P&R scheme, the data are not partitioned in this case, and only the replication backup is used; hence, the overhead is lower compared to the P&XE scheme.

User storage cost (O) comparison
To compare the overall performances of the P&XE and P&R schemes in terms of T, C, and O, in the following, the trends of T, C, and O under the two schemes and under the control of variables are compared.

Overall comparison
Figures 14 and 15 compare the security of data from two aspects: Fig. 14 shows the best performance R of C under the P&XE scheme (the case in which the group size is consistent with the number of blocks, namely, x = g) compared with T of the P&R scheme under the same strategy. According to Fig. 14, as the number of blocks increases, the probability of the P&R scheme data being stolen increases due to the increase in the number of blocks and in the number of copies of each block for the P&R scheme. As the probability of stealing any piece of data increases, the probability of an attacker obtaining the complete data increases. Under the P&XE scheme, the data security depends only on the number of blocks: the greater the number of blocks, the higher the security of the data. Figure 15 shows R in the case in which the group size of the P&XE scheme is 2 and the change in T with the number of blocks. When the group size is determined (namely, for the P&R scheme, the number of copies of each piece of data is consistent), T of the P&R scheme decreases as the number of blocks increases. This is because the number of copies of each block of data is the same, the probability of obtaining a copy of any piece of data is the same, and the number of blocks to be acquired increases, thereby increasing the difficulty for attackers to obtain the full data. Therefore, the probability of an attacker stealing data is reduced. Figure 16 compares C under the same scheme R of T (with the same T as the reference standard, namely, no backup after the data have been partitioned). Since there is no replication backup, there is only one block per data. As the number of blocks increases, the probability of an attacker destroying any block increases; therefore, as the number of blocks increases, the probability of data corruption under the P&R scheme increases. Under the P&XE scheme, XOR-encrypted backup data do not affect the probability of data being stolen. If n = 30 and C corresponds to the minimum (x = g) data, as the number of blocks increases, the XOR-encrypted backup data of each piece of data also increases; hence, the probability of data corruption decreases.
In Fig. 17, n = 50 and c is set to the maximum value (g = 2) for comparison. As the number of blocks increases, the probability of user data being corrupted under the P&XE scheme increases because the number of blocks increases; however, the number of XOR-encrypted backup data per block remains unchanged. The probability of the attacker destroying any block is unchanged, the number of data blocks is increased, and the possibility of destroying any block is increased; hence, the probability of user data being destroyed is increased. Figure 18 compares the user storage overhead between the P&R scheme and the P&XE scheme in the same scenario R of C. The experiment selects the group size when g = x (this is the case in which the data have the strongest survivability under the P&XE scheme, namely, C is minimal). In this case, the number of UVMs that are used by the P&R scheme is x 2 , and the number of UVMs that are used by the P&XE scheme is 2x. If x > 2, the overhead of the P&XE scheme is smaller than the overhead that is generated by the P&R scheme. According to the figure, as the number of blocks increases, the storage overhead of the P&R scheme increases sharply to realize the same data survivability, whereas that of the P&XE scheme increases relatively flatly.

Time overhead
According to Table 1, the data recovery time increases as the group size increases or/and as the data size increases, which accords with our expectations. If the number of data blocks is 10, the recovery time for 1 GB data is 70 s in the case of g = 10. This time is acceptable compared to the cost of purchasing more virtual machines for increased security.

Conclusions
As the most dangerous type of attack method in the cloud environment, co-resident attacks pose a substantial threat to user data. The P&XE scheme effectively reduces the storage overhead of users by increasing the security and survivability of user data through data partitioning and XOR backup. In the P&R scheme, increasing the survivability of data requires the maximization of the number of data blocks, which may reduce the data survivability. In contrast, increasing the survivability of data requires the maximization of the number of copies of each data block, which, in turn, reduces the data security. Maximizing the number of blocks and increasing the number of copies per block of data both increase the user's storage overhead. The P&XE scheme compensates for the insufficiency of the P&R scheme for balancing data security and data survivability, thereby reducing the user's storage overhead. The experimental results demonstrate that the P&XE scheme reduces the user's overhead and improves the security and survivability of user data.