- Research
- Open access
- Published:

# RETRACTED ARTICLE: IoT devices and data availability optimization by ANN and KNN

*EURASIP Journal on Information Security*
**volume 2024**, Article number: 2 (2024)

## Abstract

Extensive research has been conducted to enhance the availability of IoT devices and data by focusing on the rapid prediction of instantaneous fault rates and temperatures. Temperature plays a crucial role in device availability as it significantly impacts equipment performance and lifespan. It serves as a vital indicator for predicting equipment failure and enables the improvement of availability and efficiency through effective temperature management. In the proposed optimization scheme for IoT device and data availability, the artificial neural network (ANN) algorithm and the K-Nearest Neighbours (KNN) algorithm are utilized to drive a neural network. The preliminary algorithm for availability optimization is chosen, and the target is divided into two parts: data optimization and equipment optimization. Suitable models are constructed for each part, and the KNN-driven neural network algorithm is employed to solve the proposed optimization model. The effectiveness of the proposed scheme is clearly demonstrated by the verification results. When compared to the benchmark method, the availability forward fault-tolerant method, and the heuristic optimization algorithm, the maximum temperature was successfully reduced to 2.0750 °C. Moreover, significant enhancements in the average availability of IoT devices were achieved, with improvements of 27.03%, 15.76%, and 10.85% respectively compared to the aforementioned methods. The instantaneous failure rates were 100%, 87.89%, and 84.4% respectively for the three algorithms. This optimization algorithm proves highly efficient in eliminating fault signals and optimizing the prediction of time-limited satisfaction. Furthermore, it exhibits strategic foresight in the decision-making process.

## 1 Introduction

The optimal solution to effectively address the challenge of optimizing Internet of Things (IoT) device and data availability, and analyzing them accurately and timely, involves using artificial neural network (ANN) algorithms to estimate their availability. This approach can help the support team save on maintenance costs while carrying out daily maintenance [1, 2]. To optimize the usability of IoT-MTTF, MSE of IoT data, and unstructured data, their accuracy and missing value estimation function are analyzed, and the “Manhattan” distance formula is employed [3]. However, most ANN models currently suffer from problems such as long iterations, large mean square errors in estimation compared to actual values, and high average peak temperatures, which considerably affect the maintenance team's cost control [4]. Therefore, different algorithms should be used depending on the situation. For instance, for the IoT device and data availability optimization problem, ANN feedback control-driven models can be selected for preliminary algorithm selection [5, 6]. Subsequently, the data is divided into training data and testing data, and optimization is based on the iterative model with the aim of achieving both data and device availability. A calculation example is used to analyze real IoT data, employing the K-nearest neighbor neural network model (KNN), and establishing relevant mathematical algorithms and optimization conditions. The proposed optimization model is solved in AQ and Wafer datasets, ultimately achieving the goal of reducing construction period and cost. The results of this study have significant value for multiple industries and applications. Firstly, the manufacturing industry can greatly benefit from our findings. The availability of IoT devices and data is crucial for achieving intelligent manufacturing and improving production efficiency. By predicting the instantaneous failure rate of the system, potential problems can be identified in advance, thereby reducing downtime and improving production efficiency. Secondly, the healthcare industry may also benefit from our study. IoT devices play a critical role in health monitoring and disease diagnosis. By optimizing the availability of equipment and data, the quality and efficiency of healthcare services can be improved. In addition, the energy and environmental protection industries may also benefit from our findings. IoT devices play a crucial role in energy management and environmental monitoring. By improving the availability of equipment and data, energy efficiency and environmental protection can be improved. Lastly, the retail and e-commerce industries may also benefit from our study. IoT devices and data play an important role in inventory management and customer behavior analysis. By optimizing the availability of equipment and data, the efficiency and effectiveness of commercial operations can be improved. Overall, optimizing the availability of IoT devices and data can bring significant value to various industries and applications. However, it is important to note that each application scenario may have specific needs and challenges that require customized adjustments to optimization strategies. The research will be conducted in four parts. The first part is an overview of IoT devices and data availability optimization based on ANN algorithms. The second part is research on IoT data and device optimization based on artificial neural algorithms. The third part is the experimental verification of the second part. The fourth part is a summary of the research content and points out shortcomings.

## 2 Related works

Currently, there is limited research on the application of the KNN algorithm to IoT devices and data. Cui et al. proposed a novel machine learning method based on the KNN algorithm, which exhibited significant speed improvements (5–30 times faster) compared to traditional methods such as ANN and Bayesian optimization. This method also enabled optimization with reduced dataset sizes. The effectiveness and efficiency of this approach were verified through experiments involving four different antenna instances, resulting in a satisfactory optimal antenna design at a minimal cost [7]. In response to the increasing cloud service advertisements, Alkalbani et al. highlighted the need for effective interaction between the cloud service market and consumers. Existing literature often focused on algorithm development while assuming the availability of cloud service information, neglecting the importance of its effective discovery on the internet. To address this gap, the researchers proposed a framework for cloud service discovery, considering three metrics—accuracy, recall, and F-score, as benchmarks. Machine learning methods including KNN, decision trees, and naive Bayesian algorithms were employed to evaluate accuracy. Experimental results demonstrated the applicability and efficiency of the proposed framework for effective cloud service discovery [8]. Secor presented strategies to accelerate molecular simulations using ANN. By creating an ANN propagator and implementing it in one-dimensional and two-dimensional proton transfer systems, the study showcased nuclear quantum effects, including hydrogen tunneling effects. The layered, multi-time step algorithms enabled parallelization and scalability to higher dimensions, proving valuable in quantum dynamics simulations across various chemical and biological processes [9]. Sahu et al. proposed an economical and dense spatiotemporal air quality monitoring system that offered enhanced mobility and lower maintenance costs by utilizing low-cost sensors. A novel local non-parametric calibration algorithm, based on metric learning, was introduced, leading to a notable improvement of 4–20% in the R-2 value compared to conventional non-parametric methods. The experiment outcomes provided valuable insights into the benefits and limitations of these sensors, demonstrating their potential as a complement to existing regulatory monitoring networks [10]. Rianjanu et al. presented a straightforward and efficient approach for estimating the sensitivity of gas sensors based on solubility and vapor pressure. Quartz crystal microbalance sensors coated with polyvinyl acetate nanofibers were employed for empirical testing. Chemometrics technology, including the KNN machine learning algorithm, was utilized to develop prediction models. This method accurately predicted sensor sensitivity and also offered a means for selecting appropriate sensing materials [11].

Liu et al. designed and demonstrated a graphene comparator capable of directly calculating absolute distances using the zero bandgap and hole transmission characteristics of graphene. Ferroelectric hafnium zirconium oxide double gate graphene transistors were fabricated as the fundamental units of these comparators. By employing the KNN algorithm, the accuracy of the graphene comparator array could surpass 80%. These ferroelectric graphene comparators held great potential for wide applications in robots, safety systems, autonomous vehicles, and sensor networks [12]. In order to address the limitation of insufficient local feature extraction in three-dimensional semantic segmentation, Luo et al. proposed a KNN-based structured model for a three-dimensional semantic segmentation network that directly processes scattered point clouds. Experimental results demonstrated that this approach effectively improved the accuracy of semantic segmentation by solving the issue of inadequate local feature extraction [13]. Cloud data centers serve as key infrastructures that directly impact service delivery reliability. The IT architecture of these data centers, which acts as the carrier for cloud computing services, plays a critical role in determining service reliability. However, most existing research primarily focuses on the connectivity between IT architecture and services and overlooks the processing process. To bridge this research gap, a hierarchical Colored Generalized Stochastic Petri Nets (CGSPN) approach is proposed to describe and comprehend the processing process. This approach takes into consideration not only the processing process but also the availability of equipment and data, making it more profound compared to previous research methods. Additionally, it offers flexibility by adjusting optimization strategies based on real-world situations, rather than relying solely on pre-set simulation processes. Moreover, this research method incorporates the latest advancements in ANNs and KNN technology, making it cutting-edge and enhancing optimization effectiveness [14].

In summary, scholars and scientists have made significant advancements in the fields of K-nearest neighbors (KNN) and optimization of IoT data and devices. Numerous improved algorithms have been developed to enhance the efficiency of dataset processing and optimization. However, there are still certain limitations in current research. For example, these approaches often fail to account for real-time changes in IoT data, resulting in suboptimal performance. Furthermore, many algorithms overlook the complex nature of IoT devices, leading to inefficient optimization strategies. Considering the impressive data processing capabilities of the KNN algorithm and the identified shortcomings in existing IoT data and device optimization methods, leveraging this approach to optimize the availability of IoT data and devices holds great promise in IoT platform architecture. The proposed work aims to address the aforementioned limitations by incorporating real-time data analysis and considering the multifaceted nature of IoT devices in the optimization process. This comprehensive approach will result in more accurate and efficient optimization, ultimately enhancing the overall performance of IoT systems.

## 3 IoT devices and data availability optimization based on ANN and KNN

The optimization of IoT device and data availability is achieved through the utilization of ANNs and KNN. Equipment and data availability are improved by employing intelligent models for accurate monitoring and prediction. Through these models, equipment failure and data corruption can be predicted, enabling early warning and proactive maintenance. Simultaneously, extensive analysis and processing of large volumes of complex IoT data are conducted to enhance data accuracy and integrity. This approach enhances the efficiency and stability of equipment operations, while also establishing a robust foundation for efficient data utilization and security.

### 3.1 Availability model based on IoT devices and data

According to the architecture of the IoT system, the computing power of a single processor device model is irreversibly flawed. Therefore, the basic architecture of a multi-processor IoT device that integrates GPU and CPU is shown in Fig. 1.

In Fig. 1, \({\rho }_{G}\) represents the GPU and supports multiple independent discrete operating frequencies; \({\rho }_{M}\) represents \(M\) a homogeneous CPU, ensuring the long-term trouble-free operation of such IoT devices, which has become one of the focuses of academia and industry in recent years [14]. The standard for evaluating the availability of IoT devices based on the degree of damage will be the Mean Time To Failure (MTTF) of IoT devices, and an equation can be established as shown in Eq. (1).

In Eq. (1), \({T}_{\mathrm{exp}} \left(\Gamma \right)\) is the error reporting time of the independent task in the task set \(\Gamma\), \({E}_{i} \left(1\le i \le N\right)\) is the running time of the task \({\Gamma }_{i}\), and \({T}_{\mathrm{ex }e} \left(\Gamma \right)\) is the total running time of \(N\) independent tasks. Considering the gate current of hot electrons in the transistor, low impedance paths are prone to occur, and permanent failures occur [15]. Then the \({MTTF}_{p}\) equation related to permanent error reporting is established, as shown in Eq. (2).

In Eq. (2), \({A}_{TDDB}\) is the fitting constant, \(\nu\) is the operating voltage, and \(T\) is the temperature. \({\nu }_{1}\), \({\nu }_{2}\), \(\rho\), \(A\), \(B\), and \(C\) are empirical fitting constants, and the availability of IoT devices is shown in Eq. (3).

In Eq. (3), \({MTTF}_{T}\) is the average failure time for reporting an error instantly, and \({MTTF}_{p}\) is the average failure time for reporting a permanent error. \({MTTR}_{T}\) is the average repair time for instant errors, and \({MTTR}_{p}\) is the average repair time for permanent errors. To address the issue of sensors prone to mechanical and electrical failures, the mean squared error (MSE) of the estimated and initial values of IoT data is calculated, as shown in Eq. (4).

In Eq. (4), \({\widehat{x}}_{mn}\) is the estimation of the available scheme of the IoT data, and if \({x}_{mn}\) is not the error data, then \({\widehat{x}}_{mn}= {x}_{mn}\). Equation (5) is for the accuracy of unstructured data of the IoT.

\({M}_{cor}\) represents the number of correctly classified data samples, and \({M}_{d}\) represents the total number of samples. The instantaneous error rate is easily affected by attenuation factors, which can be masked or eliminated, as shown in Fig. 2.

Figure 2 shows the entire process of a representative instantaneous fault transmission between layers. Among them, some transient faults are masked or eliminated due to signal attenuation, and some transient failures will spread to the entire system, causing terminal failures. Some transient faults may be limited by signal attenuation, but there are also some transient faults that may be transmitted from the component level to the system level in the form of data or commands, leading to terminal failure.

### 3.2 IoT data and devices optimization based on ANN

It is crucial to design a method based on ANN to identify the fault instantaneous rate and temperature of IoT devices in response to the high timeliness of the IoT [16]. As a classic machine learning solution, ANN has the characteristics of universality and accuracy. A classic ANN architecture diagram is shown in Fig. 3.

In Fig. 3, the hidden layer of ANN is sandwiched between the input and output layers, and each layer has one or more neurons. The corresponding nodes in each layer are connected using multiple weight factors. The output value is transmitted through multiple neurons at high frequency and high performance through weight factors and deviation values in the neuron nodes. Neuron nodes in the hidden layer is set to \({n}_{hid}\), and the output of node \(j\) is shown in Eq. (6).

In Eq. (6), \({\omega }_{ij}\) and \({b}_{ij}\) represent the weight factors and deviations between the \(i\) th and the \(j\) th node, respectively. The weight factor and deviation can be trained and adjusted according to requirements. The calculation \(err\) is shown in Eq. (7).

In Eq. (7), \({n}_{out}\) is the number of nodes in the output layer neuron, \({G}_{i}\) is the true value of the \(i\) th output node, and \({O}_{i}\) represents the calculation result of the \(i\) th node of the ANN. The loss value obtained by ANN backpropagation from the output layer passes through the hidden layer and then enters the input layer, improving accuracy by continuously adjusting trainable parameters. The instantaneous fault rate is calculated for the critical value of the system, as shown in Eq. (8).

In Eq. (8), \({N}_{com}\) is the total number of component types, \(\alpha\) is the proportion of type \(i\) components, \({Q}_{C}^{sys}\) is the charge threshold of IoT devices, \({Q}_{ci}\) is the charge threshold of components in the device, \({n}_{train}\) is the number of samples, \({n}_{in}\) is the number of input layer neuron nodes for training data, and \({N}_{out}\) is the number of nodes for output layer divine elements. For different variables, such as the temperature, voltage, Neutron flux, and charge threshold of IoT equipment, corresponding debugging and calculation are carried out. The charge threshold is the critical charge value. If the threshold value is exceeded, a transient fault will occur. The calculation of temperature can be achieved after the ANN is trained.

In numerous practical problems, a considerable body of research has been conducted. Unlike an open-loop system that necessitates precise knowledge of every detail, these problems can be viewed as closed-loop systems. In such closed-loop systems, there is no need for a comprehensive understanding of the entire system, which may lead to substantial discrepancies between the system’s estimation and prediction. Consequently, for IoT systems characterized by high levels of uncertainty, robustness plays a crucial role, as depicted in Fig. 4.

Figure 4 illustrates a device that operates under feedback control, making it particularly well-suited for in-depth analysis. This device comprises several components, including a PID (proportional integral derivative) controller, TA controller, TB controller, and EDF scheduler. These elements collectively govern the temperature of IoT data and devices, as well as the processor’s utilization efficiency. Subsequently, they provide feedback on the control scheduling scheme. The EDF scheduler plays a pivotal role in determining task and replica scheduling, while the primary controller focuses on optimizing IoT availability. The PID controller ensures feedback control by computing the discrepancies between the desired setpoints and measured variables, using proportional, integral, and derivative calculations. The TB controller adjusts processor utilization by adding or removing replicas for tasks. On the other hand, when the TB controller is unable to process certain tasks, the TA controller manages the utilization of remaining processors by controlling the number of IoT tasks themselves [17]. Furthermore, the feedback control structure is defined in Eq. (9).

In Eq. (9), \(\Delta U\) is the processor utilization rate, \(\Delta Err\left(t\right)\) is the threshold set by the system for deadline miss rate and the difference between the current system deadline miss rate, \({C}_{p}\), \({C}_{I}\), and \({C}_{D}\) are the correlation coefficients of the PID controller, \(IW\) is the sum related time window, and \(DW\) is another time window.

### 3.3 IoT data and devices optimization based on KNN

The research on IoT data and device optimization based on multiple datasets is planned to be conducted from three aspects, as shown in Fig. 5.

Figure 5 presents an overview of the data association-driven structured data availability optimization scheme for IoT, aimed at enhancing the availability of associated structured data in the IoT ecosystem. The scheme encompasses several key components. Firstly, a novel method for evaluating the initial values of IoT structured data with missing and unreliable information is proposed based on KNN. This method utilizes KNN to assess the initial values, ensuring more accurate evaluations. Secondly, an availability optimization algorithm is developed specifically for IoT's structured data. Building upon the initial values obtained from the iterative correction algorithm, this algorithm further improves the accuracy of the values. Additionally, a new independent judgment mechanism is introduced to validate the evaluation results independently. The research project contributes to efficiently addressing the challenge of optimizing the usability of IoT’s structured data. The efficacy of the proposed scheme is evaluated through comprehensive simulation tests, enabling the verification of the research outcomes. The effectiveness of the scheme is then validated by comparing the results of simulation experiments with real-world measured data.

The KNN algorithm standardizes all data in the dataset \(X\) by using the weighted average of the closest samples \(K\) with low availability data and estimates the missing values of the samples based on this, which can be represented by Eq. (10).

In Eq. (10), \({x}_{\mathrm{min}}^{*}\) is the standardized data, and \({\mu }_{n}\) is the mean of the high availability data in column \(n\) of observation data \(X\), as shown in Eq. (11).

By using Eq. (11), the “Manhattan distance” between missing samples and other samples in the dataset can be determined, as shown in Eq. (12).

In Eq. (12), \({x}_{i}^{*}\) is the sample to be estimated, and \({x}_{in}^{*}\) and \({x}_{jn}^{*}\) will be calculated into \(d\left({x}_{i}^{*}, {x}_{j}^{*} \right)\) only when \({x}_{in}\) and \({x}_{jn}\) have no outlier. In other words, \({x}_{in}^{*}\) and \({x}_{jn}^{*}\) contain low availability data, so the data will not be used as a reference for availability optimization [18]. The missing values are estimated for data standardization, as shown in Eq. (13).

In Eq. (13), the adjacent set \({\theta }_{i}\) of \(K\) of \({x}_{i}^{*}\) is determined by the Manhattan distance. From this, the weights related to distance are calculated to obtain Eq. (14).

In Eq. (14), \({x}_{in}^{*}\) is the normalized value, and \({\widehat{x}}_{in}\) is the missing value after affine transformation and further estimation. For each data valuation, the newly generated estimated value will no longer belong to the missing value, and this valuation will be used for estimating the missing value of the same sensor or other sensors. The usability optimization method based on KNN is simple and effective, but the accuracy of the estimation is not high, so it is necessary to further improve the usability of the initial estimation based on the iterative data of orthogonal Matching pursuit, as shown in Eq. (15).

In Eq. (15), \({c}_{j}\) is the \(j\) column of \(X\), \({\widehat{c}}_{i}\) is the corresponding column in \(\widehat{X}\), and the data with high availability does not have missing values, so \(\sum_{{c}_{i} {\epsilon }^{U}}{c}_{j} {w}_{j}\) is a constant, and \(w\) is known in each iteration, and each column is independent, so after processing the residual matrix, the availability optimization of available orthogonal Matching pursuit iteration data can be obtained [19, 20]. The flow chart of the system is shown in Fig. 6.

Figure 6 introduces a novel method for assessing the initial values of IoT structured data based on KNN. This method enables an initial valuation of IoT structured data that may contain missing or unreliable information. Subsequently, algorithms are developed to optimize the availability of structured data in the IoT domain, enhancing the accuracy of the estimated values by refining the initial estimates. Additionally, a new independent judgment mechanism is incorporated to ensure unbiased evaluation and validity testing of the optimized results. Throughout these steps, the effectiveness of the proposed approach is validated through simulation tests and a comparative analysis with real-world data. The selection of ANN and KNN as the optimization algorithms is based on their respective advantages. ANN, inspired by the human brain’s neural networks, can effectively handle the diversity and complexity of IoT devices and data [21,22,23]. Its self-learning capability makes it well-suited for processing large-scale data efficiently. On the other hand, the KNN algorithm is an instance-based learning method, particularly suitable for handling data with high variability and uncertainty. Its simplicity facilitates implementation and interpretation. Although other algorithms such as decision trees and support vector machines also offer advantages, ANN and KNN have distinct strengths in managing the complex and dynamic nature of IoT devices and data. It is important to acknowledge that no single algorithm can excel in all scenarios, and exploring and implementing alternative algorithms may be beneficial in the future.

## 4 IoT data and device analysis based on ANN and KNN

In this section, extensive simulation experiments were conducted to evaluate the effectiveness of enhancing the user-friendliness of the device. The project initially employed ANN to predict the temporary failure rate and temperature of IoT devices. Based on these predictions, the proposed method was assessed from various perspectives, including temperature, reliability, timely task deadlines, and availability of IoT devices. Through experimental analysis, the proposed method successfully optimized the correctness and reliability of IoT's structured data. To validate the correctness of IoT’s structured data, particularly when handling missing values and outliers, six open-source datasets were utilized. These datasets served as benchmarks to verify the accuracy and robustness of the proposed method. Overall, these simulation experiments provided substantial evidence regarding the efficacy of the proposed method in enhancing the usability and performance of the device. The thorough evaluation and validation approach adopted in this project contributes to the reliability and credibility of the research outcomes.

### 4.1 IoT devices availability optimization based on ANN

The simulation experiment was conducted on a device equipped with a 2.4-GHz Intel i7 Quad-core processor and 8 GB DDR4 memory, using Windows versions of Matlab × 64 and OMNeT + + . A task scheduling process was simulated using OMNeT + + and Matlab × 64, and the changes in neuron nodes in training and testing data are shown in Fig. 7.

In Fig. 7a, it is shown that when the number of hidden layer neuron nodes in ANN was different, using ANN for transient fault rate prediction on training and testing data showed a decreasing trend in MSE. When the number was more than 30, ANN generated overfitting. As shown in Fig. 7b, for the test data, when the number exceeded 39, MSE between the true and estimated values of temperature increases. To avoid overfitting, the number of nodes used for temperature estimation was set to 380, and the instantaneous failure rate of IoT devices is shown in Fig. 8.

Figure 8a is an example of using the SPICE simulator to obtain instantaneous fault rate training data for 1000 IoT devices and using 800 training neural networks for training. The remaining 200 samples were used for the experiment. After training, the maximum error between the actual failure rate and the predicted result was 2.51%, and the minimum error was 0%. During the experiment, the maximum instantaneous failure rate was 2.88% and the minimum value was 0%. The maximum difference between the actual data in Fig. 8b and the estimated data obtained through this neural network was 1.47% and the minimum difference was 0%. In actual data, the ratio of maximum to minimum obtained was only 1.79% and 0%.

The proposed scheme was compared with the benchmark method NBK and two advanced benchmark availability optimization methods SR and EA. Firstly, NBK (No Backup) is a method of optimizing device availability in the event of instantaneous and permanent failures of IoT devices. Therefore, the NBK scheme is considered the benchmark method. Shared Recovery (SR) is a forward fault-tolerant method that utilizes idle time in IoT systems to improve the availability of IoT devices. All tasks share a replica to improve the availability of IoT devices. Evolutionary Algorithm (EA) uses biological evolution heuristic algorithms to search for optimal solutions for tasks and their number of replicas in IoT to improve the average available time of IoT devices and enhance their availability. The analysis results of feedback control are shown in Fig. 9.

In Fig. 9a, the comparison of algorithms for NBK, SR, EA, and optimization models shows that the average system reliability of the optimization model reached 0.7599, which was higher than the performance of NBK (0.2674), SR (0.6287), and EA (0.6294) algorithms. This indicated that the new IoT availability optimization method can improve the availability of terminals, and further optimize the availability of terminals by continuously adjusting the availability of access to them. The completion rate of task deadlines under different temperatures is shown in Fig. 10.

In Fig. 10, the target completion time ratio of this model compared to the other three benchmark methods shows that the new method can achieve a time limit fulfillment rate of 100%, which exceeds the time limit fulfillment rates of NBK, SR, and EA algorithms, with 100%, 87.89%, and 84.4%, respectively. Notably, the proposed method executed faster as it did not require additional task replication. The comparison of the processor utilization of IoT devices between the three benchmark methods and the proposed method under different task loads and computing resources is shown in Fig. 11.

Figure 11 shows the impact of three comparative methods and the proposed new method on the processor utilization of IoT devices under different environmental temperatures. Compared to other methods, the improved method can significantly improve the average availability of IoT devices by 27.03%, 15.76%, and 10.85%, respectively.

### 4.2 IoT data availability analysis based on KNN

The AQ dataset was collected by 48 low-power sensors deployed in a sensor network in Europe, as shown in Fig. 12.

Figure 12 demonstrates that for sample missing rates of 0.1%, 1%, and 10%, both the maximum and minimum variances of the sample are 0. This indicates that the least squares method provides an optimal solution for achieving structured data availability in the IoT. However, as the missing rate increases, estimating all missing data becomes more challenging. Therefore, it is important to develop robust methods to handle higher missing rates and improve estimation accuracy in IoT applications.

Finally, Fig. 13 shows the overfitting problem in the Wafer dataset when the missing rate is set to 10% due to a lack of sample size. When the mean square error was greater and the COST was lower, the algorithm automatically selected the calculation result of the initial value as the final residual value.

## 5 Conclusion

An iterative model based on ANN and KNN algorithms has been utilized to enhance the availability of IoT devices and data, with various metrics validated. The research findings demonstrated that using ANN to predict system fault rates yielded satisfactory outcomes. However, overfitting occurred when the number of nodes in the hidden layer exceeded 30. After training, the model exhibited maximum and minimum errors of 2.51% and 0%, respectively. In experiments, the system fault rates ranged from 2.88 to 0%. By combining NBK, SR, EA, and optimization model algorithms, average system reliability values at 50 °C were obtained as 62.5970 °C, 63.9250 °C, 61.510 °C, and 60.850 °C, respectively. Compared to other schemes, the proposed approach reduced the maximum temperature by 2.0750 °C. The improved methods resulted in average IoT device availability increases of 27.03%, 15.76%, and 10.85%, respectively. For the wafer dataset, 141 samples were collected from 22 sensors with a defect rate of 10%. However, when the missing rate exceeded 10%, overfitting became significant due to limited observed samples. Additionally, the model’s time complexity was analyzed, considering data collection, training, and testing. Training the model took 6 h, and running it on different hardware configurations required 8 h. Efficiency is a concern when dealing with large datasets. Although the discussed iterative model has shown promising results in optimizing IoT device and data availability, it has certain limitations. Firstly, computational complexity is an issue, particularly for large-scale data processing, which can reduce efficiency and waste computing resources. Overfitting may occur when the number of nodes in the hidden layer surpasses 30, further intensifying computational complexity and difficulty. Secondly, the model's scalability is limited due to the diverse and complex nature of IoT devices and data. Different types and scales of IoT devices and data may not yield the expected results. For example, the model performs well for a defect rate of 10% in the wafer dataset but experiences severe overfitting when the missing rate exceeds 10% due to limited observed samples. Finally, the model might not perform as anticipated under specific conditions. Factors such as cost efficiency are crucial in practical applications, whereas the model does not consider them, potentially affecting its performance and effectiveness. This study has made preliminary advancements in optimizing IoT devices and data availability, but future research should address these limitations. Firstly, model optimization is essential, focusing on improving computational complexity, scalability, and performance under specific conditions by exploring more efficient algorithms. Secondly, expanding the application of the model to other network types, such as social networks and sensor networks, is a potential direction. Lastly, addressing known limitations, such as incorporating cost factors into the model to align with real-world requirements, is an important research direction. Thus, the findings of this study provide valuable guidance for optimizing IoT devices and data availability.

## Availability of data and materials

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

## Change history

### 08 May 2024

This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1186/s13635-024-00160-9

## References

M.O. Arowolo, M.O. Adebiyi, A.A. Adebiyi, O. Olugbare, Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier. J. Big Data

**8**(1), 1–14 (2021)T.A. Assegie, An optimized KNN model for signature-based malware detection Tsehay Admassu Assegie" An Optimized KNN Model for Signature-Based Malware Detection". Int. J. Comput. Eng. Res. Trends

**1**, 2349–7084 (2021). ISSNB. Al-Helali, Q. Chen, B. Xue, M. Zhang, A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data. Soft. Comput.

**25**, 5993–6012 (2021)Y. Wang, B. Feng, G. Li, L. Deng, Y. Xie, Y. Ding, STPAcc: Structural TI-based Pruning for Accelerating Distance-related Algorithms on CPU-FPGA Platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.

**41**(5), 1358–1370 (2022)S. Ying, B. Wang, L. Wang, Q. Li, Y. Zhao, J. Shang, H. Huang, G. Cheng, Z. Yang, J. Geng, An improved KNN-based efficient log anomaly detection method with automatically labeled samples. ACM Trans. Knowl. Discov. Data

**15**(3), 34.1-34.22 (2021)W.T. Ho, F.W. Yu, Chiller system optimization using k nearest neighbour regression. J. Clean. Prod.

**303**(Jun.20), 127050.1-127050.15 (2021)L. Cui, Y. Zhang, R. Zhang, Q. Liu, A modified efficient KNN method for antenna optimization and design. IEEE Trans. Antennas Propag.

**68**(10), 6858–6866 (2020)A.M. Alkalbani, W. Hussain, Cloud service discovery method: A framework for automatic derivation of cloud marketplace and cloud intelligence to assist consumers in finding cloud services. Int. J. Commun. Syst.

**34**(8), e4780.1-e4780.17 (2021)M. Secor, A. Soudackov, S. Hammes-Schiffer, Artificial neural networks as propagators in quantum dynamics. J. Phys. Chem. Lett.

**12**(43), 10654–10662 (2021)R. Sahu, A. Nagal, K.K. Dixit, H. Unnibhavi, S.N. Tripathi, Robust statistical calibration and characterization of portable low-cost air quality monitoring sensors to quantify real-time O 3 and NO 2 concentrations in diverse environments. Atmos. Meas. Tech.

**14**(1), 37–52 (2021)A. Rianjanu, S.N. Hidayat, N. Yulianto, Sensitivity prediction and analysis of nanofiber-based gas sensors using solubility and vapor pressure parameters. Jpn. J. Appl. Phys.

**60**(10), 107001.1-107001.6 (2021)J. Liu, H. Ryu, W. Zhu, Nonconventional analog comparators based on graphene and ferroelectric hafnium zirconium oxide. IEEE Trans. Electron Devices

**99**, 1–6 (2021)N. Luo, Y. Wang, Y. Gao, Y. Tian, Q. Wang, C. Jing, kNN-based feature learning network for semantic segmentation of point cloud data. Pattern Recognit. Lett.

**152**(Dec.), 365–371 (2021)H. Kaya, E. Guler, V. Kirmaci, B. Buyukpatpat, Linear, kNN, SVM, and RF regression applications for temperature separation performance of a Ranque-Hilsch vortex tube using air and O-2. Heat Transf. Res.

**52**(18), 1–14 (2021)H. Akbari, M.T. Sadiq, A.U. Rehman, M. Ghazvini, H. Bagheri, Depression recognition based on the reconstruction of phase space of EEG signals and geometrical features. Appl. Acoust.

**179**(Aug.), 108078.1-108078.17 (2021)E.H. Houssein, E. Saber, A.A. Ali, Y.M. Wazery, Centroid mutation-based Search and Rescue optimization algorithm for feature selection and classification. Expert Syst. Appl.

**191**(Apr.), 116235.1-116235.21 (2022)A.A. Mostafa, A.A. Alhossary, S.A. Salem, A.E. Mohamed, GBO-kNN a new framework for enhancing the performance of ligand-based virtual screening for drug discovery. Expert Syst. Appl.

**197**(1), 116723.1-116723.10 (2022)Y. Yang, X. Song, Research on face intelligent perception technology integrating deep learning under different illumination intensities. J. Comput. Cognit. Eng.

**1**(1), 32–36 (2022)A.J. Gallego, J.R. Rico-Juan, J.J. Valero-Mas, Efficient k-nearest neighbor search based on clustering and adaptive k values. Pattern Recog.

**122**, 108356.1-108356.17 (2022)K.A. Panaiyappan, M. Rajalakshmi, A multimodal architecture using Adapt-HKFCT segmentation and feature-based chaos integrated deep neural networks (Chaos-DNN-SPOA) for contactless biometricpalm vein recognition system. Int. J. Intell. Syst.

**37**(3), 1846–1879 (2022)J. Strobing, M. Granit, J. Wang, L. Zhao. Generalized stochastic petri net based simulation of IoT supported dynamic navigation in teaching building evacuation. 2022 Int. Conf. Cyber-Phys. Soc. Intell. 436–441 (2022). IEEE

M. Di Mauro, G. Galatro, M. Longo, F. Postiglione, M. Tambasco, Comparative performability assessment of SFCs: The case of containerized IP multimedia subsystem. IEEE Trans. Netw. Serv. Manage.

**18**(1), 258–272 (2020)X.Y. Li, Y. Liu, Y.H. Lin, L.H. Xiao, E. Zio, R. Kang, A generalized petri net-based modeling framework for service reliability evaluation and management of cloud data centers. Reliab. Eng. Syst. Saf.

**207**, 107381 (2021)

## Acknowledgements

None.

## Funding

No funds, grants, or other support was received.

## Author information

### Authors and Affiliations

### Contributions

Zhiqiang Chen and Zhihua Song: original draft preparation. Tao Zhang: methodology. Yong Wei: writing—review and editing. All authors read and approved the final manuscript.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

## Additional information

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1186/s13635-024-00160-9"

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Chen, Z., Song, Z., Zhang, T. *et al.* RETRACTED ARTICLE: IoT devices and data availability optimization by ANN and KNN.
*EURASIP J. on Info. Security* **2024**, 2 (2024). https://doi.org/10.1186/s13635-023-00145-0

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s13635-023-00145-0