Energy Aware Hierarchal Data Aggregation and Trust Based Data Integrity Verification for WSN

Currently the demand of wireless sensor networks has gained huge attraction due to its wide range of applications. Generally, these nodes are equipped with limited power resource and deployed in harsh environment where replacing these resources is a tedious task. Due to these issues, minimizing the energy consumption is a prime task to prolong the network lifetime. To overcome the challenging issue of data aggregation we introduced a novel combined mechanism which performs clustering and trust computing process to improve the data aggregation. According to this scheme, we arrange the nodes as normal node, advanced node and super nodes based on their residual energy parameters. The proposed model uses hierarchal scheme where we present a new mechanism for optimal number of cluster formation and cluster head selection. After selecting the cluster head, we apply trust computation scheme which provides sensing trust, link trust and node trust. The node trust is computed as direct and indirect trust. This trust mechanism is used as hop-by-hop manner to maintain he data integrity. The experimental study shows that proposed approach achieves better performance and maintains the security aspects of WSN.


Introduction
Nowadays, the demand of wireless sensor network has increased due to their advances in communication and significant use in widespread applications in real-world applications such as medical application [1], smart home [2], military application [3], environmental monitoring [4], traffic administration [5] and many more [6]. The sensor network is composed of large number of tiny sensor nodes. According to the application, these sensor nodes sense and collect the data such as temperature, pressure, humidity, light, voltage, etc... Generally, the collected data is stored as multidimensional data. Moreover, these sensor nodes are resource-constrained which are generally deployed in an unattended even hostile area where replacing the power and other resources is not possible. Due to these issue, these networks are not considered as a feasible solution in critical application scenario where high quality of service and long network lifetime is desired. Hence, maintaining the traffic and reduced computation overhead is the primary task to prolong the network lifetime. However, to deal with this issue of network lifetime, energy aware routing schemes are widely adopted in various real-time systems. These routing protocols are categorized as route processing, network structure, network operation and communicator initiator based protocols. Below given figure 1 shows the classification and sub-classification of these routing protocols. The main aim of routing scheme to carry the data to the destination node. Hierarchal routing have gained significant attention to improve the network lifetime. However, these techniques do not consider the quality of data collected by sensor node. Sensor nodes collect data and transmit to the next hop. During this collection process, the redundant data causes additional energy consumption which can lead towards the degraded network lifetime. To overcome this issue, data aggregation scheme is widely adopted which helps to minimize the outliers in the data minimized the retransmission frequency. Data aggregation is technique which combines the data coming from child nodes in an energy efficient manner. This process is called as data aggregation which uses aggregation functions such as SUM, AVG, MIN, MAX, COUNT, etc. These functions minimizes the data redundancy resulting in increasing the network lifetime while ensuring the better quality of data collection. Several techniques have been reported for data aggregation such as compressive sensing data aggregation [7], heuristic approach [8], and entropy based data aggregation [9]. Similarly, optimization based schemes are also adopted to improve the aggregation ISSN: 00333077 \ 5638 www.psychologyandeducation.net performance such as particle swarm optimization (PSO) [10], genetic algorithm (GA) [11], and firefly optimization [12]. Generally, the data aggregation techniques can be classified as structure based [15,16] and structure-free data aggregation approach [17,18]. In structure based approaches, several structures are formed using sensor nodes to collect the data, aggregate and transmit to the base station. These structures are follow the chain-based, treebased, cluster based, and hierarchal cluster based models. The superior node is assigned as leader node in chain based structure, root node in tree based structure and cluster head in cluster based mechanisms. These nodes are responsible for data collection and aggregation. However, these techniques fail to obtain the desired performance in terms of accuracy, fault-tolerance, security and latency [13]. Security and energy consumption are the two major challenging issue while maintaining the quality of data aggregation. Currently secure data aggregation has become the prime concern for research community to improve the overall communication performance. The main security requirements are integrity, confidentiality, authentication and freshness. We propose a novel combined scheme which performs both clustering and trust computing operations to enhance the data aggregation process. As per the proposed method, we arrange the nodes as normal node, advanced node and super nodes based on their residual energy parameters. Subsequent to cluster head selection, we perform trust computation mechanism which provides sensing trust, link trust and node trust. The node trust is further computed as direct and indirect trust. This trust mechanism is used as hop-by-hop manner to maintain he data integrity. Rest of the article is organized in following sections: section II presents a brief literature review about existing techniques of data aggregation, section III presents the proposed energy efficient and secure data aggregation, section IV describes the experimental analysis and section V presents the concluding remarks about this scheme.

Literature Survey
This section presents the brief literature review about existing techniques of secure and energy efficient data aggregation in wireless sensor network. Kang et al. [14] focused on achieving the tradeoff in terms of network delay and energy cost for data aggregation tasks. Currently, duty cycled WSN are adopted where communication and sensing capabilities are periodically switched ON and OFF to minimize the energy consumption when node is in ideal mode. This periodic switching process causes complexity in data aggregation. To deal with data aggregation issue, authors introduced distributed delay efficient data aggregation scheduling (DEDAS-D) approach for duty cycled WSN. Haseeb et al. [15] used structure based data aggregation mechanism and incorporated security aspects in Internet of Things (IoT) with WSN. The first phase of this approach performs node clustering based on varying communication radius. This helps to mitigate the energy hole around the base station. In next phase, A-star heuristic algorithm is applied to obtain the routing paths. Later, a security scheme is applied to protect the communication link. This security model uses unbreakable one time pad (OTP) encryption for data security. Fang et al. [18] developed a combined scheme which focuses on both energy-efficient and secure data aggregation called as cluster-based private data aggregation (CSDA). This approach is the modified version of CPDA (Clusterbased Private Data Aggregation) and SMART. These two schemes suffer from the computational complexity issue and loss of data. To overcome this issue, authors incorporated intrusion detection scheme to secure the network from sinkhole and selective forwarding attacks. Moreover, this scheme uses data slicing scheme to minimize the energy consumption. Similar to this data slicing process, Hua et al. [19] developed energy efficient secure data aggregation approach to prevent the node compromisation in the network. This article presented Adaptive Slice-based Secure Data Aggregation (ASSDA) which considers limitation of node resources. Merad Boudia et al. [20] discussed about the importance of WSN in IoT technology and elaborated the advantages of data aggregation to maintain the energy consumption and security of the network. In these networks, false data ejection and impersonation attacks are the challenging issues which affect the network performance. Generally, the data can be verified by using either end-to-end approach or hopby-hop approach. The end-to-end approach can be performed after receiving the data at the end. This leads to loss of legitimate data. On the contrary, the hop-by-hop approach verifies data at each hop which significantly improves the aggregation process. Hence, authors presented hop-by-hop verification scheme. This scheme utilizes Elliptic Curve El Gamal (ECEG) protocol and message authentication code (MAC) modules to incorporate the security aspects. Later, a distributed computing scheme is applied for concealed data aggregation. Shobana et al. [21] proposed cluster-based systematic data aggregation model (CSDAM) for WSN. During the first phase, the sensor network is created and clusters are formed which include active and sleep node. Further, the cluster head is selected based on existing energy level and geographic location to the base station. This cluster head acts as the aggregator node. Further, a three stage data aggregation scheme is presented which uses threshold to select the aggregation. Hu et al. [22] presented chain based privacy-preserving data aggregation scheme. In this approach, the nodes are arranged as a tree topology. The leaf nodes establish the connection with other nodes to form the chain topology. To ensure the security, the tail node divides the data into fragments. The tail node keeps one fragment and distributes fragments to the neighboring nodes. These nodes inject some fake fragments to divert the adversaries. Roslin et al. [23] studied that during the data fragmentation phase, the adversaries or attackers can inject forged fragments which can degrade the quality of aggregation. Hence, these fragments must be verified to observe the correctness of data. To deal with these issues, authors presented trust based approach for data aggregation to verify the integrity of data fragments. This approach constructs a tree based on the trust values and encrypted data with the shared symmetric key. Further, this encrypted data is divided into fragments and a homomorphic MAC tag is ISSN: 00333077 \ 5639 www.psychologyandeducation.net incorporated. After receiving the signed blocks, the aggregator nodes perform the SUM aggregation to aggregate the data. Boubiche et al. [24] reported that most of the existing secure data aggregation techniques use encryption based modeling to protect the data but the key generation and distribution consumes addition energy. To overcome this issue, authors presented a new approach of secure data aggregation which is called as SDAW (secure data aggregation watermarkingbased scheme in homogeneous WSNs). For security, a lightweight fragile watermarking scheme is developed which is used to authenticate the data for aggregation. Similarly, the links between sensor node and aggregation node, links between aggregator and base station are also secured by using watermarking technique. Qi et al. [25] developed an asymmetric key encryption scheme based on elliptic curve cryptography scheme for WSNs. First of all, a key generation scheme is applied to generate the cryptography keys periodically. Later, a homomorphic scheme is applied to achieve the encrypted data. Finally, a hop-by-hop verification is performed using rotation MAC generation algorithm.

Proposed Model
This section presents the proposed solution for energy aware and secure data aggregation to improve the energy efficiency and reliability of communication. We assume that the network has only one sink node and location of other sensor nodes can be obtained by using our previous mechanism []. We model a sensor network in the form of edge-weighted graph denoted as where denotes the vertex set. Each vertex in vertex set represent a senor node including sink node which is denoted as . These sensor nodes consume a specific power level. This set is denoted as which contains power levels. For each sensor node the consumed power level is represented as . Here, denotes an edge if and have connectivity as . This denotes the bidirectional communication link between nodes. Here, the data collection request is denoted as where represent the source node and is the data source node. As discussed before, hierarchal network communication is widely adopted where cluster head performs data processing, aggregation and transmission tasks. Hence, optimal cluster formation and cluster head selection are the important phases to improve the network performance. We assume that number of sensor nodes is deployed uniformly in the 2D square geographical region. All of these sensor nodes and base station become stationary after deployment and cluster head (CH) performs the data aggregation. We use a radio model to compute the energy required to transmit bit data packet over distance . This can be expressed as: Where denotes the energy dissipation to operate the transmitter or receiver circuitry, is the energy dissipation per bit in free-space model and is the energy dissipation of multipath model for distance which is given as: Similarly, the amount of energy required to receive the packet is given as:

Optimal number of clusters
Selection of optimal number of clusters plays important role to improve the communication performance and minimizes the energy consumption. Let us consider a network area square meters where number of sensor nodes are deployed uniformly. The distance of node to base station or node to its corresponding cluster head is . Thus, the energy dissipation by CH in a round can be given as: (4) Where is the number of clusters is, is the energy consumption in data aggregation, is the distance between base station and CH, which is computed as: (5) Similarly, the energy dissipated by cluster member node is given as: (6) Where denotes the average distance between CH and cluster member node which is given as: Where denotes the node distribution and is the network area. Thus, the total energy consumed in each round in each cluster can be given as: (8) With the help of Eq. (4) and Eq. (6), the total energy consumption can be expressed as: (9) With the help of this, we can obtain the optimal number of clusters as: (10) Based on the distance and optimal number of clusters, using Eq. (5) and (10), we compute the probability of node to become a cluster head. This can be given as: (11)

CH selection
This scheme categorizes sensor nodes into three categories as normal node, advanced node and super node. Here, the node which are having higher residual energy level are known as advanced node and super node whereas remaining nodes are treated as normal node. Let us consider that nodes are the advanced nodes and are the super nodes among these advanced nodes. The initial energy of normal node, advanced node and super node is denoted as , \ 5640 www.psychologyandeducation.net and ,respectively. Based on residual energy level, it is obvious that advanced and super nodes have the higher probability to become the cluster head. At this stage, we present assign an optimal weight to the previous probability . Let us denote the weighted election probabilities as and for normal, advanced and super nodes. These probabilities can be computed as: (12) With the help of these probabilities, we derive a new threshold function to select node as CH for each type of node i.e. normal, advanced and super node. The threshold function for normal node is given as: (13) Where denotes the current round, is the set of normal nodes, round of each epoch. Similarly, we compute the threshold for advanced and super nodes as: (14) Incorporating security for secure data aggregation The previous section describes the complete process of data aggregation where we perform optimal number of cluster selection and cluster head selection to maximize the network lifetime. Here, maintaining the security during aggregation is a challenging task. In order to maintain the integrity, we present a trust computation strategy where we present data sensing trust, link trust and node trust.

Computing the sensing trust
The sensing trust helps to maintain the consistency and fault tolerance in the network. Generally, the data sensed by sensor nodes is spatio-temporal correlation which denotes the similarity between collected data in the current cluster. The probability density function can be described as: (15) denotes the mean value, and denotes the variance. The closer value of represents the higher trust. Let for node , the trust value can be computed as: (16) where denotes the value of sensed data by node . Further, to resist the data modification attackers we use mean absolute deviation.

Computing the link trust
The trust value of communication link is evaluated based on the packet error rate and packet loss rate. The probability of bit error rate is obtained as: (17) Based on this, the packet error rate can be computed as where is the number of payload in a packet. Similarly, the packet loss rate can be computed as: (18) Where denotes the number of packets received successfully, and denotes the sum of sent packets. Based on these values of packet error rate and packet loss rate, the link quality can be computed as follows: (19) Computing the node trust In order to compute the node trust, we require two parameters which are known as direct trust and indirect trust. The direct trust is obtained by the observation of any node and indirect trust is recommended by a third party node. The direct and indirect trust values are computed as: (20) Where denotes the number of successful communication, denotes the residual energy levels, and is unsuccessful transmission. Similarly, the indirect trust can be computed as: (21) denotes the number of nodes as trust recommender and is the recommender trust value of node recommended by node . Finally, the node trust value can be computed as : (22) Where is the weight value of trust

Results and discussion
This section presents the experimental analysis of proposed energy aware secure data aggregation in WSN. This approach is simulated using MATLAB simulation tool running on windows platform. The windows operating system machine is equipped with 16GB RAM, 8 GB NVIDIA graphic card, 1TB storage space and Intel i5 processing unit. The outcome of proposed model is compared with existing techniques such as Trust Assisted Global and Greedy Congestion-aware Data Aggregation for (TAG-GCDA) [26], hamming residue method (HRM) [27] and belief-based trust evaluation mechanism (BTEM) [28] in terms of communication overhead, energy consumption, packet delivery ratio and packet loss rate. Below given table shows the complete set of simulation parameters used in this work. The communication overhead is the measurement of total number of packets transmitted from source to destination node in direct or hop-by-hop manner. This overhead includes data collection, security considerations and data aggregation. Below given figure 1 shows the comparative analysis of communication overhead for varied node scenario where we have considered 50 -500 number of nodes.

Fig.1. Comparative analysis of communication overhead
The performance quality of these schemes is arranged as BTEM, HRM, TAG-GCDA and Proposed Approach in the increasing order. The average performance of these schemes is obtained as 139.5, 115.8, 90 and 58.7 KB. In existing techniques, as the number of nodes is increasing, the overhead also increases. The graph analysis shows that the performance of proposed approach for 500 nodes is better when compared with existing techniques for 200 nodes. Initially, for 50 nodes, the performance of proposed approach is increased by 50%, 66%, and 33% by using BTEM, HRM and TAG-GCDA, respectively. Similarly for 500 nodes, the performance of proposed approach improved by 52%, 42.85%, and 25.46% by using aforementioned techniques. In next phase, we measure the performance in terms of total energy consumed by network during the entire simulation period. This energy consumption includes energy consumed by several processes such as data sensing, collection, processing and aggregation at the cluster head. We use the same similar experimental setup as depicted in figure 1. The energy consumption performance for this scenario is depicted in figure 2.

Conclusion
In this work, we have presented a novel data aggregation scheme to improve the network lifetime. The conventional data aggregation schemes suffer from the high energy consumption and fails in maintaining the security during data aggregation process. Hence, we present a combined approach to accomplish these objectives. First of all, we present an energy aware cluster formation and cluster head selection. In this approach, the nodes are divided as normal node, advanced and super node based on their energy levels. We proposed a model to compute the optimal number of cluster formation and CH selection. The CH node is responsible for aggregation. Further, we present trust computation based mechanism to incorporate the security aspects. The experimental study shows improved performance of the system in terms of communication overhead, energy consumption, packet delivery, and packet drop rates.