Survey on Machine Learning in 5G

— The core of next generation 5G wireless network is heterogeneous network. The upcoming 5G heterogeneous network cannot be fulfilled until Artificial Intelligence is deployed in the network. The existing traditional 4G technology approaches are centrally managed and reactive conception-based network which needs additional hardware for every update and when there is a demand for the resources in the network. 5G helps in giving solution to the problem of 4G network using prediction and traffic learning to increase performance and bandwidth. Heterogeneous network provides more desirable Quality of Service (QOS) and explores the resources of the network explicitly. The assortment of heterogeneous network brings difficulty in traffic control of the network. The problem in heterogeneous network is network traffic which cannot be controlled and managed due to different protocols and data transfer rate. To solve the problem in heterogeneous network advanced techniques like Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) are employed in 5G Network which are self pro-active, predictive and adaptive. In this paper we discuss about above mentioned advanced techniques that are deployed in 5G to reduce traffic in a network which increases efficiency of the network.


A. Port based IP traffic classification
TCP and UDP give multiplexing of different streams between IP endpoints with the assistance of port numbers. Generally numerous applications use an 'outstanding' port to which different hosts may start correspondently. The application is deduced by looking into the TCP SYN parcel's objective port number in the Internet Assigned Numbers Authority (IANA's) rundown of enlisted ports. In any case, this methodology has constraints. Right off the bat, a few applications might not have their ports enrolled with IANA (for instance, distributed applications, for example, Napster and Kazaa). An application may utilize ports other than its outstanding ports to rescue from working framework get to control confinements. Additionally, at times server ports are powerfully assigned as required. Though port-based traffic grouping is the quickest and straightforward strategy, a few examinations have demonstrated that it performs ineffectively, e.g., under 70% precision in characterizing streams [11]12].

B. Payload based IP traffic classification
This methodology reviews the packet header to decide the applications. Packet payloads are analyzed a little bit at a time to find the bit streams that contain signature. In the event such piece of streams is discovered, at that point bundles can be precisely named. This methodology is regularly utilized for P2P traffic discovery and system interruption identification. Real impediments of this methodology is that the protection laws may not enable directors to assess the payload; it additionally forces huge multifaceted nature and preparing load on traffic ID gadget; requires significant computationally power and capacity limit since it examinations the full payload [2] C

. Protocol Behavior or Heuristics Based Classification
In this method the classification of networks is based on connection level patterns and network protocol behavior. This method is based on identifying and observing patterns of host behavior at the transport layer. The advantage of this classification is that packet pay load access is not needed [10] [2].

D. Classification based on flow statistics traffic properties:
The preceding techniques are restricted by their dependence on the inferred linguistics of the information gathered through deep review of packet content (payload and port numbers). Newer approaches depend on traffic's statistical characteristics to identify the applying [4][7][6] [9]. associate degree assumption underlying such ways in which is that traffic at the network layer has mathematical properties that are distinctive definitely classes of applications and modify wholly totally different offer applications to be distinguished from each other. It uses network or transport layer that has applied mathematics properties like distribution of flow length, flow idle time, packet interarrival time, packet lengths etc. These are distinctive sure categories of applications and thence facilitate to {differentiate|to tell apart} different applications from one another. This methodology is possible to see application sort however not usually the particular consumer type. as an example, it can't verify if flow belongs to Skype or MSN traveller voice traffic specifically. The advantage of this approach is that there's no packet payload scrutiny concerned.

II. MACHINE LEARNING
In every possible field machine learning has been used to leverage its astonishing power. In variety of application such as speech recognition, bio informatics and computer vision, ML techniques have been used efficiently. Machine learning is mainly used for prediction and classification and also in networking it is mainly used for performance prediction and intrusion detection. To make decision directly Machine learning constructs models that can learn themselves from data without being explicitly programmed or without following some set of rules.
Machine learning enables the model to get into selflearning mode without being explicitly programmed. The model can be trained by providing data sets to them, when exposed to new data, models are enabled to learn, predict and develop by themselves. Machine learning algorithm can be classified into three categories. They are supervised learning, unsupervised learning, reinforcement learning [3].
In Supervised learning the model is trained on a labeled data set which then learns on its own and when new testing data is given it compares with the training data set and predicts the output. Supervised learning is mainly used for regression and classification problems.
In unsupervised learning the training data set is unlabelled, and it finds pattern and relationship among data. It is mainly used in clustering and association problems. In reinforcement learning the model learns on its own without any training data.

A. Naïve Bayes
Naïve Bayes is a classification algorithm which mainly relies on Bayes theorem. To control traffic in network Bayes theorem is used which classifies the network traffic accurately with the help of the flow feature which is given as training data to the model [26]. a. Pre-processing In this process IP packets over a network is collected and used for designing and also for determining the header of packets. A stream is regularly plot as sequent IP bundles having the qualities, for example, 5-tuple: supply IP, supply port, goal IP, goal port, and transport layer convention [25].
Since we tend to have some expertise in a connected science approach for grouping strategy, we need to extricate the stream connected arithmetic alternatives and is discretized for speaking to the traffic streams.

b. Correlation Based Feature Selection
In this method measurable highlights are extricated and are utilized to speak to traffic streams that is finished by pre-handling to apply include determination [16] to expel immaterial and excess highlights from the list of capabilities. The relationship-based element subset choice is utilized in the investigations, which looks for a subset of highlights with high class-explicit connection and low inter connection. Relationship coefficient is signified as 'r' where where nrepresents number of instances xindicates attributes to be tested or correlation. yrepresents the attributes to be tested against the x. Finally, 0.75 is selected as threshold value

c. Feature Discretization
Discretization [30] could be a method of changing numeric values into intervals and associating them to a nominal image. These symbols are then used as new values rather than the initial numeric values. The new dataset is smaller than that of the previous one, i.e.) a discretized feature is having a fewer attainable values than that of nondiscretized one. The key method in discretization is that the choice of intervals which may be determined by associate experience within the field or by discretization rule. There ar 2 approaches for discretization: One is to discretize every feature while not the data of the categories within the coaching set (unsupervised discretization). the opposite is to form use of the categories once discretizing (supervised discretization) [15].

d. Naïve Bayes Classification
A Naïve-Bayes (NB) metric capacity unit algorithmic program [6] could be a straightforward structure consisting of a category node because the parent node of all alternative nodes. the fundamental structure of Naïve Bayes Classifier is shown in Fig three within which C represents main category and a, b, c and d represent alternative feature or attribute nodes of a selected sample. No alternative connections square measure allowed during a Naïve-Bayes structure. Naïve-Bayes has been used as a good classifier.It is simple to construct Naïve Bayes classifier as compared to alternative classifiers as a result of the structure is given a priori and thus no structure learning procedure is needed. Naïve-Bayes works alright over an outsized variety of datasets, particularly wherever the options accustomed characterize every sample don't seem to be properly related to.

B. K-Nearest Neighbor
It is a kind of classification algorithm which collect all similar data and forms cluster. If a new data enters into the model based on the closeness of the data it classifies them to the corresponding clusters [5].
It is a non-parametric algorithm which does not require any prior knowledge about the data and enhances the robustness of the model. In network the traffic can be classified using K-Nearest neighbor by assigning the cluster value. In K nearest neighbor, K can be an integer greater than 1. For every new data point we want to classify, we compute to which neighboring group it is closest to [30].

C. Support Vector Machine
It is a supervised machine learning algorithm which is mainly used for classification and regression. In this algorithm the data is plotted in n-dimensional space, where n represents the number of features that is used for training [13] [14]. Then the classification is done by the hyper-plane that differentiates two classes. In networking the features of the network are trained and tested with new data and then the algorithm learns to predict and classify the new incoming class.
we must first train the classifier and then cross validate with test the data [17]. To get accurate prediction using SVM classifier we need to use SVM kernel function and then the parameters has to be tuned. the process involved in SVM classifier is as follows: step 1: training SVM classifier. step 2: classifying new data with SVM classifier. step 3: tuning SVM classifier.
In learning phase, the classifier is made to learn about the fundus images. Feature vector of the image is fed to the classifier and then the output is labelled [19]. In testing phase, feature vector of unknown image is fed to the classifier and the lesion is classified. The extracted feature of the image is given to the classifier to classify the fundus image of the retina accurately. For non-linear classification SVM uses kernel function to map the data to dimensional space.

IV. UNSUPERVISED LEARNING ALGORTIHM
In unsupervised learning algorithm the training data sets are unlabeled. This can be mainly used for clustering problems. In network the traffic can be clustered based on their features. Un supervised algorithms used in networking are K-Means, DBSCAN.

A. K-Means
k-means is one of unsupervised learning algorithm which is used for making inference from datasets by only using vectors as input. They do not refer to known, labelled outcomes. K-means algorithm groups data together forms cluster and finds the pattern that is involved in the dataset. Clusters refers to group of data point that have certain similarities [19].
K-Means randomly selects k centroids. Then it works in iterative way to perform two different tasks. First each data Is assigned to closest centroid, using the standard Euclidean distance. Euclidean distance finds similarity between flow

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181 http://www.ijert.org of data. Next For each centroid, mean value of data point has to be calculated.
In this training data contains payload to entitle flow with source application. Learning process involves two steps. First step contains explanation of each cluster and the other contains the application's structure. In classification packet size are noted and compared with the new flow of data. Flow is directed to the application that has more dominant value in the cluster [23] [24].

B. DBSCAN
Density Based Spatial Clustering of Application of Noise is a density-based clustering algorithm which uses dense area of objects. The parameters that are used in DBSCAN algorithm are eps, min points. The clusters in DBSCAN are formed from the core point which are directlydensity reachable and density reachable.
The data are collected from online tools and features are selected based on the packets and then they are fit into the model for testing and training. Finally, they are predicted and classified. The steps involved in DBSCAN are Let Y = {Y1,Y2,Y3…) be the set of data points 1) Initially the process has to be started with an arbitrary starting point which is not visited already.
2) Extract the neighbour of arbitrary point using ε.
3)Then clustering process starts if there are sufficient neighbourhood and point is marked as visited or it is noted as noise data. 4) If a point is found to be a part of the cluster then its ε neighbour is also the part of the cluster and the above procedure from step 2 is repeated for all ε neighbour points. Until all the cluster point is determined this process is repeated. 5) The unvisited new point is retrieved. 6) This process stops when all points are marked as visited.
V. ARTIFICIAL NEURAL NETWORK Artificial neural network is one of the learning algorithms where which is used within machine learning techniques. It consists of many layers for learning and analysing data. It learns like human brain and it is mainly used for pattern recognition and data classification. Neural networks are trained using examples. They can be programmed explicitly.

It contains three layers
• Input layer • Hidden layer • Output layer It may also contain multiple hidden layer. Hidden layer is mainly used for feature extraction and calculation. Feed forward and feedback are two topologies in neural network [22].

A. Backpropagation algorithm
It is the most important algorithm for training a neural network. It is mainly used to network traffic effectively in heterogenous network. For training the weights in multilayer feed forward network, backpropagation algorithm is used.
The neuron has weights that has to be maintained. Then the forward propagation is classified as neuron activation, neuron transfer, forward propagation.
Next is backpropagate error where the error is calculated and error is then back propagated through the hidden layer. It involves transfer derivate and error propagation. Then the network has to be trained by propagating the error and forwarding inputs [16].
Finally, the prediction of the network traffic is made effectively.

VI. WORK FLOW OF MACHINE LEARNING IN
NETWORK In next generation network machine learning plays an important role. The steps that are involved in networking are: • Step1: Problem formulation

A. Problem formulation
In machine learning the training process is time consuming so it is mandatory that the problem should be formulated correctly at the beginning of the process. There should be a strong relation between the problem and the data that has been collected. The machine learning model is classified as clustering, classification and decision making and the problem statement should also fall under this category [8][15] [16].
This help in identifying the learning model and also for collecting data. When the problem formulation is not done properly it leads to unsuitable learning model and un satisfactory performance. There are two types of data collection. They are offline data collection and online data collection.
In online data collection the real time data are collected and they can be used as feedback for the model and it can also be used as a re-training data for the model. Offline data can be collected from repositories [17][18] [19].
For the purpose of classification of network traffic, we are utilizing the datasets that are made from this present reality traffic flow named as 'wide'. The wide dataset comprises of traffic streams which are haphazardly chosen from the wide follow and cautiously perceived by the manual examination. It comprises of 3416 occasions with 7 classes, for example, (bt, dns, ftp, http, smtp, yahoomsg, ssh) and 22 traits. The features that are extracted from the process [13] is recorded in Table 1 By using monitoring and measurement tool online and offline data can be collected effectively which provides security in various data collection aspects. It can also be stored for model adaption. After data collection the process is categorized as training or learning phase, validation and testing phase.

C. Data analysis
Data analysis consists of two phases. They are: • pre-processing • feature extraction. Pre-processing is done to remove noise from the data that has been collected. Then the features of the data are extracted which is a prior step for learning and training [10]. The types of features that can be extracted from the network are: • Packet level features.
• Flow level features In packet level features the extracted features are packet size, mean, root and variance.
In flow level mean flow duration and mean number of packet flow features are extracted.

D. Model construction
In this process model selection, training and tuning are involved. According to the size of the data set a suitable learning model and algorithm needs to be selected.
Training involves training of the model along with the data set that is bee collected at the beginning of the stage.
The tuning process helps in making the model to learn themselves by comparing them with the trained data.

E. Model validation
It involves cross validation of the testing process to test the accuracy of the model. This helps in optimizing the model and maintains the overall performance of the system.

F. Deployment and interference
In deployment and interference stage all the trade off and stability of the model is maintained to check the accuracy and finds the best way in which steps has been followed.

VII. TRAFFIC CLASSIFCATION
It is a process in which network traffic can be categorized based on the parameters into number of traffic classes. It first captures network traffic and extracts the features of the selected data. Then training process is done using data sampling method and finally algorithm is implemented and results are calculated.

VIII. REINFORCEMENT LEARNING PERSPECTIVE
It enables the model to learn on its own automatically and make decisions by interacting with the environment continuously. When it gets combined with deep learning it becomes a solution to the problems which are un traceable in the real world [9].
It has three components.
• First the agent behavior is defined by the policy function • Second the state and value are evaluated • A model which represents learned knowledge IX. DEEP LEARNING IN HETEROGENOUS NETWORK The deep learning mechanism continues to exists in three phases [8]. They are: • Initial phase • Training phase • action or running phase. Initial Phase In this phase the relevant data from the deep learning system is obtained. To stimulate the communication between different routers under different conditions, traditional routing OSPF is used and also to record the traffic patterns in the network. Training Phase The training algorithm contains two main parts: The greedy layer-wise training method is used to initialize the deep learning system. The backpropagation algorithm is used to fine tune the deep neural networks. In each router the training period is executed.

Running phase
In running phase, the system is executed and the performance is calculated.
X. CONCLUSION AI is used as a tool to improve 5G technologies in recent technologies. The reason for not using AI algorithm in networking is due to the lack of learning process that has been left in past few years. Heterogeneous network is the basic for next generation network where traffic in a network plays a major role in disturbing the performance of the network. In this paper we discussed about machine learning techniques and its implementation in 5G heterogeneous network to increase its performance by reducing traffic in a network.