<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">WSN</journal-id><journal-title-group><journal-title>Wireless Sensor Network</journal-title></journal-title-group><issn pub-type="epub">1945-3078</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/wsn.2012.44015</article-id><article-id pub-id-type="publisher-id">WSN-18621</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Data Categorization and Noise Analysis in Mobile Communication Using Machine Learning Algorithms
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>aghavendra</surname><given-names>Phani Kumar</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Malleswara</surname><given-names>Rao</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Dsvgk</surname><given-names>Kaladhar</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Electronics and Communication Engineering, GITAM Institute of Technology,  GITAM University, Visakhapatnam, India</addr-line></aff><aff id="aff2"><addr-line>Department of Bioinformatics, GITAM Institute of Science, GITAM University, Visakhapatnam, India</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>phanikrch@gitam.edu(APK)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>24</day><month>04</month><year>2012</year></pub-date><volume>04</volume><issue>04</issue><fpage>113</fpage><lpage>116</lpage><history><date date-type="received"><day>January</day>	<month>16,</month>	<year>2012</year></date><date date-type="rev-recd"><day>February</day>	<month>24,</month>	<year>2012</year>	</date><date date-type="accepted"><day>March</day>	<month>11,</month>	<year>2012</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Machine learning and pattern recognition contains well-defined algorithms with the help of complex data, provides the accuracy of the traffic levels, heavy traffic hours within a cluster. In this paper the base stations and also the noise levels in the busy hour can be predicted. J48 pruned tree contains 23 nodes with busy traffic hour provided in east Godavari. Signal to noise ratio has been predicted at 55, based on CART results. About 53% instances provided inside the cluster and 47% provided outside the cluster. DBScan clustering provided maximum noise from srikakulam. MOR (Number of originating calls successful) predicted as best associated attribute based on Apriori and Genetic search 12:1 ratio.
 
</p></abstract><kwd-group><kwd>Traffic; MOR; Data Mining</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>The classification (or automated categorization) of texts into predefined categories has spectator with a booming interest in the last 10 years, due to the increased availability of information in digital form in communication technology and the ensuing need to organize them [<xref ref-type="bibr" rid="scirp.18621-ref1">1</xref>]. Technological advances can produce a flood of large data sets that have led to massive data analytic problems and can easily lead to flawed inferences. Statisticians might benefit from learning more about wireless signal controls, and thinking up ways to use data on controls in their analyses [2-4].</p><p>Machine learning contains well-defined algorithms, data structures, and theories of learning by automated cauterization or classification of text in to predefined categories [<xref ref-type="bibr" rid="scirp.18621-ref1">1</xref>]. Machine learning became a central research area since mid-1950, due to achieve recognition in artificial intelligence to understand the phenomenon of learning data sets [<xref ref-type="bibr" rid="scirp.18621-ref5">5</xref>].</p><p>Pattern recognition and data mining from past few years has fundamental operations in partitioning large set of objects in to homogeneous clusters [6,7]. Scientific data provides a platform to learn the data in search of hidden patterns that exist in large data bases .data mining is the advancement of inductive learning technique to evaluate the usefulness of the cases retrieved from large data sets [<xref ref-type="bibr" rid="scirp.18621-ref8">8</xref>].</p><p>In this paper we describe an application of machine learning to an important communication problem: Detection of busy traffic hours in the base stations of an area. We cover the application of machine learning from the formulation of the problem to the delivery of a system for field testing which includes soft handoff traffic and busy traffic hour, soft handoff rate, number of calls, originating calls, paging response, call termination rates. The primary purpose of the paper is to present machine teaching research communities that have general importance in communication technology in machine learning applications.</p></sec><sec id="s2"><title>2. Methodology</title><p>The input dataset is in the Waikato Environment for Knowledge Analysis (WEKA) “arff” file format. The Communication data set has 15 attributes and there are 78 instances.</p><sec id="s2_1"><title>2.1. J48</title><p>J48 algorithm is an implementation of the C4.5 decision tree learner, produces decision tree models. The algorithm uses the greedy technique to induce decision trees for classification.</p></sec><sec id="s2_2"><title>2.2. CART</title><p>CART algorithm stands for Classification and Regression Trees algorithm, and is a data exploration and prediction algorithm. Classification, Regression Trees is a classifier method which in order to construct decision trees.</p></sec><sec id="s2_3"><title>2.3. SimpleKMeans</title><p>In SimpleKMeans clustering, the similarity of two clusters is defined as the similarity of their centroids. The centroid of a cluster which is a point whose parameter values are the mean of the parameter values of all the points in the clusters.</p></sec><sec id="s2_4"><title>2.4. DBScan</title><p>DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a data clustering algorithm, as it finds a number of clusters starting from the estimated density distribution of corresponding nodes. It starts with an arbitrary starting point that has not been visited. This retrieves the neighboured clusters, and if it contains sufficiently many points, a cluster is started. Otherwise, the point is labelled as noise.</p></sec><sec id="s2_5"><title>2.5. Apriori</title><p>Apriori is a classic algorithm for learning association rules, designed to operate on databases for finding patterns in data.</p></sec><sec id="s2_6"><title>2.6. Genetic Search</title><p>It is a sub set of scoring algorithm, search for multiple solution simultaneously. These solutions blended with each other and are maintained in population based on fitness.</p></sec></sec><sec id="s3"><title>3. Results</title><p>The mobile communication depends on the transmitted signal and also the number of users in the cluster. Signal traffic of those mobile users is carried by the base stations. In this paper, the analysis related to the clustering and associated study has been constricted through survey in various base stations, in a clustered area.</p><p>J48 pruned tree provided the result with 13 leaves (4 for east Godavari, 2 for Vizag, 4 for Vijayanagaram and 3 for srikakulam). Busy traffic hour has been provided in a leave for east Godavari. The size of the tree contains 23 nodes (<xref ref-type="fig" rid="fig1">Figure 1</xref>).</p><p>CART Decision Tree provides 3 leaf nodes with 5 branches. Signal to noise ratio has been predicted at 55 (<xref ref-type="fig" rid="fig2">Figure 2</xref>).</p><p>SimpleKMeans provided the centroid data for the clustered dataset. About 53% instances provided inside the cluster and 47% provided outside the cluster (<xref ref-type="fig" rid="fig3">Figure 3</xref>).</p><p>DBScan clustering provided maximum noise from srikakulam. Five clusters have been predicted based on the clustering results (<xref ref-type="fig" rid="fig4">Figure 4</xref>).</p><p>The best associated attribute predicted based on Apriori and Genetic search is predicted as MOR (Number of originating calls successful) with 12:1.</p></sec><sec id="s4"><title>4. Discussion</title><p>Mobile communication traffic data analysis has been often used as a background application to motivate many data mining problems [<xref ref-type="bibr" rid="scirp.18621-ref9">9</xref>]. The data mining tool tracks for a minimal difference set between things because they believe a list of essential differences is easier to read and understand than detailed descriptions. Summarizing the large data sets to find the data that really matters detailed summaries and generating extensive and lengthy descriptions [<xref ref-type="bibr" rid="scirp.18621-ref10">10</xref>].</p><p>A new data mining algorithm which involves incremental mining for user moving patterns in a mobile computing environment and exploit the mining results to develop data allocation schemes so as to improve the overall performance of a mobile system [<xref ref-type="bibr" rid="scirp.18621-ref11">11</xref>]. Data collected from mobile phones have the potential to provide insight into the relational dynamics of individuals. Dis-</p><p>tinctive temporal and spatial patterns in their physical proximity and calling patterns allow the prediction of individual-level outcomes such as job satisfaction [<xref ref-type="bibr" rid="scirp.18621-ref12">12</xref>].</p><p>Group pattern is used to locate different groups of mobile users associated by means of physical distance and amount of time spent together. Performance of the method indicates a suitable segment size and alpha value needs to be selected to get the best result [<xref ref-type="bibr" rid="scirp.18621-ref13">13</xref>]. Mining frequent sub trees from databases of labelled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. The application needs more expressive power of labelled trees to capture the complex relations among data entities [<xref ref-type="bibr" rid="scirp.18621-ref14">14</xref>].</p><p>Mobile traffic caused by the mobile users in a base station data mining is about finding useful knowledge from the raw data produced by them. Performance evaluation shows that as the number of characteristics increases, the number of rules will increase dramatically and therefore, a careful choosing of only the relevant characteristics to ensure acceptable amount of rules [<xref ref-type="bibr" rid="scirp.18621-ref15">15</xref>].</p></sec><sec id="s5"><title>5. Conclusion</title><p>Group pattern of mobile user’s results to develop data allocation schemes so as to improve the overall performance of a mobile system without interruption, as the traffic rate is dramatically increasing. Signal to noise ratio has been predicted at 55, based on CART results. The development of intelligent data analysis in mobile communication from the machine learning perspective is necessary in future.</p></sec><sec id="s6"><title>6. Acknowledgements</title><p>The authors acknowledged the support from Department of ECE, GITAM University for providing the necessary research facilities.</p></sec><sec id="s7"><title>REFERENCES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.18621-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Fabrizio Sebastiani, “Machine learning in automated text categorization,” ACM Computing Surveys (CSUR), Vol. 34, No. 1, 2002, pp. 1-47. doi:10.1145/505282.505283</mixed-citation></ref><ref id="scirp.18621-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Leland Wilkinson, “The Future of Statistical Computing,” Technometrics, Vol. 50, No. 4, 2008, pp. 418-435. 
doi:10.1198/004017008000000460</mixed-citation></ref><ref id="scirp.18621-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">I. F. Akyildiz, W. Su, Y. Sankarasubramaniam and E. Cayirci, “Wireless Sensor Networks: A Survey,” Computer Networks, Vol. 38, No. 4, 2002, pp. 393-422. 
doi:10.1016/S1389-1286(01)00302-4</mixed-citation></ref><ref id="scirp.18621-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">D. S. V. G. K. Kaladhar, T. Uma Devi, P. V. Lakshmi, R. Harikrishna Reddy, R. K. SriTeja Ayayangar V. and P. V. Nageswara Rao, “Analysis of E. coli Promoter Regions Using Classification, Association and Clustering Algorithms,” Advances in Intelligent and Soft Computing, Vol. 132, 2012, pp. 169-177.  
doi:10.1007/978-3-642-27443-5_20</mixed-citation></ref><ref id="scirp.18621-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, Vol. 1, No. 1, 1986, pp. 81-106. 
doi:10.1007/BF00116251</mixed-citation></ref><ref id="scirp.18621-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">M. K. Anderberg, “Cluster Analysis for Applications,” Academic Press, Waltham, 1973.</mixed-citation></ref><ref id="scirp.18621-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">R. S. Michalski and R. E. Strepp, “Automated Construction of Classification: Conceptual Clustering versus Numerical Taxonomy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 5, No. 4, 1983, pp. 396-410. doi:10.1109/TPAMI.1983.4767409</mixed-citation></ref><ref id="scirp.18621-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">D. S. V. G. K.  Kaladhar and B. Chandana, “Data Mining, Inference and Prediction of Cancer Datasets Using Learning Algorithms,” International Journal of Science and Advanced Technology, Vol. 1, No. 3, 2011, pp. 68-77.</mixed-citation></ref><ref id="scirp.18621-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">T. J. Wang, B. S. Yang, J. Gao, D. Q. Yang, S. W. Tang, H. Y. Wu, K. D. Liu and J. Pei, “MobileMiner: A Real World Case Study of Data Mining in Mobile Communication,” Proceedings of the 35th SIGMOD International Conference on Management of Data, Rhode Island, 29 June-2 July 2009. pp. 1083-1086.  
doi:10.1145/1559845.1559988</mixed-citation></ref><ref id="scirp.18621-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">T. Menzies and Ying Hu, “Data Mining for Very Busy People,” Computer, Vol. 36, No. 11, 2003, pp. 22-29. 
doi:10.1109/MC.2003.1244531</mixed-citation></ref><ref id="scirp.18621-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Wen-Chih Peng and Ming-Syan Chen, “Developing Data Allocation Schemes by Incremental Mining of User Moving Patterns in a Mobile Computing System,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 1, 2003, pp. 70-85. doi:10.1109/TKDE.2003.1161583</mixed-citation></ref><ref id="scirp.18621-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">N. Eagle, A. Pentland and D. Lazer, “Inferring Friendship Network Structure by Using Mobile Phone Data,” Proceedings of the National Academy of Sciences, Vol. 106, No. 36, 2009, pp. 15274-15278.  
doi:10.1073/pnas.0900282106</mixed-citation></ref><ref id="scirp.18621-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">J. Goh and D. Taniar, “Mining Frequency Pattern from Mobile Users,” Knowledge-Based Intelligent Information and Engineering Systems, Vol. 3215, 2004, pp. 795-801.  
doi:10.1007/978-3-540-30134-9_106</mixed-citation></ref><ref id="scirp.18621-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Y. Chi, R. R. Muntz, S. Nijssen and J. N. Kok, “Frequent Subtree Mining—An Overview,” Fundamental Informaticae—Advances in Mining Graphs, Trees and Sequences, Vol. 66, No. 1-2, 2004, pp. 161-198. </mixed-citation></ref><ref id="scirp.18621-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">J. Y. Goh and D. Taniar, “Mobile Data Mining by Location Dependencies,” Intelligent Data Engineering and Automated Learning, Vol. 3177, 2004, pp. 225-231</mixed-citation></ref></ref-list></back></article>