<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2015.311030</article-id><article-id pub-id-type="publisher-id">JCC-61321</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Mutual Information-Based Modified Randomized Weights Neural Networks
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jian</surname><given-names>Tang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Zhiwei</surname><given-names>Wu</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Meiying</surname><given-names>Jia</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Zhuo</surname><given-names>Liu</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Research Institute of Computing Technology, Beifang Jiaotong University, Beijing, China</addr-line></aff><aff id="aff2"><addr-line>State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, China</addr-line></aff><pub-date pub-type="epub"><day>19</day><month>11</month><year>2015</year></pub-date><volume>03</volume><issue>11</issue><fpage>191</fpage><lpage>197</lpage><history><date date-type="received"><day>October</day>	<month>2015</month>	</date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
   Randomized weights neural networks have fast learning speed and good generalization performance with one single hidden layer structure. Input weighs of the hidden layer are produced randomly. By employing certain activation function, outputs of the hidden layer are calculated with some randomization. Output weights are computed using pseudo inverse. Mutual information can be used to measure mutual dependence of two variables quantitatively based on the probability theory. In this paper, these hidden layer’s outputs that relate to prediction variable closely are selected with the simple mutual information based feature selection method. These hidden nodes with high mutual information values are maintained as a new hidden layer. Thus, the size of the hidden layer is reduced. The new hidden layer’s output weights are learned with the pseudo inverse method. The proposed method is compared with the original randomized algorithms using concrete compressive strength benchmark dataset. 
 
</p></abstract><kwd-group><kwd>Randomized Weights Neural Networks</kwd><kwd> Mutual Information</kwd><kwd> Feature Selection</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Machine learning (ML)-based data analysis has been a hot focuses in different disciplines. The most used learning prediction model construction methods are backup propagation neural networks (BPNN) and support vector machines (SVM) [<xref ref-type="bibr" rid="scirp.61321-ref1">1</xref>]. However, BPNN suffers from local optima, uncontrolled convergence speed and over-fit- ting problems. Although SVM can address small samples modeling problem with good generalization, quadratic program (QP) and large kernel matrix problems are difficult to overcome for big sample learning datasets. A special single-layer feed-forward (SLFN) networks-based neural networks learning algorithm, i.e., randomized weights neural networks, was proposed to overcome shortcomings that caused by the gradient-based learning algorithms [<xref ref-type="bibr" rid="scirp.61321-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.61321-ref3">3</xref>]. Its characteristics include: 1) input weights of the hidden layer are chosen randomly; 2) the hidden layer neurons need not be adjusted; and 3) output weights are analytically computed using pseudo inverse or least square method. The normally used pseudo inverse-based output weights calculation method has two advantages: a) optimal solution to the least square problem can be obtained; and b) the optimal output weight matrix is with minimal norm. There, this randomized weights neural networks algorithm has faster learning speed, which has been successfully applied [<xref ref-type="bibr" rid="scirp.61321-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.61321-ref5">5</xref>]. Thus, pseudo inverse-based randomized algorithm solves the local minima problem with good testing performance and fast training time [<xref ref-type="bibr" rid="scirp.61321-ref6">6</xref>]. However, how to control and estimate randomization of the input weights is an open issue. Study shows that small norm of the weights is more important than the node number to obtain good generalization performance for feed forward networks [<xref ref-type="bibr" rid="scirp.61321-ref7">7</xref>]. The norms of the hidden weights generated by deep learning are small [<xref ref-type="bibr" rid="scirp.61321-ref8">8</xref>]. Therefore, a randomized algorithms for nonlinear system identification with deep learning modification is proposed, which regards deep learning as pre-training technique to obtain the hidden layers’ input weights [<xref ref-type="bibr" rid="scirp.61321-ref9">9</xref>]. Thus, the small norm of the input weights and output weights are obtained by combination of the deep learning and the least-square approaches. However, long training time is needed. An effective and simple randomization control and estimation method needs to be addressed further.</p><p>Mutual information (MI) can be used to measure the mutual dependence of the two variables quantitatively based on the probability theory and information theory. Thus, it has been used widely in feature selection. The MI is more comprehensive than the other normal feature selection methods for select optimal input variables [<xref ref-type="bibr" rid="scirp.61321-ref10">10</xref>]. However, the popular used MI based feature selection method needs lots of computational consume [<xref ref-type="bibr" rid="scirp.61321-ref11">11</xref>]. A simple MI based feature selection method is used in [<xref ref-type="bibr" rid="scirp.61321-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.61321-ref13">13</xref>]. For randomized weights neural networks, if we cannot control the randomization of the input weights effectively or simply, how about to control the hidden layer’s outputs? That to say, we can only select some hidden layer’s outputs that relate the prediction variables more closely to calculate output weights using pseudo inverse method.</p><p>Motivated by the above problems, a modified randomized weight neural networks based on MI is proposed in this paper. At first, the input variables and the random chosen input weights feed into certain activation function to produce outputs of the hidden layer. Then, MI values between these hidden layer’s output and predicted variables are calculated, and these outputs with MI values higher than a preset threshold are selected. At last, pseudo inverse method is used to compute weights between these selected hidden layer’s outputs and predicted variable. Therefore, input weights’ randomization is controlled in some degrees. Simulation based on concrete compressive strength benchmark dataset is used to validate the proposed method.</p></sec><sec id="s2"><title>2. Randomized Weights Neural Networks</title><p>Suppose that SLFNs with <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x4.png" xlink:type="simple"/></inline-formula> hidden nodes can be represented as:</p><disp-formula id="scirp.61321-formula132"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x5.png"  xlink:type="simple"/></disp-formula><p>where,</p><disp-formula id="scirp.61321-formula133"><label>(2)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x6.png"  xlink:type="simple"/></disp-formula><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x7.png" xlink:type="simple"/></inline-formula>denotes the activation function of the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x8.png" xlink:type="simple"/></inline-formula> hidden node, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x9.png" xlink:type="simple"/></inline-formula>is the input weights connecting the input layer to the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x10.png" xlink:type="simple"/></inline-formula> hidden node, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x11.png" xlink:type="simple"/></inline-formula>is the bias of the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x12.png" xlink:type="simple"/></inline-formula> hidden node, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x13.png" xlink:type="simple"/></inline-formula>is the output weight connecting the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x14.png" xlink:type="simple"/></inline-formula>hidden node to the output layer, and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x15.png" xlink:type="simple"/></inline-formula> is the mapping output of the hidden layer, can be denoted as</p><disp-formula id="scirp.61321-formula134"><label>(3)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x16.png"  xlink:type="simple"/></disp-formula><p>Then, Equation (1) can be rewritten as:</p><disp-formula id="scirp.61321-formula135"><label>(4)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x17.png"  xlink:type="simple"/></disp-formula><p>where,</p><disp-formula id="scirp.61321-formula136"><label>(5)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x18.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.61321-formula137"><label>(6)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x19.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.61321-formula138"><label>(7)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x20.png"  xlink:type="simple"/></disp-formula><p>Theoretically, SLFNs are able to approximate any continuous target functions with enough hidden layer nodes using the randomized input weights. Give a training set<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x21.png" xlink:type="simple"/></inline-formula>, the randomized weight neural networks aim to reach the smallest training error and the smallest norm of output weights jointly.</p><disp-formula id="scirp.61321-formula139"><label>(8)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x22.png"  xlink:type="simple"/></disp-formula><p>The solution can be analytically determined by the expression below:</p><disp-formula id="scirp.61321-formula140"><label>(9)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x23.png"  xlink:type="simple"/></disp-formula><p>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x24.png" xlink:type="simple"/></inline-formula> is the Moore-Penrose generalized inverse of matrix<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x25.png" xlink:type="simple"/></inline-formula>.</p><p>The reason of using Moore-Penrose generalized inverse is that matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x26.png" xlink:type="simple"/></inline-formula> may be singular and/or be not square. The relations between <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x27.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x28.png" xlink:type="simple"/></inline-formula> include:<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x29.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x30.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x31.png" xlink:type="simple"/></inline-formula>and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x32.png" xlink:type="simple"/></inline-formula>.</p><p>In particular, when <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x33.png" xlink:type="simple"/></inline-formula> has full column rank,</p><disp-formula id="scirp.61321-formula141"><label>(10)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x34.png"  xlink:type="simple"/></disp-formula><p>And when <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x35.png" xlink:type="simple"/></inline-formula> has full row rank,</p><disp-formula id="scirp.61321-formula142"><label>. (11)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x36.png"  xlink:type="simple"/></disp-formula></sec><sec id="s3"><title>3. Mutual Information Based Feature Selection</title><sec id="s3_1"><title>3.1. Mutual Information</title><p>Information entropy can quantify the uncertainty of the random variables and scale the amount of information shared by these variables. Thus, it has been widely used in many fields. The entropy can be represented as:</p><disp-formula id="scirp.61321-formula143"><label>(12)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x37.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x38.png" xlink:type="simple"/></inline-formula>is the margin probability density.</p><p>Mutual information (MI) can measure the mutual dependence of two variables, which is defined as:,</p><disp-formula id="scirp.61321-formula144"><label>(13)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x39.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x40.png" xlink:type="simple"/></inline-formula>is the joint probability density, and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x40.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x41.png" xlink:type="simple"/></inline-formula> is the conditional entropy at <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x40.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x42.png" xlink:type="simple"/></inline-formula> is known, which is calculated as</p><disp-formula id="scirp.61321-formula145"><label>(14)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x43.png"  xlink:type="simple"/></disp-formula><p>For the continuous random variables,</p><disp-formula id="scirp.61321-formula146"><label>(15)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x44.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.61321-formula147"><label>(16)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x45.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.61321-formula148"><label>(17)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x46.png"  xlink:type="simple"/></disp-formula></sec><sec id="s3_2"><title>3.2. Simple Feature Selection Based on Mutual Information</title><p>Mutual information feature select (MIFS) algorithm can be described as: calculate MI values between each input feature and output variable, then select the input features with the bigger MI values and penalize the others features have the bigger MI values with the selected features, and obtain the best input feature sub-set using the greedy search method [<xref ref-type="bibr" rid="scirp.61321-ref14">14</xref>]. This method is time-consuming for select features from high dimensional data.</p><p>A simple method based on MI is: 1) Calculate MI values between each input feature and output variables; 2) Given a pre-set threshold value of the MI based on prior knowledge; 3) The features with higher MI values than the threshold are selected. How to select the optimal pre-threshold value is an open question.</p></sec></sec><sec id="s4"><title>4. MI Based on Modified Randomized Weights Neural Networks</title><p>The proposed MI based modified randomized weights neural networks model are shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p><p>As shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>, after obtain the mapping outputs of the hidden layer nodes<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x47.png" xlink:type="simple"/></inline-formula>, ∙∙∙ , <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x47.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x48.png" xlink:type="simple"/></inline-formula>, the MI values between these outputs and predicted variable are calculated with:</p><disp-formula id="scirp.61321-formula149"><label>(18)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x49.png"  xlink:type="simple"/></disp-formula><p>Given that pre-set threshold value<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x50.png" xlink:type="simple"/></inline-formula>, the following equation is used to select hidden layer’s outputs:</p><disp-formula id="scirp.61321-formula150"><label>(19)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x51.png"  xlink:type="simple"/></disp-formula><p>We denote these hidden layer’s outputs with <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x52.png" xlink:type="simple"/></inline-formula> as:</p><disp-formula id="scirp.61321-formula151"><label>(20)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x53.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x54.png" xlink:type="simple"/></inline-formula>is the number of the selected hidden layer’s outputs.</p><p>Therefore, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x55.png" xlink:type="simple"/></inline-formula>has less randomization than that of the original<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x55.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x56.png" xlink:type="simple"/></inline-formula>. Output weights are also computed using the Moore-Penrose method with:</p><disp-formula id="scirp.61321-formula152"><label>(21)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x57.png"  xlink:type="simple"/></disp-formula><p>Consideration problem of the learning parameters’ selection, the MI based randomized weights algorithms can be represented as the following optimization problem:</p><disp-formula id="scirp.61321-formula153"><label>(22)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/61321x58.png"  xlink:type="simple"/></disp-formula><p>Some intelligent optimization methods can be used to address this problem.</p></sec><sec id="s5"><title>5. Application on Modeling Concrete Compressive Strength</title><p>Concrete compressive strength data obtained by the experimental studies of the group led by I.C. Yeh in Taiwan</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> MI based modified randomized weights neural networks model</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/61321x59.png"/></fig><p>Chung Hua University [<xref ref-type="bibr" rid="scirp.61321-ref15">15</xref>]. This dataset contains 1030 samples, each sample has nine columns. The first 7 columns are the input parameters, namely cement, blast furnace slag, fly ash, water, super plasticizer, coarse aggregate and fine aggregate in concrete per cubic content of the various ingredients of concrete placement. The eighth column is conserved days, and the last column is concrete compressive strength.</p><p>Given that L = 300, the MI values between hidden layer’s outputs and predicted variable are shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><p><xref ref-type="fig" rid="fig2">Figure 2</xref> shows that the maximum MI value is almost 10 times than that of the minimum value. Thus, the hidden layer’s outputs are not stability. It is needed to select outputs with high MI values.</p><p>The original randomized weights algorithm and MI based modified version are compared with different hidden nodes’ number and different MI pre-set threshold values. In order to overcome the randomization of the initial weights, the mean root mean square errors (MRMSEs) with repeated 100 times are used to estimate the model’s prediction accuracy. Statistical results are shown in <xref ref-type="table" rid="table1">Table 1</xref>.</p><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> MI values between hidden layer’s outputs and predicted variable</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/61321x60.png"/></fig><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Statistical results (MRMSEs) of different learning parameters with repeated 100 times</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x61.png" xlink:type="simple"/></inline-formula> L</th><th align="center" valign="middle"  rowspan="2"  >Original method (MRMSEs)</th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="7"  >MI based modified method (MRMSEs,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x62.png" xlink:type="simple"/></inline-formula>) with different <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x63.png" xlink:type="simple"/></inline-formula></th></tr></thead><tr><td align="center" valign="middle" >0.1*<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x64.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" >0.2*<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x65.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" >0.3*<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x66.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" >0.4*<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x67.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" >0.5*<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x68.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" >0.6*<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x69.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" >0.7*<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x70.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x71.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >L = 10</td><td align="center" valign="middle" >12.37</td><td align="center" valign="middle" >(12.41, 10)</td><td align="center" valign="middle" >(12.27, 9.8)</td><td align="center" valign="middle" >(12.57, 9.1)</td><td align="center" valign="middle" >(13.97, 6.6)</td><td align="center" valign="middle" >(14.76, 4.9)</td><td align="center" valign="middle" >(16.42, 3.1)</td><td align="center" valign="middle" >--</td><td align="center" valign="middle" >0.2703</td></tr><tr><td align="center" valign="middle" >L = 20</td><td align="center" valign="middle" >10.36</td><td align="center" valign="middle" >(10.25, 20)</td><td align="center" valign="middle" >(10.31, 19.2)</td><td align="center" valign="middle" >(10.73, 15.4)</td><td align="center" valign="middle" >(11.73, 9.85)</td><td align="center" valign="middle" >(13.60, 6.03)</td><td align="center" valign="middle" >(15.04, 3.75)</td><td align="center" valign="middle" >--</td><td align="center" valign="middle" >0.3319</td></tr><tr><td align="center" valign="middle" >L = 30</td><td align="center" valign="middle" >9.713</td><td align="center" valign="middle" >(9.675, 29.9)</td><td align="center" valign="middle" >(9.702, 28.1)</td><td align="center" valign="middle" >(10.16, 21.3)</td><td align="center" valign="middle" >(10.95, 13.3)</td><td align="center" valign="middle" >(12.33, 8.07)</td><td align="center" valign="middle" >(14.77, 4.69)</td><td align="center" valign="middle" >--</td><td align="center" valign="middle" >0.3572</td></tr><tr><td align="center" valign="middle" >L = 40</td><td align="center" valign="middle" >9.518</td><td align="center" valign="middle" >(9.444, 39.9)</td><td align="center" valign="middle" >(9.621, 37.2)</td><td align="center" valign="middle" >(9.787, 27.2)</td><td align="center" valign="middle" >(10.61, 16.2)</td><td align="center" valign="middle" >(11.78, 9.11)</td><td align="center" valign="middle" >(13.89, 5.09)</td><td align="center" valign="middle" >--</td><td align="center" valign="middle" >0.3707</td></tr><tr><td align="center" valign="middle" >L = 50</td><td align="center" valign="middle" >9.755</td><td align="center" valign="middle" >(9.799, 49.9)</td><td align="center" valign="middle" >(9.638, 46.2)</td><td align="center" valign="middle" >(9.591, 33.2)</td><td align="center" valign="middle" >(10.21, 19.3)</td><td align="center" valign="middle" >(11.21, 11.6)</td><td align="center" valign="middle" >(13.19, 6.37)</td><td align="center" valign="middle" >--</td><td align="center" valign="middle" >0.3755</td></tr><tr><td align="center" valign="middle" >L = 60</td><td align="center" valign="middle" >10.17</td><td align="center" valign="middle" >(10.12, 59.8)</td><td align="center" valign="middle" >(9.772, 53.9)</td><td align="center" valign="middle" >(9.486, 36.2)</td><td align="center" valign="middle" >(10.16, 19.7)</td><td align="center" valign="middle" >(11.26, 10.8)</td><td align="center" valign="middle" >(13.02, 6.36)</td><td align="center" valign="middle" >--</td><td align="center" valign="middle" >0.4042</td></tr><tr><td align="center" valign="middle" >L = 70</td><td align="center" valign="middle" >10.38</td><td align="center" valign="middle" >(10.71, 69.8)</td><td align="center" valign="middle" >(10.14, 62.9)</td><td align="center" valign="middle" >(9.453, 41.8)</td><td align="center" valign="middle" >(9.973, 23.1)</td><td align="center" valign="middle" >(11.02, 12.83)</td><td align="center" valign="middle" >(12.14, 7.19)</td><td align="center" valign="middle" >--</td><td align="center" valign="middle" >0.4050</td></tr><tr><td align="center" valign="middle" >L = 80</td><td align="center" valign="middle" >11.18</td><td align="center" valign="middle" >(11.33, 79.8)</td><td align="center" valign="middle" >(10.62, 70.5)</td><td align="center" valign="middle" >(9.625, 44.8)</td><td align="center" valign="middle" >(9.738, 24.1)</td><td align="center" valign="middle" >(10.79, 13.0)</td><td align="center" valign="middle" >(12.68, 7.15)</td><td align="center" valign="middle" >--</td><td align="center" valign="middle" >0.4185</td></tr><tr><td align="center" valign="middle" >L = 90</td><td align="center" valign="middle" >12.48</td><td align="center" valign="middle" >(12.22, 89.7)</td><td align="center" valign="middle" >(11.24, 80.2)</td><td align="center" valign="middle" >(9.634, 53.3)</td><td align="center" valign="middle" >(9.736, 27.93)</td><td align="center" valign="middle" >(10.56, 15.28)</td><td align="center" valign="middle" >(11.50, 8.96)</td><td align="center" valign="middle" >(13.72, 4.82)</td><td align="center" valign="middle" >0.4113</td></tr><tr><td align="center" valign="middle" >L = 100</td><td align="center" valign="middle" >13.30</td><td align="center" valign="middle" >(13.06, 100)</td><td align="center" valign="middle" >(12.88, 99.0)</td><td align="center" valign="middle" >(12.20, 89.6)</td><td align="center" valign="middle" >(10.47, 70.5)</td><td align="center" valign="middle" >(9.885, 48.5)</td><td align="center" valign="middle" >(9.614, 32.5)</td><td align="center" valign="middle" >(10.00, 22.6)</td><td align="center" valign="middle" >0.4186</td></tr><tr><td align="center" valign="middle" >L = 200</td><td align="center" valign="middle" >535.1</td><td align="center" valign="middle" >(488.6, 199)</td><td align="center" valign="middle" >(47.26, 1.65)</td><td align="center" valign="middle" >(12.38, 94.0)</td><td align="center" valign="middle" >(9.678, 46.92)</td><td align="center" valign="middle" >(10.01, 23.87)</td><td align="center" valign="middle" >(10.95, 12.51)</td><td align="center" valign="middle" >(12.46, 5.99)</td><td align="center" valign="middle" >0.4612</td></tr><tr><td align="center" valign="middle" >L = 300</td><td align="center" valign="middle" >167.5</td><td align="center" valign="middle" >(166.4, 2.98)</td><td align="center" valign="middle" >(253.1, 240)</td><td align="center" valign="middle" >(20.24, 128)</td><td align="center" valign="middle" >(10.69, 15.53)</td><td align="center" valign="middle" >(9.646, 31.5)</td><td align="center" valign="middle" >(10.51, 15.4)</td><td align="center" valign="middle" >(12.05, 7.33)</td><td align="center" valign="middle" >0.4798</td></tr><tr><td align="center" valign="middle" >L = 400</td><td align="center" valign="middle" >132.2</td><td align="center" valign="middle" >(135.3, 397)</td><td align="center" valign="middle" >(156.7, 314)</td><td align="center" valign="middle" >(58.05, 167)</td><td align="center" valign="middle" >(11.66, 81.5)</td><td align="center" valign="middle" >(9.874, 41.2)</td><td align="center" valign="middle" >(10.24, 19.9)</td><td align="center" valign="middle" >(11.26, 9.53)</td><td align="center" valign="middle" >0.4854</td></tr><tr><td align="center" valign="middle" >L = 500</td><td align="center" valign="middle" >121.4</td><td align="center" valign="middle" >(118.1, 496)</td><td align="center" valign="middle" >(379.7, 382)</td><td align="center" valign="middle" >(512.9, 197)</td><td align="center" valign="middle" >(12.56, 93.6)</td><td align="center" valign="middle" >(9.957, 45.1)</td><td align="center" valign="middle" >(10.16, 21.6)</td><td align="center" valign="middle" >(10.95, 11.0)</td><td align="center" valign="middle" >0.5070</td></tr></tbody></table></table-wrap><p><xref ref-type="table" rid="table1">Table 1</xref> shows that: 1) The maximum MI values based on different learning parameters between hidden layer’s output and predicted variable increase with the number of the hidden nodes; 2) All smallest prediction errors with different learning parameters (L,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/61321x72.png" xlink:type="simple"/></inline-formula>) occur with L = 30 - 40; Thus, it may be the best range for this benchmark dataset; 3) The biggest prediction errors occur at about L = 200. The reason may be relate to the Moore-Penrose method; 4) The prediction performance isn’t much improved with the modified approach with L = 40. However, with the other L values, the prediction performance can be improved much with suitable MI pre- set threshold value. Therefore, the largest prediction error problems at L = 200 can be avoided with the MI based modified approach. Thus, the proposed method has better robustness than that of the original randomized weighting algorithm.</p></sec><sec id="s6"><title>6. Conclusion</title><p>This paper proposes new mutual information based randomized weights neural networks. Input weights of the hidden layer are produced randomly as normal randomized algorithm. Not all the outputs of the hidden layer are used to compute output weights. Mutual information based simple feature selection method is used to select hidden layer’s outputs. These selected outputs are used to compute weights of hidden layer with pseudo inverse method. Concrete compressive strength benchmark dataset is used to validate this method. More researches will address some theoretically analysis and to validate this idea with more benchmark datasets.</p></sec><sec id="s7"><title>Acknowledgements</title><p>The research was sponsored by the post doctoral National Natural Science Foundation of China (2013M532118, 2015T81082), National Natural Science Foundation of China (61573364, 61273177), State Key Laboratory of Synthetical Automation for Process Industries, China National 863 Projects (2015AA043802).</p></sec><sec id="s8"><title>Cite this paper</title><p>Jian Tang,Zhiwei Wu,Meiying Jia,Zhuo Liu, (2015) Mutual Information-Based Modified Randomized Weights Neural Networks. Journal of Computer and Communications,03,191-197. doi: 10.4236/jcc.2015.311030</p></sec></body><back><ref-list><title>References</title><ref id="scirp.61321-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Shang, C., Yang, F., Huang, D.X. and Lu, W.X. (2014) Data-Driven Soft Sensor Development Based on Deep Learning. Journal of Process Control, 24, 223-233. http://dx.doi.org/10.1016/j.jprocont.2014.01.012</mixed-citation></ref><ref id="scirp.61321-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Pao, Y.H. and Takefuji, Y. (1992) Functional-Link Net Computing, Theory, System Architecture, and Functionalities. IEEE Computer, 25, 76-79. http://dx.doi.org/10.1109/2.144401</mixed-citation></ref><ref id="scirp.61321-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Igelnik, B. and Pao, Y.H. (1995) Stochastic Choice of Basis Functions in Adaptive Function Approximation and the Functional-Link Net. IEEE Trans. Neural Network, 6, 1320-1329. http://dx.doi.org/10.1109/72.471375</mixed-citation></ref><ref id="scirp.61321-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Huang, G.B., Chen, L. and Siew, C.K. (2006) Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes. IEEE Transactions on Neural Networks, 17, 879-892.  
http://dx.doi.org/10.1109/TNN.2006.875977</mixed-citation></ref><ref id="scirp.61321-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Tapson, J. and Schaik, A.V. (2013) Learning the Pseudoinverse So-lution to Network Weights. Neural Networks, 45, 94-100. http://dx.doi.org/10.1016/j.neunet.2013.02.008</mixed-citation></ref><ref id="scirp.61321-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Alhamdoosh, M. and Wang, D.H. (2014) Fast Decorrelated Neural Network Ensembles with Random Weights. Information Sciences, 264, 104-117. http://dx.doi.org/10.1016/j.ins.2013.12.016</mixed-citation></ref><ref id="scirp.61321-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Bartlett, P.L. (1997) For Valid Generalization, the Size of the Weights Is More Important Than the Size of the Network. IEEE Conference on Neural Information Processing Systems, MIT Press, Cambridge, 134-140.</mixed-citation></ref><ref id="scirp.61321-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Bengio, Y., Lamblin, P., Popovici, D. and Larochelle, H. (2007) Greedy Layer-Wise Training of Deep Networks. Advances in Neural Information Processing Systems, MIT Press, Cambridge, 153-160.</mixed-citation></ref><ref id="scirp.61321-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">de la Rosa, E. and Yu, W. (2015) Nonlinear System Identification Using Deep Learning and Randomized Algorithms. IEEE International Conference on Information and Automation (ICIA2015), Lijing, 274-279. 
http://dx.doi.org/10.1109/ICInfA.2015.7279298</mixed-citation></ref><ref id="scirp.61321-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Liu, H.W., Sun, J.G., Liu, L. and Zhang, H.J. (2009) Feature Se-lection with Dynamic Mutual Information. Pattern Recognition, 42, 1330-1339. http://dx.doi.org/10.1016/j.patcog.2008.10.028</mixed-citation></ref><ref id="scirp.61321-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Peng, H.C., Long, F.H. and Ding, C. (2005) Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226-1238. http://dx.doi.org/10.1109/TPAMI.2005.159</mixed-citation></ref><ref id="scirp.61321-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Tan, C. and Li, M.L. (2008) Mutual Information-Induced Interval Selection Combined with Kernel Partial Least Squares for Near-Infrared Spectral Calibration. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 71, 1266-1273. http://dx.doi.org/10.1016/j.saa.2008.03.033</mixed-citation></ref><ref id="scirp.61321-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Tang, J., Chai, T.Y., Yu, W. and Zhao, L.J. (2012) Feature Extraction and Selection Based on Vibration Spectrum with Application to Estimate the Load Parameters of Ball Mill in Grinding Process. Control Engineering Practice, 20, 991- 1004. http://dx.doi.org/10.1016/j.conengprac.2012.03.020</mixed-citation></ref><ref id="scirp.61321-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Battiti, R. (1994) Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Transaction on Neural Network, 5, 537-550. http://dx.doi.org/10.1109/72.298224</mixed-citation></ref><ref id="scirp.61321-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Yeh, I.C. (1998) Modeling of Strength of High Performance Concrete Using Artificial Neural Networks. Cement and Concrete Research, 28, 1797-1808. http://dx.doi.org/10.1016/S0008-8846(98)00165-3</mixed-citation></ref></ref-list></back></article>