<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JSIP</journal-id><journal-title-group><journal-title>Journal of Signal and Information Processing</journal-title></journal-title-group><issn pub-type="epub">2159-4465</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jsip.2012.32026</article-id><article-id pub-id-type="publisher-id">JSIP-19565</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Complex Valued Recurrent Neural Network: From Architecture to Training
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>lexey</surname><given-names>Minin</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Alois</surname><given-names>Knoll</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hans-Georg</surname><given-names>Zimmermann</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib></contrib-group><aff id="aff2"><addr-line>Siemens AG, Corporate Technology, München, Germany</addr-line></aff><aff id="aff1"><addr-line>Technische Universit?t München—Robotics and Embedded Systems, München, Germany</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>alexey.minin@gmail.com(LM)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>30</day><month>05</month><year>2012</year></pub-date><volume>03</volume><issue>02</issue><fpage>192</fpage><lpage>197</lpage><history><date date-type="received"><day>December</day>	<month>7th,</month>	<year>2011</year></date><date date-type="rev-recd"><day>February</day>	<month>11th,</month>	<year>2012</year>	</date><date date-type="accepted"><day>March</day>	<month>18th,</month>	<year>2012</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Recurrent Neural Networks were invented a long time ago, and dozens of different architectures have been published. In this paper we generalize recurrent architectures to a state space model, and we also generalize the numbers the network can process to the complex domain. We show how to train the recurrent network in the complex valued case, and we present the theorems and procedures to make the training stable. We also show that the complex valued recurrent neural network is a generalization of the real valued counterpart and that it has specific advantages over the latter. We conclude the paper with a discussion of possible applications and scenarios for using these networks.
 
</p></abstract><kwd-group><kwd>Complex Valued Neural Networks; Complex Valued System Identification; Recurrent Neural Networks; Complex Valued Recurrent Neural Networks</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Current paper aims to give the complete guidance from the state space models with complex parameters to the complex valued recurrent neural network of a special type. This paper is unique in translating the models suggested by Zimmermann in [<xref ref-type="bibr" rid="scirp.19565-ref1">1</xref>] to the complex valued case. Moreover one can see unique approach for managing the problems with transition functions which arise in complex-valued case, new approach for treating the error function which gives the unique advantages for the complex-valued neural networks. A lot of research in the area of complex valued recurrent neural networks is currently ongoing. One can find the works of Mandic [2,3], Adali [<xref ref-type="bibr" rid="scirp.19565-ref4">4</xref>] and Dongpo [<xref ref-type="bibr" rid="scirp.19565-ref5">5</xref>]. Mandic and Adali pointed out the advantages of using the complex valued neural networks in many papers. This paper will supply the neural network community with new architecture which shows better results in its complex-valued case.</p><p>We start the paper with the description on the state space models and then proceed with the very detailed explanations regarding the complex valued neural networks. We discuss complex valued system identification, error function properties in its complex valued case, complex valued back-propagation and break points with transition functions. Paper ends up with the small discussion on applications and advantages which arise form the complex valued case of the considered architecture.</p><p>State space techniques may be used to model recurrent dynamical systems. There are two principle ways of modeling dynamical systems: 1) use a feed-forward neural network and use delayed inputs or 2) use a recurrent architecture and model the dynamics itself. The first approach is based on Takens theorem [<xref ref-type="bibr" rid="scirp.19565-ref6">6</xref>] that a dynamical system or the attractor of the dynamical system can be reconstructed by a set of previous values of the realizations of the dynamical system (expectations). This is true for chaotic systems, but in real world applications feed forward networks cannot be used for forecasting the states of dynamical systems. Therefore, recurrent architectures are the only sensible way of forecasting dynamical systems, i.e. to represent the dynamics in the recurrent connection weights. This approach was first suggested by Elman [<xref ref-type="bibr" rid="scirp.19565-ref7">7</xref>] and later extended by Zimmermann [<xref ref-type="bibr" rid="scirp.19565-ref8">8</xref>]. As an example for this paper we will consider the so called open-system for which we will build a state space model based on the recurrent complex valued neural network. Such an open-system (open means that the system is driven not only by its internal state changes but also by external stimuli) is given as follows:</p><disp-formula id="scirp.19565-formula150297"><label>(1)</label><graphic position="anchor" xlink:href="9-3400165\1aee237f-8cdc-4624-84ed-e705a654e75f.jpg"  xlink:type="simple"/></disp-formula><p>Here, the states of the system (<img src="9-3400165\1db41d6a-976f-45a1-a9ea-3e476bf6a546.jpg" />) depend on the previous states as well as some system input <img src="9-3400165\b66a19e6-33d5-48ba-8dda-a7ee0b0cb330.jpg" /> through some non-linear function f. The output of the system depends on the current state of the system mapped through another non-linear function g. A graphical representation is in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p><p>In the rest of this paper, we will use networks described by Equation (1) and <xref ref-type="fig" rid="fig1">Figure 1</xref>. In order to generalize the approach, we now assume that the dynamic system’s behavior is described by complex numbers, which means that <img src="9-3400165\bfa859bb-6711-4a89-9762-49865d89ad6c.jpg" /> and functions <img src="9-3400165\514b6648-6bdf-4a45-9f86-13b23e8b0e91.jpg" /> are defined on the domain of complex numbers.</p></sec><sec id="s2"><title>2. Complex Valued Neural Network</title><p>The Complex Valued Recurrent Neural Network (further CVRNN) is a straight forward generalization of the realvalued RNN. The algorithms which are used for CVRNNs can be also used for RNNs without loss of generality. To describe the CVRNN we start with a feed-forward path, and then we will discuss the error back-propagation algorithm (further CVEBP) and the training of such architectures.</p><sec id="s2_1"><title>2.1. Architecture Description and Feed Forward Path</title><p>The system represented by <xref ref-type="fig" rid="fig1">Figure 1</xref> and Equation (1) can be realized as follows (as suggested by Zimmermann [<xref ref-type="bibr" rid="scirp.19565-ref9">9</xref>]): consider a set of 3-layer-feed-forward networks (further FFNN), whose hidden layers are connected to each other. This connection represents the evolution of the corresponding dynamical system inside the RNN. The structure of this type of network is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><p>The dynamical system develops based on 1) internal evolution of the system state governed by the matrix A and the activation function. Matrices B, C convert the external stimulus to the state (in the sense of data compression) and produce the output from the state (state decompression). Therefore, one can write the following system of equations, which describe the system in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><disp-formula id="scirp.19565-formula150298"><label>(2)</label><graphic position="anchor" xlink:href="9-3400165\98ed6b42-59f0-4b7a-8f2c-c71e583d14a4.jpg"  xlink:type="simple"/></disp-formula><p>where we have selected <img src="9-3400165\8ae4eb94-a977-4721-9b6b-995f4df1b95e.jpg" /> as an activation function<img src="9-3400165\b52a7533-a54d-4c17-9d80-db147cda06c8.jpg" />, which performs the non-linear transformation of the state. Thus, all temporal relations of the dynamical system are represented in the matrix <img src="9-3400165\4d389b60-6e79-45c1-94e9-b2ebf0e22b03.jpg" /> (to be learned during training), the compression, and the decompression ability represented by the matrixes B, C respectively (note that all elements of matrixes <img src="9-3400165\bc5a20fb-33b1-415a-a35c-b8a3f977b311.jpg" /> are complex numbers).</p><p>One word about the weights matrices: the matrices between the layers are always the same (suggested as the “shared weights concept” in [<xref ref-type="bibr" rid="scirp.19565-ref8">8</xref>]). It is exactly this property that makes this network recurrent. Next, we will discuss the back propagation for the shared weights concept.</p><p>Summarizing, the RNN has actuation inputs<img src="9-3400165\bec384f1-f9f8-4ba1-aa81-d056969bc130.jpg" />; it has observable outputs<img src="9-3400165\b0445534-7495-47f6-9d55-129c6e3c337f.jpg" />, it has states<img src="9-3400165\3011ef89-df70-4322-b37a-ab64cbad9b3f.jpg" />, which evolve under the regime of the matrix<img src="9-3400165\fc7dc330-d86f-4c92-8aee-cda152803484.jpg" />, and it has a non-linear activation function<img src="9-3400165\402be428-79cf-4051-a593-e0695eb66255.jpg" />.</p></sec><sec id="s2_2"><title>2.2. Error Back Propagation for the CVRNN</title><p>The first variant of Complex Valued Error Back Propagation was described by Haykin in [<xref ref-type="bibr" rid="scirp.19565-ref7">7</xref>]. First, we have to define the error function. Since, in the complex valued case there are no “greater/less than” relations, the output of the error function must be a real number in order to make it possible to evaluate the training result and to guide it into the direction of an error reduction. The procedure of the whole network training is as follows: find the network parameters, which are those weights that produce the minimum of the error function:</p><disp-formula id="scirp.19565-formula150299"><label>(3)</label><graphic position="anchor" xlink:href="9-3400165\be5a3f26-02ca-40df-930d-6247d3eea30a.jpg"  xlink:type="simple"/></disp-formula><p>where w are weights, f is the activation function, u are the complex valued inputs of the system, and <img src="9-3400165\90c796c4-bf36-42fb-98d7-45530dc94b54.jpg" /> are observables.</p><p>One class of functions, which produces real-valued output from complex arguments, is the following:</p><disp-formula id="scirp.19565-formula150300"><label>(4)</label><graphic position="anchor" xlink:href="9-3400165\f37a6a6a-cd9b-4daa-85f0-0f4b176e5265.jpg"  xlink:type="simple"/></disp-formula><p>where the over bar denotes complex value conjugate<img src="9-3400165\1db0d80a-c5f6-42b7-a3a8-aaaace0036b0.jpg" />.</p><p>This current error function is not analytic, i.e., the derivative <img src="9-3400165\d6cfa023-4bb3-4379-ac00-782ddaa5745a.jpg" /> is not defined over the entire range of input values. Therefore, back propagation cannot be applied.</p><p>The requirement for the analyticity of any function E is given by the Cauchy-Riemann conditions:</p><disp-formula id="scirp.19565-formula150301"><label>(5)</label><graphic position="anchor" xlink:href="9-3400165\0addda34-dfaa-4c55-9f93-8500394d373c.jpg"  xlink:type="simple"/></disp-formula><p>where the function E is described with the following equation:</p><disp-formula id="scirp.19565-formula150302"><label>(6)</label><graphic position="anchor" xlink:href="9-3400165\253b6df3-d869-4769-a283-0b798d000cba.jpg"  xlink:type="simple"/></disp-formula><p>where <img src="9-3400165\7cb3b3c2-9c59-47ea-8ff4-25599f403164.jpg" /> are some real differentiable functions of two real variables.</p><p>The requirement for the error function to produce real output means that<img src="9-3400165\c6f80b37-85a5-4a26-828a-729fd766b3e2.jpg" />.</p><disp-formula id="scirp.19565-formula150303"><label>(7)</label><graphic position="anchor" xlink:href="9-3400165\1268b6f9-051a-4082-8d90-d71c177c18a5.jpg"  xlink:type="simple"/></disp-formula><p>If we want<img src="9-3400165\ff5218e5-78f6-4a9d-86b1-7b3a35ebf2ef.jpg" />, we have to take an error function similar to (4) since our error function<img src="9-3400165\b1a2f087-5538-47cd-b5dc-21d1fc7b50ed.jpg" />, the optimality conditions are given by:</p><disp-formula id="scirp.19565-formula150304"><label>(8)</label><graphic position="anchor" xlink:href="9-3400165\a8e21bfb-9e86-4ba0-ab4f-296e39a1197b.jpg"  xlink:type="simple"/></disp-formula><p>The function <img src="9-3400165\81c926a6-6937-46b7-bc85-a449b4435765.jpg" /> makes a mapping of the following type: <img src="9-3400165\c5937c38-ee10-4ccf-b170-e5174cc5ac8c.jpg" />instead of<img src="9-3400165\5b50c109-e4c0-4b76-ad03-a2fa890aefbf.jpg" />. In order to calculate the derivative of the function <img src="9-3400165\6939e811-2ff5-4c8a-b21e-52a95d80adc6.jpg" /> one should use the so called Wirtinger derivative (discussed, e.g., in Brandwood [<xref ref-type="bibr" rid="scirp.19565-ref10">10</xref>]): the Wirtinger derivative with respect to z and <img src="9-3400165\b609eec4-5f92-49a1-a303-18198261e25f.jpg" /> can be calculated in the following way:</p><disp-formula id="scirp.19565-formula150305"><label>(9)</label><graphic position="anchor" xlink:href="9-3400165\22a542c6-48d9-408e-ae4a-e32900576496.jpg"  xlink:type="simple"/></disp-formula><p>For the real functions of complex variables <img src="9-3400165\fbad3ffd-3894-4786-b9ff-84429e40016c.jpg" /> therefore the minimization of the error function can be done in both directions <img src="9-3400165\463f2838-52f5-4f5d-9545-a43c561364cb.jpg" /> or<img src="9-3400165\e6317a67-82a5-4d6a-ab3a-12ae7d192958.jpg" />.</p><p>Now we have the derivatives of the error function defined in the Wirtinger sense.</p><p>Note that this error function “minimizes” the complex number, which in the Euler notation (see Equation (10)) would mean, that it minimizes both amplitude and phase of the complex number, which is in our case <img src="9-3400165\180e2d53-9e5e-49ca-9ef5-407d0ba908b5.jpg" />:</p><disp-formula id="scirp.19565-formula150306"><label>(10)</label><graphic position="anchor" xlink:href="9-3400165\16b7c53c-d442-4f80-8838-7a19589e6721.jpg"  xlink:type="simple"/></disp-formula><p>This error function has very unique and desirable properties. Let us describe these properties more in detail. We rewrite (4) into Euler notation:</p><disp-formula id="scirp.19565-formula150307"><label>(11)</label><graphic position="anchor" xlink:href="9-3400165\c75ffe8a-facc-4bf1-a0da-1ea2a9ae7412.jpg"  xlink:type="simple"/></disp-formula><p>The discriminant of (11) is negative and only can be equal to zero that the equation has 1 root:</p><p><img src="9-3400165\319fd94d-00da-41e3-ba79-96928f9f74d4.jpg" /></p><p>We can also rewrite (4) in the following way:</p><disp-formula id="scirp.19565-formula150308"><label>(12)</label><graphic position="anchor" xlink:href="9-3400165\31ccc9e7-686d-4de5-95c3-ae97bafe9f0f.jpg"  xlink:type="simple"/></disp-formula><p>One can see that error function (4) minimizes both real and imaginary parts of the complex number.</p><p>After defining a suitable error-function, we can now start with the CVEBP description. The procedure for CVEBP is shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>. It follows the description in [<xref ref-type="bibr" rid="scirp.19565-ref9">9</xref>] or the RNN. The “ladder” algorithm allows a local and efficient computation of the recurrent network partial derivatives of the error with respect to the weights. The advantage of the algorithm shown in <xref ref-type="fig" rid="fig3">Figure 3</xref> is that it intelligently unites the equations, the architecture and the locality of the CVEBP.</p><p>In <xref ref-type="fig" rid="fig3">Figure 3</xref>, one can see the CVEBP which is done for the shared matrix <img src="9-3400165\eb23b9bc-04a1-452b-9f8c-611a6672fb61.jpg" /> and for the case when all NN parameters are complex numbers.</p></sec><sec id="s2_3"><title>2.3. Weights Update Rule for the CVRNN</title><p>In order to find the training rule for the weights update, we introduce the Taylor expansion of the error function:</p><disp-formula id="scirp.19565-formula150309"><label>(13)</label><graphic position="anchor" xlink:href="9-3400165\5cecdcfa-7c16-4ebd-906b-5cf30fb72e89.jpg"  xlink:type="simple"/></disp-formula><p>where (one can note that <img src="9-3400165\4d7c481e-773b-4abe-b4ec-30495e6c5e79.jpg" /> has to be equal to the <img src="9-3400165\e6462648-54e3-41e5-95d1-d6af66ed15f8.jpg" /></p><p>that Taylor expansion exist):</p><disp-formula id="scirp.19565-formula150310"><label>(14)</label><graphic position="anchor" xlink:href="9-3400165\8b2f84ee-9a25-45df-a875-63eff97a2238.jpg"  xlink:type="simple"/></disp-formula><p>Following Johnson in his paper [<xref ref-type="bibr" rid="scirp.19565-ref11">11</xref>], two useful theorems to calculate the derivatives can be applied.</p><p>Theorem 1. If the function <img src="9-3400165\d3240692-01b9-408d-947b-b7d14b852ced.jpg" /> is real-valued and analytic with respect to <img src="9-3400165\efd10586-c195-4cde-83b9-d41ac44e406e.jpg" /> or<img src="9-3400165\6bc7fe0a-7f65-462e-abd0-6f67fc2dbcd2.jpg" />, all stationary points can be found by setting the derivatives in Equation (9) with respect to either <img src="9-3400165\2c561a65-27be-46a4-a4b7-ad75492a0837.jpg" /> or <img src="9-3400165\6cf2f0eb-b601-4085-bf68-04fb7480bfcf.jpg" /> to zero.</p><p>Theorem 2. By treating <img src="9-3400165\8ac46e0c-732d-4c9c-b128-c1098211ef9d.jpg" /> and <img src="9-3400165\456f07b0-774a-4108-a93a-27b19d3761da.jpg" /> as independent variables, the quantity pointing in the direction of the maximum rate of change of <img src="9-3400165\b47970c1-1c56-408a-be78-70d757e6cf9d.jpg" /> is <img src="9-3400165\ce703c68-aa26-4955-a95f-6c71e85a2ccb.jpg" /></p><p>The proof of the theorems was demonstrated by Johnson in [<xref ref-type="bibr" rid="scirp.19565-ref11">11</xref>].</p><p>Following Karla in [<xref ref-type="bibr" rid="scirp.19565-ref12">12</xref>] and Adali in [<xref ref-type="bibr" rid="scirp.19565-ref13">13</xref>], if minimization goes in the direction of<img src="9-3400165\0ebba4c4-77e0-4af5-87c7-00dc0c95e066.jpg" />, then <img src="9-3400165\285b308f-b33d-4e39-b6bb-e27f0f5dbb74.jpg" />. Otherwise, if we minimize in the direction of<img src="9-3400165\88f04317-f89e-483e-b07a-33d1276dbc4d.jpg" />, it results in <img src="9-3400165\47809ba8-24d4-44d7-aa3a-98e5f7b8345b.jpg" /> which need not necessarily be negative. This will lead us in the direction of a different minimization.</p><p>Following Theorem 2 and Equation (7), we consider</p><p><img src="9-3400165\006a5a24-57ca-4ab6-bfd8-f2d7520218c2.jpg" />. The Taylor expansion exists, since the derivatives are defined and we can obtain a training rule for the optimization of weights in the direction of<img src="9-3400165\b3206138-0da5-44aa-9454-263c0ed9d79e.jpg" />:</p><disp-formula id="scirp.19565-formula150311"><label>(15)</label><graphic position="anchor" xlink:href="9-3400165\becec072-6211-4cf1-92f0-4870b7a8ec60.jpg"  xlink:type="simple"/></disp-formula><p>Notice that <xref ref-type="fig" rid="fig3">Figure 3</xref> is very similar to the real valued RNN, despite the conjugations instead of the transposes.</p><p>One should also note that this error function is universal because it optimizes both the real and the imaginary part of the complex number. It has a simple derivative, and it is a parabola, which means it has only one minimum and smooth bounds. A typical convergence of the error during the training is presented in <xref ref-type="fig" rid="fig4">Figure 4</xref>.</p><p>Note that this error function is a real value. <xref ref-type="fig" rid="fig4">Figure 4</xref> shows the modulus, i.e., exactly the error function, the angle error is:</p><disp-formula id="scirp.19565-formula150312"><label>(16)</label><graphic position="anchor" xlink:href="9-3400165\b7809392-1025-4eb8-8b73-8922f9b37b0d.jpg"  xlink:type="simple"/></disp-formula><p>After presenting the CVEBP and discussing the convergence of the error, we now discuss the final aspect of the CVRNN, which is the activation (or transition) function.</p></sec><sec id="s2_4"><title>2.4. Activation Function in the Complex Valued Domain</title><p>It is well known that for real valued networks, one of the requirements for the activation function is to be continuous (ideally: bounded), and it should have at least one derivative defined for the whole search space.</p><p>Unfortunately, this is not the case for the complex valued functions due to the Liouville theorem [<xref ref-type="bibr" rid="scirp.19565-ref10">10</xref>]. Moreover, all transition functions which are not linear have an unlimited growth at their bounds (example: sine-function) or have singularity points an (example is the tanh-function, see <xref ref-type="fig" rid="fig5">Figure 5</xref> below).</p><p>Based on the following Theorem 3, we can make several remedies:</p><p>Liouville Theorem 3. If a complex analytic function is bounded and complex differentiable on the whole complex plain, it is constant.</p><p>This theorem has been proven in Remmert [<xref ref-type="bibr" rid="scirp.19565-ref14">14</xref>].</p><p>Remedy 1. Choose bounded functions which are only real valued but not complex differentiable:</p><disp-formula id="scirp.19565-formula150313"><label>(17)</label><graphic position="anchor" xlink:href="9-3400165\23113a94-382c-4c5c-aabd-ddd660a59114.jpg"  xlink:type="simple"/></disp-formula><p>Remedy 2. Constrain the optimization procedure in order to stay in the area, where there are no singularities:</p><disp-formula id="scirp.19565-formula150314"><label>(18)</label><graphic position="anchor" xlink:href="9-3400165\07cff2cf-4d12-43e0-bcbd-78b13e2e65a7.jpg"  xlink:type="simple"/></disp-formula><p>Remedy 3. All real analytic functions are differentiable in complex domain using the Wirtinger Calculus.</p><p>One can also try to substitute the problematic regions of the functions with different functions which do not have the problem in the following region (the problem is the presence of singularity point) following the (19) below:</p><disp-formula id="scirp.19565-formula150315"><label>(19)</label><graphic position="anchor" xlink:href="9-3400165\6d30cbac-6f66-4220-996e-6827e5c26094.jpg"  xlink:type="simple"/></disp-formula><p>The result of such experiment is shown in <xref ref-type="fig" rid="fig6">Figure 6</xref> below.</p><p>Typical use of the CVRNN with the activation function <img src="9-3400165\321b1c36-f6f4-452d-bdd0-6e2b89454567.jpg" /> will be possible with the non-linear function, as long as the weights are initialized with small numbers and the error minimization goes in the correct direction (i.e., the error decreases and steps of the weight update becomes smaller as training time increases). Also, the weights do not go above 1, which means they do not approach the singularities of the function.</p></sec></sec><sec id="s3"><title>3. Summary and Outlook</title><p>In this paper we discussed several aspects of CVRNN.</p><p>We showed the architecture of the CVRNN, discussed the feed forward operation as well as the back-propagation CVEBP and the weights update rules. We discussed problems with the activation and error functions and showed how to overcome these problems.</p><p>There are many advantages of using CVRNN: continuous time modeling, modeling of electrical devices and energy grids, robust time series prediction, physical models of the brain, etc. Future work will focus on applications and evaluation of CVRNNs.</p></sec><sec id="s4"><title>REFERENCES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.19565-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">H. G. Zimmermann and R. Neuneier, “Modeling Dynamical Systems by Recurrent Neural Networks, Data Mining II,” Second International Conference on Data Mining, Cambridge, 5-7 July 2000, pp. 557-566.</mixed-citation></ref><ref id="scirp.19565-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">S.-L. Gohand and D. Mandic, “A Complex-Valued RTRL Algorithm for Recurrent Neural Networks,” Neural Computation, Vol. 16, No. 12, 2006, pp. 2699-2713.</mixed-citation></ref><ref id="scirp.19565-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">P. Mandic, “Complex Valued Recurrent Neural Networks for Noncircular Complex Signals,” International Joint Conference on Neural Networks, 14-19 June 2009, pp. 1987-1992. doi:10.1109/IJCNN.2009.5178960</mixed-citation></ref><ref id="scirp.19565-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">T. Adali and H. Li, “A Practical Formulation for Computation of Complex Gradients and its Application to Maximum Likelihood ICA,” 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 2, 2007, pp. II-633-II-636.  
doi:10.1109/ICASSP.2007.366315</mixed-citation></ref><ref id="scirp.19565-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">D. P. Xu, H. S. Zhang and L. J. Liu, “Convergence Analysis of Three Classes of Split-Complex Gradient Algorithms for Complex-Valued Recurrent Neural Networks,” Neural Computation, Vol. 22, No. 20, 2010, pp. 2655-2677. doi:10.1162/NECO_a_00021</mixed-citation></ref><ref id="scirp.19565-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">F. Takens, “Detecting Strange Attractors in Turbulence, Dynamical Systems and Turbulence,” Lecture Notes in Mathematics, Vol. 898, Springer-Verlag, New York, 1981, pp. 366-381.</mixed-citation></ref><ref id="scirp.19565-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">H. Leung and S. Haykin, “The Complex Back Propagation,” IEEE Transactions on Signal Processing, Vol. 39, No. 9, 1991, pp. 2101-2104. doi:10.1109/78.134446</mixed-citation></ref><ref id="scirp.19565-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">G. Zimmermann, A. Minin and V. Kusherbaeva, “Historical Consistent Complex Valued Recurrent Neural Network,” Lecture Notes in Computer Science, Part 1, Vol. 6791, 2011, pp. 185-192.  
doi:10.1007/978-3-642-21735-7_23</mixed-citation></ref><ref id="scirp.19565-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">H.-G. Zimmermann, A. Minin and V. Kusherbaeva, “Comparison of the Complex Valued and Real Valued Neural Networks Trained with Gradient Descent and Random Search Algorithms,” 19th European Symposium on Artificial Neural Networks, Bruges, 27-29 April 2011, pp. 216-222. </mixed-citation></ref><ref id="scirp.19565-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">D. H. Brandwood, “A Complex Gradient Operator and Its Application in Adaptive Array Theory,” IEEE Proceedings, F: Communications, Radar and Signal Processing, Vol. 130, No. 1, 1983, p. 1116.</mixed-citation></ref><ref id="scirp.19565-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">D. Johnson, “Optimization Theory,” Optimization Theory Page from the Connexions Project.  
http://cnx.org/content/m11240/latest/</mixed-citation></ref><ref id="scirp.19565-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">P. Kalra, A. Gangal and D. Chauhan, “Performance Evaluation of Complex Valued Neural Networks Using Various Error Functions,” World Academy of Science, Engineering and Technology, Vol. 29, 2007, pp. 27-32.</mixed-citation></ref><ref id="scirp.19565-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">H. L. Li and T. Adali, “A Class of Complex ICA Algorithms Based on the Kurtosis Cost Function,” IEEE Transactions on Neural Networks, Vol. 19, No. 3, 2008, pp. 408-420. doi:10.1109/TNN.2007.908636</mixed-citation></ref><ref id="scirp.19565-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">R. Remmert, “Theory of Complex Functions,” Springer, New York, 1991. doi:10.1007/978-1-4612-0939-3</mixed-citation></ref></ref-list></back></article>