<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">ALAMT</journal-id><journal-title-group><journal-title>Advances in Linear Algebra &amp; Matrix Theory</journal-title></journal-title-group><issn pub-type="epub">2165-333X</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/alamt.2014.41002</article-id><article-id pub-id-type="publisher-id">ALAMT-43474</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  The Better Accuracy of Strassen-Winograd Algorithms (FastMMW)
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>aolo</surname><given-names>D’Alberto</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>FastMMW, San Jose, USA</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>paolo@fastmmw.com</email></corresp></author-notes><pub-date pub-type="epub"><day>05</day><month>03</month><year>2014</year></pub-date><volume>04</volume><issue>01</issue><fpage>9</fpage><lpage>39</lpage><history><date date-type="received"><day>2</day>	<month>January</month>	<year>2014</year></date><date date-type="rev-recd"><day>10</day>	<month>February</month>	<year>2014</year>	</date><date date-type="accepted"><day>17</day>	<month>February</month>	<year>2014</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
   
   
   The first error theory and bounds for Fast Matrix Multiplication based on the Strassen-Winograd algorithms (FastMMW) were formulated in the 70s. The theory introduces the concept, which is now known as weakly-stable error analysis, where the error bounds must use matrix norms instead of component-wise bounds. While the theory debunked the instability myth by using matrix scaling and a clean and simple analysis, its bounds are available only as properties of the whole matrices, which are too coarse, pessimistic, at times used to suggest instability, and are not used for algorithm optimization. We build on top of the original theory in order to reformulate the bounds: we show that tighter norm-wise and component-wise bounds are achievable by orthogonal algorithm optimizations. To achieve even better discrimination and circumvent the use of norm bounds, we develop an error theory by using communication and statistics concepts: we investigate lower and upper bounds, we estimate the practical bounds, and we investigate the algorithmic nature of the error for the class of random matrices. The theory and tools are not limited to random matrices and we can foresee further investigations to different matrix classes and algorithms. We propose new and more accurate algorithms. We show that we can improve theoretically and empirically the maximum absolute error of any FastMMW algorithm by 10% - 20% per recursion (we reduce the error by half for 4 recursions). Our theory and practice, in turn, will provide a kick start for the development of hybrid algorithms as accurate as the vendor GEMM implementation, and in certain cases even more accurate for random matrices.  
  
 
</p></abstract><kwd-group><kwd>Matrix Multiplications; Algorithms; Performance; Error Analysis</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>We are interested in the analysis, design, and implementation of the algorithms known as Fast Matrix-Multipli- cation based on the variants of Strassen and Winograd (i.e., Fast Matrix Multiplication by Winograd’s algorithms FastMMW see [<xref ref-type="bibr" rid="scirp.43474-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.43474-ref2">2</xref>] based on bilinear technique). We are drawn towards these algorithms because of their beauty, performance advantages, and, here we address a novel theory and measures to resolve the ever- present concerns about numerical stability and practical use of the FastMMW. We argue that the theory and the tools available are basically unchanged since their introduction forty years ago probably because these bounds have been used to discourage the use of the FastMMW, with a few exceptions [<xref ref-type="bibr" rid="scirp.43474-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.43474-ref4">4</xref>] . Here, we improve those bounds by more accurate algorithms and we introduce new tools.</p><p>The FastMMWs are used and investigated in several contexts, there is a wealth of related work; furthermore, with every new architecture there are new implementations, and thus new results. For example, we have a small contribution in the exploration of a large set of architectures. In practice, performance is the main reason for the FastMMW proposal and we often hear that accuracy is the main concern. Here, we address the accuracy of the FastMMW for any type of matrices and we provide new tools to handle the class of matrices broadly called random.</p><p>We show that using the theory developed by Brent [<xref ref-type="bibr" rid="scirp.43474-ref5">5</xref>] , Miller [<xref ref-type="bibr" rid="scirp.43474-ref6">6</xref>] , and Bini and Lotti [<xref ref-type="bibr" rid="scirp.43474-ref7">7</xref>] , we can design bilinear algorithms (thus Strassen-Winograd variants) that are more accurate than what was known. The theory will provide the tools so that we can estimate the accuracy gain a priori and for any algorithm. These bounds are still based on the theory of weak-stability and thus on norm wise bounds. These bounds provide an estimate of the maximum error for the whole computation, the whole result matrix. These bounds provide quite an over estimate for most of the result matrix component because they do not offer appropriate component-wise bounds. This is a practical and important concern that we will address in the following and present evidence for random matrices.</p><p>In our work, the design of kernel algorithms, we struggle in providing performance and accuracy estimates for our algorithms. We struggle because we do not always know the context where our algorithms will be applied. The application of random matrices for such estimates is common. Random matrices are useful tools to describe the range (i.e., in the range [0, 1]) and the values of matrices without a clear pattern (e.g., with normal distribution); that is, random. This class of matrices is special: they have full rank and they are dense. Being without pattern makes them a little remote to be a benchmark, in the common sense of the term; however, they are ideal for the testing of algorithms. There are a few good reasons why researchers, like us, entertain the testing with these special matrices, in the following we share three.</p><p>First, there is a known unknown effect. If we design a new algorithm and we want to test its performance and accuracy, it is plainly impossible to test every possible matrix. We know that we do not know what matrices will be used, so we substitute our unknown matrices with something random. It is an understandable and common misdemeanor.</p><p>Second, there is a practical appeal. They are easy to generate with different statistical properties: uniformly distributed in an interval such as [−1, 1] or Normal with specific mean and variance just to give two common continuous distributions (e.g., a user-guide of random matrix [<xref ref-type="bibr" rid="scirp.43474-ref8">8</xref>] as suggested by a reviewer). Here, the absence of a pattern or random as by Kolmogorov [<xref ref-type="bibr" rid="scirp.43474-ref9">9</xref>] should be associated with the error committed during the computation; however, the simplest way is to control the random nature of the input matrices<sup>1</sup>. The absence of a pattern assures the testing of every instruction and its contribution to the output. The independent and weighted, accordingly to importance, contribution of each instruction is the ultimate reward. Often, the former helps bring forward the latter. Nonetheless, these matrices are non-singular with probability one and thus they may avoid important corner cases in the context of matrix factorization.</p><p>Third, there is a statistical appeal. We know and we control the statistical features of the input and we measure the statistical properties of the output quite easily. We can use such information to derive the so called transfer function for the algorithms; we provide a formal definition starting from Section 4. The random matrix is a tool for the computation of the transfer function. If we obtain an adequate transfer function, we can estimate statistical properties of the output. Among these properties, there are also the distribution range of the maximum absolute error and its location in result matrix: the maximum error is the most common measure to estimate the accuracy of an algorithm. We are interested in a component-wise transfer function so as to estimate component-wise error bounds.</p><p>The transfer function has specific properties that will allow us to extend the weak-stability error analysis and our optimizations to random matrices and to obtain component-wise bounds.</p><p>We can state our contribution in one paragraph. Here, we present a methodology that will improve the FastMMW error bounds and will provide the estimate and measure of the error for each component of the result matrix for random matrices. This allows us to have a complete error-analysis tool set: we shall model the error, measure the error (by experiments in floating point arithmetic in IEEE-754 single precision floating point arithmetic), validate the model, and ultimately we shall design and implement more accurate algorithms. These algorithms enrich the FastMMW software package. In turn, all performance tables, plots, and other graphical presentations are drawn automatically using the Python FastMMW package. The self-contained software will help any independent validation and reproduction of all the following results: ultimately, it will simplify the exploration of new algorithms in the future and the transfer of FastMM algorithms to a larger audience.</p><p>We organize the paper as follows: in Section 2, we introduce the theory of weak stability, that is, the most successful error analysis as of today with the relative references, a description of the main ideas and our main difference. In Section 3, we bridge together the error analysis with tools used in linear time-series analysis. In Section 4, we introduce and formalize the transfer function. We present a complexity analysis using transfer function in Section 5. Using this analysis, we propose optimizations in Section 6. In Section 7, we show the practical relation between the transfer-function complexity and the weak stability. We draw our conclusions in Section 8.</p></sec><sec id="s2"><title>2. Weak Stability</title><p>Obviously, an algorithm is a constructive way of representing and computing a function. The output of an algorithm is often an approximation to the true result due to the approximation of the data and of intermediary states of the computation. Stability analysis is the ability to quantify the maximum or the expected error an algorithm will introduce during and at the end of its computation.</p><p>The most natural way to quantify an error is by estimating the difference between what we should compute and what we compute instead. So if the ideal computation with ideal representation of the inputs and outputs is a matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x6.png" xlink:type="simple"/></inline-formula> then we will represent the computed values as<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x7.png" xlink:type="simple"/></inline-formula>. Of course, the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x8.png" xlink:type="simple"/></inline-formula> Matrix Multiplication is</p><p>simply the computation <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x9.png" xlink:type="simple"/></inline-formula> (where even the input matrices <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x10.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x11.png" xlink:type="simple"/></inline-formula> may be affected by an</p><p>initial error).</p><p>The component-wise comparison between <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x12.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x13.png" xlink:type="simple"/></inline-formula> is</p><disp-formula id="scirp.43474-formula54780"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x14.png"  xlink:type="simple"/></disp-formula><p>This is a matrix representing the absolute difference of the two matrices and the equality is meant to be component wise <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x15.png" xlink:type="simple"/></inline-formula> where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x16.png" xlink:type="simple"/></inline-formula> represents the row and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x17.png" xlink:type="simple"/></inline-formula> the column of the matrices,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x18.png" xlink:type="simple"/></inline-formula>.</p><p>We know that for any Matrix Multiplication (MM) using <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x19.png" xlink:type="simple"/></inline-formula> operations the error is:</p><disp-formula id="scirp.43474-formula54781"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x20.png"  xlink:type="simple"/></disp-formula><p>The interpretation is simple and important. Given any component error<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula>, this is bounded from above by the size of the matrices<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula>, by the precision of the hardware<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula>, and by the absolute value of the operands<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula>; that is the dot product of i-th <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula> row and the j-th <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x26.png" xlink:type="simple"/></inline-formula> column (i.e.,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x27.png" xlink:type="simple"/></inline-formula>). In fact, each component error has the same bounds because the result matrix is the computation of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x28.png" xlink:type="simple"/></inline-formula> independent dot products. The bound is a function of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x29.png" xlink:type="simple"/></inline-formula> because it involves <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x30.png" xlink:type="simple"/></inline-formula> additions (and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x31.png" xlink:type="simple"/></inline-formula> multiplications): <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x32.png" xlink:type="simple"/></inline-formula>re- presents the maximum addition-chain length and how the error at the beginning of the chain will carry on to the end. Finally, the hardware data representation uses a format that has unit roundoff, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x21.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x28.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x31.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x33.png" xlink:type="simple"/></inline-formula>, which means that in absolute terms the error is larger for large numbers and thus for arithmetic with large numbers.</p><p>The first bounds for the FastMM accuracy are by Brent [<xref ref-type="bibr" rid="scirp.43474-ref5">5</xref>] . He studied the Winograd algorithm [<xref ref-type="bibr" rid="scirp.43474-ref10">10</xref>] , where the inner product can be organized to save one half of the scalar multiplication, and Strassen’s recursive algorithm, with 7 recursive MM and 18 matrix additions [<xref ref-type="bibr" rid="scirp.43474-ref1">1</xref>] . Brent has proposed a specific bound for Strassen’s like algorithms. After forty years, these are the de facto best bounds, which define the error bounds for FastMM. We re-state here Brent’s bounds again and provide an intuitive meaning.</p><p>For both Strassen’s and Winograd’s (for the latter Higham provides a complete analysis [<xref ref-type="bibr" rid="scirp.43474-ref11">11</xref>] [<xref ref-type="bibr" rid="scirp.43474-ref12">12</xref>] Chapter 23, Theorem 23.3 where he traces back the recurrence equations and solutions) like algorithms:</p><disp-formula id="scirp.43474-formula54782"><label>(2)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x34.png"  xlink:type="simple"/></disp-formula><p>Where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x35.png" xlink:type="simple"/></inline-formula> is the recursion point where the fast algorithm yields to the general MM or leaf computation. The constant <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x36.png" xlink:type="simple"/></inline-formula> is a function of the algorithm; for Strassen’s 18 addition algorithm<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x37.png" xlink:type="simple"/></inline-formula>, whereas Winograd’s 15 addition algorithm<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x38.png" xlink:type="simple"/></inline-formula>. Here, the estimate is not a matrix but a real number. With the notation<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x38.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x39.png" xlink:type="simple"/></inline-formula>, we identify the norm operation, here and, in the literature is called max norm, defined as</p><disp-formula id="scirp.43474-formula54783"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x40.png"  xlink:type="simple"/></disp-formula><p>Intuitively, the ratio <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x41.png" xlink:type="simple"/></inline-formula> is the number of recursion steps from the original problem size <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x42.png" xlink:type="simple"/></inline-formula> to<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x42.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x43.png" xlink:type="simple"/></inline-formula>. The</p><p>Factor <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x44.png" xlink:type="simple"/></inline-formula> is a function of the longest path of additions (i.e., matrix additions) within a recursion step of the algorithm.</p><p>Without loss of generality and to simplify the equation, consider the matrices unitary so that</p><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x45.png" xlink:type="simple"/></inline-formula>(i.e., by scaling) and we can neglect the factor <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x45.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x46.png" xlink:type="simple"/></inline-formula> because it is an architecture feature and not an algorithm feature. Thus we can turn our attention on the two important ideas of this equation: First, the error is a function of the leaf computation (i.e., error at the leaf); second, the error we commit is a multiplicative factor of the leaf error: in particular as a function of the number of recursion levels<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x45.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x46.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x47.png" xlink:type="simple"/></inline-formula>.</p><p>For example, for one level of recursion, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula>, we can estimate a multiplicative factor of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x49.png" xlink:type="simple"/></inline-formula> to the leaf error. In practice, for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x49.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x50.png" xlink:type="simple"/></inline-formula> level of recursions in the worst case scenario we should have a multiplicative factor of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x49.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x51.png" xlink:type="simple"/></inline-formula> (or <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x49.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x52.png" xlink:type="simple"/></inline-formula> for Winograd). The best case scenario is for<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x49.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x53.png" xlink:type="simple"/></inline-formula>, the shortest addition chain, and thus the factor is just<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x49.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x53.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x54.png" xlink:type="simple"/></inline-formula>. This means that the error will be linear as function of the problem size (<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x49.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x53.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x54.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x55.png" xlink:type="simple"/></inline-formula>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x49.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x50.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x53.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x54.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x55.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x56.png" xlink:type="simple"/></inline-formula> is a constant parameter).</p><p>We understand from this equation that we can apply two different optimizations: we improve the leaf computation [<xref ref-type="bibr" rid="scirp.43474-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.43474-ref14">14</xref>] or we decrease the length of addition chains [<xref ref-type="bibr" rid="scirp.43474-ref7">7</xref>] . We investigated the former approach, for the latter Bini and Lotti present a Winograd algorithm with <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x57.png" xlink:type="simple"/></inline-formula> (unfortunately, with no implementation nor experiments). In this work, we show a third approach: by reducing the catastrophic effects of long chains.</p><p>Miller [<xref ref-type="bibr" rid="scirp.43474-ref6">6</xref>] showed that the estimate of Equation (1) cannot be applied to FastMM because part of the result matrix has different bounds (e.g., not uniform). What he proposed is to use a norm-wise bound that should follow this form:</p><disp-formula id="scirp.43474-formula54784"><label>(3)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x58.png"  xlink:type="simple"/></disp-formula><p>Miller infers that Brent’s bound are the best we can infer for FastMM because Equation (2) satisfies the form of Equation (3). In practice, Miller argument is to introduce and evaluate different ways to compute bounds for bilinear forms (FastMM) and he introduces the terminology known today: Brent stability and Restricted Brent stability (in honor to the original author) and the more common term of Weak and Strong Stability. In the literature and in this paper, we mean the weak stability as the Brent Stability (norm wise bounds).</p><p>In Section 4, we shall present graphical tools and bounds that will make obvious Miller’s error bounds [<xref ref-type="bibr" rid="scirp.43474-ref6">6</xref>] . In short and intuitively, the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x59.png" xlink:type="simple"/></inline-formula> for FastMM are not independent; the algorithm will focus on, for the lack of a better term, the error in specific locations of the matrix. We can provide point wise bounds satisfying Miller’s bounds and providing an optimization tool, thus we can develop more accurate FastMM algorithms.</p><p>Bini and Lotti [<xref ref-type="bibr" rid="scirp.43474-ref7">7</xref>] provide the first framework to describe by recurrence equations the error of any bilinear algorithms, which are expressed by matrices and matrix products. Their idea is to keep separated the (block) error that will affect the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x60.png" xlink:type="simple"/></inline-formula> and they split the matrix error into quadrants.</p><disp-formula id="scirp.43474-formula54785"><label>(4)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x61.png"  xlink:type="simple"/></disp-formula><p>They estimate the error as a multiplicative factor for each quadrant but they do not care about their order. They define it stability vector and we represent the error location by means of a matrix:</p><disp-formula id="scirp.43474-formula54786"><label>(5)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x62.png"  xlink:type="simple"/></disp-formula><p>The component/matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x63.png" xlink:type="simple"/></inline-formula> is the maximum error associated with component/quadrant<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x64.png" xlink:type="simple"/></inline-formula>.</p><p>This is a fundamental idea that we shall expand in this paper further for the design of more accurate algorithms. For example, the Winograd algorithm, as presented by Higham ([<xref ref-type="bibr" rid="scirp.43474-ref12">12</xref>] Chap.23) and with <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x65.png" xlink:type="simple"/></inline-formula> as above, has a stability vector</p><disp-formula id="scirp.43474-formula54787"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x66.png"  xlink:type="simple"/></disp-formula><p>The stability factor of an algorithm is the maximum of the stability vector:<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x67.png" xlink:type="simple"/></inline-formula>. For the Strassen’s algorithm they show the stability vector</p><disp-formula id="scirp.43474-formula54788"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x68.png"  xlink:type="simple"/></disp-formula><p>and we can see that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x69.png" xlink:type="simple"/></inline-formula> immediately. The stability factor again is used to solve a recurrence equation. The authors provide exactly the same bounds as proposed by Brent and by Miller’s equation. Bini and Lotti then classify fast algorithms by their stability vectors: two algorithms are equivalent if their stability vectors differ only for permutations of their components. Clearly an algorithm <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x69.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x70.png" xlink:type="simple"/></inline-formula> is more accurate than algorithm<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x69.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x70.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x71.png" xlink:type="simple"/></inline-formula>, if and only if<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x69.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x70.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x71.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x72.png" xlink:type="simple"/></inline-formula>.</p><p>This classification and the methodology is powerful but there are a few points that we are going to expand and use:</p><p>・ The stability error is a function of the algorithm representation (matrix forms) instead of code or experimental data. We shall introduce an empirical and theoretical measure, the transfer function, for any algorithm to cope with experimental error analysis and graphical tools and the inherently coarse grain of the error bounds.</p><p>・ We tailor our tools for random matrices. However, the theory developed by Bini and Lotti will justify the same optimization on any matrix.</p><sec id="s2_1"><title>2.1. Re-Bound: Component-Wise Bounds</title><p>The last point is quite important and we expand it here. This section derives from Bini and Lotti framework but it is completely original. The classification and the bounds are based on the recursive nature of the stability vector, unchanged, and thus exploiting the worst case scenario. We can improve the bounds if we improve the algorithms. Consider the addition of two result matrices by the Strassen algorithm and their identical stability vector</p><disp-formula id="scirp.43474-formula54789"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x73.png"  xlink:type="simple"/></disp-formula><p>Obviously, if we add them together the stability factor is additive because the computations are independent and because we estimate the worst case (i.e., if we add two matrices that have been computed using the Strassen’s algorithm, the resulting stability factor, the maximum expected error, is the sum of the stability factors.)</p><disp-formula id="scirp.43474-formula54790"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x74.png"  xlink:type="simple"/></disp-formula><p>However, if we have a way to permute the computation so that the stability vector is rotated one shift clockwise (this is always possible as we show in Section 5) the error estimate is different:</p><disp-formula id="scirp.43474-formula54791"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x75.png"  xlink:type="simple"/></disp-formula><p>As we may appreciate from the Strassen algorithm, the error is on a diagonal, we may take advantage of the specific layout of the error to write better algorithms. If we do nothing and we perform two levels of recursion, the stability vector will be:</p><disp-formula id="scirp.43474-formula54792"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x76.png"  xlink:type="simple"/></disp-formula><p>This means that the stability factor is <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x77.png" xlink:type="simple"/></inline-formula> (for each recursion<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x77.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x78.png" xlink:type="simple"/></inline-formula>) and we have a large difference between the maximum and minimum error (i.e., 144 and 16 respectively).</p><p>If we exploit the location of the error using what we call orthogonal algorithms (see Section 6), we can obtain the following stability vector:</p><disp-formula id="scirp.43474-formula54793"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x79.png"  xlink:type="simple"/></disp-formula><p>This means that the stability factor is <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x80.png" xlink:type="simple"/></inline-formula> (for each recursion we have at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x80.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x81.png" xlink:type="simple"/></inline-formula>) and we have a smaller difference between the maximum and minimum error (i.e., 96 and 32 respectively). The error variance is smaller. For more recursions, the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x80.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x81.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x82.png" xlink:type="simple"/></inline-formula> per recursion does not change. We have<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x80.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x81.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x82.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x83.png" xlink:type="simple"/></inline-formula>, thus we have a more accurate algorithm for any matrix.</p><p>Bini and Lotti provide several classes of algorithms for the Winograd variant: the most accurate has stability vector</p><disp-formula id="scirp.43474-formula54794"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x84.png"  xlink:type="simple"/></disp-formula><p>and for two recursions</p><disp-formula id="scirp.43474-formula54795"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x85.png"  xlink:type="simple"/></disp-formula><p>There are actually three orthogonal permutations to be applied to have any advantage</p><disp-formula id="scirp.43474-formula54796"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x86.png"  xlink:type="simple"/></disp-formula><p>This means that the stability factor is <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x87.png" xlink:type="simple"/></inline-formula> (for each recursion we have at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x87.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x88.png" xlink:type="simple"/></inline-formula>) and we have a smaller difference between the maximum and minimum error.</p><p>Notice that all previous stability vectors are computed automatically using the matrix notations introduced by the original authors. Also, the permutations are applied as matrix notation and automatic. Once the set of matrices are specified (i.e., the algorithm), we can compute the stability vector with and without orthogonal permutation for any recursion level. Bottom line, this section presents a constructive proof of how to write more accurate FastMM algorithms based on bilinear techniques using permutations. It also shows that we can actually write component-wise bounds.</p><p>However, further discussion about the automatic generation of stability vectors is beyond the scope of this paper. When we will present our code-generation tools that will take the matrix form and will generate code: we will provide more details and mathematical notations how this can be achieved using Kronecker and matrix products.</p></sec></sec><sec id="s3"><title>3. Series Connection</title><p>The intuition behind Miller’s result is that the components of the error <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x89.png" xlink:type="simple"/></inline-formula> are not independent. The MM is a sequence of additions and multiplications, thus we model it as a bank of filters.</p><p>The easiest way to introduce this connection and its implications is by the description of a common experiment. Choose a reference MM, for example, DGEMM (double precision General Matrix Multiplication see [<xref ref-type="bibr" rid="scirp.43474-ref15">15</xref>] and we use Goto’s BLAS [<xref ref-type="bibr" rid="scirp.43474-ref16">16</xref>] ). Choose a comparison algorithm, for example SGEMM. We run the following experiment.</p><p>・ Let us choose the number of iteration T, 100 say, and a dimension<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x90.png" xlink:type="simple"/></inline-formula>, 200 say.</p><p>・ Per iteration<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x91.png" xlink:type="simple"/></inline-formula>, we create two random matrices <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x92.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x93.png" xlink:type="simple"/></inline-formula> of sizes <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x94.png" xlink:type="simple"/></inline-formula> and with components in the range [?1, 1]. We compute the reference <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x94.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x95.png" xlink:type="simple"/></inline-formula> by DGEMM. Then we compute the comparative result <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x94.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x95.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x96.png" xlink:type="simple"/></inline-formula> by SGEMM.</p><p>・ We compute<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x97.png" xlink:type="simple"/></inline-formula>. This is a multidimensional times series where for each component we have a single time series<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x97.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x98.png" xlink:type="simple"/></inline-formula>.</p><p>This is a very common experiment so that to estimate the maximum (or maximum relative error) given a reference and experimental algorithm. For example, we record only <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x99.png" xlink:type="simple"/></inline-formula> at each iteration and then we can determine features such as:</p><p>・ worst case estimate<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x100.png" xlink:type="simple"/></inline-formula>, estimate of Equation (2);</p><p>・ empirical distribution of the<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x101.png" xlink:type="simple"/></inline-formula>, expectation<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x101.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x102.png" xlink:type="simple"/></inline-formula>, variance<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x101.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x102.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x103.png" xlink:type="simple"/></inline-formula>.</p><p>Often the input matrices have some statistical properties but they can be from benchmarks as well. In practice, we use the experiments above to estimate the parameters such as<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x104.png" xlink:type="simple"/></inline-formula>, used in Equation (2). This equation addresses only the question about the error magnitude. However, this experiment and the Equation (2) do not tell us where the error is. Miller’s result suggested that the error is not equal everywhere. In this work, we shall show that there is a pattern and we can provide such a bound. To do so we transform the problem from a single bound problem to a multidimensional time-series variance estimation. We clarify the connection in the following.</p><p>For large <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x105.png" xlink:type="simple"/></inline-formula> and if we assume that the rounding error are independent to each other (e.g., function of only of the instruction input data), and the rounding error can be positive and negative. In this scenario, it is common to estimate the nature of each error (data representation and operation) as the realization of independent processes (round off, ceiling, and other approximations just as good as along the errors are independent) and with finite mean and variance.</p><p>Then each series <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x106.png" xlink:type="simple"/></inline-formula> represents the result of large sum of independent rounding errors, this is a moving average (in the literature<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x106.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x107.png" xlink:type="simple"/></inline-formula>) of a so called stationary process. As a MA(n) filter, the output is completely described by its first two moments: mean and variance. We estimate those empirically as follows:</p><p>・ The estimate of the mean</p><disp-formula id="scirp.43474-formula54797"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x108.png"  xlink:type="simple"/></disp-formula><p>Of course, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x109.png" xlink:type="simple"/></inline-formula>is a matrix and what we compute is an component-wise mean.</p><p>・ The estimate of the variance</p><disp-formula id="scirp.43474-formula54798"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x110.png"  xlink:type="simple"/></disp-formula><p>Of course, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x111.png" xlink:type="simple"/></inline-formula>is a matrix, this is a component-wise variance, and the square operation is component wise.</p><p>If the assumptions about the error independence nature, if the size of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x112.png" xlink:type="simple"/></inline-formula> is large enough, and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x113.png" xlink:type="simple"/></inline-formula> is large, then we know that the estimates are consistent: they converge in probability to the real mean and variance. In more practical terms, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x114.png" xlink:type="simple"/></inline-formula>as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x113.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x114.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x115.png" xlink:type="simple"/></inline-formula> because of the stationary of the series and more interestingly we can bound in probability the maximum error using the variance</p><disp-formula id="scirp.43474-formula54799"><label>(6)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x116.png"  xlink:type="simple"/></disp-formula><p>Where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x117.png" xlink:type="simple"/></inline-formula> is the possible maximum error realization during any experiment (in the experimental section we present evidence that the maximum is at<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x118.png" xlink:type="simple"/></inline-formula>), <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x119.png" xlink:type="simple"/></inline-formula>is the cumulative distribution function of a normal</p><p>distribution <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x120.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x120.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x121.png" xlink:type="simple"/></inline-formula> is the probability that the error is twice sigma. Here, we abuse the</p><p>notation providing a single bound instead of a matrix bound. In practice, where there is a large variance, there is a large error. Also, where there is a small variance, likely, there is a small error.</p><p>Furthermore, we can infer a relation across components of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x122.png" xlink:type="simple"/></inline-formula>: if <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x123.png" xlink:type="simple"/></inline-formula> with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x124.png" xlink:type="simple"/></inline-formula>, then very likely</p><disp-formula id="scirp.43474-formula54800"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x125.png"  xlink:type="simple"/></disp-formula><p>Of course, Equation (6) has such a small and clean probability because we use the normal distribution of the output<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x126.png" xlink:type="simple"/></inline-formula>. Note that a stationary process will be defined by its first two moments: median and variance; in combination with being the collection of a large number of independent observations, a normal distribution is a very good approximation. Nonetheless, with different distributions, there will be different bounds but the main ideas will be valid still; that is, large variance is associated to large error.</p><p>In practice, the matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x127.png" xlink:type="simple"/></inline-formula> provides a component-wise error and, if we like, we can provide meaningful bounds to the maximum error, thought these have probabilities. In time-series analysis: the relation between the input variance and the output variance is a function of the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x128.png" xlink:type="simple"/></inline-formula> and it is called spectral transfer function [<xref ref-type="bibr" rid="scirp.43474-ref17">17</xref>] Chapter 4.12 [<xref ref-type="bibr" rid="scirp.43474-ref18">18</xref>] Chapter 4.4<sup>2</sup>.</p><disp-formula id="scirp.43474-formula54801"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x129.png"  xlink:type="simple"/></disp-formula><p><sup>2</sup>Also, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x130.png" xlink:type="simple"/></inline-formula>is a property of the scalar error randomness and its mean, it is not a property of the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x130.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x131.png" xlink:type="simple"/></inline-formula> thus the algorithm.</p></sec><sec id="s4"><title>4. Transfer Function</title><p>One of the best ways to showcase the power of the transfer function is by presenting the matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x132.png" xlink:type="simple"/></inline-formula> as a heat map and by examples. For example, the reference <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x132.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x133.png" xlink:type="simple"/></inline-formula> is obtained by using Goto’s DGEMM. Then we estimate the transfer function of four algorithms freely available: GSGEMM (Goto’s SGEMM), SW, SWOPT and SSTRA. The (S/D) GEMM are from [<xref ref-type="bibr" rid="scirp.43474-ref16">16</xref>] , The algorithm SW, SWOPT, SSTRA are from [<xref ref-type="bibr" rid="scirp.43474-ref19">19</xref>] .</p><p>We can take a small problem where matrices are of size <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x134.png" xlink:type="simple"/></inline-formula> and the recursion point is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x135.png" xlink:type="simple"/></inline-formula>, the fast algorithm will perform one level recursion. We can perform 10,000 iterations and thus we compute two statistical features of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x135.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x136.png" xlink:type="simple"/></inline-formula>: <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x134.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x135.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x136.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x137.png" xlink:type="simple"/></inline-formula>and the distribution of the maximum error, <xref ref-type="fig" rid="fig1">Figure 1</xref>. The latter is the classic interpretation and estimation of the maximum error with information about the actual range of the maximum error. We can see that GSGEMM is better than SSTRA and SSTRA is better than SW and SWOPT. The transfer function maintains the same ordering in accuracy (GSGEMM &lt; SSTRA &lt; SW = SWOPT) but clearly specifies where the maximum error is likely to be. We see that Strassen implementation has hot spots on the main diagonal, while Winograd’s algorithms have both a very cool top-left quadrant (dark).</p><p>In the previous section, we discussed that the transfer function has meaning for large<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x138.png" xlink:type="simple"/></inline-formula>. We may wonder if a problem of size <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x139.png" xlink:type="simple"/></inline-formula> is large enough to describe what we are looking for. We present a much larger problem with <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x140.png" xlink:type="simple"/></inline-formula> with recursion point<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x141.png" xlink:type="simple"/></inline-formula>; that is, with one recursion only as before. This recursion point is actually the optimal recursion point for the testing architecture (AMD A8) and for single precision algorithms, so this is a realistic test. In <xref ref-type="fig" rid="fig2">Figure 2</xref>, we show that the heat map distribution is unchanged. The order of accuracy did not change (GSGEMM &lt; SSTRA &lt; SW = SWOPT for maxima errors and variances). Only the value of the error is increased accordingly to the problem size. This is true for 2 and 3 levels of recursion as well. In this section and for presentation purpose we present heat maps for relatively small problems.</p><p>It is simple to appreciate the close relationship between the heat map and the stability vector. We recall the stability vector is:</p><disp-formula id="scirp.43474-formula54802"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x142.png"  xlink:type="simple"/></disp-formula><p>the transfer function as heat map is a graphical representation of the stability vector and the error distribution, see <xref ref-type="fig" rid="fig1">Figure 1</xref>. We shall show that the transfer function is more descriptive than the stability vector in such a way to capture the subtle change of the error analysis.</p><p>The nature of the recursive algorithm is captured by the transfer function quite beautifully: Figures 3 and 4.</p><p>Now, we have a clear picture about Miller’s bounds and the not uniform distribution of the error that cannot be model by Equation (1). Here, we put to use both the recursive division, Bini and Lotti’s framework, and the line of thoughts of Equation (2) proof. We know that applying a single recursion of the Strassen’s algorithm the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x143.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x144.png" xlink:type="simple"/></inline-formula> had larger error, we also know that Winograd’s had the larger error on<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x145.png" xlink:type="simple"/></inline-formula>. The proof shows a direction of the error where different quadrants of the result matrix are affected differently and with different</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> Parameters: n = 21, n<sub>0</sub> = 20, range = [−1, 1], and iterations = 10,000 (Top) Transfer Function and maximum (white: high error, dark: low error) (Bottom) Maximum Error histogram and ma- ximum (maximum) error</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x146.png"/></fig><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> Parameters: n = 2001, n<sub>0</sub> = 2000, range = [−1, 1], and iterations = 1000 (Top) Transfer Function and maximum σ<sup>2</sup> (Bottom) Maximum Error histogram and maximum (maximum) error</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x147.png"/></fig><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> (Left) n = 42, n<sub>0</sub> = 20, range = [−1, 1], two recursions, and iterations = 10,000 (Right) n = 86, n<sub>0</sub> = 20, range = [−1, 1], three recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x148.png"/></fig><fig-group id="fig4"><label><xref ref-type="fig" rid="fig4">Figure 4</xref></label><caption><title> (Left) n = 175, n<sub>0</sub> = 20, range = [−1, 1], 4 recursions, and iterations = 10,000 (Right) n = 350, n<sub>0</sub> = 20, range = [−1, 1], 5 recursions, and iterations = 10,000.</title></caption><fig id ="fig4_1"><label></label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x150.png"/></fig><fig id ="fig4_2"><label></label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x149.png"/></fig></fig-group><p>magnitudes. The proof uses the maximum among the errors in order to write a recurrence equation and to provide a solution. Notice if we use Bini and Lotti framework we can compute directly the solution of the recurrence equation for each quadrant and then each component. The heat map is a consistent estimate of such a computation (as we did in Section 2.1). The heat map is a clear picture for one recursive step, which seems obvious now; however, it pictures a concise, coherent and beautiful information for 2 or more recursions and it is a beautiful example of fractals.</p><p>The transfer function is a way to represent and compute the point-wise root-mean-square error and this is a common theme in several previous publications: For example, the original work presented by Welch [<xref ref-type="bibr" rid="scirp.43474-ref20">20</xref>] for the fixed point FFT, which is a stable computation. Here, we use the variance to drive algorithm optimizations and infer the maximum error as well.</p>Error Change<p>There are also disconnections between the transfer function, the direction of the error and the Equation (2). Here we investigate the hidden differences before dwelling into the commonalities.</p><p>In our experience, the error of FastMM is connected to the sign/range of the matrices in the sense that versions of Winograd’s are known to be as accurate as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x151.png" xlink:type="simple"/></inline-formula> MM algorithm for positive matrices (probability matrices). We present evidence here as well. This accuracy is counter intuitive with respect to the bounds available in literature; that is, Strassen’s algorithm should be always more accurate than Winograd’s and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x151.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x152.png" xlink:type="simple"/></inline-formula> MM algorithms should be always more accurate than Strassen’s. Counter intuitively, we show that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x151.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x152.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x153.png" xlink:type="simple"/></inline-formula> MM</p><p>algorithm is less accurate for positive matrices; this will affect the accuracy of FastMM because <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x154.png" xlink:type="simple"/></inline-formula> MM algorithm is the leaf computation.</p><p>Obviously we wonder, whether or not any range change of the input matrices, will affect the transfer function. For example, instead of using matrices in the range <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x155.png" xlink:type="simple"/></inline-formula> we chose the range<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x155.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x156.png" xlink:type="simple"/></inline-formula>, will the transfer function change? Will the error change?</p><p>1. Let us start by considering the effects on the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x157.png" xlink:type="simple"/></inline-formula> MM: First, The variance of the error for [0, 1] matrices is smaller than for [?1, 1] matrices; however, for positive matrices it has larger transfer function and maximum error. Intuitively, we can think of a bias of the error contribution for positive matrices. For other matrices, the mean error is smaller because of compensation, introducing a randomization.</p><p>This simple observation will explain why fast algorithms have the opportunity to be as accurate as regular algorithms for random positive matrices (i.e., they resolve to use matrices in the [?1, 1] range).</p><p>2. The transfer-function shape changes for FastMMs. In <xref ref-type="fig" rid="fig5">Figure 5</xref>, the transfer function has changed for the algorithm SWOPT: it is like SWOPT has no error in <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x158.png" xlink:type="simple"/></inline-formula> and this is true for deeper recursions: See Figures 6 and 7.</p><p>Note that SWOPT magnifies the error onto two quadrants, instead of three. The SWOPT’s transfer function has similarity with the transfer function of SSTRA thought the maximum error differs especially for large recursions. Interestingly, SW is as accurate as GSGEMM for recursion smaller or equal to three. For deeper recursions, SW looses its edge in accuracy. We will provide an explanation in the following section when we introduce a complexity theory using transfer functions.</p></sec><sec id="s5"><title>5. Error Directions</title><p>Let us introduce a few definitions useful for the notation, for the error complexity and, finally, for the design of more accurate algorithms. These notations stem from the stability vectors previously introduced. Let us consider the error matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x159.png" xlink:type="simple"/></inline-formula> where the matrix<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x159.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x160.png" xlink:type="simple"/></inline-formula>. Without loss of generality, we consider square matrices<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x159.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x160.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x161.png" xlink:type="simple"/></inline-formula>. We can identify the transfer function of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x159.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x160.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x161.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x162.png" xlink:type="simple"/></inline-formula> from <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x159.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x160.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x161.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x162.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x163.png" xlink:type="simple"/></inline-formula> samples <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x159.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x160.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x161.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x162.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x163.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x164.png" xlink:type="simple"/></inline-formula> as</p><disp-formula id="scirp.43474-formula54803"><label>(7)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x165.png"  xlink:type="simple"/></disp-formula><p>Of course, the transfer function is a matrix and we identify the error direction in a transfer function using matrix notations and sub-matrices as in Equation (4). We can summarize the transfer function by the hot submatrices of the error matrix.</p><p>・ The transfer function of SSTRA algorithm will be identified as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x166.png" xlink:type="simple"/></inline-formula> to highlight the main diagonal error.</p><p>・ The SWOPT and SW algorithms have the same transfer function and we identify it as<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x167.png" xlink:type="simple"/></inline-formula>; For positive matrices SWOPT has<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x167.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x168.png" xlink:type="simple"/></inline-formula>.</p><p>・ We can identify the GSGEMM algorithm transfer function as<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x169.png" xlink:type="simple"/></inline-formula>; where, every component is affected identically.</p><p>For example, if we have the addition of two matrices such as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x170.png" xlink:type="simple"/></inline-formula> and these are the result of two independent MM algorithms, we can see that the error and the transfer function of this operation is naturally the addition of the transfer functions:</p><disp-formula id="scirp.43474-formula54804"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x171.png"  xlink:type="simple"/></disp-formula><p>For example, if <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x172.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x172.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x173.png" xlink:type="simple"/></inline-formula> are both computed using GSGEMM, the same algorithm</p><disp-formula id="scirp.43474-formula54805"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x174.png"  xlink:type="simple"/></disp-formula><p>This is true because we add the component-wise square variances and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x175.png" xlink:type="simple"/></inline-formula>. Clearly, the transfer function depends on the algorithm used because it will affect the shape and the maximum error or power consi-</p><fig id="fig5"  position="float"><label><xref ref-type="fig" rid="fig5">Figure 5</xref></label><caption><title> Parameters: n = 21, n<sub>0</sub> = 20, range = [0, 1], and iterations = 10,000 (Top) Transfer Function and maximum σ<sup>2</sup> (Bottom) Maximum Error histogram and maximum (maximum) error</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x176.png"/></fig><fig id="fig6"  position="float"><label><xref ref-type="fig" rid="fig6">Figure 6</xref></label><caption><title> (Left) n = 42, n<sub>0</sub> = 20, range = [0, 1], two recursions, and iterations = 10,000 (Right) n = 86, n<sub>0</sub><sup> </sup>= 20, range = [0, 1], three recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x177.png"/></fig><fig id="fig7"  position="float"><label><xref ref-type="fig" rid="fig7">Figure 7</xref></label><caption><title> (Left) n = 175, n<sub>0</sub> = 20, range = [0, 1], 4 recursions, and iterations = 10,000 (Right) n = 350, n<sub>0</sub> = 20, range = [0, 1], 5 recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x178.png"/></fig><p>dering the statistical meaning. It is also function of the problem size and the range of the matrices (because it affects the basic assumption of the error distribution).</p><p>The transfer function <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x179.png" xlink:type="simple"/></inline-formula> defines an Abelian group with respect the operation matrix addition + as above:</p><p>・ Closure: <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x180.png" xlink:type="simple"/></inline-formula>where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x181.png" xlink:type="simple"/></inline-formula>.</p><p>・ Associativity<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x182.png" xlink:type="simple"/></inline-formula>.</p><p>・ Commutativity<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x183.png" xlink:type="simple"/></inline-formula>.</p><p>・ (almost) Identity element <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x184.png" xlink:type="simple"/></inline-formula> where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x185.png" xlink:type="simple"/></inline-formula>: where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x186.png" xlink:type="simple"/></inline-formula>, which is an appropriate transfer function of the ideal computation.</p><p>・ Orthogonal (or Inverse) element <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x187.png" xlink:type="simple"/></inline-formula> where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x188.png" xlink:type="simple"/></inline-formula>.</p><p>Theoretically, there is an identity element: the matrix zero or the transfer function of the ideal computation. Here, we rather introduce the almost identity element because it is a real computation and it takes the role of the classic <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x189.png" xlink:type="simple"/></inline-formula> algorithm.</p><p>Again, if we restrict the family of algorithms, there may be no orthogonal transfer function to one transfer function. For example, for the Winograd algorithm with transfer function<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula>, its orthogonal is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula>: intuitively <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x192.png" xlink:type="simple"/></inline-formula> that is <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x193.png" xlink:type="simple"/></inline-formula> and (0) has no overlap with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x194.png" xlink:type="simple"/></inline-formula>. We can compute this by having the Winograd algorithm applied to the first quadrant <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x195.png" xlink:type="simple"/></inline-formula> only. This (orthogonal) algorithm is a hybrid, in the sense that we mix different division processes within the same recursion level. These hybrids are beyond the scope of this paper and we turn our attention to a family of algorithms obtained by permutations for which there are weakly-orthogonal transfer functions: <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x196.png" xlink:type="simple"/></inline-formula>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x197.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x198.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x190.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x191.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x192.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x199.png" xlink:type="simple"/></inline-formula> have partial overlap. In this work, weakly-orthogonal transfer functions have a very intuitive and statistical meaning showing that two algorithms could have little heat overlap.</p><p>In practice, if <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x200.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x201.png" xlink:type="simple"/></inline-formula> have the same parameters like number of recursions, sizes, range of the operands, and create the same error distribution we can write something like this</p><disp-formula id="scirp.43474-formula54806"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x202.png"  xlink:type="simple"/></disp-formula><p>which emphasizes that the shape of the transfer function stays the same and the intensity of the variance should double.</p><p>Let us consider the Strassen’s algorithm as presented in <xref ref-type="table" rid="table1">Table 1</xref> and apply the function transfer addition. For example, we assume this is just one recursion level where we yield to GSGEMM. Notice we take advantage that MM for [0, 1] has a transfer function about twice as large as for matrices [?1, 1]. Notice that in the context of FastMM for positive matrices, the leaf computation involving mixed sign matrices, we have shifted the range to [?1, 1] instead of [0, 2] in case of the addition of two positive matrices, and thus the maximum value. This affects the error as well. We shall explain in the appendix <xref ref-type="table" rid="table2">Table 2</xref> how to take full advantage of this property and why there are Winograd’s algorithms that can be very accurate for positive matrices.</p><p>The left column in <xref ref-type="table" rid="table1">Table 1</xref> represents the computation. The central column in <xref ref-type="table" rid="table1">Table 1</xref> explains the transfer function for the input matrix in the range [?1, 1], and for the range [0, 1] see the right column. This is basically the contribution to the transfer function for one recursion step. As in previous works, we would like to quantify the magnitude of the maximum variance (maximum error). This boils down to the computation of the largest contribution for each quadrants and write a recurrence equation for each type of inputs. The Equation (8) represents the recurrence equation for [?1, 1].</p><disp-formula id="scirp.43474-formula54807"><label>(8)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x203.png"  xlink:type="simple"/></disp-formula><p>We present a solution for the above error complexity in Equation (9) where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x204.png" xlink:type="simple"/></inline-formula> means the transfer function of the leaf computation:</p><disp-formula id="scirp.43474-formula54808"><label>(9)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x205.png"  xlink:type="simple"/></disp-formula><p>From the experimental results presented previously, we see that this is quite adequate with a simple explanation. Notice that FastMM are more accurate in absolute term for matrices in the range [?1, 1]. However, they grow slower for positive matrices. We present the analysis for the Winograd variants in Appendix 8.</p></sec><sec id="s6"><title>6. Orthogonality</title><p>Strassen algorithm has a very distinctive direction of the transfer function <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x206.png" xlink:type="simple"/></inline-formula> and we can show that there is an orthogonal algorithm that has<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x207.png" xlink:type="simple"/></inline-formula>.</p><p>Strassen algorithm computes the following and obvious matrix computation:</p><disp-formula id="scirp.43474-formula54809"><label>(10)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x208.png"  xlink:type="simple"/></disp-formula><p>Its orthogonal is the following:</p><disp-formula id="scirp.43474-formula54810"><label>(11)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x209.png"  xlink:type="simple"/></disp-formula><p>The permutation is logical and we do not need really to move data along. If one recursion level is applied, and if we repeat the same bound estimation as in <xref ref-type="table" rid="table1">Table 1</xref>, we can find that the transfer function is <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x210.png" xlink:type="simple"/></inline-formula> and the</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> SSTRA algorithm and estimated transfer function</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x211.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x212.png" xlink:type="simple"/></inline-formula>[?1, 1]</th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x213.png" xlink:type="simple"/></inline-formula>[0, 1]</th></tr></thead><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x214.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x215.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x216.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x217.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x218.png" xlink:type="simple"/></inline-formula>[?1, 1]</td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x219.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x220.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x221.png" xlink:type="simple"/></inline-formula>[?1, 1]</td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x222.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x223.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x224.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x225.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x226.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x227.png" xlink:type="simple"/></inline-formula>[?1, 1]</td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x228.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x229.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x230.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x231.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x232.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x233.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x234.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x235.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x236.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x237.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x238.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x239.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x240.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x241.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x242.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x243.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x244.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x245.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x246.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x247.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x248.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x249.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x250.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x251.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x252.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x253.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x254.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x255.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x256.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x257.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x258.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x259.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x260.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x261.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x262.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x263.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x264.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x265.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x266.png" xlink:type="simple"/></inline-formula></td></tr></tbody></table></table-wrap><p>recursion bounds are identical as in Equation (9).</p><p>Now we can do something interesting. Consider a matrix result <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x267.png" xlink:type="simple"/></inline-formula> with transfer function <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x267.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x268.png" xlink:type="simple"/></inline-formula> and another matrix result <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x267.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x268.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x269.png" xlink:type="simple"/></inline-formula> with transfer function<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x267.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x268.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x269.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x270.png" xlink:type="simple"/></inline-formula>, then we have</p><disp-formula id="scirp.43474-formula54811"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x271.png"  xlink:type="simple"/></disp-formula><p>and more importantly small error increase.</p><p>The coefficient can be estimate as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x272.png" xlink:type="simple"/></inline-formula> but for simplicity here we set it <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x272.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x273.png" xlink:type="simple"/></inline-formula> (for ex-</p><p>ample <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x274.png" xlink:type="simple"/></inline-formula> for one recursion <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x274.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x275.png" xlink:type="simple"/></inline-formula> for two and smaller and smaller).</p><p>So let us take again the Strassen algorithm and introduce a permutation instruction that allows us to switch on</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> SW algorithm and estimated transfer function</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x276.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x277.png" xlink:type="simple"/></inline-formula>[?1, 1]</th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x278.png" xlink:type="simple"/></inline-formula>[0, 1]</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x279.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x280.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x281.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x282.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x283.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x284.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x285.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x286.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >5</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x287.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x288.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x289.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >6</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x290.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x291.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x292.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >7</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x293.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >8</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x294.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x295.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x296.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >9</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x297.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >10</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x298.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >11</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x299.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x300.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x301.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >12</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x302.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x303.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x304.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >13</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x305.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >14</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x306.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x307.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x308.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >15</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x309.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >16</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x310.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x311.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x312.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >17</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x313.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >18</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x314.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >19</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x315.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x316.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x317.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >20</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x318.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x319.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x320.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >21</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x321.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x322.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x323.png" xlink:type="simple"/></inline-formula></td></tr></tbody></table></table-wrap><p>and off the orthogonal algorithm (<xref ref-type="table" rid="table3">Table 3</xref>) and see what could be a reasonable bound.</p><p>If we write the recurrence equation as we did in Equation (8), and we solve it to estimate the magnitude, then we should explicitly introduce coefficient<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x324.png" xlink:type="simple"/></inline-formula>. Because the bound is a function of the recursion level, we specify the coefficient as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x324.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x325.png" xlink:type="simple"/></inline-formula> where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x324.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x325.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x326.png" xlink:type="simple"/></inline-formula> is the recursion level.</p><disp-formula id="scirp.43474-formula54812"><label>(12)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x327.png"  xlink:type="simple"/></disp-formula><p>For the Winograd’s variants such as SW and SWOPT the orthogonal transformation is a little more complicated:</p><disp-formula id="scirp.43474-formula54813"><label>(13)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x328.png"  xlink:type="simple"/></disp-formula><p>Once again, the permutations are logical only, there is no data movement.</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Orthogonal SSTRA algorithm and estimated transfer function</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x329.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x330.png" xlink:type="simple"/></inline-formula>[?1, 1]</th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x331.png" xlink:type="simple"/></inline-formula>[0, 1]</th></tr></thead><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x332.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x333.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x334.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x335.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x336.png" xlink:type="simple"/></inline-formula>[?1, 1]</td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x337.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x338.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x339.png" xlink:type="simple"/></inline-formula>[?1, 1]</td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x340.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x341.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >ORTHOGONAL 1, 2</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x342.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x343.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x344.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x345.png" xlink:type="simple"/></inline-formula>[?1, 1]</td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x346.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x347.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x348.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x349.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x350.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x351.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x352.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x353.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x354.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x355.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >ORTHOGONAL 0, 3</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x356.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x357.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x358.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x359.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x360.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x361.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x362.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x363.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >ORTHOGONAL 1, 2</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x364.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x365.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x366.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x367.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x368.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x369.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x370.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x371.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x372.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x373.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x374.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x375.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x376.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x377.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >ORTHOGONAL 0,3</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x378.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x379.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x380.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x381.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x382.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x383.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x384.png" xlink:type="simple"/></inline-formula></td></tr></tbody></table></table-wrap><p>In Section 2.1, we introduce an algorithm that actually applies all four possible direction. The regular, the two above, and the following:</p><disp-formula id="scirp.43474-formula54814"><label>(14)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x385.png"  xlink:type="simple"/></disp-formula><p>We call this algorithm <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x386.png" xlink:type="simple"/></inline-formula> and we shall present a description in Section 8. Basically, we overlap the four error direction evenly <xref ref-type="table" rid="table4">Table 4</xref>.</p><p>Note. We applied these same permutations to Strassen and Winograd stability vectors in Section 2.1 to lower the coefficient<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x387.png" xlink:type="simple"/></inline-formula>, which represents the error of the algorithms in Equation (2). These are optimizations that improve the accuracy of the algorithms using two different measures. The Transfer function is more empirical because it is determined by experimentation and it should provide tighter bounds (e.g., SWOPT algorithm for</p><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> SW-4Permute Algorithm and Estimated transfer function</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x388.png" xlink:type="simple"/></inline-formula></th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x389.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x390.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >ORTHOGONAL 0, 2, 3</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x391.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x392.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >5</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x393.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >ORTHOGONAL 0, 1, 3</td></tr><tr><td align="center" valign="middle" >6</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x394.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >7</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x395.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >ORTHOGONAL 0, 1, 2</td></tr><tr><td align="center" valign="middle" >8</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x396.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >9</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x397.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >10</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x398.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >11</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x399.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >12</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x400.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >13</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x401.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >ORTHOGONAL 0, 1, 3</td></tr><tr><td align="center" valign="middle" >14</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x402.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >15</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x403.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >ORTHOGONAL 0, 2, 3</td></tr><tr><td align="center" valign="middle" >16</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x404.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >17</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x405.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >18</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x406.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >ORTHOGONAL 0, 1, 3</td></tr><tr><td align="center" valign="middle" >19</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x407.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >20</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x408.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >21</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x409.png" xlink:type="simple"/></inline-formula></td></tr></tbody></table></table-wrap><p>matrix input [0, 1]).</p><p>How will it work in practice? In short, the orthogonal algorithm actually improves the transfer function in a significant way that will improve the maximum error as well. In the following, we shall summarize the error in Figures 8, 9, 10, 11 and 12.</p><p>We can see that the error direction changed dramatically and the transfer functions of Fast algorithm is getting closer to the regular MM. From the simple theory we developed, we understand that we cannot achieve a truly uniform distribution by using orthogonal algorithm transformation. What we can do is to attenuate the effects of the recursive narrow error into a specific location so as to avoid the overlap of large errors close to the same geographical location.</p><p>A curiosity: the SWOPT orthogonal algorithm has a heat spot clearly defined on the right side of the matrix result. Such a biased error may suggest that part of the matrix is small and it could be computed separately: for 5</p><p>recursions, we may have to recompute a very small matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x410.png" xlink:type="simple"/></inline-formula> and shave the error by about 10% - 20%. In</p><p>case the reader is wondering about any relation with the permutations introduced in [<xref ref-type="bibr" rid="scirp.43474-ref21">21</xref>] , those permutations do</p><fig id="fig8"  position="float"><label><xref ref-type="fig" rid="fig8">Figure 8</xref></label><caption><title> Parameters: n = 42, n<sub>0</sub> = 20, range = [−1, 1], and iterations = 10,000 (Left) Transfer Function and maximum σ<sup>2</sup> (Right) Maximum Error histogram and maximum (maximum) error</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x411.png"/></fig><fig id="fig9"  position="float"><label><xref ref-type="fig" rid="fig9">Figure 9</xref></label><caption><title> (Left) n = 21, n<sub>0</sub> = 20, range = [−1, 1], one recursion, and iterations = 10,000 (Right) n = 86, n<sub>0</sub> = 20, range = [−1, 1], three recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x412.png"/></fig><p>not change the error direction and thus the error; we tested them and not reported here.</p></sec><sec id="s7"><title>7. Error Practice</title><p>In this section, we will wrap up the experimental results by showing the properties of fast algorithms using relatively small matrix sizes. The goal is to compare what we can predict using transfer function versus maximum error.</p><p>We present different views of the error and we start by showing the maximum error and maximum transfer function, here we may use the term heat to indicate the transfer function for short.</p><p>In <xref ref-type="table" rid="table5">Table 5</xref>, we present the maximum heat, the maximum error and their ratio for matrices in the range [0, 1]. In <xref ref-type="table" rid="table6">Table 6</xref>, we present the results for the range [?1, 1]. We run 10,000 iterations to compute the maximum error and maximum heat.</p><p>For every algorithm and matrix range, the heat and maximum error are consistent measures of each other and in particular we show that the orthogonal permutation always improves both. We present also the ratio between maximum error and maximum heat to provide the multiplicative factor to<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x413.png" xlink:type="simple"/></inline-formula>. We notice that the factor has a range in between 4 - 10. A value of 4, means that the error is closer to normal distribution. Once again, we see that GSGEMM has smaller error and heat for the range [?1, 1]. As a rule of thumb, the max error and heat is two times smaller than for the matrices in the range [0, 1]. In combination with large multiplicative factor of 10, it seems that GSGEMM distribution for the range [?1, 1] has fat tails suggesting not a normal distribution.</p><p>We can appreciate quantitatively that permutation algorithms reduce the heat and the maximum error by half.</p><p>Maximum Heat vs. Maximum Error Location.</p><p>There is a correlation between the values of the maximum error and the maximum heat. The correlation is used to show that we can design better algorithms. Here, we address the geographical correlation: we show that the transfer function maps the most likely locations for the error.</p><p>In Figures 13 and 14, we present the heat map for the maximum error for all algorithms for matrices of size <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x414.png" xlink:type="simple"/></inline-formula> and with positive and mixed range. The FastMM algorithms apply 4 recursions; for example, the Strassen algorithm has a cluster of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x414.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x415.png" xlink:type="simple"/></inline-formula> error diagonal matrices. In practice, for each 10,000 iteration, we store and plot the location and the value of the maximum error. It is clear that the transfer function predicts the location of the maximum error. We can also appreciate better the ability of the orthogonal algorithms of spreading the maximum error. The distribution of the error is not as random as for the GSGEMM algorithm but closer to it.</p><p>The goal of the orthogonal permutation is to change the pattern of the error in sub computations in such a way to avoid their maximum contribution. As result, we are spreading the error across the result matrix. Differently, we can guide the distribution of the error accordingly to target any part or the result matrix; this could be inva-</p><fig id="fig10"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>0</label><caption><title> (Left) n = 175, n<sub>0</sub> = 20, range = [−1, 1], 4 recursions, and iterations = 10,000 (Right) n = 350, n<sub>0</sub> = 20, range = [−1, 1], 5 recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x416.png"/></fig><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Range [0, 1], maximum error, maximum heat, max/heat ~ 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Size</th><th align="center" valign="middle" >GSGEMM</th><th align="center" valign="middle" >SSTRA</th><th align="center" valign="middle" >SW</th><th align="center" valign="middle" >SWOPT</th></tr></thead><tr><td align="center" valign="middle" >20</td><td align="center" valign="middle" >2.25e-06/3.70e-07/6.08</td><td align="center" valign="middle" >2.25e-06/3.70e-07/6.08</td><td align="center" valign="middle" >2.25e-06/3.70e-07/6.08</td><td align="center" valign="middle" >2.25e-06/3.70e-07/6.08</td></tr><tr><td align="center" valign="middle" >21</td><td align="center" valign="middle" >2.45e-06/3.95e-07/6.21</td><td align="center" valign="middle" >4.39e-06/7.36e-07/5.96</td><td align="center" valign="middle" >2.49e-06/3.80e-07/6.56</td><td align="center" valign="middle" >3.99e-06/6.60e-07/6.05</td></tr><tr><td align="center" valign="middle" >42</td><td align="center" valign="middle" >6.72e-06/1.09e-06/6.16</td><td align="center" valign="middle" >1.83e-05/3.32e-06/5.50</td><td align="center" valign="middle" >6.75e-06/1.05e-06/6.40</td><td align="center" valign="middle" >1.50e-05/2.83e-06/5.29</td></tr><tr><td align="center" valign="middle" >50</td><td align="center" valign="middle" >8.20e-06/1.35e-06/6.06</td><td align="center" valign="middle" >2.47e-05/3.95e-06/6.24</td><td align="center" valign="middle" >8.03e-06/1.21e-06/6.65</td><td align="center" valign="middle" >1.88e-05/3.41e-06/5.50</td></tr><tr><td align="center" valign="middle" >64</td><td align="center" valign="middle" >1.30e-05/1.91e-06/6.81</td><td align="center" valign="middle" >3.43e-05/5.48e-06/6.25</td><td align="center" valign="middle" >1.08e-05/1.63e-06/6.64</td><td align="center" valign="middle" >3.04e-05/4.96e-06/6.14</td></tr><tr><td align="center" valign="middle" >70</td><td align="center" valign="middle" >1.38e-05/2.25e-06/6.15</td><td align="center" valign="middle" >3.97e-05/6.51e-06/6.09</td><td align="center" valign="middle" >1.07e-05/1.82e-06/5.86</td><td align="center" valign="middle" >3.14e-05/5.50e-06/5.70</td></tr><tr><td align="center" valign="middle" >86</td><td align="center" valign="middle" >1.88e-05/3.16e-06/5.94</td><td align="center" valign="middle" >7.81e-05/1.51e-05/5.16</td><td align="center" valign="middle" >2.19e-05/3.61e-06/6.05</td><td align="center" valign="middle" >6.49e-05/1.20e-05/5.41</td></tr><tr><td align="center" valign="middle" >90</td><td align="center" valign="middle" >1.93e-05/3.36e-06/5.74</td><td align="center" valign="middle" >8.84e-05/1.63e-05/5.43</td><td align="center" valign="middle" >2.44e-05/3.80e-06/6.41</td><td align="center" valign="middle" >7.13e-05/1.53e-05/4.65</td></tr><tr><td align="center" valign="middle" >100</td><td align="center" valign="middle" >2.12e-05/3.80e-06/5.57</td><td align="center" valign="middle" >9.22e-05/1.76e-05/5.24</td><td align="center" valign="middle" >2.40e-05/3.62e-06/6.62</td><td align="center" valign="middle" >7.31e-05/1.44e-05/5.09</td></tr><tr><td align="center" valign="middle" >120</td><td align="center" valign="middle" >3.10e-05/4.71e-06/6.58</td><td align="center" valign="middle" >1.20e-04/2.15e-05/5.56</td><td align="center" valign="middle" >3.10e-05/4.36e-06/7.11</td><td align="center" valign="middle" >9.84e-05/1.85e-05/5.32</td></tr><tr><td align="center" valign="middle" >150</td><td align="center" valign="middle" >4.23e-05/7.19e-06/5.88</td><td align="center" valign="middle" >1.72e-04/3.23e-05/5.32</td><td align="center" valign="middle" >3.92e-05/6.11e-06/6.42</td><td align="center" valign="middle" >1.38e-04/2.58e-05/5.34</td></tr><tr><td align="center" valign="middle" >175</td><td align="center" valign="middle" >5.54e-05/9.17e-06/6.04</td><td align="center" valign="middle" >3.75e-04/6.81e-05/5.51</td><td align="center" valign="middle" >1.26e-04/2.15e-05/5.86</td><td align="center" valign="middle" >4.25e-04/8.11e-05/5.24</td></tr><tr><td align="center" valign="middle" >200</td><td align="center" valign="middle" >6.13e-05/1.08e-05/5.69</td><td align="center" valign="middle" >4.17e-04/7.89e-05/5.28</td><td align="center" valign="middle" >8.54e-05/1.19e-05/7.18</td><td align="center" valign="middle" >3.16e-04/6.15e-05/5.14</td></tr><tr><td align="center" valign="middle" >250</td><td align="center" valign="middle" >4.72e-05/7.34e-06/6.43</td><td align="center" valign="middle" >5.93e-04/1.03e-04/5.79</td><td align="center" valign="middle" >1.28e-04/2.16e-05/5.91</td><td align="center" valign="middle" >4.41e-04/9.89e-05/4.46</td></tr><tr><td align="center" valign="middle" >300</td><td align="center" valign="middle" >7.16e-05/1.05e-05/6.79</td><td align="center" valign="middle" >8.22e-04/1.45e-04/5.67</td><td align="center" valign="middle" >1.49e-04/1.78e-05/8.38</td><td align="center" valign="middle" >5.62e-04/1.09e-04/5.17</td></tr><tr><td align="center" valign="middle" >350</td><td align="center" valign="middle" >7.92e-05/1.33e-05/5.96</td><td align="center" valign="middle" >1.73e-03/3.06e-04/5.67</td><td align="center" valign="middle" >4.06e-04/6.95e-05/5.84</td><td align="center" valign="middle" >1.51e-03/3.39e-04/4.45</td></tr><tr><td align="center" valign="middle" >400</td><td align="center" valign="middle" >9.57e-05/1.55e-05/6.16</td><td align="center" valign="middle" >2.06e-03/3.53e-04/5.85</td><td align="center" valign="middle" >3.76e-04/4.40e-05/8.53</td><td align="center" valign="middle" >1.29e-03/2.58e-04/5.01</td></tr><tr><td align="center" valign="middle" >Size</td><td align="center" valign="middle" >SSTRA-Permute</td><td align="center" valign="middle" >SW-4Permute</td><td align="center" valign="middle" >SW-Permute</td><td align="center" valign="middle" >SWOPT-Permute</td></tr><tr><td align="center" valign="middle" >20</td><td align="center" valign="middle" >2.25e-06/3.70e-07/6.08</td><td align="center" valign="middle" >2.25e-06/3.70e-07/6.08</td><td align="center" valign="middle" >2.25e-06/3.70e-07/6.08</td><td align="center" valign="middle" >2.25e-06/3.70e-07/6.08</td></tr><tr><td align="center" valign="middle" >21</td><td align="center" valign="middle" >4.39e-06/7.36e-07/5.96</td><td align="center" valign="middle" >2.49e-06/3.80e-07/6.56</td><td align="center" valign="middle" >2.49e-06/3.80e-07/6.56</td><td align="center" valign="middle" >3.99e-06/6.60e-07/6.05</td></tr><tr><td align="center" valign="middle" >42</td><td align="center" valign="middle" >1.53e-05/3.03e-06/5.06</td><td align="center" valign="middle" >6.29e-06/9.95e-07/6.32</td><td align="center" valign="middle" >6.74e-06/1.06e-06/6.37</td><td align="center" valign="middle" >1.37e-05/2.78e-06/4.94</td></tr><tr><td align="center" valign="middle" >50</td><td align="center" valign="middle" >1.89e-05/3.61e-06/5.25</td><td align="center" valign="middle" >7.49e-06/1.14e-06/6.57</td><td align="center" valign="middle" >7.08e-06/1.20e-06/5.90</td><td align="center" valign="middle" >1.70e-05/3.33e-06/5.11</td></tr><tr><td align="center" valign="middle" >64</td><td align="center" valign="middle" >2.73e-05/5.01e-06/5.45</td><td align="center" valign="middle" >1.05e-05/1.53e-06/6.84</td><td align="center" valign="middle" >9.35e-06/1.63e-06/5.74</td><td align="center" valign="middle" >2.69e-05/4.82e-06/5.58</td></tr><tr><td align="center" valign="middle" >70</td><td align="center" valign="middle" >3.28e-05/5.95e-06/5.51</td><td align="center" valign="middle" >1.07e-05/1.71e-06/6.27</td><td align="center" valign="middle" >1.01e-05/1.81e-06/5.57</td><td align="center" valign="middle" >2.98e-05/5.39e-06/5.53</td></tr><tr><td align="center" valign="middle" >86</td><td align="center" valign="middle" >6.49e-05/1.27e-05/5.13</td><td align="center" valign="middle" >1.90e-05/2.93e-06/6.48</td><td align="center" valign="middle" >2.12e-05/3.11e-06/6.80</td><td align="center" valign="middle" >5.67e-05/1.14e-05/4.97</td></tr><tr><td align="center" valign="middle" >90</td><td align="center" valign="middle" >6.81e-05/1.36e-05/5.02</td><td align="center" valign="middle" >1.99e-05/3.02e-06/6.60</td><td align="center" valign="middle" >1.91e-05/3.25e-06/5.88</td><td align="center" valign="middle" >6.47e-05/1.26e-05/5.11</td></tr><tr><td align="center" valign="middle" >100</td><td align="center" valign="middle" >8.60e-05/1.48e-05/5.83</td><td align="center" valign="middle" >2.25e-05/3.22e-06/6.96</td><td align="center" valign="middle" >2.11e-05/3.48e-06/6.06</td><td align="center" valign="middle" >6.76e-05/1.38e-05/4.90</td></tr><tr><td align="center" valign="middle" >120</td><td align="center" valign="middle" >9.35e-05/1.80e-05/5.19</td><td align="center" valign="middle" >2.49e-05/3.88e-06/6.43</td><td align="center" valign="middle" >2.40e-05/4.18e-06/5.74</td><td align="center" valign="middle" >9.26e-05/1.76e-05/5.26</td></tr><tr><td align="center" valign="middle" >150</td><td align="center" valign="middle" >1.43e-04/2.71e-05/5.28</td><td align="center" valign="middle" >3.20e-05/5.11e-06/6.28</td><td align="center" valign="middle" >3.22e-05/5.40e-06/5.95</td><td align="center" valign="middle" >1.34e-04/2.46e-05/5.46</td></tr><tr><td align="center" valign="middle" >175</td><td align="center" valign="middle" >2.62e-04/5.29e-05/4.95</td><td align="center" valign="middle" >5.97e-05/9.97e-06/5.99</td><td align="center" valign="middle" >7.48e-05/1.42e-05/5.28</td><td align="center" valign="middle" >2.25e-04/4.72e-05/4.77</td></tr><tr><td align="center" valign="middle" >200</td><td align="center" valign="middle" >3.04e-04/6.07e-05/5.00</td><td align="center" valign="middle" >6.52e-05/1.03e-05/6.31</td><td align="center" valign="middle" >6.78e-05/1.12e-05/6.06</td><td align="center" valign="middle" >2.68e-04/5.66e-05/4.74</td></tr><tr><td align="center" valign="middle" >250</td><td align="center" valign="middle" >4.23e-04/7.84e-05/5.40</td><td align="center" valign="middle" >8.66e-05/1.27e-05/6.84</td><td align="center" valign="middle" >8.88e-05/1.52e-05/5.86</td><td align="center" valign="middle" >3.77e-04/7.21e-05/5.23</td></tr><tr><td align="center" valign="middle" >300</td><td align="center" valign="middle" >6.02e-04/1.12e-04/5.36</td><td align="center" valign="middle" >1.02e-04/1.51e-05/6.75</td><td align="center" valign="middle" >1.01e-04/1.67e-05/6.06</td><td align="center" valign="middle" >5.43e-04/1.01e-04/5.37</td></tr><tr><td align="center" valign="middle" >350</td><td align="center" valign="middle" >1.15e-03/2.17e-04/5.32</td><td align="center" valign="middle" >2.26e-04/3.39e-05/6.66</td><td align="center" valign="middle" >2.38e-04/4.32e-05/5.52</td><td align="center" valign="middle" >9.68e-04/1.96e-04/4.95</td></tr><tr><td align="center" valign="middle" >400</td><td align="center" valign="middle" >1.24e-03/2.50e-04/4.94</td><td align="center" valign="middle" >2.66e-04/3.82e-05/6.97</td><td align="center" valign="middle" >2.68e-04/4.10e-05/6.53</td><td align="center" valign="middle" >1.25e-03/2.34e-04/5.34</td></tr></tbody></table></table-wrap><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> Range [−1, 1], maximum error, maximum heat, max/heat ~ 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Size</th><th align="center" valign="middle" >GSGEMM</th><th align="center" valign="middle" >SSTRA</th><th align="center" valign="middle" >SW</th><th align="center" valign="middle" >SWOPT</th></tr></thead><tr><td align="center" valign="middle" >20</td><td align="center" valign="middle" >1.57e-06/1.45e-07/10.86</td><td align="center" valign="middle" >1.57e-06/1.45e-07/10.86</td><td align="center" valign="middle" >1.57e-06/1.45e-07/10.86</td><td align="center" valign="middle" >1.57e-06/1.45e-07/10.86</td></tr><tr><td align="center" valign="middle" >21</td><td align="center" valign="middle" >1.68e-06/1.50e-07/11.17</td><td align="center" valign="middle" >2.35e-06/3.16e-07/7.43</td><td align="center" valign="middle" >4.23e-06/3.88e-07/10.92</td><td align="center" valign="middle" >3.58e-06/3.85e-07/9.29</td></tr><tr><td align="center" valign="middle" >42</td><td align="center" valign="middle" >3.75e-06/2.85e-07/13.18</td><td align="center" valign="middle" >7.30e-06/1.11e-06/6.58</td><td align="center" valign="middle" >2.18e-05/1.66e-06/13.13</td><td align="center" valign="middle" >1.81e-05/1.66e-06/10.94</td></tr><tr><td align="center" valign="middle" >50</td><td align="center" valign="middle" >4.06e-06/3.36e-07/12.10</td><td align="center" valign="middle" >8.25e-06/1.26e-06/6.52</td><td align="center" valign="middle" >2.05e-05/1.90e-06/10.80</td><td align="center" valign="middle" >2.19e-05/1.90e-06/11.51</td></tr><tr><td align="center" valign="middle" >64</td><td align="center" valign="middle" >5.71e-06/4.22e-07/13.53</td><td align="center" valign="middle" >1.07e-05/1.52e-06/7.03</td><td align="center" valign="middle" >2.14e-05/2.29e-06/9.35</td><td align="center" valign="middle" >2.47e-05/2.29e-06/10.76</td></tr><tr><td align="center" valign="middle" >70</td><td align="center" valign="middle" >6.37e-06/4.61e-07/13.84</td><td align="center" valign="middle" >1.17e-05/1.66e-06/7.05</td><td align="center" valign="middle" >2.46e-05/2.48e-06/9.92</td><td align="center" valign="middle" >3.06e-05/2.48e-06/12.33</td></tr><tr><td align="center" valign="middle" >86</td><td align="center" valign="middle" >8.97e-06/5.61e-07/16.00</td><td align="center" valign="middle" >2.36e-05/3.87e-06/6.10</td><td align="center" valign="middle" >6.76e-05/7.20e-06/9.39</td><td align="center" valign="middle" >7.15e-05/7.13e-06/10.03</td></tr><tr><td align="center" valign="middle" >90</td><td align="center" valign="middle" >8.28e-06/5.85e-07/14.17</td><td align="center" valign="middle" >2.45e-05/4.12e-06/5.96</td><td align="center" valign="middle" >7.70e-05/7.50e-06/10.27</td><td align="center" valign="middle" >6.08e-05/7.34e-06/8.28</td></tr><tr><td align="center" valign="middle" >100</td><td align="center" valign="middle" >1.03e-05/6.48e-07/15.94</td><td align="center" valign="middle" >2.89e-05/4.38e-06/6.61</td><td align="center" valign="middle" >7.36e-05/8.11e-06/9.07</td><td align="center" valign="middle" >6.92e-05/8.05e-06/8.60</td></tr><tr><td align="center" valign="middle" >120</td><td align="center" valign="middle" >1.26e-05/7.84e-07/16.09</td><td align="center" valign="middle" >2.97e-05/5.02e-06/5.92</td><td align="center" valign="middle" >9.15e-05/9.25e-06/9.89</td><td align="center" valign="middle" >8.74e-05/9.30e-06/9.39</td></tr><tr><td align="center" valign="middle" >150</td><td align="center" valign="middle" >1.60e-05/9.62e-07/16.65</td><td align="center" valign="middle" >3.99e-05/6.05e-06/6.59</td><td align="center" valign="middle" >9.86e-05/1.12e-05/8.78</td><td align="center" valign="middle" >1.01e-04/1.11e-05/9.06</td></tr><tr><td align="center" valign="middle" >175</td><td align="center" valign="middle" >1.96e-05/1.13e-06/17.39</td><td align="center" valign="middle" >8.19e-05/1.35e-05/6.07</td><td align="center" valign="middle" >2.64e-04/3.07e-05/8.60</td><td align="center" valign="middle" >2.98e-04/3.05e-05/9.78</td></tr><tr><td align="center" valign="middle" >200</td><td align="center" valign="middle" >2.45e-05/1.28e-06/19.10</td><td align="center" valign="middle" >9.72e-05/1.53e-05/6.36</td><td align="center" valign="middle" >3.01e-04/3.43e-05/8.76</td><td align="center" valign="middle" >3.32e-04/3.41e-05/9.76</td></tr><tr><td align="center" valign="middle" >250</td><td align="center" valign="middle" >1.36e-05/1.15e-06/11.82</td><td align="center" valign="middle" >1.05e-04/1.81e-05/5.81</td><td align="center" valign="middle" >3.49e-04/4.12e-05/8.48</td><td align="center" valign="middle" >3.28e-04/4.05e-05/8.10</td></tr><tr><td align="center" valign="middle" >300</td><td align="center" valign="middle" >1.78e-05/1.37e-06/13.05</td><td align="center" valign="middle" >1.26e-04/2.09e-05/6.04</td><td align="center" valign="middle" >4.24e-04/4.73e-05/8.95</td><td align="center" valign="middle" >4.13e-04/4.75e-05/8.69</td></tr><tr><td align="center" valign="middle" >350</td><td align="center" valign="middle" >2.62e-05/1.59e-06/16.43</td><td align="center" valign="middle" >2.64e-04/4.66e-05/5.68</td><td align="center" valign="middle" >1.11e-03/1.31e-04/8.42</td><td align="center" valign="middle" >9.40e-04/1.30e-04/7.23</td></tr><tr><td align="center" valign="middle" >400</td><td align="center" valign="middle" >2.63e-05/1.81e-06/14.48</td><td align="center" valign="middle" >2.99e-04/5.29e-05/5.66</td><td align="center" valign="middle" >1.14e-03/1.45e-04/7.86</td><td align="center" valign="middle" >1.20e-03/1.45e-04/8.29</td></tr><tr><td align="center" valign="middle" >Size</td><td align="center" valign="middle" >SSTRA-Permute</td><td align="center" valign="middle" >SW-4Permute</td><td align="center" valign="middle" >SW-Permute</td><td align="center" valign="middle" >SWOPT-Permute</td></tr><tr><td align="center" valign="middle" >20</td><td align="center" valign="middle" >1.57e-06/1.45e-07/10.86</td><td align="center" valign="middle" >1.57e-06/1.45e-07/10.86</td><td align="center" valign="middle" >1.57e-06/1.45e-07/10.86</td><td align="center" valign="middle" >1.57e-06/1.45e-07/10.86</td></tr><tr><td align="center" valign="middle" >21</td><td align="center" valign="middle" >2.35e-06/3.16e-07/7.43</td><td align="center" valign="middle" >4.23e-06/3.88e-07/10.92</td><td align="center" valign="middle" >4.23e-06/3.88e-07/10.92</td><td align="center" valign="middle" >3.58e-06/3.85e-07/9.29</td></tr><tr><td align="center" valign="middle" >42</td><td align="center" valign="middle" >6.85e-06/9.23e-07/7.42</td><td align="center" valign="middle" >1.47e-05/1.59e-06/9.24</td><td align="center" valign="middle" >1.62e-05/1.63e-06/9.94</td><td align="center" valign="middle" >1.45e-05/1.64e-06/8.81</td></tr><tr><td align="center" valign="middle" >50</td><td align="center" valign="middle" >7.85e-06/1.05e-06/7.47</td><td align="center" valign="middle" >1.45e-05/1.82e-06/7.95</td><td align="center" valign="middle" >1.71e-05/1.87e-06/9.11</td><td align="center" valign="middle" >1.69e-05/1.88e-06/9.00</td></tr><tr><td align="center" valign="middle" >64</td><td align="center" valign="middle" >9.59e-06/1.24e-06/7.74</td><td align="center" valign="middle" >2.28e-05/2.20e-06/10.37</td><td align="center" valign="middle" >2.21e-05/2.27e-06/9.74</td><td align="center" valign="middle" >1.96e-05/2.29e-06/8.57</td></tr><tr><td align="center" valign="middle" >70</td><td align="center" valign="middle" >1.00e-05/1.38e-06/7.23</td><td align="center" valign="middle" >2.18e-05/2.41e-06/9.04</td><td align="center" valign="middle" >2.97e-05/2.48e-06/11.96</td><td align="center" valign="middle" >2.32e-05/2.47e-06/9.37</td></tr><tr><td align="center" valign="middle" >86</td><td align="center" valign="middle" >1.79e-05/2.87e-06/6.23</td><td align="center" valign="middle" >5.31e-05/6.68e-06/7.95</td><td align="center" valign="middle" >5.59e-05/7.07e-06/7.91</td><td align="center" valign="middle" >6.00e-05/7.01e-06/8.56</td></tr><tr><td align="center" valign="middle" >90</td><td align="center" valign="middle" >2.11e-05/2.99e-06/7.05</td><td align="center" valign="middle" >5.36e-05/6.98e-06/7.68</td><td align="center" valign="middle" >5.41e-05/7.34e-06/7.37</td><td align="center" valign="middle" >5.65e-05/7.26e-06/7.78</td></tr><tr><td align="center" valign="middle" >100</td><td align="center" valign="middle" >2.01e-05/2.99e-06/6.72</td><td align="center" valign="middle" >5.98e-05/7.50e-06/7.97</td><td align="center" valign="middle" >6.22e-05/7.95e-06/7.83</td><td align="center" valign="middle" >8.18e-05/7.93e-06/10.33</td></tr><tr><td align="center" valign="middle" >120</td><td align="center" valign="middle" >2.61e-05/3.33e-06/7.82</td><td align="center" valign="middle" >6.89e-05/8.60e-06/8.01</td><td align="center" valign="middle" >6.97e-05/9.08e-06/7.68</td><td align="center" valign="middle" >7.41e-05/9.13e-06/8.11</td></tr><tr><td align="center" valign="middle" >150</td><td align="center" valign="middle" >2.96e-05/4.47e-06/6.61</td><td align="center" valign="middle" >8.31e-05/1.05e-05/7.93</td><td align="center" valign="middle" >9.04e-05/1.10e-05/8.22</td><td align="center" valign="middle" >9.52e-05/1.10e-05/8.67</td></tr><tr><td align="center" valign="middle" >175</td><td align="center" valign="middle" >5.04e-05/7.64e-06/6.60</td><td align="center" valign="middle" >1.86e-04/2.77e-05/6.70</td><td align="center" valign="middle" >2.27e-04/3.00e-05/7.56</td><td align="center" valign="middle" >2.13e-04/3.00e-05/7.09</td></tr><tr><td align="center" valign="middle" >200</td><td align="center" valign="middle" >6.09e-05/8.48e-06/7.18</td><td align="center" valign="middle" >2.57e-04/3.10e-05/8.28</td><td align="center" valign="middle" >2.38e-04/3.35e-05/7.10</td><td align="center" valign="middle" >2.30e-04/3.35e-05/6.86</td></tr><tr><td align="center" valign="middle" >250</td><td align="center" valign="middle" >6.63e-05/1.13e-05/5.86</td><td align="center" valign="middle" >2.64e-04/3.73e-05/7.07</td><td align="center" valign="middle" >3.10e-04/4.04e-05/7.69</td><td align="center" valign="middle" >3.14e-04/4.03e-05/7.79</td></tr><tr><td align="center" valign="middle" >300</td><td align="center" valign="middle" >8.26e-05/1.27e-05/6.51</td><td align="center" valign="middle" >3.05e-04/4.31e-05/7.08</td><td align="center" valign="middle" >3.76e-04/4.67e-05/8.06</td><td align="center" valign="middle" >3.91e-04/4.67e-05/8.38</td></tr><tr><td align="center" valign="middle" >350</td><td align="center" valign="middle" >1.32e-04/2.41e-05/5.47</td><td align="center" valign="middle" >8.76e-04/1.14e-04/7.67</td><td align="center" valign="middle" >8.67e-04/1.27e-04/6.81</td><td align="center" valign="middle" >8.70e-04/1.28e-04/6.82</td></tr><tr><td align="center" valign="middle" >400</td><td align="center" valign="middle" >1.45e-04/2.40e-05/6.07</td><td align="center" valign="middle" >8.05e-04/1.28e-04/6.28</td><td align="center" valign="middle" >9.91e-04/1.43e-04/6.93</td><td align="center" valign="middle" >9.81e-04/1.42e-04/6.90</td></tr></tbody></table></table-wrap><fig id="fig11"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>1</label><caption><title> (Left) n = 21, n<sub>0</sub> = 20, range = [0, 1], two recursions, and iterations = 10,000 (Right) n = 86, n<sub>0</sub> = 20, range = [0, 1], three recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x417.png"/></fig><fig id="fig12"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>2</label><caption><title> (Left) n = 175, n<sub>0</sub> = 20, range = [0, 1], 4 recursions, and iterations = 10,000 (Right) n = 350, n<sub>0</sub> = 20, range = 0, 1], 5 recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x418.png"/></fig><p>luable in the case we know where to have the maximum accuracy. This tailoring of the algorithm to a result accuracy goal is novel and powerful; in contrast, this is not possible using regular matrix multiplications because of their uniformly distributed error.</p><p>Brent’s Connection.</p><p>Now we show that the error is function of the algorithm. Let us start by using Equation (2), which we present here again.</p><disp-formula id="scirp.43474-formula54815"><label>(15)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x419.png"  xlink:type="simple"/></disp-formula><p>In <xref ref-type="fig" rid="fig1">Figure 1</xref>5, we present the estimate of the factor <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x420.png" xlink:type="simple"/></inline-formula> of the equation. Here, we use <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x420.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x421.png" xlink:type="simple"/></inline-formula> instead of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x420.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x421.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x422.png" xlink:type="simple"/></inline-formula> to emphasize that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x420.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x421.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x422.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x423.png" xlink:type="simple"/></inline-formula> is the measured value of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x420.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x421.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x422.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x423.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x424.png" xlink:type="simple"/></inline-formula> in previous equations and their values are different. In this case, we measure the maximum error<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x420.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x421.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x422.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x423.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x424.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x425.png" xlink:type="simple"/></inline-formula>. We measure the maximum error of the leaf<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x420.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x421.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x422.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x423.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x424.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x425.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x426.png" xlink:type="simple"/></inline-formula>. We divide the left</p><p>hand side of Equation (15) by<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x427.png" xlink:type="simple"/></inline-formula>, thus we have<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x427.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x428.png" xlink:type="simple"/></inline-formula>. We estimate that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x427.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x428.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x429.png" xlink:type="simple"/></inline-formula> is linear in <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x427.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x428.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x429.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x430.png" xlink:type="simple"/></inline-formula> and thus we</p><fig id="fig13"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>3</label><caption><title> Orthogonal FastMMW (Left) n = 175, n<sub>0</sub> = 20, range = [0, 1], 4 recursions, and iterations = 10,000 (Right) n = 175, n<sub>0</sub> = 20, range = [−1, 1], 4 recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x431.png"/></fig><fig id="fig14"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>4</label><caption><title> (Left) n = 175, n<sub>0</sub> = 20, range = [0, 1], 4 recursions, and iterations = 10,000 (Right) n = 175, n<sub>0</sub> = 20, range = [−1, 1], 4 recursions, and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x432.png"/></fig><p>divide the LHS by<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x433.png" xlink:type="simple"/></inline-formula>. The linear relation is adequate because the leaf computation is based on a <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x433.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x434.png" xlink:type="simple"/></inline-formula> algorithm that satisfies such a property. Thus, we estimate <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x433.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x434.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x435.png" xlink:type="simple"/></inline-formula> from the equation:</p><disp-formula id="scirp.43474-formula54816"><label>(16)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x436.png"  xlink:type="simple"/></disp-formula><p>For comparison purpose, we show also the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x437.png" xlink:type="simple"/></inline-formula> in case we use the GSGEMM, showing the minimum bound <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x437.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x438.png" xlink:type="simple"/></inline-formula> and remember that the current bounds for Strassen and Winograd call for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x437.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x438.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x439.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x437.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x438.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x439.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x440.png" xlink:type="simple"/></inline-formula>, respectively. Stating the obvious first, this bounds says that large <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x437.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x438.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x439.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x440.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x441.png" xlink:type="simple"/></inline-formula> is bad and small <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x437.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x438.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x439.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x440.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x441.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x442.png" xlink:type="simple"/></inline-formula> is good.</p><p>As a function of the range of the input we have different factors. For the range [?1, 1] we have clear factors: <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x443.png" xlink:type="simple"/></inline-formula>for Winograd’s variants and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x443.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x444.png" xlink:type="simple"/></inline-formula> for Strassen. Also we notice that the orthogonal algorithms by permutations provide consistently better <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x443.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x444.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x445.png" xlink:type="simple"/></inline-formula>s, being more accurate. For positive matrices in the range [0, 1], we see clearly that SW algorithm is accurate as GSGEMM for small number of recursions. Three recursions provide a sweet spot, any larger and we can notice a difference in accuracy of the algorithm.</p><p>So we can see even if we use the standard way to measure the error and the standard bounds: we reproduce correctly what we already know about the algorithm and we show that orthogonal permutation affects the maximum error.</p><p>In <xref ref-type="fig" rid="fig1">Figure 1</xref>6, we show the different estimation of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x446.png" xlink:type="simple"/></inline-formula> using the maximum of the transfer function (instead of the maximum error). In practice, the maximum is about<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x446.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x447.png" xlink:type="simple"/></inline-formula>, so quite on the right of the estimate by transfer function. The advantage of using the transfer function is a clear picture where the resolution of the orthogonal permutation and the different algorithms is quite clear. Notice that both methods order the algorithm consistently.</p><p>Recursion Connection.</p><p>In this work, we introduce the transfer function to estimate the recursive effect on the error, so that to create a different recurrence equation to solve. Our goal was to achieve a simplified bound such as in Equation (17)</p><fig id="fig15"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>5</label><caption><title> Computation of the multiplicative factor <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x449.png" xlink:type="simple"/></inline-formula> (Left) maximum error, range = [−1, 1], and iterations = 10,000 (Right) maximum error, range = [0, 1], and iterations = 10,000. On the ordinate, we present X, we recall that the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x449.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x450.png" xlink:type="simple"/></inline-formula> is the multiplicative factor, and on the abscissa we present N</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x448.png"/></fig><fig id="fig16"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>6</label><caption><title> Computation of the multiplicative factor <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x452.png" xlink:type="simple"/></inline-formula> (Left) maximum transfer function, range = [−1, 1], and iterations = 10,000 (Right) maximum transfer function, range = [0, 1], and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x451.png"/></fig><disp-formula id="scirp.43474-formula54817"><label>(17)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x453.png"  xlink:type="simple"/></disp-formula><p>Even simpler using <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x454.png" xlink:type="simple"/></inline-formula> as before</p><disp-formula id="scirp.43474-formula54818"><label>(18)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x455.png"  xlink:type="simple"/></disp-formula><p>The bound in Equation (18) is simpler to explain to any developer because we quantify the intuitive idea that more recursive calls will increase the error: the multiplicative factor is specific to the algorithm and a constant at each recursion step.</p><p>Both equations provide a means for the comparison of different algorithms and their accuracy. We can actually plug in the GSGEMM, which should provide a practical and theoretical lower bounds (X = 2).</p><p>In <xref ref-type="fig" rid="fig1">Figure 1</xref>7, we estimate the factor <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x456.png" xlink:type="simple"/></inline-formula> of Equation (2) using the maximum of the transfer function. In <xref ref-type="fig" rid="fig1">Figure 1</xref>8, we present the analogous estimation using the maximum error. Let us start with the obvious: <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x456.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x457.png" xlink:type="simple"/></inline-formula>tries to explain the multiplicative factor of the leaf error so that to estimate the error of the algorithms. We present again GSGEMM to provide a lower bound.</p><fig id="fig17"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>7</label><caption><title> Computation of the multiplicative factor X (Left) maximum transfer function, range = [−1, 1], and iterations = 10,000 (Right) maximum transfer function, range = [0, 1], and iterations = 10,000, We report X on the ordinate, we recall that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x459.png" xlink:type="simple"/></inline-formula> is multiplicative where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x459.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x460.png" xlink:type="simple"/></inline-formula>, we present N = n problem size on the abscissa</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x458.png"/></fig><fig id="fig18"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>8</label><caption><title> Computation of the multiplicative factor X (Left) maximum error, range = [−1, 1], and iterations = 10,000 (Right) maximum error, range = [0, 1], and iterations = 10,000</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-2230043x461.png"/></fig><p>We notice that the transfer function and the maximum error provide very similar estimate of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x462.png" xlink:type="simple"/></inline-formula>. The order of accuracy is consistent. We see once again that for positive matrices SW should not have more than three recursions.</p><p>The recurrence solutions presented in Equations (9), (12), (19), and (20) are consistent estimate of the bounds; that is, they allow comparing the algorithm accuracy. However, they seem overestimating the measured ones.</p></sec><sec id="s8"><title>8. Conclusions</title><p>In summary, from the theory of weak stability we introduce the family of orthogonal algorithms for the Fast- MMW algorithms. The combination of the regular and orthogonal algorithms allows us to write more accurate FastMMW algorithms. We show this theoretically by showing better error bounds. We provide extension to the error bounds, and the way we can compute error bounds, so that we can model corner cases: we introduce the transfer function. In fact, for the family of random matrices the weak stability bounds cannot capture idiosyncrasies when positive random matrices are used as operands.</p><p>Recalling conversations we had about error analysis, now we understand better why Winograd’s algorithms are viewed with suspicion by some even for positive matrices. In contrast, we have always found our Winograd implementation quite accurate for positive matrices. The misunderstanding is related to the assumption that all fast algorithms have the same error analysis properties. In this work, we show that we can actually estimate and expect accuracy: this is a property of the algorithms, their implementation, and the way we use them. Hopefully, a better understanding of these algorithms will provide adequate standard and error estimate in such a way to guide experiments and data collection: so as to make sense of large errors in experiments results (e.g., [<xref ref-type="bibr" rid="scirp.43474-ref22">22</xref>] ) sometimes due to other external factors and not algorithmic dependent.</p><p>We have the opportunity to open a new chapter and create new interest in this beautiful field.</p></sec><sec id="s9"><title>Acknowledgements</title><p>This work stems from a question raised during a conversation with Matthew Badin, Alexandru Nicolau, and Michael Dillencourt. The question was: where the error is located? Once we answered the question by the transfer function, we wanted to reduce the error by randomizing the error pattern in particular by permutations. Marco Bodrato had the idea of permuting the computation among recursion calls. We revisited the permutation used by David Wise and checked the original use of the permutations. Wise’s permutations did not help because they are symmetric and they just reverse the order of the computation. Random permutations involving the result matrix did change it by disrupting the patter. The randomization and the permutations provided almost the same distribution as the original matrix multiplication: Richard Brent guided us to make sense of the preliminary result. At this stage, we had a randomized algorithm. This helped a little the maximum error. So instead of applying random permutations, we tried to understand which computation and permutation we could use systematically. The orthogonal permutations were crystallized and applied to random matrices: we achieved better transfer function and better error. Nicholas Higham asked whether or not this approach can be extended to general matrices and thus provide better bounds. The answer was yes, thanks to the theory developed by Dario Bini and Grazia Lotti in their original work. Orthogonal algorithms are applied to the stability vectors and thus reducing the asymptotic stability factor. We shared the preliminary draft of this work with all of the above and we thank them to be our sounding board, our reference, and our standard. Especially, we thank Richard Brent for his feed- back, moral support and clean up of the earlier drafts.</p></sec><sec id="s10"><title>Appendix</title><p>SWOPT</p><p>If we take the expected transfer function from the experiments and estimate the transfer function, then we can explain the transfer function for matrices in the range [?1, 1] very nicely. If we take the minimum and the maximum of the transfer function we have a ratio as in <xref ref-type="fig" rid="fig1">Figure 1</xref> of 4.</p><p>However, for matrices in the range [0, 1] and for the algorithm SWOP, the addition of the transfer function related to the matrix <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x463.png" xlink:type="simple"/></inline-formula> does not explain the result <xref ref-type="table" rid="table7">Table 7</xref>. The transfer function estimation as above will fall short explaining experimental results.</p><p>The computation of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x464.png" xlink:type="simple"/></inline-formula> is based on the following set of operations executed left to right:</p><disp-formula id="scirp.43474-formula54819"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x465.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.43474-formula54820"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x466.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.43474-formula54821"><graphic  xlink:href="http://html.scirp.org/file/2-2230043x467.png"  xlink:type="simple"/></disp-formula><p>The computation of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x468.png" xlink:type="simple"/></inline-formula> has the right matrix operand that will be in the range [?1, 1] and thus with a smaller transfer function. There are two interesting features about the computation: the order of the additions such as <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x468.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x469.png" xlink:type="simple"/></inline-formula> is the same in <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x468.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x469.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x470.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x468.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x469.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x470.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x471.png" xlink:type="simple"/></inline-formula> and with higher probability to be positive. Also <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x468.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x469.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x470.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x471.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x472.png" xlink:type="simple"/></inline-formula> has higher probability to be positive implying that the errors are aligned so that the cancellation in <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x468.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x469.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x470.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x471.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x472.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x473.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x468.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x469.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x470.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x471.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x472.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x473.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x474.png" xlink:type="simple"/></inline-formula> are better than expected.</p><table-wrap id="table7" ><label><xref ref-type="table" rid="table7">Table 7</xref></label><caption><title> SWOPT Algorithm and Estimated transfer function</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x475.png" xlink:type="simple"/></inline-formula></th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x476.png" xlink:type="simple"/></inline-formula>[?1, 1]</th><th align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x477.png" xlink:type="simple"/></inline-formula>[0, 1]</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x478.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x479.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x480.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x481.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x482.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x483.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x484.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x485.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >5</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x486.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x487.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x488.png" xlink:type="simple"/></inline-formula>[?1, 1]</td></tr><tr><td align="center" valign="middle" >6</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x489.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x490.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x491.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >7</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x492.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >8</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x493.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >9</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x494.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x495.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x496.png" xlink:type="simple"/></inline-formula>likely in [0, 1] (see next)</td></tr><tr><td align="center" valign="middle" >10</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x497.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >11</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x498.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >12</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x499.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x500.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x501.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >13</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x502.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >14</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x503.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x504.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x505.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >15</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x506.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x507.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x508.png" xlink:type="simple"/></inline-formula>Perfect cancellation</td></tr><tr><td align="center" valign="middle" >16</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x509.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >17</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x510.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x511.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x512.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >18</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x513.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x514.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x515.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >19</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x516.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x517.png" xlink:type="simple"/></inline-formula></td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x518.png" xlink:type="simple"/></inline-formula>Impossible</td></tr></tbody></table></table-wrap><p><sup>1</sup>The statistical properties of the error will not necessarily follow the statistical properties of the input matrices.</p><p>SW as accurate as SGEMM in theory.</p><p>Let us consider the algorithm deployed in SW and presented in <xref ref-type="table" rid="table6">Table 6</xref> If we take the expected transfer function from the experiments and estimate the transfer function, then we can explain the transfer function for matrices in the range [?1, 1] very nicely. When we consider the other range [0, 1] and we observe that a few of the leaf computation are MM of mixed sign, we can estimate half of the contribution. The Fast MM for positive matrices is actually more accurate because the leave computations are not positive matrices.</p><p>The error we commit in the mixed-sign leaf MMs is smaller than for positive leaf MMs. There is a common knowledge that Winograd algorithm is more accurate because there is no true subtraction of the matrix products. Actually, the algorithm with only addition of the matrix result, will require subtraction in the input of the matrix product.</p><p>If we like to create a recursive equation to estimate the transfer functions:</p><disp-formula id="scirp.43474-formula54822"><label>(20)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x520.png"  xlink:type="simple"/></disp-formula><p>With a factor 2.5 and for small number of recursion, SW is very close to the regular MM. We can forecast and we show in practice that for<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x521.png" xlink:type="simple"/></inline-formula>, SW is as accurate as SGEMM and for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x521.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/2-2230043x522.png" xlink:type="simple"/></inline-formula> we better switch algorithm. This is a proof for the consistent better accuracy for this particular variant Winograd algorithm for positive matrices.</p><p>SW-4PermuteL: four-permutation algorithm.</p><disp-formula id="scirp.43474-formula54823"><label>(21)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-2230043x523.png"  xlink:type="simple"/></disp-formula></sec><sec id="s11"><title>NOTES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.43474-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Strassen, V. (1969) Gaussian Elimination Is Not Optimal. Numerische Mathematik, 14, 354-356.http://dx.doi.org/10.1007/BF02165411</mixed-citation></ref><ref id="scirp.43474-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Douglas, C.C., Heroux, M., Slishman, G. and Smith, R.M. (1994) GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen’s Matrix-Matrix Multiply Algorithm. Journal of Computational Physics, 110, 1-10.http://dx.doi.org/10.1006/jcph.1994.1001</mixed-citation></ref><ref id="scirp.43474-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Demmel, J. and Higham, N. (1992) Stability of Block Algorithms with Fast Level-3 BLAS. ACM Transactions on Mathematical Software, 18, 274-291.http://dx.doi.org/10.1145/131766.131769</mixed-citation></ref><ref id="scirp.43474-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Demmel, J., Dumitriu, J., Holtz, O. and Kleinberg, R. (2006) Fast Matrix Multiplication Is Stable.</mixed-citation></ref><ref id="scirp.43474-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Brent, R.P. (1970) Error Analysis of Algorithms for Matrix Multiplication and Triangular Decomposition Using Winograd’s Identity. Numerische Mathematik, 16, 145-156. http://dx.doi.org/10.1007/BF02308867</mixed-citation></ref><ref id="scirp.43474-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Miller, W. (1975) Computational Complexity and Numerical Stability. SIAM Journal on Computing, 4, 97-107.http://dx.doi.org/10.1137/0204009</mixed-citation></ref><ref id="scirp.43474-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Bini, D. and Lotti, G. (1980) Stability of Fast Algorithms for Matrix Multiplication. Numerische Mathematik, 36, 63-72.http://dx.doi.org/10.1007/BF01395989</mixed-citation></ref><ref id="scirp.43474-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Edelman, A. and Rao, N. (2005) Random Matrix Theory. Acta Numerica, 14, 233-297.http://dx.doi.org/10.1017/S0962492904000236</mixed-citation></ref><ref id="scirp.43474-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Kolmogorov, A.N. and Uspenskiiq, V.A. (1987) Algorithms and Randomness. Theory of Probability and Its Applications, 32, 389-412. http://dx.doi.org/10.1137/1132060</mixed-citation></ref><ref id="scirp.43474-ref10"><label>10</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Winograd</surname><given-names> S. </given-names></name>,<etal>et al</etal>. (<year>1968</year>)<article-title>A New Algorithm for Inner Product</article-title><source> IEEE Transactions on Computers</source><volume> 17</volume>,<fpage> 693</fpage>-<lpage>694</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.43474-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Higham, N.J. (1990) Exploiting Fast Matrix Multiplication within the Level 3 BLAS. ACM Transactions on Mathematical Software, 16, 352-368.http://dx.doi.org/10.1145/98267.98290</mixed-citation></ref><ref id="scirp.43474-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Higham, N.J. (2002) Accuracy and Stability of Numerical Algorithms. 2nd Edition, SIAM, Philadelphia.http://dx.doi.org/10.1137/1.9780898718027</mixed-citation></ref><ref id="scirp.43474-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Badin, M., D’Alberto, P., Bic, L., Dillencourt, M. and Nicolau, A. (2011) Improving the Accuracy of High Performance Blas Implementations Using Adaptive Blocked Algorithms. In Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD ’11, Washington, DC, IEEE Computer Society, 26-29 October 2011, 120-127.</mixed-citation></ref><ref id="scirp.43474-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Castaldo, A.M., Clint Whaley, R. and Chronopoulos, A.T. (2008) Reducing Floating Point Error in Dot Product Using the Superblock Family of Algorithms. SIAM Journal on Scientific Computing, 31, 1156-1174. http://dx.doi.org/10.1137/070679946</mixed-citation></ref><ref id="scirp.43474-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Dongarra, J.J., Du Croz, J., Duff, I.S. and Hammarling, S. (1990) A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transaction in Mathematical Software, 16, 1-17.http://dx.doi.org/10.1145/77626.79170</mixed-citation></ref><ref id="scirp.43474-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Goto, K. and van de Geijn, R.A. (2008) Anatomy of Highperformance Matrix Multiplication. ACM Transactions on Mathematical Software.http://dx.doi.org/10.1145/1356052.1356053</mixed-citation></ref><ref id="scirp.43474-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Priestley, M.B. (1981) Spectral Analysis and Time Series. Academic Press Inc, New York.</mixed-citation></ref><ref id="scirp.43474-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Brockwell, P.J. and Davis, R.A. (2006) Time Series: Theory and Methods. Springer, New York.</mixed-citation></ref><ref id="scirp.43474-ref19"><label>19</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>D’Alberto</surname><given-names> P.</given-names></name>,<name name-style="western"><surname> Bodrato</surname><given-names> M. and Nicolau</given-names></name>,<name name-style="western"><surname> A. </surname><given-names>  </given-names></name>,<etal>et al</etal>. (<year>2011</year>)<article-title>Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems: Matrix-Multiplication and Matrix-Addition Algorithm Optimizations by Software Pipelining and Threads Allocation</article-title><source> ACM Transaction in Mathematical Software</source><volume> 38</volume>,<fpage> 1</fpage>-<lpage>2</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.43474-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Welch, P.D. (1969) A Fixed-Point Fast Fourier Transform Error Analysis. IEEE Transactions on Audio and Electroacoustics, 17, 151-157. http://dx.doi.org/10.1109/TAU.1969.1162035</mixed-citation></ref><ref id="scirp.43474-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Loos, S. and Wise, D.S. (2009) Strassen’s Matrix Multiplication Relabeled.</mixed-citation></ref><ref id="scirp.43474-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Li, J.J., Ranka, S. and Sahni, S. (2011) Strassen’s Matrix Multiplication on Gpus. 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), Tainan, 7-9 December 2011, 157-164.</mixed-citation></ref></ref-list></back></article>