<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2016.44004</article-id><article-id pub-id-type="publisher-id">JCC-65093</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  A Two-Stage Algorithm of High Resolution Image Alignment for Mobile Applications
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>en-You</surname><given-names>Huang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Lan-Rong</surname><given-names>Dung</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Tang-Suan</surname><given-names>Hong</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib></contrib-group><aff id="aff3"><addr-line>Institute of Communications Engineering, National Chiao Tung University, Taiwan</addr-line></aff><aff id="aff2"><addr-line>Department of Electrical and Computer Engineering, National Chiao Tung University, Taiwan</addr-line></aff><aff id="aff1"><addr-line>Institute of Electrical Control Engineering, National Chiao Tung University, Taiwan</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>hry76519@gmail.com(EH)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>18</day><month>03</month><year>2016</year></pub-date><volume>04</volume><issue>04</issue><fpage>36</fpage><lpage>51</lpage><history><date date-type="received"><day>1</day>	<month>February</month>	<year>2016</year></date><date date-type="rev-recd"><day>accepted</day>	<month>26</month>	<year>March</year>	</date><date date-type="accepted"><day>29</day>	<month>March</month>	<year>2016</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Global motion estimation (GME) algorithms are widely applied to computer vision and video processing. In the previous works, the image resolutions are usually low for the real-time requirement (e.g. video stabilization). However, in some mobile devices applications (e.g. image sequence panoramic stitching), the high resolution is necessary to obtain satisfactory quality of panoramic image. However, the computational cost will become too expensive to be suitable for the low power consumption requirement of mobile device. The full search algorithm can obtain the global minimum with extremely computational cost, while the typical fast algorithms may suffer from the local minimum problem. This paper proposed a fast algorithm to deal with 2560 &#215; 1920 high-resolution (HR) image sequences. The proposed method estimates the motion vector by a two-level coarse-to-fine scheme which only exploits sparse reference blocks (25 blocks in this paper) in each level to determine the global motion vector, thus the computational costs are significantly decreased. In order to increase the effective search range and robustness, the predictive motion vector (PMV) technique is used in this work. By the comparisons of computational complexity, the proposed algorithm costs less addition operations than the typical Three-Step Search algorithm (TSS) for estimating the global motion of the HR images without the local minimum problem. The quantitative evaluations show that our method is comparable to the full search algorithm (FSA) which is considered to be the golden baseline.
 
</p></abstract><kwd-group><kwd>Global Motion Estimation</kwd><kwd> Block Matching</kwd><kwd> High Resolution Image Alignment</kwd><kwd> Mobile Applications</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Global motion estimation (GME) had been widely applied to video processing and computer vision in decades. For example, video stabilization, motion compensation, and the popular image panoramic stitching which can make a photograph with wide field of view. There are two major tasks to stitch a sequence of images: 1) find the global motions of the images with respect to the previous ones and then 2) stitch the sequence to produce a panoramic image. After all the images have been aligned, there are various algorithms to stitch the images. For instance, one may refer to the efficient ways which are to find the optimal seams [<xref ref-type="bibr" rid="scirp.65093-ref1">1</xref>] - [<xref ref-type="bibr" rid="scirp.65093-ref4">4</xref>] , or the more com- plicated ones [<xref ref-type="bibr" rid="scirp.65093-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref6">6</xref>] . This study will focus on the global motion estimation, thus the discussions about the stitching algorithms are beyond the scope of this paper. If the readers are interested in the stitching algorithms, please refer to the related works.</p><p>Global motion estimation algorithms can be classified into two categories: direct methods [<xref ref-type="bibr" rid="scirp.65093-ref7">7</xref>] - [<xref ref-type="bibr" rid="scirp.65093-ref18">18</xref>] and feature-based methods [<xref ref-type="bibr" rid="scirp.65093-ref19">19</xref>] - [<xref ref-type="bibr" rid="scirp.65093-ref24">24</xref>] . The direct methods aim to obtain a global motion through global minimization of some cost functions by using the image pixels directly. On the contrary, the feature-based methods first locate sparse set of keypoints in the images and then obtain the global motion parameters by matching the feature correspondences. In this paper, we focus on the direct methods since the feature-based methods are computa- tional expensive in feature description and feature matching, which makes feature-based methods unsuitable for the low power consumption requirement in mobile devices.</p><p>The direct methods can be classified into two subgroups: full search kind algorithms [<xref ref-type="bibr" rid="scirp.65093-ref7">7</xref>] - [<xref ref-type="bibr" rid="scirp.65093-ref12">12</xref>] and fast algorithms [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>] - [<xref ref-type="bibr" rid="scirp.65093-ref18">18</xref>] . The original full search algorithm compares all the positions in the search windows, which makes the computations extremely expensive. There are some accelerated versions of full search algorithm, e.g. projection-based [<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>] , or skipping some checking points by some criteria of mathematical inequality [<xref ref-type="bibr" rid="scirp.65093-ref11">11</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref12">12</xref>] . Unfortunately, the computations still significantly increase as the search range and block size increase. On the contrary, the fast algorithms such as three-step search (TSS) [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref14">14</xref>] , four-step search [<xref ref-type="bibr" rid="scirp.65093-ref15">15</xref>] , special patterns (diamond, hexagon, etc.) search [<xref ref-type="bibr" rid="scirp.65093-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref17">17</xref>] , and gradient-descent [<xref ref-type="bibr" rid="scirp.65093-ref18">18</xref>] , are well-known for their quick conver- gence property. However, the fast algorithms usually suffer from the local minimum problem. For the applica- tions of panoramic stitching, especially when the user captures the sequence with large motions, misalignments may produce apparently discontinuities on the seam lines. Therefore, the fast algorithms are not suitable for the panoramic stitching purposes.</p><p>Another issue is power consumption for mobile devices. The power consumption will be reduced if 1) the computation costs and 2) the memory access are as low as possible. Therefore, we aimed at developing an algorithm which is fast and low-memory-access while keeping the accuracy as comparable to the full search algorithm as possible.</p><p>The remainder of this paper is organized as follows. Section 2 gives the brief review of related works. The overall details of the proposed algorithm are described in Section 3. The comparisons of computational complexity are presented in Section 4. Accuracy verifications are presented in Section 5. Finally, conclusions are summarized in Section 6.</p></sec><sec id="s2"><title>2. Related Works</title><p>The full search kind algorithms usually consider all the positions in a search window and determine the motion vectors by minimizing some cost functions. The traditional full search algorithm (FSA) computes the sum of absolute differences (SAD) between a reference block and candidate blocks by block matching to determine the motion vector of the reference block. The FSA is able to find a global minimum in search window while the computational cost is extremely high. There are some accelerated versions of FSA: Tu et al. [<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>] proposed the projection method to accelerate the SAD computations of block matching; Puglisi et al. [<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>] proposed an modified version of projection method [<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>] , which is more efficient since only the candidate blocks satisfied the conditions are further used to compute the SAD; the other kind of accelerated version is to skip the computation of SAD by some mathematical inequality [<xref ref-type="bibr" rid="scirp.65093-ref11">11</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref12">12</xref>] . Although these accelerated versions significantly reduced the computational costs in contrast to original FSA, the computational costs still greatly increase as the search range and block size increase with image resolution.</p><p>On the other hand, the fast algorithms assume the energy functions (e.g. SAD) are unimodal and the optimal solution is quickly converge to the minimum using different optimization algorithms. The typical fast algorithms, for example, three-step search (TSS) algorithms [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref14">14</xref>] , four-step search (4SS) [<xref ref-type="bibr" rid="scirp.65093-ref15">15</xref>] , special patterns (e.g., diamond, hexagon) search [<xref ref-type="bibr" rid="scirp.65093-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref17">17</xref>] , and gradient-descent [<xref ref-type="bibr" rid="scirp.65093-ref18">18</xref>] , are well-known for their quick convergence property. The TSS algorithm determines motion vector for one block in three iterations, and the computation of SAD are significantly less than FSA. The 4SS algorithm is a modified version of TSS which is more robust and accurate than TSS. The special pattern search algorithms are also well-known for their fast convergence and compatible peak signal-to-noise ratio (PSNR) quality in video coding. However, the real world scenes are complex and the SAD is not unimodal in general, hence there might be local minimum problem. <xref ref-type="fig" rid="fig1">Figure 1</xref> shows the illustration of local minimum problem for 1-D case. The optimal solution may converge to local minimum instead of global minimum if the initial position is close to the local minimum.</p></sec><sec id="s3"><title>3. The Proposed Algorithm</title><p>The proposed algorithm is a two-level scheme which processes the QVGA versions of the original HR images followed by a motion vector refinement in HR images. Although the down-sampled image will lose some detail informations, the major structures and edges in the image are usually preserved. The QVGA global motion multiplied by 8 will approach to the HR global motion, hence we only need to search for small range in HR domain to refine the HR global motion vector. This significantly reduce the computation costs in HR full search.</p><p>The overall flow chart is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>. We first down-sample all the HR images to obtained the QVGA ones, and only select <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x6.png" xlink:type="simple"/></inline-formula> reference blocks uniformly distributed in the center of image in order to reduce the</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> The illustration of local minimum problem for 1-D case</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x7.png"/></fig><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> The overview of the proposed algorithm</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x8.png"/></fig><p>computational costs. Every block is predicted by the global motion vector of previous image or the local motion vector of the processed neighbor block, in order to extend the effective search range. Once the motion vectors of all blocks are obtained, we perform the simplest clustering algorithm to reject the “outliers” of motion vectors and obtain the global motion vector which is equal to the center of the largest cluster. Finally, we apply the global motion vectors of QVGA images to predict the motion of HR images and refine the motion vectors of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x9.png" xlink:type="simple"/></inline-formula> blocks in a small search range followed by the same motion vectors clustering to obtain the final HR global motion vectors of all images.</p><p>In the following subsections, we will describe all the details of the proposed algorithm. Section 3.1 first briefly reviews the traditional full search and then describes the global motion estimation in the QVGA resolu- tion, including the details of the predictive motion estimation and clustering algorithm. Section 3.2 describes the HR global motion refinement.</p><sec id="s3_1"><title>3.1. QVGA (320 &#215; 240) Global Motion Estimation</title><sec id="s3_1_1"><title>3.1.1. Reference Blocks Reduction</title><p>The block-based motion estimation algorithms usually divide the image of size <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x10.png" xlink:type="simple"/></inline-formula> into <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x11.png" xlink:type="simple"/></inline-formula> pixels sub-blocks, as shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>. The full search algorithm computes the sum of absolute differences (SAD) between reference image <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x12.png" xlink:type="simple"/></inline-formula> and all the candidate blocks within a search window in the previous image<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x13.png" xlink:type="simple"/></inline-formula>, the local motion of one reference block is determined by the criteria of minimizing the SAD:</p><disp-formula id="scirp.65093-formula1199"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/4-1730319x14.png"  xlink:type="simple"/></disp-formula><p>After all the local motion vectors of reference blocks are estimated, the global motion vector is estimated by averaging all the local motion vectors or by some statistical method, e.g. histogram, least-square, etc.</p><p><xref ref-type="fig" rid="fig4">Figure 4</xref> shows the full search algorithm within a <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x15.png" xlink:type="simple"/></inline-formula> search window. Assume the total number of blocks is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x16.png" xlink:type="simple"/></inline-formula>, the number of addition operations for a global motion estimation is</p><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> The traditional block based motion estimation algorithms divide the image into subblocks with size of N &#215; N</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x17.png"/></fig><fig id="fig4"  position="float"><label><xref ref-type="fig" rid="fig4">Figure 4</xref></label><caption><title> The full search algorithm compares all the positions in the search window to determine the motion vector</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x18.png"/></fig><disp-formula id="scirp.65093-formula1200"><label>(2)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/4-1730319x19.png"  xlink:type="simple"/></disp-formula><p>For a QVGA(320 &#215; 240) image, suppose<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x20.png" xlink:type="simple"/></inline-formula>, then there are <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x20.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x21.png" xlink:type="simple"/></inline-formula> reference blocks for global motion estimation. In fact, the reference blocks in neighbor usually have similar motion vectors, hence only sparse reference blocks are needed. The reference blocks should have the following properties:</p><p>• Repeatability: The reference blocks in <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x22.png" xlink:type="simple"/></inline-formula> are supposed to find their true correspondences in<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x22.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x23.png" xlink:type="simple"/></inline-formula>. This means that the blocks near the four boundaries of image are not suitable.</p><p>• Independence: The reference blocks in image <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x24.png" xlink:type="simple"/></inline-formula> should be as uncorrelated to each other as possible, hence the estimated global motion will approach to the true camera motion rather than the motions of moving objects.</p><p>Taking the above properties into considerations, we uniformly take <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x25.png" xlink:type="simple"/></inline-formula> sample blocks in the center of image, as shown in <xref ref-type="fig" rid="fig5">Figure 5</xref>. We only select the samples from the center of image with size<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x26.png" xlink:type="simple"/></inline-formula>. The central part of image is uniformly divided into <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x27.png" xlink:type="simple"/></inline-formula> large blocks, then the reference blocks of size <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x25.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x26.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x28.png" xlink:type="simple"/></inline-formula></p><p>are sampled from the center of these large blocks. This sampling scheme significantly reduces the number of reference blocks. The comparisons of the accuracy between the proposed method and the traditional full search algorithm are shown in Section 4.</p><p>Although the number of reference blocks <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x29.png" xlink:type="simple"/></inline-formula> is reduced to 25, the computational costs significantly increase as the search range R and block size N increase. The block size N can not be small since the reference blocks have to contain representative and enough informations. In this paper, we set <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x30.png" xlink:type="simple"/></inline-formula> for QVGA global motion estimation, which is a common size in many related works. For the sequence panoramic stitching purposes, users may move the camera in large motions between two successive images, hence the search range R should be as large as possible. However, according to (2), if R increases by a factor of M, the total number of addition operations will increase by a factor of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x31.png" xlink:type="simple"/></inline-formula>. Hence we need a solution to increase the search range while keeping the computation costs unchanged. This paper utilized the idea of predictive motion vector (PMV) which is widely applied to video coding [<xref ref-type="bibr" rid="scirp.65093-ref25">25</xref>] .</p></sec><sec id="s3_1_2"><title>3.1.2. Predictive Motion Estimation</title><p>Intra/inter frame motion prediction has been widely used in video coding for local motion vector predictions. The intra-frame prediction provides possible motion to adjacent reference blocks hence the blocks can find the global minimum beyond the search range R, i.e. the search range is effectively “extended” without any addi- tional computation. <xref ref-type="fig" rid="fig6">Figure 6</xref> shows the illustration of intra-frame motion prediction. The inter-frame prediction assumes the motions of camera between two successive frames (or photoshots) are similar, hence the motion of previous frame can be exploited to predict the possible motion of current frame. <xref ref-type="fig" rid="fig7">Figure 7</xref> shows the inter-frame motion prediction.</p><p><xref ref-type="fig" rid="fig8">Figure 8</xref> gives an example for a pair of images in an outdoor sequence. In this example, the global motion vector is greater than the search range in horizontal direction. It is clear that the global motion vector estimated with prediction is more reliable since there is exactly a dominant motion in the histogram.</p><fig id="fig5"  position="float"><label><xref ref-type="fig" rid="fig5">Figure 5</xref></label><caption><title> The proposed reference blocks selection scheme</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x32.png"/></fig><fig id="fig6"  position="float"><label><xref ref-type="fig" rid="fig6">Figure 6</xref></label><caption><title> Intra-frame motion prediction</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x33.png"/></fig><fig id="fig7"  position="float"><label><xref ref-type="fig" rid="fig7">Figure 7</xref></label><caption><title> Inter-frame motion prediction</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x34.png"/></fig><fig-group id="fig8"><label><xref ref-type="fig" rid="fig8">Figure 8</xref></label><caption><title> An example for a pair of outdoor images. (a) The original images; (b) The histogram of 300 motion vectors computed by full search without prediction; (c) The histogram of 300 motion vectors computed by full search with predic- tion.</title></caption><fig id ="fig8_1"><label>(b)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x35.png"/></fig><fig id ="fig8_2"><label> (c)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x36.png"/></fig></fig-group><p>In the proposed algorithm, the the reference blocks of the first row apply the inter-frame prediction and the rest of rows apply the intra-frame prediction, as shown in <xref ref-type="fig" rid="fig9">Figure 9</xref>. In this paper, we proposed a PMV selection scheme for the intra-frame prediction, as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>0. We only consider three processed blocks adjacent to the current block since the neighborhood blocks give reliable prediction. In order to increase robustness, the PMV is estimated by choosing the median of the neighbor MV:</p><disp-formula id="scirp.65093-formula1201"><label>(3)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/4-1730319x37.png"  xlink:type="simple"/></disp-formula><p>The motion vectors in the first row blocks and first column blocks are less precision since the blocks select PMV from GMV of previous frame or the motion vectors of blocks which are not adjacent to them. Considering this problem, the motion vectors of blocks in the first row and first column are only for intra-frame prediction purposes, hence only the rest of 16 motion vectors are selected for further processing.</p><p>The simplest way to estimate the GMV is to averaging all the motion vectors of reference blocks, i.e.,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x38.png" xlink:type="simple"/></inline-formula>. However, averaged motion vector suffered from the “outliers” problem since all motion vectors have equivalent weights<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x38.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x39.png" xlink:type="simple"/></inline-formula>. The outliers values will significantly influence the averaged value</p><fig id="fig9"  position="float"><label><xref ref-type="fig" rid="fig9">Figure 9</xref></label><caption><title> PMV selection for all the blocks. The reference blocks in the 1st row are predicted by the inter-frame prediction. The blocks in the rest of rows are predicted by the intra-frame prediction</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x40.png"/></fig><fig id="fig10"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>0</label><caption><title> The proposed intra-frame PMV selection scheme. The motion vectors of green blocks are exploited to predict the motion of white block</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x41.png"/></fig><p>especially when the number of blocks is small. To alleviate this problem, we applied a simple clustering algorithm to automatically cluster the motion vectors into several (at least one) clusters, and the final global motion vector of a QVGA image is obtained by averaging all the motion vectors in the largest cluster.</p></sec><sec id="s3_1_3"><title>3.1.3. Threshold-Order-Dependent Clustering</title><p>The threshold-order-dependent (TOD) clustering [<xref ref-type="bibr" rid="scirp.65093-ref26">26</xref>] is the simplest algorithm in data clustering. The clustering result depends on the order of input data and the distance threshold <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x42.png" xlink:type="simple"/></inline-formula> (i.e. radius of a cluster). <xref ref-type="fig" rid="fig1">Figure 1</xref>1 shows an example of 2D data clustering. The only one parameter is the distance threshold, which controls the final number of clusters. If the threshold is small, there might be a lot of isolated clusters; on the contrary, if the threshold is large, the largest cluster probably contains the “outliers” which decrease the precision in our global motion estimation. In this paper, we set <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x42.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x43.png" xlink:type="simple"/></inline-formula> by experience. The details of TOD algorithm are shown in Algorithm 1.</p><fig id="fig11"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>1</label><caption><title> An example of clustering</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x44.png"/></fig><disp-formula id="scirp.65093-formula1202"><graphic  xlink:href="http://html.scirp.org/file/4-1730319x45.png"  xlink:type="simple"/></disp-formula><p>After motion clustering, the global motion vector is equal to the center of the largest cluster, for example, the red cluster shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>1. By motion vectors clustering, the outliers problem is avoided. Note that the</p><p>computational complexity of TOD is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x46.png" xlink:type="simple"/></inline-formula>, however, the number of data (motion vectors) in this paper is</p><p>only 16, hence the computation cost of TOD is negligible (compared with the computation cost of motion estimation).</p></sec></sec><sec id="s3_2"><title>3.2. Global Motion Refinement for HR(2560 &#215; 1920) Images</title><p>After the GMV of QVGA image is obtained, we only need to refine the global motion of HR image by exploiting the fact that the image contents are similar to the QVGA version. The division of image is the same as the QVGA version. We multiply the GMV of QVGA by 8 as the PMV of HR image, and simply perform the traditional full search in a small search range. Note that the search range in HR domain is reduced to<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x47.png" xlink:type="simple"/></inline-formula>, and the size N of reference blocks is multiplied by 8, i.e.<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x47.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x48.png" xlink:type="simple"/></inline-formula>. We only compute the motion vectors of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x47.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x48.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x49.png" xlink:type="simple"/></inline-formula> reference blocks corresponding to the 16 blocks in QVGA image, i.e., we ignored the unreliable motion vectors of the first row blocks and first column blocks. Finally, the refined global motion vector is determined by TOD clustering of the motion vectors of the 16 blocks. <xref ref-type="fig" rid="fig1">Figure 1</xref>2 shows the HR global motion vector refinement. The proposed algorithm is completely described in Algorithm 2.</p><fig-group id="fig12"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>2</label><caption><title> Illustration of the motion estimation in high resolution domain.</title></caption><fig id ="fig12_1"><label></label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x50.png"/></fig><fig id ="fig12_2"><label></label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x51.png"/></fig></fig-group><p>The proposed global motion estimation algorithm applied the idea of predictive motion vectors, hence the search range is effectively enlarged without additional computations. The comparisons of computational complexity are given in next section.</p></sec></sec><sec id="s4"><title>4. Computational Complexity</title><p>In this section, the proposed algorithm is compared with four algorithms, which are full search algorithm (FSA), three-step search (TSS) algorithm [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>] , and projection-based method [<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>] , respectively. The factor <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x52.png" xlink:type="simple"/></inline-formula> is set to 89.74 in [<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>] . We only compare the addition (subtraction) operations in the block matching since other computations are rather minor.</p><p><xref ref-type="table" rid="table1">Table 1</xref> lists the computational costs of motion estimation for one block. The proposed algorithm is a modified version of FSA, hence the computational complexity for one block is the same as FSA. The proposed method only computes 25 blocks in QVGA and 16 blocks in HR while the others computes all the blocks to determine the global motion vector. We first compare the computation costs under QVGA resolution with block size <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x53.png" xlink:type="simple"/></inline-formula> and search range<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x53.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x54.png" xlink:type="simple"/></inline-formula>. <xref ref-type="table" rid="table2">Table 2</xref> shows that our method is much faster than FSA since we only consider 25 blocks, but is still slower than others especially the TSS algorithm. Consider the HR motion vectors, our method provide the effective search range:</p><disp-formula id="scirp.65093-formula1203"><label>(4)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/4-1730319x55.png"  xlink:type="simple"/></disp-formula><p>Hence, in our case, the effective search range of HR motion vectors is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x56.png" xlink:type="simple"/></inline-formula>. With the block size<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x56.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x57.png" xlink:type="simple"/></inline-formula>, <xref ref-type="table" rid="table3">Table 3</xref> shows that our method is much faster than the algorithms proposed in [<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>] and even faster than the fast algorithm TSS [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>] . <xref ref-type="table" rid="table4">Table 4</xref> lists the computational speed ratio with respect to FSA for HR global motion estimation. The proposed method is 4671.58 times faster than FSA and 18.05 times faster than the projection-based method [<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>] .</p><p>Moreover, the effective search range in our method is more than 272 since we applied the PMVs to extend the QVGA search range. The effective search range in QVGA is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x58.png" xlink:type="simple"/></inline-formula>. Note that we only consider <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x59.png" xlink:type="simple"/></inline-formula> as twice the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x59.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x60.png" xlink:type="simple"/></inline-formula> for conservative estimation. In fact, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x59.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x61.png" xlink:type="simple"/></inline-formula>could be more than <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x59.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x62.png" xlink:type="simple"/></inline-formula> since we select the PMV from adjacent blocks which were predicted by their neighbors. Consider<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x59.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x63.png" xlink:type="simple"/></inline-formula>, the overall effective search range is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x59.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x64.png" xlink:type="simple"/></inline-formula>, which is usually enough for ordinary usage of HR panoramic stitching. The computational costs are shown in <xref ref-type="table" rid="table5">Table 5</xref>, and the <xref ref-type="table" rid="table6">Table 6</xref> lists the speed ratio with respect to FSA in HR motion estimation with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x59.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x61.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x62.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x63.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x65.png" xlink:type="simple"/></inline-formula>. There is no additional computations since we applied the PMVs to extend the overall search range.</p><p>The proposed algorithm can be further accelerated for sequence panoramic stitching purposes since the users usually capture an image sequence in horizontal camera motions with only little vertical jitters. In such kind of applications, we can reduce the Y-direction search range to accelerate the proposed algorithm.</p></sec><sec id="s5"><title>5. Accuracy Verifications</title><p>The proposed algorithm is a fast version of FSA which determines the global motion vector with only 25 reference blocks, hence we need to verify the accuracy of our method with the original FSA, which is consi- dered as the golden baseline. In this section, we apply two different comparisons to verify the accuracy of our</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Computation costs for each block</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >No. Additions/Block</th></tr></thead><tr><td align="center" valign="middle" >FSA</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x66.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >TSS [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>]</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x67.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>]</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x68.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>]</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x69.png" xlink:type="simple"/></inline-formula></td></tr><tr><td align="center" valign="middle" >Our method</td><td align="center" valign="middle" ><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x70.png" xlink:type="simple"/></inline-formula></td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Computation costs for whole QVGA image with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x71.png" xlink:type="simple"/></inline-formula>,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x71.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x72.png" xlink:type="simple"/></inline-formula></title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >No. Add./Block</th><th align="center" valign="middle" >No. Blocks</th><th align="center" valign="middle"  colspan="2"  >Total No. Add.</th></tr></thead><tr><td align="center" valign="middle" >FSA</td><td align="center" valign="middle" >2,163,200</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >648,960,000</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >TSS [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>]</td><td align="center" valign="middle" >23,040</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >6,912,000</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>]</td><td align="center" valign="middle" >131,487</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >39,446,100</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>]</td><td align="center" valign="middle" >66,403</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >19,920,900</td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Our method</td><td align="center" valign="middle" >2,163,200</td><td align="center" valign="middle" >25</td><td align="center" valign="middle" >54,080,000</td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Computation costs for whole QVGA image with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x73.png" xlink:type="simple"/></inline-formula>,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x73.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x74.png" xlink:type="simple"/></inline-formula></title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >No. Add./Block</th><th align="center" valign="middle" >No. Blocks</th><th align="center" valign="middle" >Total No. Add.</th></tr></thead><tr><td align="center" valign="middle" >FSA</td><td align="center" valign="middle" >9,732,915,200</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >2,919,874,560,000</td></tr><tr><td align="center" valign="middle" >TSS [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>]</td><td align="center" valign="middle" >2,385,101</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >715,530,300</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>]</td><td align="center" valign="middle" >75,774,143</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >22,732,242,900</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>]</td><td align="center" valign="middle" >376,01,630</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >11,280,489,000</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >Our method</td><td align="center" valign="middle" >(QVGA) 2,163,200</td><td align="center" valign="middle" >(QVGA) 25</td><td align="center" valign="middle"  rowspan="2"  >625,029,632</td></tr><tr><td align="center" valign="middle" >(HR) 35,684,352</td><td align="center" valign="middle" >(HR) 16</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Speed ratio of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x75.png" xlink:type="simple"/></inline-formula> with respect to FSA for HR global motion estimation</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >Total No. Add.</th><th align="center" valign="middle" >Speed Ratio</th></tr></thead><tr><td align="center" valign="middle" >FSA</td><td align="center" valign="middle" >2,919,874,560,000</td><td align="center" valign="middle" >1.00</td></tr><tr><td align="center" valign="middle" >TSS [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>]</td><td align="center" valign="middle" >715,530,300</td><td align="center" valign="middle" >4080.71</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>]</td><td align="center" valign="middle" >22,732,242,900</td><td align="center" valign="middle" >128.45</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>]</td><td align="center" valign="middle" >11,280,489,000</td><td align="center" valign="middle" >258.84</td></tr><tr><td align="center" valign="middle" >Our method</td><td align="center" valign="middle" >625,029,632</td><td align="center" valign="middle" >4671.58</td></tr></tbody></table></table-wrap><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Computation costs for whole HR image with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x76.png" xlink:type="simple"/></inline-formula>,<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x76.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x77.png" xlink:type="simple"/></inline-formula></title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >No. Add./Block</th><th align="center" valign="middle" >No. Blocks</th><th align="center" valign="middle" >Total No. Add.</th></tr></thead><tr><td align="center" valign="middle" >FSA</td><td align="center" valign="middle" >36,610,015,232</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >10,983,004,569,600</td></tr><tr><td align="center" valign="middle" >TSS [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>]</td><td align="center" valign="middle" >2,667,300</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >800,190,000</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>]</td><td align="center" valign="middle" >284,931,263</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >85,479,378,900</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>]</td><td align="center" valign="middle" >140,949,854</td><td align="center" valign="middle" >300</td><td align="center" valign="middle" >42,284,956,200</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >Our method</td><td align="center" valign="middle" >(QVGA) 2,163,200</td><td align="center" valign="middle" >(QVGA) 25</td><td align="center" valign="middle"  rowspan="2"  >625,029,632</td></tr><tr><td align="center" valign="middle" >(HR) 35,684,352</td><td align="center" valign="middle" >(HR) 16</td></tr></tbody></table></table-wrap><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> Speed ratio of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x78.png" xlink:type="simple"/></inline-formula> with respect to FSA for HR global motion estimation</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >Total No. Add.</th><th align="center" valign="middle" >Speed Ratio</th></tr></thead><tr><td align="center" valign="middle" >FSA</td><td align="center" valign="middle" >10,983,004,569,600</td><td align="center" valign="middle" >1.00</td></tr><tr><td align="center" valign="middle" >TSS [<xref ref-type="bibr" rid="scirp.65093-ref13">13</xref>]</td><td align="center" valign="middle" >800,190,000</td><td align="center" valign="middle" >13725.50</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref9">9</xref>]</td><td align="center" valign="middle" >85,479,378,900</td><td align="center" valign="middle" >128.49</td></tr><tr><td align="center" valign="middle" >[<xref ref-type="bibr" rid="scirp.65093-ref10">10</xref>]</td><td align="center" valign="middle" >42,284,956,200</td><td align="center" valign="middle" >259.74</td></tr><tr><td align="center" valign="middle" >Our method</td><td align="center" valign="middle" >625,029,632</td><td align="center" valign="middle" >17571.97</td></tr></tbody></table></table-wrap><p>method. Section 5.1 gives the comparisons of global motion vectors obtained by FSA and our method. Another comparison is PSNR in the overlapped region of every two successive images. For panoramic stitching applications, the overlapped regions of two successive images should be as similar as possible, such that there will be less visual discontinuities. In Section 5.2, we apply the PSNR value as the quantitative index to show that the global motion vector errors do not affect the stitching quality since the PSNR differences are below 0.5 dB, which is usually treated as noise-level difference and negligible.</p><sec id="s5_1"><title>5.1. Estimation Errors</title><p>If we disregard the expensive computations of FSA, the GMV obtained by FSA can be the ground truth since FSA takes all the candidate blocks in search windows into considerations. Therefore, we should compare the GMV obtained by our method with the GMV obtained by FSA, the GMV error is defined as follows:</p><disp-formula id="scirp.65093-formula1204"><label>(5)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/4-1730319x79.png"  xlink:type="simple"/></disp-formula><p>We tested our method with six real-world panoramic sequences, including three indoor scenes and three outdoor scenes. There are at least 19 images in each sequence, and there are total 148 pairs of successive images used to estimate the global motion vectors. All the sequences were captured by hand-held camera and the major motion was horizontal. <xref ref-type="fig" rid="fig1">Figure 1</xref>3 shows the sample images of the six sequences.</p><p><xref ref-type="fig" rid="fig1">Figure 1</xref>4 shows the histograms of GMV error calculate by (5). The first column represents X-direction error histograms and the second column represents Y-direction error histograms. The horizontal axis of histograms are the GMV error (in pixels) and vertical axis are the number of images. We can observe that the number of images with great GMV errors in the indoor sequences are more than outdoor ones since there are more homogeneous regions (e.g. ceiling, walls, etc) which will degrade the accuracy of motion estimation and large disparities which made local motion vectors in large variance. The outdoor scenes usually contain more textures which are helpful to the block matching and the disparities are usually small, hence there are small number of images with large GMV errors in contrast to the indoor scenes. <xref ref-type="fig" rid="fig1">Figure 1</xref>4 shows that the accuracy of our method is comparable to FSA. Even in the indoor sequences with large motions, which are challenging for global motion estimation, our method performed a satisfactory accuracy (GMV errors within 5 pixels) in most of images.</p></sec><sec id="s5_2"><title>5.2. Quantitative Evaluation</title><p>The GMV errors is not enough to describe “how accurate our method is”. Another way to evaluate the accuracy of the GMV is to measure the “similarity” of the overlapped region of two images, i.e., the more accurate GMV, the less difference between the two images in the overlapped region. A well-known index to evaluate the “similarity” of two images is peak signal-to-noise ratio (PSNR) [<xref ref-type="bibr" rid="scirp.65093-ref27">27</xref>] , which is defined as follows:</p><disp-formula id="scirp.65093-formula1205"><label>(6)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/4-1730319x80.png"  xlink:type="simple"/></disp-formula><p>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x81.png" xlink:type="simple"/></inline-formula> is the overlapped region as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>5 and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x81.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x82.png" xlink:type="simple"/></inline-formula> is the number of pixels in the overlapped region. The PSNR value computed in the overlapped region of two images using the GMV obtained by FSA</p><fig-group id="fig13"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>3</label><caption><title> Test sequence for accuracy verification. (1)-(3) are indoor sequences. (4)-(6) are outdoor sequences. All the sequences are captured by hand-held camera with mainly horizontal motion. The image resolution is 2560 &#215; 1920 pixels.</title></caption><fig id ="fig13_1"><label> (2)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x83.png"/></fig><fig id ="fig13_2"><label>(3)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x84.png"/></fig><fig id ="fig13_3"><label> (4)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x85.png"/></fig></fig-group><fig-group id="fig14"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>4</label><caption><title> The GMV error statistic hitogram. (a) The indoor sequences GMV errors (in pixels); (b) The outdoor sequences GMV errors (in pixels). The horizontal axis are GMV errors (in pixels) and the vertical axis are number of images.</title></caption><fig id ="fig14_1"><label> (b)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x86.png"/></fig></fig-group><p>should be the highest one. The only thing we need to prove is that the difference between the PSNR of two images using GMV of FSA and the PSNR of two images using the GMV of our method is small enough. Therefore, we used the same sequences in Section 5.1 to compare the PSNR differences, which is defined as follows:</p><disp-formula id="scirp.65093-formula1206"><label>(7)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/4-1730319x87.png"  xlink:type="simple"/></disp-formula><p><xref ref-type="fig" rid="fig1">Figure 1</xref>6(a) shows the histograms of PSNR differences of indoor sequences and <xref ref-type="fig" rid="fig1">Figure 1</xref>6(b) shows the</p><fig id="fig15"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>5</label><caption><title> Only the overlapped regions in two successive images are used to compute the PSNR value</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x88.png"/></fig><fig-group id="fig16"><label><xref ref-type="fig" rid="fig1">Figure 1</xref>6</label><caption><title> The histograms of PSNR differences. (a) The PSNR differences of indoor sequences; (b) The PSNR differences of outdoor sequences.</title></caption><fig id ="fig16_1"><label>(b)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x89.png"/></fig><fig id ="fig16_2"><label></label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-1730319x90.png"/></fig></fig-group><p>histograms of PSNR differences of outdoor sequences. The horizontal axis are <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x91.png" xlink:type="simple"/></inline-formula> (in dB) and the vertical axis are the number of image pairs. Note that most of the <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-1730319x92.png" xlink:type="simple"/></inline-formula> values are less than 0.5 dB, which is usually considered as the noise-level difference, i.e., the similarity of the overlapped region in two images based on our GMV is comparable to the one based on GMV of FSA.</p></sec></sec><sec id="s6"><title>6. Conclusions</title><p>This paper proposed a fast global motion estimation algorithm for HR (2560 &#215; 1920) image alignment of mobile applications. The proposed method is a modified version of full search algorithm which only considers 25 reference blocks uniformly distributed in the center of image. By applying the predictive motion vector scheme, our method is able to deal with the large camera motions and even faster than the typical three-step search (TSS) algorithm. The local minimum problem is avoided since the proposed method is a kind of full search algorithm. Six real-world sequences with total 148 pairs of successive images are used to verify our method by comparing the GMV errors and similarity with FSA. The first comparison shows that the GMV differences between our method and FSA are less than 5 pixels in both X and Y directions for most cases. The second comparison shows that the similarity of the overlapped region in two images using our GMVs is comparable with the one using GMVs of FSA.</p><p>In the future, we will focus on solving the challenging problems which make the block matching task difficult. For example, illumination differences between two successive images, large disparity in the scenes. These problems exist in real world image sequences especially in indoor scenes. We will aim to improve the algorithm to alleviate the problems and increase the accuracy.</p></sec><sec id="s7"><title>Acknowledgements</title><p>The authors would like to thank the Editor and the referee for their comments. This work was supported in part by the National Science Council, Taiwan, under Grant No. 98-2221-E-009-138.</p></sec><sec id="s8"><title>Cite this paper</title><p>Ren-You Huang,Lan-Rong Dung,Tang-Suan Hong, (2016) A Two-Stage Algorithm of High Resolution Image Alignment for Mobile Applications. Journal of Computer and Communications,04,36-51. doi: 10.4236/jcc.2016.44004</p></sec></body><back><ref-list><title>References</title><ref id="scirp.65093-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Milgram, D.L. (1975) Computer Methods for Creating Photomosaics. IEEE Transactions on Computers, 11, 1113-1119. http://dx.doi.org/10.1109/t-c.1975.224142</mixed-citation></ref><ref id="scirp.65093-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Davis, J. (1998) Mosaics of Scenes with Moving Objects. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, 23-25 Jun 1998, 354-360. http://dx.doi.org/10.1109/cvpr.1998.698630</mixed-citation></ref><ref id="scirp.65093-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Agarwala, A., Dontcheva, M., Agrawala, M., Drucker, S., Colburn, A., Curless, B., Salesin, D. and Cohen, M. (2004) Interactive Digital Photomontage. ACM Transactions on Graphics (TOG), 23, 294-302. http://dx.doi.org/10.1145/1186562.1015718</mixed-citation></ref><ref id="scirp.65093-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Xiong, Y. and Pulli, K. (2010) Fast Panorama Stitching for High-Quality Panoramic Images on Mobile Phones. IEEE Transactions on Consumer Electronics, 56, 298-306. http://dx.doi.org/10.1109/tce.2010.5505931</mixed-citation></ref><ref id="scirp.65093-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Zomet, A., Levin, A., Peleg, S. and Weiss, Y. (2006) Seamless Image Stitching by Minimizing False Edges. IEEE Transactions on Image Processing, 15, 969-977. http://dx.doi.org/10.1109/tip.2005.863958</mixed-citation></ref><ref id="scirp.65093-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Jia, J. and Tang, C.-K. (2005) Eliminating Structure and Intensity Misalignment in Image Stitching. Tenth IEEE International Conference on Computer Vision (ICCV), 2, 1651-1658. http://dx.doi.org/10.1109/iccv.2005.87</mixed-citation></ref><ref id="scirp.65093-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Dufaux, F. and Konrad, J. (2000) Efficient, Robust, and Fast Global Motion Estimation for Video Coding. IEEE Transactions on Image Processing, 9, 497-501. http://dx.doi.org/10.1109/83.826785</mixed-citation></ref><ref id="scirp.65093-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Su, Y., Sun, M.-T. and Hsu, V. (2005) Global Motion Estimation from Coarsely Sampled Motion Vector Field and the Applications. IEEE Transactions on Circuits and Systems for Video Technology, 15, 232-242. http://dx.doi.org/10.1109/TCSVT.2004.841656</mixed-citation></ref><ref id="scirp.65093-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Tu, C., Tran, T.D., Prince, J.L. and Topiwala, P.N. (2000) Projection-Based Block-Matching Motion Estimation. International Symposium on Optical Science and Technology, 374-383.</mixed-citation></ref><ref id="scirp.65093-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Puglisi, G. and Battiato, S. (2011) A Robust Image Alignment Algorithm for Video Stabilization Purposes. IEEE Transactions on Circuits and Systems for Video Technology, 21, 1390-1400. http://dx.doi.org/10.1109/tcsvt.2011.2162689</mixed-citation></ref><ref id="scirp.65093-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Battiato, S., Bruna, A.R. and Puglisi, G. (2010) A Robust Block Based Image/Video Registration Approach for Mobile Image Devices. IEEE Transactions on Multimedia, 12, 622-635. http://dx.doi.org/10.1109/tmm.2010.2060474</mixed-citation></ref><ref id="scirp.65093-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Zhu, C. and Qi, W.-S. and Ser, W. (2005) Predictive Fine Granularity Successive Elimination for Fast Optimal Block-Matching Motion Estimation. IEEE Transactions on Image Processing, 14, 213-221. http://dx.doi.org/10.1109/TIP.2004.840702</mixed-citation></ref><ref id="scirp.65093-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Koga, T. (1981) Motion-Compensated Interframe Coding for Video Conferencing. Proceedings of National Telecommunication Conference, New Orleans, 29 November-3 December 1981, G5.3.1-G5.3.5.</mixed-citation></ref><ref id="scirp.65093-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Li, R., Zeng, B. and Liou, M.L. (1994) A New Three-Step Search Algorithm for Block Motion Estimation. IEEE Transactions on Circuits and Systems for Video Technology, 4, 438-442. http://dx.doi.org/10.1109/76.313138</mixed-citation></ref><ref id="scirp.65093-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Po, L.-M. and Ma, W.-C. (1996) A Novel Four-Step Search Algorithm for Fast Block Motion Estimation. IEEE Transactions on Circuits and Systems for Video Technology, 6, 313-317. http://dx.doi.org/10.1109/76.499840</mixed-citation></ref><ref id="scirp.65093-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Zhu, S. and Ma, K.-K. (2000) A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation. IEEE Transactions on Image Processing, 9, 287-290. http://dx.doi.org/10.1109/tip.2000.826791</mixed-citation></ref><ref id="scirp.65093-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Zhu, C., Lin, X. and Chau, L.-P. (2002) Hexagon-Based Search Pattern for Fast Block Motion Estimation. IEEE Transactions on Circuits and Systems for Video Technology, 12, 349-355. http://dx.doi.org/10.1109/TCSVT.2002.1003474</mixed-citation></ref><ref id="scirp.65093-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Liu, L.-K. and Feig, E. (1996) A Block-Based Gradient Descent Search Algorithm for Block Motion Estimation in Video Coding. IEEE Transactions on Circuits and Systems for Video Technology, 6, 419-422. http://dx.doi.org/10.1109/76.510936</mixed-citation></ref><ref id="scirp.65093-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Battiato, S., Gallo, G., Puglisi, G., and Scellato, S. (2007) SIFT Features Tracking for Video Stabilization. 14th International Conference on Image Analysis and Processing (ICIAP), Modena, 10-14 September 2007, 825-830. http://dx.doi.org/10.1109/iciap.2007.4362878</mixed-citation></ref><ref id="scirp.65093-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Yang, J., Schonfeld, D. and Mohamed, M. (2009) Robust Video Stabilization Based on Particle Filter Tracking of Projected Camera Motion. IEEE Transactions on Circuits and Systems for Video Technology, 19, 945-954. http://dx.doi.org/10.1109/TCSVT.2009.2020252</mixed-citation></ref><ref id="scirp.65093-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Farin, D. and Peter H.N. de W. (2005) Evaluation of a Feature-Based Global-Motion Estimation System. Visual Communications and Image Processing, 12 July 2005 SPIE—The International Society for Optical Engineering, Beijing, 59603X-59603X-12.</mixed-citation></ref><ref id="scirp.65093-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Bosco, A., Bruna, A., Battiato, S., Bella, G. and Puglisi, G. (2008) Digital Video Stabilization through Curve Warping Techniques. IEEE Transactions on Consumer Electronics, 54, 220-224. http://dx.doi.org/10.1109/tce.2008.4560078</mixed-citation></ref><ref id="scirp.65093-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Fang, X., Luo, B., Zhao, H., Tang, J. and Zhai, S. (2010) New Multi-Resolution Image Stitching with Local and Global Alignment. IET Computer Vision, 4, 231-246. http://dx.doi.org/10.1049/iet-cvi.2009.0025</mixed-citation></ref><ref id="scirp.65093-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Bay, H., Ess, A., Tuytelaars, T. and Van Gool, L. (2008) Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding, 110, 346-359. http://dx.doi.org/10.1016/j.cviu.2007.09.014</mixed-citation></ref><ref id="scirp.65093-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Srinivasan, R. and Rao, K.R. (1985) Predictive Coding Based on Efficient Motion Estimation. IEEE Transactions on Communications, 33, 888-896.  http://dx.doi.org/10.1109/tcom.1985.1096398</mixed-citation></ref><ref id="scirp.65093-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Kandel, A. (1999) Introduction to Pattern Recognition: Statistical, Structural, Neural, and Fuzzy Logic Approaches. World Scientific, Singapore.</mixed-citation></ref><ref id="scirp.65093-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Gonzalez, R.C. and Woods, R.E. (2002) Digital Image Processing. 2nd Edition, Prentice-Hall, Inc., Upper Saddle River.</mixed-citation></ref></ref-list></back></article>