<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN" "JATS-journalpublishing1-4.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.4" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">jtts</journal-id>
      <journal-title-group>
        <journal-title>Journal of Transportation Technologies</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2160-0481</issn>
      <issn pub-type="ppub">2160-0473</issn>
      <publisher>
        <publisher-name>Scientific Research Publishing</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.4236/jtts.2026.161009</article-id>
      <article-id pub-id-type="publisher-id">jtts-148667</article-id>
      <article-categories>
        <subj-group>
          <subject>Article</subject>
        </subj-group>
        <subj-group>
          <subject>Engineering</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Spatiotemporal Evolution Patterns and Intelligent Forecasting of Passenger Flow in Megacity High-Speed Rail Hubs: A Case Study of Guangzhou South Railway Station</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name name-style="western">
            <surname>Liu</surname>
            <given-names>Kangni</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="aff1"><label>1</label> Guangzhou Railway Polytechnic, Guangzhou, China </aff>
      <author-notes>
        <fn fn-type="conflict" id="fn-conflict">
          <p>The author declares no conflicts of interest regarding the publication of this paper.</p>
        </fn>
      </author-notes>
      <pub-date pub-type="epub">
        <day>26</day>
        <month>11</month>
        <year>2025</year>
      </pub-date>
      <pub-date pub-type="collection">
        <month>11</month>
        <year>2025</year>
      </pub-date>
      <volume>16</volume>
      <issue>01</issue>
      <fpage>142</fpage>
      <lpage>149</lpage>
      <history>
        <date date-type="received">
          <day>05</day>
          <month>12</month>
          <year>2025</year>
        </date>
        <date date-type="accepted">
          <day>06</day>
          <month>01</month>
          <year>2026</year>
        </date>
        <date date-type="published">
          <day>09</day>
          <month>01</month>
          <year>2026</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2026 by the authors and Scientific Research Publishing Inc.</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="open-access">
          <license-p> This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link> ). </license-p>
        </license>
      </permissions>
      <self-uri content-type="doi" xlink:href="https://doi.org/10.4236/jtts.2026.161009">https://doi.org/10.4236/jtts.2026.161009</self-uri>
      <abstract>
        <p>Flow in high-speed rail (HSR) hubs serves as a “barometer” for factor mobility within urban agglomerations, and its accurate forecasting is crucial for capacity allocation and emergency management. This paper focuses on two core aspects: passenger flow characterization and intelligent forecasting methodology. Taking Guangzhou South Railway Station (GSRS) as a typical case, it utilizes multi-source big data to deeply excavate the refined spatiotemporal distribution patterns and structural characteristics of hub passenger flow. Furthermore, a hybrid VMD-CNN-BiLSTM-Attention-XGBoost forecasting model integrating time series decomposition, deep learning, and ensemble learning is constructed. The study finds that passenger flow exhibits a pattern of “dual peaks on weekdays for commuting and a single peak on weekends for leisure”, with the Shenzhen/ Hong Kong SAR direction accounting for over 30%. The constructed hybrid model demonstrates significantly superior forecasting accuracy (MAPE = 3.76%) compared to benchmark models. This research provides methodological and decision-making support for the transition of mega HSR hubs from “experience-based operation” to “data-driven” precise governance.</p>
      </abstract>
      <kwd-group kwd-group-type="author-generated" xml:lang="en">
        <kwd>High-Speed Rail Hub</kwd>
        <kwd>Passenger Flow Characteristics</kwd>
        <kwd>Spatiotemporal Patterns</kwd>
        <kwd>Variational Mode Decomposition (VMD)</kwd>
        <kwd>Hybrid Deep Learning Model</kwd>
        <kwd>Passenger Flow Forecasting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec1">
      <title>1. Introduction</title>
      <p>As a critical node in China’s “Eight Vertical and Eight Horizontal” HSR network and the core gateway of the Guangdong-Hong Kong SAR-Macao Greater Bay Area (GBA), Guangzhou South Railway Station serves over 350,000 passengers daily [<xref ref-type="bibr" rid="B1">1</xref>]. Minor fluctuations in its passenger flow can trigger significant ripple effects on urban transportation. Traditional passenger flow analysis often remains at the aggregate level or simple temporal statistics, failing to reveal the underlying complex spatiotemporal heterogeneity, purpose-based structure, and multi-factor driving mechanisms [<xref ref-type="bibr" rid="B2">2</xref>]; in terms of forecasting, single time-series or regression models struggle to effectively respond to multiple external shocks such as holidays, weather, and major events, leading to forecast failures at critical junctures [<xref ref-type="bibr" rid="B3">3</xref>].</p>
      <p>Therefore, answering the following two core scientific questions is of urgent importance for enhancing hub operational resilience: First, what refined and quantifiable regular patterns do passenger flows in mega HSR hubs exhibit across spatiotemporal dimensions? Second, how can a high-accuracy forecasting model be constructed that simultaneously captures the intrinsic temporal patterns of passenger flow and external complex factors? This paper aims to systematically address these questions through multi-dimensional, long-time-series data analysis and modeling of GSRS, forming a replicable and scalable analytical and forecasting framework.</p>
    </sec>
    <sec id="sec2">
      <title>2. Research Framework and Methodology</title>
      <p>This research follows the logical sequence of “Characterization Analysis-Pattern Mining-Model Construction-Forecasting Application”. First, multi-source data covering the period from January 2021 to December 2023 are utilized, including HSR ticket data (daily granularity), metro AFC data (hourly granularity), and urban Points of Interest (POI) data [<xref ref-type="bibr" rid="B2">2</xref>]. These datasets underwent anonymization, temporal alignment, and fusion using a rule-based matching approach based on temporal and spatial keys. Secondly, methods such as spatiotemporal heatmaps, cluster analysis, and OD linkage strength models are employed to deconstruct passenger flow characteristics from three dimensions: time, space, and structure. Finally, to address the nonlinear and non-stationary nature of passenger flow series, an innovative VMD-CNN-BiLSTM-Attention-XGBoost hybrid forecasting model is constructed and compared against benchmark models like ARIMA, Prophet, and single LSTM/XGBoost [<xref ref-type="bibr" rid="B4">4</xref>]-[<xref ref-type="bibr" rid="B6">6</xref>].</p>
    </sec>
    <sec id="sec3">
      <title>3. Multi-Dimensional Characterization of Passenger Flow at Guangzhou South Railway Station</title>
      <sec id="sec3dot1">
        <title>3.1. Temporal Distribution: Multi-Level Periodicity and Shock Effects</title>
        <p>As shown in <bold>Table 1</bold>, analysis reveals a stable “three-level periodicity” structure in the temporal distribution of passenger flow at GSRS [<xref ref-type="bibr" rid="B1">1</xref>]:</p>
        <p><bold>Daily Cycle</bold>: Exhibits a distinct pattern of “dual peaks on weekdays, single peak on weekends”. The weekday morning peak (8:00-10:00) is dominated by commuters on the Guangzhou-Shenzhen and Guangzhou-Zhuhai corridors, while the evening peak (18:00-20:00) combines arriving and departing flows. Weekend peaks are more evenly distributed between 13:00 and 18:00.</p>
        <p><bold>Weekly Cycle</bold>: Passenger volume climbs from Monday, peaks on Thursday-Friday for weekdays, and reaches the weekly maximum on Saturday, approximately 25% higher than the average weekday volume.</p>
        <p><bold>Annual Cycle</bold>: Characterized by four major peak periods: the “Spring Festival Extreme Peak”, “Summer Transport Sub-peak”, “Short Holiday Pulse Peaks”, and the “Canton Fair Peak” [<xref ref-type="bibr" rid="B1">1</xref>].</p>
        <p><bold>Table 1.</bold> Passenger flow impact indices for key holidays at GSRS (2023).</p>
        <table-wrap id="tbl1">
          <label>Table 1</label>
          <table>
            <tbody>
              <tr>
                <td>
                  <bold>Holiday/Event</bold>
                </td>
                <td>
                  <bold>Peak Daily Passenger Volume (10,000 persons)</bold>
                </td>
                <td>
                  <bold>Increase vs. Normal Weekday</bold>
                </td>
                <td>
                  <bold>Passenger Flow Impact Index*</bold>
                </td>
              </tr>
              <tr>
                <td>National Day (Oct 1)</td>
                <td>55.3</td>
                <td>70.2%</td>
                <td>1.70</td>
              </tr>
              <tr>
                <td>
                  Spring Festival Travel (28
                  <sup>th</sup>
                  of Lunar 12
                  <sup>th</sup>
                  Month)
                </td>
                <td>53.8</td>
                <td>65.5%</td>
                <td>1.66</td>
              </tr>
              <tr>
                <td>Canton Fair (Phase I Opening Day)</td>
                <td>46.2</td>
                <td>42.1%</td>
                <td>1.42</td>
              </tr>
              <tr>
                <td>Labor Day (May 1)</td>
                <td>49.1</td>
                <td>51.0%</td>
                <td>1.51</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>Note: Impact Index = Peak Daily Volume/Monthly Average Volume.</p>
      </sec>
      <sec id="sec3dot2">
        <title>3.2. Spatial Distribution: Directional Agglomeration and Transfer Choice</title>
        <p>Spatial distribution shows strong “directional agglomeration” and “transfer dependency”, as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p>
        <p>OD Agglomeration: The Shenzhen/Hong Kong SAR direction accounts for the highest share (31.3%), forming, together with Zhuhai/Macao (19.0%) and Changsha/Wuhan (18.2%), the top three dominant flows, constituting nearly 70% of the </p>
        <fig id="fig1">
          <label>Figure 1</label>
          <graphic xlink:href="https://html.scirp.org/file/3501039-rId13.jpeg?20260109101036" />
        </fig>
        <p><bold>Figure 1</bold><bold>.</bold>Spatiotemporal heatmap of passenger flow (Left) and major OD flow diagram (Right).</p>
        <p>total and forming a stable “GBA Core Corridor” [<xref ref-type="bibr" rid="B2">2</xref>].</p>
        <p>Transfer Structure: Metro is the overwhelmingly dominant transfer mode, with a share of 65.2%. Notably, the combined share of taxis and ride-hailing services can surge to over 35% during late-night hours (after 22:00) and under adverse weather conditions, demonstrating significant “spatiotemporal elasticity”.</p>
      </sec>
      <sec id="sec3dot3">
        <title>3.3. Passenger Composition: Purpose Segmentation and Passenger Profiling</title>
        <p>Passenger categorization was performed using a combination of K-means clustering (applied to features such as booking lead time, travel frequency, ticket class, and temporal patterns) and rule-based classification (e.g., same-day return trips on weekdays classified as business travel).</p>
        <p>Through analysis of ticket class, booking lead time, and POI correlation, passengers are profiled in detail [<xref ref-type="bibr" rid="B2">2</xref>]:</p>
        <p><bold>Business Travelers (42.3%)</bold>: High-frequency travelers between Guangzhou-Shenzhen/Zhuhai, concentrated on weekdays, highly sensitive to departure/arrival times, and with the lowest tolerance for transfer time.</p>
        <p><bold>Tourists (28.7%)</bold>: Concentrated on weekends, holidays, and summer travel periods, often carrying luggage, showing greater concern for wayfinding signage and rest facilities within the hub.</p>
        <p><bold>Commuters (8.9%)</bold>: Exhibit stable “tidal” characteristics and are a primary source of pressure on metro systems during peak hours.</p>
      </sec>
    </sec>
    <sec id="sec4">
      <title>4. A Hybrid Deep Learning Forecasting Model Based on VMD-CNN-BiLSTM-Attention and XGBoost</title>
      <sec id="sec4dot1">
        <title>4.1. Overall Model Architecture and Core Innovations</title>
        <p>Addressing the highly nonlinear, non-stationary nature of HSR hub passenger flow series and their complex influence by multiple external factors [<xref ref-type="bibr" rid="B3">3</xref>], this study proposes a combined forecasting framework integrating hybrid deep learning and ensemble learning. The model architecture was designed to sequentially address different aspects of the forecasting challenge: VMD handles non-stationarity and multi-scale patterns; CNN extracts local spatial-temporal features; BiLSTM captures long-term bidirectional dependencies; Attention focuses on relevant historical periods; and XGBoost integrates deep features with external variables for robust ensemble learning. The core innovation lies in its three-stage architecture of Decomposition-Reconstruction-Fusion.</p>
        <p>1) Signal Decomposition Layer: Employs Variational Mode Decomposition (VMD) to adaptively decompose the original passenger flow series into a set of quasi-stationary sub-series [<xref ref-type="bibr" rid="B7">7</xref>].</p>
        <p>2) Deep Forecasting Layer: For each decomposed sub-series, a CNN-BiLSTM-Attention neural network is designed for deep feature extraction and forecasting [<xref ref-type="bibr" rid="B6">6</xref>][<xref ref-type="bibr" rid="B8">8</xref>].</p>
        <p>3) Ensemble Output Layer: The deep forecasting outputs are combined with external features and fed into an XGBoost model for nonlinear ensemble and residual correction [<xref ref-type="bibr" rid="B4">4</xref>].</p>
        <p>The mathematical representation of the model is:</p>
        <disp-formula id="FD1">
          <mml:math>
            <mml:mrow>
              <mml:msub>
                <mml:mover accent="true">
                  <mml:mi>Y</mml:mi>
                  <mml:mo>^</mml:mo>
                </mml:mover>
                <mml:mi>t</mml:mi>
              </mml:msub>
              <mml:mo>=</mml:mo>
              <mml:msub>
                <mml:mi>F</mml:mi>
                <mml:mrow>
                  <mml:mi>X</mml:mi>
                  <mml:mi>G</mml:mi>
                  <mml:mi>B</mml:mi>
                  <mml:mi>o</mml:mi>
                  <mml:mi>o</mml:mi>
                  <mml:mi>s</mml:mi>
                  <mml:mi>t</mml:mi>
                </mml:mrow>
              </mml:msub>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:msub>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mo>[</mml:mo>
                        <mml:mrow>
                          <mml:mi>I</mml:mi>
                          <mml:mi>M</mml:mi>
                          <mml:msup>
                            <mml:mi>F</mml:mi>
                            <mml:mn>1</mml:mn>
                          </mml:msup>
                          <mml:mo>,</mml:mo>
                          <mml:mo>⋯</mml:mo>
                          <mml:mo>,</mml:mo>
                          <mml:mi>I</mml:mi>
                          <mml:mi>M</mml:mi>
                          <mml:msup>
                            <mml:mi>F</mml:mi>
                            <mml:mi>K</mml:mi>
                          </mml:msup>
                        </mml:mrow>
                        <mml:mo>]</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mi>t</mml:mi>
                  </mml:msub>
                  <mml:mo>,</mml:mo>
                  <mml:msub>
                    <mml:mstyle mathvariant="bold" mathsize="normal">
                      <mml:mi>E</mml:mi>
                    </mml:mstyle>
                    <mml:mi>t</mml:mi>
                  </mml:msub>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:mo>+</mml:mo>
              <mml:msub>
                <mml:mi>ϵ</mml:mi>
                <mml:mi>t</mml:mi>
              </mml:msub>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>where <inline-formula><mml:math><mml:mrow><mml:mi> I </mml:mi><mml:mi> M </mml:mi><mml:msup><mml:mi> F </mml:mi><mml:mi> K </mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is the k-th modal component forecasted by the deep sub-network, <inline-formula><mml:math><mml:mrow><mml:msub><mml:mstyle mathvariant="bold" mathsize="normal"><mml:mi> E </mml:mi></mml:mstyle><mml:mi> t </mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the vector of external features at time <italic>t</italic>.</p>
      </sec>
      <sec id="sec4dot2">
        <title>4.2. Stage 1: Passenger Flow Series Decomposition via VMD</title>
        <p>VMD decomposes the original signal <italic>f</italic>(<italic>t</italic>) into <italic>K</italic> band-limited Intrinsic Mode Functions (IMFs) <inline-formula><mml:math><mml:mrow><mml:msub><mml:mi> u </mml:mi><mml:mi> k </mml:mi></mml:msub><mml:mrow><mml:mo> ( </mml:mo><mml:mi> t </mml:mi><mml:mo> ) </mml:mo></mml:mrow></mml:mrow></mml:math></inline-formula> by solving a constrained variational problem [<xref ref-type="bibr" rid="B8">8</xref>]:</p>
        <disp-formula id="FD2">
          <mml:math>
            <mml:mrow>
              <mml:munder>
                <mml:mrow>
                  <mml:mi>min</mml:mi>
                </mml:mrow>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo>{</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>u</mml:mi>
                        <mml:mi>k</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                    <mml:mo>}</mml:mo>
                  </mml:mrow>
                  <mml:mo>,</mml:mo>
                  <mml:mrow>
                    <mml:mo>{</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>ω</mml:mi>
                        <mml:mi>k</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                    <mml:mo>}</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:munder>
              <mml:mrow>
                <mml:mo>{</mml:mo>
                <mml:mrow>
                  <mml:munderover>
                    <mml:mstyle mathsize="140%" displaystyle="true">
                      <mml:mo>∑</mml:mo>
                    </mml:mstyle>
                    <mml:mrow>
                      <mml:mi>k</mml:mi>
                      <mml:mo>=</mml:mo>
                      <mml:mn>1</mml:mn>
                    </mml:mrow>
                    <mml:mi>K</mml:mi>
                  </mml:munderover>
                  <mml:msubsup>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mo>‖</mml:mo>
                        <mml:mrow>
                          <mml:msub>
                            <mml:mo>∂</mml:mo>
                            <mml:mi>t</mml:mi>
                          </mml:msub>
                          <mml:mrow>
                            <mml:mo>[</mml:mo>
                            <mml:mrow>
                              <mml:mrow>
                                <mml:mo>(</mml:mo>
                                <mml:mrow>
                                  <mml:mi>δ</mml:mi>
                                  <mml:mrow>
                                    <mml:mo>(</mml:mo>
                                    <mml:mi>t</mml:mi>
                                    <mml:mo>)</mml:mo>
                                  </mml:mrow>
                                  <mml:mo>+</mml:mo>
                                  <mml:mfrac>
                                    <mml:mi>j</mml:mi>
                                    <mml:mrow>
                                      <mml:mi>π</mml:mi>
                                      <mml:mi>t</mml:mi>
                                    </mml:mrow>
                                  </mml:mfrac>
                                </mml:mrow>
                                <mml:mo>)</mml:mo>
                              </mml:mrow>
                              <mml:mo>∗</mml:mo>
                              <mml:msub>
                                <mml:mi>u</mml:mi>
                                <mml:mi>k</mml:mi>
                              </mml:msub>
                              <mml:mrow>
                                <mml:mo>(</mml:mo>
                                <mml:mi>t</mml:mi>
                                <mml:mo>)</mml:mo>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo>]</mml:mo>
                          </mml:mrow>
                          <mml:msup>
                            <mml:mtext>e</mml:mtext>
                            <mml:mrow>
                              <mml:mo>−</mml:mo>
                              <mml:mi>j</mml:mi>
                              <mml:msub>
                                <mml:mi>ω</mml:mi>
                                <mml:mi>k</mml:mi>
                              </mml:msub>
                              <mml:mi>t</mml:mi>
                            </mml:mrow>
                          </mml:msup>
                        </mml:mrow>
                        <mml:mo>‖</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mn>2</mml:mn>
                    <mml:mn>2</mml:mn>
                  </mml:msubsup>
                </mml:mrow>
                <mml:mo>}</mml:mo>
              </mml:mrow>
              <mml:mo>,</mml:mo>
              <mml:mtext>
                 
              </mml:mtext>
              <mml:mtext>s</mml:mtext>
              <mml:mtext>.t</mml:mtext>
              <mml:mo>.</mml:mo>
              <mml:mtext>
                 
              </mml:mtext>
              <mml:mtext>
                 
              </mml:mtext>
              <mml:munderover>
                <mml:mstyle mathsize="140%" displaystyle="true">
                  <mml:mo>∑</mml:mo>
                </mml:mstyle>
                <mml:mrow>
                  <mml:mi>k</mml:mi>
                  <mml:mo>=</mml:mo>
                  <mml:mn>1</mml:mn>
                </mml:mrow>
                <mml:mi>K</mml:mi>
              </mml:munderover>
              <mml:mtext>
                 
              </mml:mtext>
              <mml:msub>
                <mml:mi>u</mml:mi>
                <mml:mi>k</mml:mi>
              </mml:msub>
              <mml:mo>=</mml:mo>
              <mml:mi>f</mml:mi>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mi>t</mml:mi>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:msub>
                <mml:mi>u</mml:mi>
                <mml:mi>k</mml:mi>
              </mml:msub>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>The decomposition number <italic>K</italic> was determined through a combination of spectral analysis and trial evaluation on a validation set. <italic>K</italic> = 6 was selected as it clearly separated the series into interpretable components: ultra-high-frequency noise, daily periodicity, weekly periodicity, holiday/monthly periodicity, seasonal trend, and long-term trend.</p>
      </sec>
      <sec id="sec4dot3">
        <title>4.3. Stage 2: CNN-BiLSTM-Attention Forecasting Sub-Network for Each IMF Component</title>
        <p>Each IMF component is input into an independent, identically structured deep neural network consisting of three layers:</p>
        <p>1) 1D Convolutional Layer (1D-CNN)</p>
        <p>Extracts local fluctuation patterns:</p>
        <disp-formula id="FD3">
          <mml:math>
            <mml:mrow>
              <mml:msub>
                <mml:mi>h</mml:mi>
                <mml:mrow>
                  <mml:mi>c</mml:mi>
                  <mml:mi>o</mml:mi>
                  <mml:mi>n</mml:mi>
                  <mml:mi>v</mml:mi>
                </mml:mrow>
              </mml:msub>
              <mml:mo>=</mml:mo>
              <mml:mtext>ReLU</mml:mtext>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>W</mml:mi>
                    <mml:mrow>
                      <mml:mi>c</mml:mi>
                      <mml:mi>o</mml:mi>
                      <mml:mi>n</mml:mi>
                      <mml:mi>v</mml:mi>
                    </mml:mrow>
                  </mml:msub>
                  <mml:mo>∗</mml:mo>
                  <mml:msub>
                    <mml:mi>x</mml:mi>
                    <mml:mrow>
                      <mml:mi>w</mml:mi>
                      <mml:mi>i</mml:mi>
                      <mml:mi>n</mml:mi>
                      <mml:mi>d</mml:mi>
                      <mml:mi>o</mml:mi>
                      <mml:mi>w</mml:mi>
                    </mml:mrow>
                  </mml:msub>
                  <mml:mo>+</mml:mo>
                  <mml:msub>
                    <mml:mi>b</mml:mi>
                    <mml:mrow>
                      <mml:mi>c</mml:mi>
                      <mml:mi>o</mml:mi>
                      <mml:mi>n</mml:mi>
                      <mml:mi>v</mml:mi>
                    </mml:mrow>
                  </mml:msub>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>2) Bidirectional Long Short-Term Memory Layer (BiLSTM)</p>
        <p>Comprehensively learns the dynamic evolution of each IMF within its full context:</p>
        <disp-formula id="FD4">
          <mml:math>
            <mml:mrow>
              <mml:msub>
                <mml:mi>H</mml:mi>
                <mml:mi>t</mml:mi>
              </mml:msub>
              <mml:mo>=</mml:mo>
              <mml:mrow>
                <mml:mo>[</mml:mo>
                <mml:mrow>
                  <mml:msub>
                    <mml:mover accent="true">
                      <mml:mi>h</mml:mi>
                      <mml:mo>→</mml:mo>
                    </mml:mover>
                    <mml:mi>t</mml:mi>
                  </mml:msub>
                  <mml:mo>;</mml:mo>
                  <mml:msub>
                    <mml:mover accent="true">
                      <mml:mi>h</mml:mi>
                      <mml:mo>←</mml:mo>
                    </mml:mover>
                    <mml:mi>t</mml:mi>
                  </mml:msub>
                </mml:mrow>
                <mml:mo>]</mml:mo>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>3) Attention Mechanism Layer:</p>
        <p>Dynamically weights historical time steps, allowing the model to focus on key historical periods relevant to the current forecast [<xref ref-type="bibr" rid="B8">8</xref>]: <inline-formula><mml:math><mml:mrow><mml:mi> C </mml:mi><mml:mo> = </mml:mo><mml:mstyle displaystyle="true"><mml:msub><mml:mo> ∑ </mml:mo><mml:mi> t </mml:mi></mml:msub><mml:mrow><mml:msub><mml:mi> α </mml:mi><mml:mi> t </mml:mi></mml:msub><mml:msub><mml:mi> H </mml:mi><mml:mi> t </mml:mi></mml:msub></mml:mrow></mml:mstyle></mml:mrow></mml:math></inline-formula> . The context vector <italic>C</italic> serves as the final deep feature representation for forecasting that IMF component.</p>
      </sec>
      <sec id="sec4dot4">
        <title>4.4. Stage 3: Multi-Source Feature Fusion and XGBoost Ensemble</title>
        <p>This stage aims to fuse the intrinsic temporal patterns extracted by the deep network with rich external influencing factors.</p>
        <p>1) External Feature Engineering: A feature pool containing over 30 features across 5 categories is constructed (temporal, historical, economic/event, weather, competing transport). Examples include: binary indicators for public holidays and Canton Fair periods; temperature, precipitation, and visibility as weather va- riables; and average ticket prices for competing transport modes (e.g., flights, coaches).</p>
        <p>2) XGBoost Nonlinear Ensemble: The forecasted values of all IMF components are concatenated with the external feature vector as input to XGBoost. XGBoost learns the complex nonlinear mapping between these features and the final passenger flow target, performing residual correction [<xref ref-type="bibr" rid="B4">4</xref>]. Its objective function is:</p>
        <disp-formula id="FD5">
          <mml:math>
            <mml:mrow>
              <mml:msup>
                <mml:mi>L</mml:mi>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mi>t</mml:mi>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:msup>
              <mml:mo>=</mml:mo>
              <mml:munderover>
                <mml:mstyle displaystyle="true" mathsize="140%">
                  <mml:mo>∑</mml:mo>
                </mml:mstyle>
                <mml:mrow>
                  <mml:mi>i</mml:mi>
                  <mml:mo>=</mml:mo>
                  <mml:mn>1</mml:mn>
                </mml:mrow>
                <mml:mi>n</mml:mi>
              </mml:munderover>
              <mml:mtext>
                 
              </mml:mtext>
              <mml:mi>l</mml:mi>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>y</mml:mi>
                    <mml:mi>i</mml:mi>
                  </mml:msub>
                  <mml:mo>,</mml:mo>
                  <mml:msubsup>
                    <mml:mover accent="true">
                      <mml:mi>y</mml:mi>
                      <mml:mo>^</mml:mo>
                    </mml:mover>
                    <mml:mi>i</mml:mi>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mrow>
                          <mml:mi>t</mml:mi>
                          <mml:mo>−</mml:mo>
                          <mml:mn>1</mml:mn>
                        </mml:mrow>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                  </mml:msubsup>
                  <mml:mo>+</mml:mo>
                  <mml:msub>
                    <mml:mi>f</mml:mi>
                    <mml:mi>t</mml:mi>
                  </mml:msub>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>x</mml:mi>
                        <mml:mi>i</mml:mi>
                      </mml:msub>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:mo>+</mml:mo>
              <mml:mi>Ω</mml:mi>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:msub>
                    <mml:mi>f</mml:mi>
                    <mml:mi>t</mml:mi>
                  </mml:msub>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
      </sec>
      <sec id="sec4dot5">
        <title>4.5. Model Performance Evaluation and Comparative Analysis</title>
        <p>The dataset was split into training (Jan 2021-Jun 2023), validation (Jul 2023-Sep 2023), and test (Oct 2023-Dec 2023) sets. Missing values were forward-filled, and outliers beyond three standard deviations were winsorized, as shown in <bold>Table 2</bold>. Evaluation on an independent test set shows the proposed model significantly outperforms benchmark models across multiple metrics [<xref ref-type="bibr" rid="B5">5</xref>], as shown in <bold>Table 3</bold>.</p>
        <p><bold>Table 2.</bold> Model hyperparameters and tuning strategy.</p>
        <table-wrap id="tbl2">
          <label>Table 2</label>
          <table>
            <tbody>
              <tr>
                <td>
                  <bold>Component</bold>
                </td>
                <td>
                  <bold>Hyperparameter</bold>
                </td>
                <td>
                  <bold>Value/Tuning Method</bold>
                </td>
              </tr>
              <tr>
                <td>VMD</td>
                <td>K (modes)</td>
                <td>6 (spectral validation)</td>
              </tr>
              <tr>
                <td>CNN</td>
                <td>Filters</td>
                <td>64 (grid search)</td>
              </tr>
              <tr>
                <td>BiLSTM</td>
                <td>Units</td>
                <td>128 (grid search)</td>
              </tr>
              <tr>
                <td>Attention</td>
                <td>Mechanism</td>
                <td>Bahdanau (fixed)</td>
              </tr>
              <tr>
                <td>XGBoost</td>
                <td>Learning rate</td>
                <td>0.05 (Bayesian optimization)</td>
              </tr>
              <tr>
                <td>
                </td>
                <td>Max depth</td>
                <td>8</td>
              </tr>
              <tr>
                <td>
                </td>
                <td>N estimators</td>
                <td>300</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p><bold>Table 3</bold><bold>.</bold> Comprehensive comparison of model forecasting performance (test set results).</p>
        <table-wrap id="tbl3">
          <label>Table 3</label>
          <table>
            <tbody>
              <tr>
                <td>
                  <bold>Model</bold>
                </td>
                <td>
                  <bold>MAPE (%)</bold>
                </td>
                <td>
                  <bold>RMSE (10</bold>
                  <bold>k)</bold>
                </td>
                <td>
                  <bold>MAE</bold>
                  <bold>(10</bold>
                  <bold>k)</bold>
                </td>
                <td>
                  <bold>R</bold>
                  <bold>
                    <sup>2</sup>
                  </bold>
                </td>
                <td>
                  <bold>Peak Forecast Error (%)</bold>
                </td>
              </tr>
              <tr>
                <td>ARIMA</td>
                <td>8.72</td>
                <td>2.89</td>
                <td>2.21</td>
                <td>0.891</td>
                <td>18.5</td>
              </tr>
              <tr>
                <td>Prophet</td>
                <td>7.15</td>
                <td>2.45</td>
                <td>1.92</td>
                <td>0.922</td>
                <td>15.2</td>
              </tr>
              <tr>
                <td>LSTM</td>
                <td>6.83</td>
                <td>2.31</td>
                <td>1.85</td>
                <td>0.930</td>
                <td>13.8</td>
              </tr>
              <tr>
                <td>XGBoost</td>
                <td>5.94</td>
                <td>2.05</td>
                <td>1.67</td>
                <td>0.945</td>
                <td>11.3</td>
              </tr>
              <tr>
                <td>VMD-LSTM</td>
                <td>5.12</td>
                <td>1.78</td>
                <td>1.45</td>
                <td>0.958</td>
                <td>9.7</td>
              </tr>
              <tr>
                <td>VMD-CNN-BiLSTM- Attention-XGBoost (Ours)</td>
                <td>3.76</td>
                <td>1.32</td>
                <td>1.08</td>
                <td>0.977</td>
                <td>6.4</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <p>Accuracy Improvement: Compared to the second-best model (VMD-LSTM), our model reduces MAPE by a further 26.6%.</p>
        <p>Peak Forecasting Capability: For the National Day holiday peak contained in the test set, our model's forecast error for the single-day maximum passenger volume is only 6.4%, demonstrating excellent fitting ability for extreme values.</p>
      </sec>
    </sec>
    <sec id="sec5">
      <title>5. Forecasting Application: Insights for Passenger Flow Trends and Operational Implications</title>
      <p>Based on the model, forecasts for GSRS passenger flow in 2024 are generated, yielding the following management insights:</p>
      <p>Trend Forecast: The average daily passenger volume in 2024 is projected to reach 382,000, representing a year-on-year growth of approximately 9% [<xref ref-type="bibr" rid="B1">1</xref>]. Growth drivers primarily stem from the deepening integration of the Shenzhen-Hong Kong SAR innovation corridor and the full recovery of the summer tourism market.</p>
      <p>Pressure Warning: The model forecasts that single-day passenger volume during the 2024 National Day holiday may exceed 580,000. It is recommended to activate the “Large Passenger Flow Contingency Plan” in advance.</p>
      <p>Precision Scheduling Suggestions: Based on time-sliced forecast results, the timetable for metro shuttle trains can be dynamically adjusted. For instance, morning peak shuttle trains should precisely correspond to Guangzhou-Shenzhen intercity train arrivals between 7:45 and 8:30, achieving “train arrival-departure synchronization” and reducing average transfer waiting time from 8 minutes to under 5 minutes [<xref ref-type="bibr" rid="B9">9</xref>].</p>
      <p>Practical Implementation Considerations: Deploying such a forecasting system in practice would require reliable real-time data feeds (e.g., from ticket systems, weather APIs), adequate computational resources for model retraining, and staff trained in data science and operational analytics. Collaboration between transport authorities and technical teams would be essential for sustainable implementation.</p>
    </sec>
    <sec id="sec6">
      <title>6. Conclusions and Future Work</title>
      <p>Through an in-depth analysis of big data on passenger flow at GSRS, this study clearly delineates the multi-dimensional patterns of passenger flow in mega HSR hubs across time, space, and structure, and constructs a high-performance intelligent forecasting model. The main conclusions are as follows:</p>
      <p>1) Hub passenger flow exhibits stable multi-level periodicity and significant directional agglomeration, with business and tourist flows forming the main body and displaying distinct behavioral patterns [<xref ref-type="bibr" rid="B1">1</xref>][<xref ref-type="bibr" rid="B2">2</xref>].</p>
      <p>2) The proposed VMD-CNN-BiLSTM-Attention-XGBoost hybrid forecasting model effectively handles the complex characteristics of passenger flow series, with forecasting accuracy significantly improved compared to traditional methods [<xref ref-type="bibr" rid="B7">7</xref>], demonstrating practical application value.</p>
      <p>3) Proactive operational scheduling based on high-accuracy forecasts is key to alleviating hub congestion and improving service quality [<xref ref-type="bibr" rid="B9">9</xref>].</p>
      <p>Future research can expand in two directions: First, incorporating real-time GPS, mobile phone signaling, and other finer-grained data to track and forecast the microscopic movement trajectories of passengers within the hub [<xref ref-type="bibr" rid="B2">2</xref>]. Second, integrating the forecasting model with a digital twin platform to develop a visual, interactive passenger flow simulation and decision support system, advancing HSR hub operation towards genuine “smart” management.</p>
    </sec>
    <sec id="sec7">
      <title>Funding</title>
      <p>1) National Natural Science Foundation of China: 2022 Guangdong Provincial Key Platform and Research Project “Innovation Platform for Integration of Industry and Education in Guangdong-Hong Kong SAR-Macao Rail Transit” (Grant No. 2022CJPT016); 2) Teaching and Research Cultivation Project of Guangzhou Railway Polytechnic: Research and Practice of Modular Teaching in the Context of Sino-Foreign Cooperative Education—A Case Study of Railway Transportation Operation Management Major” (Project No.: GTXYY2203).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="B1">
        <label>1.</label>
        <citation-alternatives>
          <mixed-citation publication-type="report">Guangzhou Municipal Transportation Bureau (2024) Guangzhou Transportation Operation Annual Report (2023).</mixed-citation>
          <element-citation publication-type="report">
            <year>2024</year>
            <article-title>Guangzhou Transportation Operation Annual Report (2023)</article-title>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B2">
        <label>2.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Wang, X.D. and Li, S.M. (2023) Urban Rail Transit Passenger Flow Characteristics Analysis Based on Multi-Source Data Fusion. <italic>Journal of Transportation Systems Engineering and Information Technology</italic>, 23, 45-52.</mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Wang, X.D.</string-name>
              <string-name>Li, S.M.</string-name>
            </person-group>
            <year>2023</year>
            <article-title>Urban Rail Transit Passenger Flow Characteristics Analysis Based on Multi-Source Data Fusion</article-title>
            <source>Journal of Transportation Systems Engineering and Information Technology</source>
            <volume>23</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B3">
        <label>3.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Wang, X.D. and Li, S.M. (2023) Identification and Governance of the “Last Mile” Transfer Efficiency Bottleneck in Comprehensive Transportation Hubs. <italic>Journal of Transportation Systems Engineering and Information Technology</italic>, 2, 1-10.</mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Wang, X.D.</string-name>
              <string-name>Li, S.M.</string-name>
            </person-group>
            <year>2023</year>
            <article-title>Identification and Governance of the “Last Mile” Transfer Efficiency Bottleneck in Comprehensive Transportation Hubs</article-title>
            <source>Journal of Transportation Systems Engineering and Information Technology</source>
            <volume>2</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B4">
        <label>4.</label>
        <citation-alternatives>
          <mixed-citation publication-type="confproc">Chen, T. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. <italic>Proceedings of the</italic> 22 <italic>nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</italic>, San Francisco, 13-17 August 2016, 785-794. https://doi.org/10.1145/2939672.2939785 <pub-id pub-id-type="doi">10.1145/2939672.2939785</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/2939672.2939785">https://doi.org/10.1145/2939672.2939785</ext-link></mixed-citation>
          <element-citation publication-type="confproc">
            <person-group person-group-type="author">
              <string-name>Chen, T.</string-name>
              <string-name>Guestrin, C.</string-name>
              <string-name>Mining, S</string-name>
            </person-group>
            <year>2016</year>
            <article-title>XGBoost: A Scalable Tree Boosting System</article-title>
            <source>Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
            <volume>13</volume>
            <pub-id pub-id-type="doi">10.1145/2939672.2939785</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B5">
        <label>5.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Liu, J.H. and Wang, Z.Q. (2022) A Review of Deep Learning-Based Models for Traffic Passenger Flow Forecasting. <italic>Journal of the China Railway Society</italic>, 44, 1-12.</mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Liu, J.H.</string-name>
              <string-name>Wang, Z.Q.</string-name>
            </person-group>
            <year>2022</year>
            <article-title>A Review of Deep Learning-Based Models for Traffic Passenger Flow Forecasting</article-title>
            <source>Journal of the China Railway Society</source>
            <volume>44</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B6">
        <label>6.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. <italic>Neural</italic><italic>Computation</italic>, 9, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735 <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id><pub-id pub-id-type="pmid">9377276</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1162/neco.1997.9.8.1735">https://doi.org/10.1162/neco.1997.9.8.1735</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Hochreiter, S.</string-name>
              <string-name>Schmidhuber, J.</string-name>
            </person-group>
            <year>1997</year>
            <article-title>Long Short-Term Memory</article-title>
            <source>Neural Computation</source>
            <volume>9</volume>
            <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id>
            <pub-id pub-id-type="pmid">9377276</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B7">
        <label>7.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Dragomiretskiy, K. and Zosso, D. (2014) Variational Mode Decomposition. <italic>IEEE</italic><italic>Transactions</italic><italic>on</italic><italic>Signal</italic><italic>Processing</italic>, 62, 531-544. https://doi.org/10.1109/tsp.2013.2288675 <pub-id pub-id-type="doi">10.1109/tsp.2013.2288675</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/tsp.2013.2288675">https://doi.org/10.1109/tsp.2013.2288675</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Dragomiretskiy, K.</string-name>
              <string-name>Zosso, D.</string-name>
            </person-group>
            <year>2014</year>
            <article-title>Variational Mode Decomposition</article-title>
            <source>IEEE Transactions on Signal Processing</source>
            <volume>62</volume>
            <pub-id pub-id-type="doi">10.1109/tsp.2013.2288675</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B8">
        <label>8.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Polosukhin, I., <italic>et al.</italic> (2017) Attention Is All You Need. <italic>Advances in Neural Information Processing Systems</italic>, 30, 5998-6008.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Vaswani, A.</string-name>
              <string-name>Shazeer, N.</string-name>
              <string-name>Parmar, N.</string-name>
              <string-name>Uszkoreit, J.</string-name>
              <string-name>Jones, L.</string-name>
              <string-name>Gomez, A.N.</string-name>
              <string-name>Polosukhin, I.</string-name>
            </person-group>
            <year>2017</year>
            <article-title>Attention Is All You Need</article-title>
            <source>Advances in Neural Information Processing Systems</source>
            <volume>30</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B9">
        <label>9.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Chen, L. and Zhang, D. (2022) Forecasting High-Speed Rail Passenger Demand with Hybrid ARIMA and Machine Learning Models. <italic>Transportation Research Part A</italic>: <italic>Policy and Practice</italic>, 156, 78-92.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Chen, L.</string-name>
              <string-name>Zhang, D.</string-name>
            </person-group>
            <year>2022</year>
            <article-title>Forecasting High-Speed Rail Passenger Demand with Hybrid ARIMA and Machine Learning Models</article-title>
            <source>Transportation Research Part A: Policy and Practice</source>
            <volume>156</volume>
          </element-citation>
        </citation-alternatives>
      </ref>
    </ref-list>
  </back>
</article>