Dynamic Causal Relationships in Stock Market: A Three-Dimensional Granger Causality Network Approach

Panmeng Huang; Yuan Liu; Huanghao Chen; Jerome Yen; Naixue Xiong; Hui Bu

doi:10.4236/jss.2026.144046

Open Journal of Social Sciences > Vol.14 No.4, April 2026

Dynamic Causal Relationships in Stock Market: A Three-Dimensional Granger Causality Network Approach

Panmeng Huang^1*, Yuan Liu^1*, Huanghao Chen¹, Jerome Yen^1#

, Naixue Xiong², Hui Bu³
¹Faculty of Science and Technology, University of Macau, Macau SAR, China.
²School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China.
³School of Economics and Management, Beihang University, Beijing, China.
DOI: 10.4236/jss.2026.144046 PDF HTML XML 40 Downloads 289 Views

Abstract

Traditional Granger-causality framework relies only on the movement of asset prices. However, clustering of stocks needs more information to identify the strength of their bonding, including but not limited to trading volume and volatility. In order to build a more robust portfolio formation framework, in this paper, a three-dimensional Granger-causality framework is proposed to study the dynamic causal relationships among stocks in the Hong Kong (SAR, China) stock market (HKG). Toda-Yamamoto Granger test (TY-Granger test) was conducted by using the constituent stocks of the Hang Seng Index (HSI). The data set contains the prices, trading volume, and volatility from February 2020 to December 2024, where 60 monthly Granger (involving approximately 20 trading days per month) causality networks were constructed. By combining each daily Granger causality matrix within that month using an e-value-based method, stocks were classified via multi-layer directed causal network feature extraction by Gaussian Mixture Model (GMM) based unsupervised clustering into three different types of clusters: Influential, Affected, and Isolated. Such classification has proven to be useful in supporting the formation of portfolio under different market conditions.

Keywords

Granger Causality, Hang Seng Index, Gaussian Mixture Model, Multi-Layer Directed Causal Network

Share and Cite:

Huang, P. , Liu, Y. , Chen, H. , Yen, J. , Xiong, N. and Bu, H. (2026) Dynamic Causal Relationships in Stock Market: A Three-Dimensional Granger Causality Network Approach. Open Journal of Social Sciences, 14, 907-925. doi: 10.4236/jss.2026.144046.

1. Introduction

Since Granger (1969) proposed the causality test, it has been widely applied in the causality analysis of financial assets, risk transmission, and market microstructure studies (Granger, 1969; Sims, 1980). Granger causality test is a statistical based time series analysis, designed to determine whether one variable can predict the changes in others. The majority of previous studies have focused on cross-market relationships, such as those between stock indices and commodities like oil and gold, or among different stock markets. For example, Zhang et al. (2023) found that during COVID-19, both the stock market and the bond market acted as net transmitters of risk spillovers. There are some studies focus on causal relationships at the individual stock level. Bai, Cui, & Zhang (2018) combine the K-means algorithm with the Granger causality test to analyze the mutual relationships among the returns of forty individual stocks on Shanghai A-shares based on the cluster. Additionally, complicated network models have been widely adopted based on Granger causality to identify the asset clustering (Gao et al., 2018; Tang et al., 2019; Huang et al., 2021; de Pontes & Rêgo, 2022). These approaches aim to construct a network where various assets form nodes and significant Granger-causal relationships form edges. Examining these edges’ strength identifies asset clusters, unveiling the asset influence and risk pathways across different market phases. Bu, Tang, & Wu (2019) used complex network analysis, Granger causality tests and the improved PageRank algorithm on CSI 300 Index constituent stocks from 2006 to 2016. Their study revealed China’s stock market co-movements intensified during this period and the strength of institutional buying power increased. However, existing Granger test focuses more on the causal relationship between variables in one or two dimensions and lacks consideration of the causal relationship between variables in multiple dimensions. Moreover, existing stock clustering methods based on complex networks typically overlook the discrepancy of strengths of causal relationships across different clusters.

As an Asian financial hub, Hong Kong SAR, China serves as a pivotal window for capital flows between China and Western countries. This paper focuses on researching the causal relationship between Hang Seng Index (HSI) constituent stocks. The study aims to explore the evolution of stock causality and the factors affecting its strength. In this study, we not only construct a novel three-dimensional Granger causality framework but also propose a novel framework that innovatively combines e-value method (Vovk & Wang, 2021) to construct monthly Granger matrix. Based on the monthly Granger matrix, we further employ a multi-layer directed causal network feature extraction method combined with Gaussian mixture model (GMM) to cluster stocks, and we divide stocks into three clustering types based on their Granger causality strength. Through our research, we have found that the relationships between HSI constituent stocks exhibit significant time-varying characteristics and structural differences, especially with the adjustment of HSI weighted stocks after 2021.

This paper makes the following three contributions: Firstly, we pioneered the construction of a Granger causality matrix incorporating stock returns, trading volume and return volatility. The Granger method is enriched by our consideration of the three-dimensional causal relationships between stocks.

Secondly, we propose a novel approach to merge daily Granger causality matrices into a monthly Granger matrix by leveraging the e-value merging method that can more accurately reflect the causal strength of stocks over a month. Based on monthly Granger matrix and GMM clustering method, we innovatively categorized HSI constituent stocks into three cluster types (Influence, Affected and Isolated) that reflect the direction and strength of causal relationships between stocks.

Thirdly, analysis of stock clustering results shows that role allocation has time-varying and cyclical characteristics. During external shocks or periods of rapid expectation shifts, constituent stocks undergo more concentrated role rearrangements. Moreover, research findings indicate that the volatility of high market value and low isolated stocks is similar to that of the HSI, while low-isolated stocks show better anti-risk performance. The excess returns of low market value and low isolated stocks are better than those of low market value and high isolated stocks.

The remainder of this paper is structured around 4 main parts. Section 2 reviewed the existing results of Granger causality test, the main application scenarios of GMM and complex network. Section 3 introduces the methodology and data we used in this research. Section 4 discusses empirical results. Finally, Section 5 concludes the study.

2. Literature Review

2.1. Granger Effects

Numerous studies have combined the Granger causality test with other econometric techniques like co-integration theory, the vector autoregressive (VAR) model and the vector error correction model (VECM) to explore the relationship between different financial variables, especially in asset price. Ben Jebli & Ben Youssef (2016) explored the short and long-run relationships between carbon emissions, economic growth, and energy consumption in Tunisia from 1980 to 2011 by using the VECM. The results showed that there was a long-term bidirectional relationship among the variables and there was a long-term bidirectional causal relationship between all the considered variables. Ahmed et al. (2017) conducted an analysis on Pakistan’s KSE 100 Index, finding that interest rate variables exhibit a significant unidirectional Granger causality on the stock index. However, the limitation of the above research is that they don’t really get into how these cause-and-effect relationships change over time.

Another limitation of the traditional Granger causality test is its requirement that input variables must be stationary or cointegrated, but many market data cannot meet this requirement. In 1995, Toda & Yamamoto (1995) proposed the TY-Granger test model that does not require the input data to be stationary and allows for integration of any order. This approach mitigates biases arising from non-stationarity in traditional methods, making it particularly suitable for analyzing non-stationary time series. Le & Chang (2015) employed the TY-Granger test to examine the causality between oil and stock prices. Ghosh & Kanjilal (2016) combined nonlinear threshold cointegration and TY-Granger test to examine the causal relationships among international crude oil prices, SENSEX, and the rupee-dollar exchange rate across three phases partitioned by the 2008 financial crisis. The study found that crude oil prices exhibited unidirectional causal effects on both the Indian stock market and the rupee-dollar exchange rate in each phase.

In addition to studying the Granger causality between asset prices, the trading volume and volatility of assets are also considered variables. Trading volume is an important indicator reflecting market activity, and the size of trading volume directly reflects the supply and demand relationship and trend of the market. Silvapulle & Choi (1999) employed linear and non-linear Granger causality tests in the South Korean market, revealing a significant bidirectional causal relationship between stock returns and trading volume. Gündüz & Hatemi-J (2005) used TY-Granger test to study stock prices and trading volume in Central and Eastern Europe and Türkiye, finding significant differences in causal links across different markets. Rashid (2007) focused on the dynamic relationship between stock prices and trading volumes at the Karachi Stock Exchange. He discovered that trading volumes have major nonlinear predictive power over stock returns, whereas stock returns display linear causality towards trading volumes. Abinaya et al. (2016) utilize TY-Granger test to study the causality between stock prices and trading volume by using high-frequency minute data of Nifty 50 companies from July 2014 to June 2015. The study confirmed causal relationships between profitability and leverage for all 29 firms.

In financial markets, volatility measures the magnitude of price or return fluctuations of an underlying asset over a certain period of time. Będowska-Sójka & Kliber (2019) first confirmed bidirectional liquidity-volatility causality in emerging markets, with liquidity’s impact on volatility more pronounced. Shahzad et al. (2021) employed a time-varying Granger causality test to examine dynamic return-volatility causality across three commodity indices: energy, agriculture, and precious metals. The analysis indicates that energy exhibits more frequent associations with the other two categories compared to agriculture’s connections with precious metals. Khurshid & Kirkulak-Uludag (2021) employed Granger causality test and VAR-GARCH model to investigate the volatility spillover effects between oil price fluctuations and stock market returns in China, Brazil, and the other five countries. The results show that all seven countries’ stock markets exhibit positive but low constant conditional correlations with oil assets. Dutta (2018) identified key links between global oil prices and the U.S. energy sector via various volatility indices, finding significant short- and long-run causal relationships between Global Oil Volatility and U.S. Energy Sector Equity Volatility, with the oil market leading the energy sector equity market during turmoil. Similar research by Dai et al. (2020) illustrated that the fear index had a significant impact on realized stock market volatility in five selected developed markets.

Although numerous scholars have conducted Granger causality analyses among asset price or return, trading volume, and volatility, no existing research has simultaneously incorporated these three important elements to examine causal relationships between assets or stocks. If only a single factor is considered, the causal effects of other variables may be overlooked.

2.2. P-Value Merging Methods

Multiple testing of a single hypothesis is typically formulated as the task of integrating a set of p-values. The Fisher method is a classical statistical approach for aggregating multiple independent p-values into a single comprehensive conclusion. An alternative to Fisher’s method is Stouffer’s method (Stouffer, 1977). The idea is to transform the p-values to z-scores, then compute a combined z-score by averaging the individual z-scores. However, the above two methods both have two limitations: First, the method is highly sensitive to extremely small p-values, which can lead to the erroneous conclusion that Granger causality is significant for an entire month based on only a few significant days within that month. Secondly, even if no daily p-value is below 0.05, accumulating relatively small ones may still reject the null of significant monthly Granger causality, thereby increasing the risk of a Type I error.

The Bonferroni method (Dunn, 1961) is another classic and fundamental statistical method in the field of multiple tests and comparisons. Its primary objective is to strictly control the family-wise error rate (FWER) occurring in multiple tests. The Holm Bonferroni method (Holm, 1979) is a dynamically optimized version of the traditional Bonferroni method. It has higher statistical power while maintaining FWER control. However, the Bonferroni and Holm methods cannot be applied to sequential data, and these methods are essentially corrections used for multiple hypothesis testing rather than for consolidating evidence.

Vovk & Wang (2021) proposed a novel merging method based on E-values. E-values are non-negative random variables calibrated to directly quantify the strength of evidence against a null hypothesis, which allows flexible and valid combination through convex mixtures (e.g., arithmetic mean) without strict independence assumptions and is less sensitive to extreme values. Unlike p-values, e-values provide a continuous and interpretable measure of evidence strength that larger values indicate stronger evidence for zero. They further demonstrated that the merging performance of e-values is superior to traditional p-value merging methods such as fisher method and Bonferroni method, as it can directly quantify the strength of evidence and avoid subjective threshold dependence.

2.3. Network-Based Causal Modeling

Network-based finance has increasingly modelled inter-asset dependence as graphs, with Granger causality naturally yielding directed networks that encode predictive influence rather than symmetric co-movement. Papana et al. (2017) illustrate this perspective by constructing time-varying financial networks from causality measures and demonstrate that network connectivity is regime-dependent, with stronger connectivity during crisis periods. Given that stock interactions propagate through multiple channels, multilayer network theory provides a principled justification for representing different relation types as separate layers instead of collapsing them into a single graph (Kivelä et al., 2014; Boccaletti et al., 2014). In accordance with extant literature on the subject, the methodology employed in this study involves the mapping of each stock into a causal-role feature space via multi-layer directed Granger-causality network feature extraction (a process which captures both influence exerted and influence received using standard directed-network importance measures such as PageRank and HITS) (Page et al., 1999; Kleinberg, 1999). Subsequently, a Gaussian Mixture Model (GMM) is employed to identify distinct latent role groups in a probabilistic, model-based manner (McLachlan & Peel, 2000). Finally, in order to mitigate spurious month-to-month label switching, temporal persistence is imposed using an HMM and a stable role path is decoded via the Viterbi algorithm (Rabiner, 1989; Forney, 1973). This framework directly addresses the limitation of similarity-driven clustering by separating stocks according to explicit influence/affectedness patterns in the causal network, ensuring stocks are classified by their causal functions in the portfolio.

3. Data and Research Methodology

In this section, we present a comprehensive overview of the methodological framework and techniques employed to construct stock clusters, shown in Figure 1.

Figure 1. Experimental framework.

3.1. The Toda Yamamoto Granger Causality Test

We used Toda and Yamamoto’s (1995) causality test to examine the causality between each stock’s daily log return, daily log trading volume and volatility calculated by the 5-day rolling standard deviation of daily logarithmic returns. Suppose there are N (N represents the total number of selected stocks) stocks with 3*N variables: For any two different variables $X_{t}$ and $Y_{t}$ , we conduct a pairwise Granger causality test. For any pair of variables $X_{t}$ and $Y_{t}$ , we establish following two equations:

Equation (1): Test whether Y Granger-causes X.

$X_{t} = α + \sum_{i = 1}^{p} β_{i} X_{t - i} + \sum_{i = p + 1}^{p + d_{m a x}} β_{i} X_{t - i} + \sum_{i = 1}^{p} θ_{i} Y_{t - i} + \sum_{i = p + 1}^{p + d_{m a x}} θ_{i} Y_{t - i} + ε_{1 t}$ . (1)

Equation (2): Test whether X Granger-causes Y.

$Y_{t} = φ + \sum_{i = 1}^{p} ω_{i} Y_{t - i} + \sum_{i = p + 1}^{p + d_{m a x}} ω_{i} Y_{t - i} + \sum_{i = 1}^{p} ρ_{i} X_{t - i} + \sum_{i = p + 1}^{p + d_{m a x}} ρ_{i} X_{t - i} + ε_{2 t} .$ (2)

In Equation (1) and Equation (2), the p represents the optimal lag order, where $d_{m a x}$ is the global maximum order of integration; both of which are essential parameters in the VAR model. The items $α$ , $φ$ , $β_{i}$ , $θ_{i}$ , $ω_{i}$ , $ρ_{i}$ are scalar coefficients for the model. Additionally, error items are presented by $ε_{1 t} ~ N (0, σ_{ε_{1}}^{2})$ and $ε_{2 t} ~ N (0, σ_{ε_{2}}^{2})$ .

To implement TY-Granger test, the following testing procedures are as follows:

(1) Optimal Lag Selection

The VAR model requires selecting the optimal lag order. Insufficient lags may lead to residual autocorrelation and model specification bias, while excessive lags can result in loss of degrees of freedom, multicollinearity, and reduced test power. We determine the optimal lag p using the corrected Akaike Information Criterion (AICc) (Hurvich & Tsai, 1989) that is well-suited for addressing the small-sample, high-parameter dataset analyzed. Compared to the standard Akaike Information Criterion (AIC) (Akaike, 1974), AICc incorporates an additional small-sample correction term, which mitigates the small-sample underestimation bias inherent in AIC.

$AICc = - 2 \ln (L) + 2 k + \frac{2 k (k + 1)}{(s - k - 1)}$ . (3)

where L is the likelihood function value of the model; k denotes the total number of parameters to be estimated in the model; s represents the sample size.

(2) Stationarity test and determination of the maximum integration order

The VAR model requires that the data must be stationary. The range of the differencing order $d$ we select is [0, 1, 2] because Economic variables will not exhibit random trends of order three or higher. For each variable $X_{t}$ and $Y_{t}$ , we respectively introduce the Augmented Dickey-Fuller (ADF) test and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test to examine whether the differenced sequences generated by the differencing orders $d$ satisfy stationarity. Only when the ADF tests reject the “non-stationary” null hypothesis and the KPSS test fails to reject the “stationary” null hypothesis is a differenced sequence considered stationary.

(3) Hypothesis Testing

In the TY-Granger test, we test the null hypothesis $H_{0}$ : $X_{t}$ does not Granger-cause $Y_{t}$ or $Y_{t}$ does not Granger-cause $X_{t}$ against the alternative hypothesis $H_{1}$ : $X_{t}$ does Granger-cause $Y_{t}$ or $Y_{t}$ does Granger-cause $X_{t}$ .

3.2. Construction of Monthly Granger Causality Matrix

(1) Daily Granger Causality

We calculate the daily Granger causality matrix for the day using data from 20 days prior to the trading day. The data contains 3 dimensions for the constituent stocks of HSI. For each pair of variables $X_{t}$ and $Y_{t}$ , the TY-Granger test is performed on the data window. Importantly, each test produces a p-value. The p-values of all variable pairs are populated into an 3N × 3N matrix $P_{t}$ , which is referred to as the daily Granger causality matrix. Through the above procedure, we obtain a total of n daily Granger causality matrices $P_{1}, P_{2}, \dots, P_{n}$ , for the month.

(2) Daily Granger Causality Monthly Granger E-value Merging Method

To enhance the robustness of the study and mitigate the interference of short-term noise and extreme values, we aggregate the daily Granger causality matrix into a monthly Granger matrix. In this paper, we used e-values method Vovk & Wang (2021) to merge the daily Granger matrix to monthly Granger matrix. For each variable pair $X_{t}$ and $Y_{t}$ in the monthly integration matrix M, we prior to extract the p-values for the same variable $X_{t}$ and $Y_{t}$ from n daily matrices to form a p-value sequence with length n: $p^{(X_{t} \to Y_{t})} = [\begin{matrix} p_{1}^{(X_{t} \to Y_{t})}, p_{2}^{(X_{t} \to Y_{t})}, \dots, p_{n}^{(X_{t} \to Y_{t})} \end{matrix}] \cdot p^{(X_{t} \to Y_{t})}$ contains the independent statistical evidence for the causal relationship “ $X_{t} \to Y_{t}$ ” on all trading days. Then convert all p-value in $p^{(X_{t} \to Y_{t})}$ to an e-value using the integral calibration Equation (4) (Vovk & Wang 2021):

$e = F (p) = \frac{1 - p + p \ln p}{p {(\ln p)}^{2}}$ . (4)

And we get the $e^{(X_{t} \to Y_{t})}$ sequence with length n: $e^{(X_{t} \to Y_{t})} = [\begin{matrix} e_{1}^{(X_{t} \to Y_{t})}, e_{2}^{(X_{t} \to Y_{t})}, \dots, e_{n}^{(X_{t} \to Y_{t})} \end{matrix}]$ .

Secondly, we merge the daily e-values into a single monthly e-value using the arithmetic mean method (Equation (5)) (Vovk & Wang 2021). The Equation for merged monthly e-value for “ $X_{t} \to Y_{t}$ ” is:

${\bar{e}}^{(X \to Y)} = \frac{1}{n} \sum_{i = 1}^{n} e_{i}^{X \to Y}$ . (5)

Thirdly, we consider a causal relationship significant at the monthly level if e-value is larger than $\sqrt{10}$ . This threshold corresponds to “substantial evidence” against the null hypothesis that introduced by Vovk & Wang (2021).

Finally, we iterate through all 3N×3N variable pairs, repeat the above processes, and complete the construction of the entire monthly Granger causality matrix M.

3.3. Multilayer Directed-Network-Based Classification of Stock Role

This chapter converts monthly Granger-causal relations among return, trading volume, and volatility into a multilayer network, extracts node-level features, and assigns each stock a monthly economic role. A 3-state Hidden Markov Model (HMM) smooths role sequences over time.

(1) Construction of the Multilayer Directed Network

For month , let the set of constituent stocks be $S_{t} = {1, 2, \dots, N_{t}}$ . We represent the three market dimensions as three separate directed layers: $l = 1$ (LogReturn), $l = 2$ (LogVolume), and $l = 3$ (Volatility). Let $(i, l) \to (j, m)$ denote a significant monthly relation from stock $i$ in layer $l$ to stock $j$ in layer $m$ .

For each layer $l$ , we construct a weighted adjacency matrix $W_{t}^{(l)} = (w_{i j}^{(l)})$ by counting significant links from $(i, l)$ to stock $j$ across target layers:

$w_{i j}^{(l)} = \sum_{m = 1}^{3} 1 {(i, l) \to (j, m) i s s i g n i f i c a n t}, i \neq j .$ (6)

Hence $w_{i j}^{(l)} \in {0, 1, 2, 3}$ . The monthly multilayer directed network is represented by ${W_{t}^{(1)}, W_{t}^{(2)}, W_{t}^{(3)}}$ , and is used for feature extraction and role identification.

(2) Node-Level Structural Features

For each month $t$ and each layer $l$ , a compact set of directed-network features is computed in order to capture influence exerted and influence received. These features include weighted out/in-strength, weighted PageRank, and HITS hub/authority scores.

Weighted Out-Strength and In-Strength

$s_{i, o u t}^{(l)} = \sum_{j = 1}^{N_{t}} w_{i j}^{(l)}, s_{i, i n}^{(l)} = \sum_{j = 1}^{N_{t}} w_{j i}^{(l)} .$ (7)

Given $W_{t}^{(l)}$ , the weighted out-strength and in-strength of stock $i$ are:

We set $w_{i i}^{(l)} = 0$ to exclude self-loops.

Weighted PageRank:

Let $P^{(l)}$ be the row-normalised transition matrix derived from $W_{t}^{(l)}$ (with standard handling of zero-outdegree nodes). With damping factor $α \in (0, 1)$ , PageRank satisfies:

$p^{(l)} = α {(P^{(l)})}^{⊤} p^{(l)} + (1 - α) \frac{1}{N_{t}} 1 .$ (8)

Here $1$ denotes the all-ones vector.

HITS Hub and Authority Scores:

The HITS model differentiates between strong “senders” (hubs) and strong “receivers” (authorities). For each layer $l$ , hub and authority vectors are obtained via the coupled updates:

$a^{(l)} \leftarrow {(W_{t}^{(l)})}^{⊤} h^{(l)}, h^{(l)} \leftarrow W_{t}^{(l)} a^{(l)} .$ (9)

The iterations are normalised and repeated until convergence, yielding $h^{(l)}$ and $a^{(l)}$ .

(3) Robust Normalisation and Cross-Layer Aggregation

In order to obtain comparable features across stocks within the same month, it is necessary to robustly normalise each raw feature using the median and interquartile range (IQR):

$z = \frac{x - m}{m a x (I Q R, z)} .$ (10)

where $m$ is the sample median and $ε > 0$ is a small constant for numerical stability. For each feature $f \in F = {o u t, i n, p a g e r a n k, h u b, a u t h o r i t y}$ , we compute $z_{i, f}^{(l)}$ within month $t$ and aggregate across layers by ${\tilde{z}}_{i, f} = m e d i a n_{l \in {1, 2, 3}} z_{i, f}^{(l)}$ . We also define a composite connectivity score $m o t a l_{i} = \sum_{j \in F} {\hat{ε}}_{i j}$ and the final feature vector:

$x_{i} = {({\tilde{z}}_{i, o u t}, {\tilde{z}}_{i, i n}, {\tilde{z}}_{i, p a g e r a n k}, {\tilde{z}}_{i, h u b}, {\tilde{z}}_{i, a u t h o r i t y}, t o t a l)}_{i}^{⊤} .$ (11)

(4) Unsupervised Role Discovery via Gaussian Mixture Model

For each month $t$ , we fit a $K = 3$ component Gaussian mixture model (GMM) to ${x_{i}}_{i}^{N_{i}}$ with parameters $θ = {(π_{k}, u_{k}, Σ_{k})}_{k = 1}^{3}$ . The posterior responsibility of component $k$ for stock $i$ is:

$γ_{i k} = P r (z_{i} = k | x_{i}; θ) = \frac{π_{k} N (x_{i} | μ_{k}, Σ_{k})}{\sum_{e = 1}^{3} π_{e} N (x_{i} | μ_{e}, Σ_{e})} .$ (12)

We assign a hard cluster label by ${\hat{c}}_{i} = a r g m a x_{s \in (1, 2, 3)} γ_{s}$ , while retaining $γ_{i k}$ as an uncertainty indicator if needed.

(5) Mapping Cluster Centres to Economic Roles

The three GMM components are mapped to economically interpretable roles using the component-wise feature means $μ_{k, f}$ . The initial identification of the most isolated component is as follows:

$k_{i s o} = \arg \min_{k \in {1, 2, 3}} μ_{k, t o t a l} .$ (13)

Among the remaining components, we define an influence score:

$S_{k} = μ_{k, o u t} + μ_{k, h u b} + μ_{k, p a g e r a n k} - μ_{k, i n}, k \in {1, 2, 3} ∖ {k_{i s o}} .$ (14)

Let $k_{i n f} = a r g m a x S_{k}$ . We label component $k_{i n f}$ as Influence, component $k_{i s o}$ as Isolated, and the remaining component as Affected. The monthly role label for stock $i$ is $C a t e g o r y_{i, t} = r o l e ({\hat{c}}_{i})$ .

(6) Temporal Smoothing of Role Sequences

It is important to note that, due to the independent nature of the clustering process conducted on a monthly basis, the raw sequence ${C a t e g o r y_{i, t}}$ may exhibit short-lived fluctuations. The smoothing of roles over time is achieved by means of a 3-state HMM with a sticky transition structure:

$T_{s s} = p_{s t a y}, T_{s s^{'}} = \frac{1 - p_{s t a y}}{2} (s^{'} \neq s) .$ (15)

Using emission probabilities proportional to $γ_{i k}$ , we decode the most probable role path via the Viterbi algorithm and obtain smoothed labels $\underset{i, t}{\tilde{C a t e g o r y}}$ . The monthly role labels serve as the grouping variable for downstream cross-role comparison.

3.4. Dataset

The stock portfolio we select is the HSI, the most representative benchmark index for the Hong Kong (SAR, China) stock market (HKG). The observation period spans from January 1, 2020, to December 31, 2024, encompassing 1227 trading days, and the average trading days of each month are about 20 days. As the daily Granger causality matrix requires data from the previous 20 trading days, the actual data range starts from December 2019. The data was first partitioned into daily observations for each month, and for each trading day within a month, the TY-Granger test was carried out using the data from the previous 20 trading days. Stocks that have been suspended for more than 10 days will be excluded. To ensure data continuity, a forward-filling method is employed: the price of a suspended stock on the trading day in question is filled with the price from the previous trading day, while the trading volume is set to 0. In order to eliminate dimensional differences between variables, we performed logarithmic processing on the daily return and trading volume. We annually update our selection of HSI constituent stocks based on the HSI year-end reports from 2020 to 2024. The number of constituent stocks stood at 52 in 2020, 64 in 2021, 76 in 2022, 82 in 2023, and 83 in 2024. The source of daily stock data and the stock market capitalization data is from Yahoo Finance.

4. Empirical Results

4.1. Cluster Result

Figure 2 shows the change in the number of cluster stocks over five years. The horizontal axis represents the year and month, while the vertical axis represents the number of stocks in cluster.

Figure 2. Change in the number of cluster stocks from 2020 to 2024.

Figure 2 reveals that the quantities of all three stock types exhibited significant volatility over time; Among them, the numbers of affected stocks and isolated stocks fluctuated sharply, while the quantity of influence stocks showed relatively mild fluctuations. Additionally, from 2020 to 2024, Isolated Type stocks were the majority, while Influence Type stocks were the minority. Most stocks were in a “weak link” in the causal network. The dominance of isolated stocks also indicates the significant effect of the HSI adjustment and optimization plan implemented after 2021. This optimization focuses on introducing more representative stocks from different industries to ensure that the industry distribution of constituent stocks is closer to real market conditions. This optimization improves the coverage of the index to the overall market, while also achieving a reasonable level of representativeness for individual industry groups to better reflect the overall trend of the HKG. The results also indicate that in most of the time, especially during the period from August 2021 to December 2022, affected stocks show a negative correlation with isolated stocks, reflecting that the market was mainly undergoing style rotation between trend-following and non-trend-following stocks, while Influence Type stocks may exert significant impacts on other stocks under specific conditions as their market behavior and performance can potentially guide or reshape relevant trading trends.

4.2. The Relationship between Cluster Type and Market Value Factor

Market value is an important indicator for measuring the value of a company. Since market capitalization exhibits a strong correlation with stock price movements and considering our goal of constructing stock clusters that can aid in predicting future price trends, we integrate market capitalization with cluster types for a more comprehensive analytical framework. Figure 3 provides an intuitive representation of the intrinsic correlation between market capitalization scale and clusters. The x‑axis denotes the average market capitalization of each stock during the observation period. The y‑axis represents the proportion of 60 months in which each stock was classified as isolated. The color of each scatter point indicates the proportion of months in which the stock was categorized as either influential or affected. We can see that the higher the market value, the lower the overall isolation of stock.

We further examined stocks distributed across the four regions defined by high and low market capitalization and high and low isolation values (a total of 35 stocks). The selected stock is present at Table 1.

The beta coefficient is used to measure the sensitivity of assets to market fluctuations (Sharpe 1964). Introducing beta coefficient helps us further analyze the discrepancy in four different stock market capitalization categories of stocks. Figure 4 presents the 5-year average beta coefficients of the four stock types, and it illustrates that the beta coefficients of high-market-cap stocks are larger than 1, whereas those of low-market-cap stocks are smaller than 1.

Figure 3. 5-year average: market cap vs isolated ratio (33.3% and 66.7% percentiles).

Table 1. High and low market value and high and low isolated stocks.

Figure 4. 5-year average beta value of four types of stocks.

This indicates that high-market-cap stocks exhibit greater volatility relative to HSI, while low-market-cap stocks fluctuate less than the market. For high-market-cap stocks with low-isolated characteristics, their beta is close to 1, showing a strong correlation with the HSI movements. In contrast, high-market-cap stocks with high isolation characteristics have a higher beta (beta = 1.11), meaning their price fluctuations are more dramatic than the market average. For Low-isolated stocks, they demonstrate better anti-risk performance because of low beta (less than 1).

Alpha is a core indicator in finance for measuring excess returns of investment portfolios. The higher the Alpha, the better the return performance of the asset compared to the market average return (Jensen 1968). Figure 5 reveals that among low-market-cap stocks, the 5-year average alpha of low-isolation stocks exceeds that of high-isolation stocks, which indicates that low-isolated stocks have better return performance thanks to capital spillover effects from other constituent stocks.

Figure 5. 5-year average annual alpha of two low market cap categories.

4.3. Temporal Evolution of HSI Constituent Role Changes and Event Mapping (2020-2024)

This section looks at changes in the roles of HSI constituents from 2020-2024. It uses the monthly role-change ratio as the main indicator, with the monthly count of stocks that switch roles for extra information. The analysis considers the annual expansion in index constituents and links fluctuations to major market events. A higher ratio changed suggests widespread role changes and major changes to key nodes and causal paths, whereas a lower value shows more localized changes within an established structure. Figure 6 clearly illustrates the changes in the role of cluster from 2020 to 2024.

Figure 6. Monthly counts and ratios of role changes among HSI constituents (2020-2024).

(1) 2020 - Pandemic shock, then convergence

In Q1 of 2020, there was a surge in role changes as a result of the rapid repricing of “influence” in the network caused by the news of the pandemic, restrictions, and the global sell-off. The cyclical and financial sectors experienced weakening, while parts of the technology, health care, and telecommunications sectors temporarily became new influence nodes, pushing the ratio of influence to a high. This was followed by a stimulus-driven rebound in the months of April to July, which led to elevated reshuffling via sector rotation. However, from August to December, expectations stabilized and role changes fell, with most of the reshuffling occurring within existing roles.

(2) 2021 - Gradual adjustment under regulation and property stress

The early months of 2021 were characterized by liquidity and reopening, which kept the structure broadly stable with moderate turnover. However, from May onwards, platform regulation (internet, education, and data) and real-estate credit risk led to persistent but dispersed role migration, with some large platforms becoming less influential and others gaining prominence. Volatility was moderate, without the “collapse and rebuild” that occurred in 2020.

(3) 2022 - Second major break in March

Inflation, tightening expectations, and geopolitical uncertainty all culminated in a “plunge and rebound” in the months of March, creating sharp inflection points in returns and volatility and forcing widespread rewiring of causal links. The ratio of influence spiked to an exceptional level. After April, the market digested the new valuation regime, and role changes quickly cooled into frequent small adjustments rather than single-month extremes.

(4) 2023 - Policy-expectation rotation

The months of January and February saw role changes towards consumption and offline services, driven by optimism about reopening. However, as recovery disappointed and uncertainty persisted, changes became fine-tuning within an established structure, and the ratio of influence remained low to moderate for most months.

(5) 2024 - capitulation to reconstruction

Early year historic lows triggered “cleansing” and defensive dominance with elevated role turnover. Subsequent policy support and recovery expectations drove a second reshuffling as financial.

5. Conclusion

By employing the TY-Granger test and GMM clustering method, we obtained the monthly clustering types of each constituent stock, finding large market cap and low isolated stocks have better market performance compared to other stock groups, as they can better reflect market trends and the effectiveness of investment diversification. Moreover, systemic risk events including the COVID-19 pandemic and the conflict between Russia and Ukraine had a significant impact on the strength of Granger causality among stocks.

Firstly, under the framework of the HSI, Isolated Type stocks constitute the predominant market segment, whereas the number of Influence Type stocks is scarce. This structural characteristic is primarily attributable to the index’s implementation of a weight cap mechanism and its ongoing sector diversification strategy. By diversifying the industry categories and increasing the total the number of constituent stocks, both the representativeness and stability of the HSI are remarkably enhanced.

Secondly, there is a significant correlation between market value and stock clustering types. By virtue of their high index weights and robust liquidity characteristics, large market cap stocks substantially outperform low market cap stocks in terms of market attention and trading activity, thereby establishing themselves as popular targets for investment focus. In addition, the average beta coefficient of low market cap stocks is less than 1, which indicates that the volatility amplitude of their stock price movements is significantly lower than the market’s average volatility level. Further analysis using alpha reveals that the return performance of low market cap and low isolated stocks is better than that of low market value and high isolated stocks.

Thirdly, the results of the monthly clustering indicate that the causal role of the HSI exhibits significant time-varying and event-driven characteristics rather than remaining static. The monthly role change rate is utilized to quantify “structural reorganization intensity,” thereby identifying two significant events of large-scale reorganization: firstly, the impact of the pandemic in early 2020, and secondly, the escalation of the Russia-Ukraine conflict alongside macroeconomic tightening in March 2022. During these periods, role transitions were found to be highly concentrated, indicating a non-stationary state in the causal network. By contrast, shifts in 2021 and 2023 were characterized by greater dispersion and moderation, aligning with the gradual changes in regulation, credit conditions, and policy expectations. In 2024, a two-stage role reconfiguration was observed. Following an unusually subdued phase, policy measures were implemented, resulting in the restoration of market activity and a partial rebound in network restructuring intensity. This evidence suggests that portfolio construction and risk monitoring should adapt to differing market environments. When ratio shifts are abrupt, relying on a single fixed causal structure becomes riskier, and in such cases, shorter-window updates or scenario-based diversification strategies may be more appropriate. Conversely, when roles remain stable, historical causal structures hold greater value for asset allocation and hedging strategies. It is evident that the intensity of role transition provides a signal that is interpretable in terms of identifying structural risk events, determining the timing of rebalancing, and monitoring the emergence of “impact” factors during periods of stress.

In conclusion, this study pioneers the construction of a three-dimensional Granger causality framework to investigate the dynamic trends within the HKG. This method not only fills a gap in existing approaches to HKG research and stock cluster but also provides a meaningful new avenue for advancing research in Granger causality study, stock markets analysis and risk management, offering practical and theoretical value to the field.

Acknowledgements

We deeply appreciate the funding from the Basic and Applied Basic Research Foundation of Guangdong Province (Project No.2023B1515130002) and Shenzhen Science and Technology Plan Project (Shenzhen-Hong Kong SAR, China-Macau SAR, China Category C, No. SGDX20220530111001003) to make this research possible.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Abinaya, P., Kumar, V. S., Balasubramanian, P., & Menon, V. K. (2016). Measuring Stock Price and Trading Volume Causality among Nifty50 Stocks: The Toda Yamamoto Method. In 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1886-1890). IEEE. [Google Scholar] [CrossRef]
[2]	Ahmed, R. R., Vveinhardt, J., Streimikiene, D., & Fayyaz, M. (2017). Multivariate Granger Causality between Macro Variables and KSE 100 Index: Evidence from Johansen Cointegration and Toda & Yamamoto Causality. Economic Research-Ekonomska Istrazivanja, 30, 1497-1521. [Google Scholar] [CrossRef]
[3]	Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, 19, 716-723. [Google Scholar] [CrossRef]
[4]	Bai, S., Cui, W., & Zhang, L. (2018). The Granger Causality Analysis of Stocks Based on Clustering. Cluster Computing, 22, 14311-14316. [Google Scholar] [CrossRef]
[5]	Bedowska-Sójka, B., & Kliber, A. (2019). The Causality between Liquidity and Volatility in the Polish Stock Market. Finance Research Letters, 30, 110-115. [Google Scholar] [CrossRef]
[6]	Ben Jebli, M., & Ben Youssef, S. (2016). Renewable Energy Consumption and Agriculture: Evidence for Cointegration and Granger Causality for Tunisian Economy. International Journal of Sustainable Development & World Ecology, 24, 149-158. [Google Scholar] [CrossRef]
[7]	Boccaletti, S., Bianconi, G., Criado, R., del Genio, C. I., Gómez-Gardeñes, J., Romance, M. et al. (2014). The Structure and Dynamics of Multilayer Networks. Physics Reports, 544, 1-122. [Google Scholar] [CrossRef]
[8]	Bu, H., Tang, W., & Wu, J. (2019). Time-varying Comovement and Changes of Comovement Structure in the Chinese Stock Market: A Causal Network Method. Economic Modelling, 81, 181-204. [Google Scholar] [CrossRef]
[9]	Dai, Z., Zhou, H., Wen, F., & He, S. (2020). Efficient Predictability of Stock Return Volatility: The Role of Stock Market Implied Volatility. The North American Journal of Economics and Finance, 52, Article ID: 101174. [Google Scholar] [CrossRef]
[10]	de Pontes, L. S., & Rêgo, L. C. (2022). Impact of Macroeconomic Variables on the Topological Structure of the Brazilian Stock Market: A Complex Network Approach. Physica A: Statistical Mechanics and Its Applications, 604, Article ID: 127660. [Google Scholar] [CrossRef]
[11]	Dunn, O. J. (1961). Multiple Comparisons among Means. Journal of the American Statistical Association, 56, 52-64. [Google Scholar] [CrossRef]
[12]	Dutta, A. (2018). Oil and Energy Sector Stock Markets: An Analysis of Implied Volatility Indexes. Journal of Multinational Financial Management, 44, 61-68. [Google Scholar] [CrossRef]
[13]	Forney, G. D. (1973). The Viterbi Algorithm. Proceedings of the IEEE, 61, 268-278. [Google Scholar] [CrossRef]
[14]	Gao, X., Huang, S., Sun, X., Hao, X., & An, F. (2018). Modelling Cointegration and Granger Causality Network to Detect Long-Term Equilibrium and Diffusion Paths in the Financial System. Royal Society Open Science, 5, Article 172092. [Google Scholar] [CrossRef]
[15]	Ghosh, S., & Kanjilal, K. (2016). Co-movement of International Crude Oil Price and Indian Stock Market: Evidences from Nonlinear Cointegration Tests. Energy Economics, 53, 111-117. [Google Scholar] [CrossRef]
[16]	Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica, 37, 424-438. [Google Scholar] [CrossRef]
[17]	Gündüz, L., & Hatemi-J, A. (2005). Stock Price and Volume Relation in Emerging Markets. Emerging Markets Finance and Trade, 41, 29-44. [Google Scholar] [CrossRef]
[18]	Holm, S. (1979). A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics, 6, 65-70. http://www.jstor.org/stable/4615733
[19]	Huang, C., Wen, S., Li, M., Wen, F., & Yang, X. (2021). An Empirical Evaluation of the Influential Nodes for Stock Market Network: Chinese A-Shares Case. Finance Research Letters, 38, Article ID: 101517. [Google Scholar] [CrossRef]
[20]	Hurvich, C. M., & Tsai, C. (1989). Regression and Time Series Model Selection in Small Samples. Biometrika, 76, 297-307. [Google Scholar] [CrossRef]
[21]	Jensen, M. C. (1968). The Performance of Mutual Funds in the Period 1945-1964. The Journal of Finance, 23, 389-416. [Google Scholar] [CrossRef]
[22]	Khurshid, M., & Kirkulak-Uludag, B. (2021). Shock and Volatility Spillovers between Oil and Emerging Seven Stock Markets. International Journal of Energy Sector Management, 15, 933-948. [Google Scholar] [CrossRef]
[23]	Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y., & Porter, M. A. (2014). Multilayer Networks. Journal of Complex Networks, 2, 203-271. [Google Scholar] [CrossRef]
[24]	Kleinberg, J. M. (1999). Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46, 604-632. [Google Scholar] [CrossRef]
[25]	Le, T., & Chang, Y. (2015). Effects of Oil Price Shocks on the Stock Market Performance: Do Nature of Shocks and Economies Matter? Energy Economics, 51, 261-274. [Google Scholar] [CrossRef]
[26]	McLachlan, G., & Peel, D. (2000). Finite Mixture Models. Wiley. [Google Scholar] [CrossRef]
[27]	Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web (Technical Report No. 1999-66). Stanford InfoLab, Stanford University. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
[28]	Papana, A., Kyrtsou, C., Kugiumtzis, D., & Diks, C. (2017). Financial Networks Based on Granger Causality: A Case Study. Physica A: Statistical Mechanics and its Applications, 482, 65-73. [Google Scholar] [CrossRef]
[29]	Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77, 257-286. [Google Scholar] [CrossRef]
[30]	Rashid, A. (2007). Stock Prices and Trading Volume: An Assessment for Linear and Nonlinear Granger Causality. Journal of Asian Economics, 18, 595-612. [Google Scholar] [CrossRef]
[31]	Shahzad, F., Bouri, E., Mokni, K., & Ajmi, A. N. (2021). Energy, Agriculture, and Precious Metals: Evidence from Time-Varying Granger Causal Relationships for Both Return and Volatility. Resources Policy, 74, Article ID: 102298. [Google Scholar] [CrossRef]
[32]	Silvapulle, P., & Choi, J. (1999). Testing for Linear and Nonlinear Granger Causality in the Stock Price-Volume Relation: Korean Evidence. The Quarterly Review of Economics and Finance, 39, 59-76. [Google Scholar] [CrossRef]
[33]	Sims, C. A. (1980). Macroeconomics and Reality. Econometrica, 48, 1-48. [Google Scholar] [CrossRef]
[34]	Stouffer, S. A. (1977). The American Soldier: Adjustment during Army Life. M A/A H Publishing.
[35]	Tang, Y., Xiong, J. J., Luo, Y., & Zhang, Y. (2019). How Do the Global Stock Markets Influence One Another? Evidence from Finance Big Data and Granger Causality Directed Network. International Journal of Electronic Commerce, 23, 85-109. [Google Scholar] [CrossRef]
[36]	Toda, H. Y., & Yamamoto, T. (1995). Statistical Inference in Vector Autoregressions with Possibly Integrated Processes. Journal of Econometrics, 66, 225-250. [Google Scholar] [CrossRef]
[37]	Vovk, V., & Wang, R. (2021). E-values: Calibration, Combination and Applications. The Annals of Statistics, 49, 1736-1754. [Google Scholar] [CrossRef]
[38]	Zhang, P., Yin, S., & Sha, Y. (2023). Global Systemic Risk Dynamic Network Connectedness during the COVID-19: Evidence from Nonlinear Granger Causality. Journal of International Financial Markets, Institutions and Money, 85, Article ID: 101783. [Google Scholar] [CrossRef]

	[email protected]
	+86 18163351462 (WhatsApp)
	1655362766
	SCIRP WeChat

Journals Menu

Home

About SCIRP

Service

Policies