<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">OJBM</journal-id><journal-title-group><journal-title>Open Journal of Business and Management</journal-title></journal-title-group><issn pub-type="epub">2329-3284</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ojbm.2021.95127</article-id><article-id pub-id-type="publisher-id">OJBM-111962</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Business&amp;Economics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Can Machine Learning Unlock the Continuous Alpha? Empirical Study Based on China A-Share Market
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ya</surname><given-names>Lin</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Rendao</surname><given-names>Ye</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Hangzhou Dianzi University, Hangzhou, China</addr-line></aff><pub-date pub-type="epub"><day>09</day><month>08</month><year>2021</year></pub-date><volume>09</volume><issue>05</issue><fpage>2358</fpage><lpage>2369</lpage><history><date date-type="received"><day>14,</day>	<month>August</month>	<year>2021</year></date><date date-type="rev-recd"><day>13,</day>	<month>September</month>	<year>2021</year>	</date><date date-type="accepted"><day>16,</day>	<month>September</month>	<year>2021</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  With the development of fintech and artificial intelligence, machine learning algorithms are widely used in quantitative investment. Based on the listed companies in China A-share market from February 2005 to July 2020, quantitative stock selection models with machine learning algorithms are established to obtain continuous alpha returns. The results show that machine learning algorithms can effectively identify the relationship between factors and returns and then improve the performance of the quantitative stock selection model. China A-share market is a weak-form efficient market. By mining the factors that are not fully digested by the market, continuous alpha returns can be obtained. The ensemble algorithms represented by the extremely randomized tree (ET) and light gradient boosting machine (LGBM) perform best in stock market prediction.
 
</p></abstract><kwd-group><kwd>Quantitative Investment</kwd><kwd> Efficient Market Hypothesis</kwd><kwd> Machine Learning</kwd><kwd> Alpha Return</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>The Efficient Markets Hypothesis (EMH) is a theoretical cornerstone in modern financial economics (Zhang et al., 2016). Malkiel &amp; Fama (1970) systematically elaborated EMH and divided markets into three types by the availability of information: strong-form EMH, semi-strong-form EMH and weak-form EMH. In the strong-form EMH, all assets are effectively priced to reflect their real value. There is no valuable information available in the market, so there is no possibility of getting continuous alpha returns. The effectiveness of China’s capital market has been empirically tested from many angles. For example, Zhang &amp; Zhang (2005) found that China’s futures market could not reject the weak-form EMH based on logarithmic futures price series. Huang et al. (2008) pointed that the market can achieve Pareto optimality in advance when the two information conditions of strong symmetry and strong perfection are both satisfied, but these strict conditions are often not satisfied in practice. Based on empirical analysis of panel data, Wang &amp; Su (2013) showed that the internal capital market of China’s listed companies was efficient. From the perspective of behavioral finance, Zhang (2015) and Ding et al. (2017) believed that the “irrationality” feature of investors was particularly prominent in China, which shook the theoretical premise of EMH. What’s more, the weak-form EMH was confirmed by the long-term performance of capital markets, i.e., most portfolios constructed by professional managers failed to outperform the market index over the long term.</p><p>The views on EMH have evolved into two investment philosophies. The first investment philosophy, based on the strong-form EMH, is that the market is always correct. Specifically, the information about the asset value is fully reflected in the prices, because any temporary mispricing will be quickly eliminated by the invisible hand of the market. Thus, no one can obtain continuous alpha returns. The second investment philosophy, based on the weak-form EMH, is that the market can be beaten. The “smart money” can detect mispricing and gain alpha until markets reach equilibrium. At present, China’s capital market is still incomplete. There are opportunities for quantitative investment models to obtain alpha returns due to the deviation in asset pricing and the lack of consistency between markets.</p><p>Quantitative investment model is an innovative form of financial technology that combines computer technology and securities price prediction. It can improve asset management efficiency and investment performance. Thanks to the long history of the capital market and the good atmosphere of fintech innovation, quantitative investment has been quite mature in several developed countries. It has also become a hot topic in China. As of February 2021, there are 567 quantitative public funds in China, among which China and Europe Quant Drive Hybrid (001980), Shanghai Investment Morgan Alpha Hybrid (377010), Guangfa Contrarian Strategy Hybrid (000747) and Invesco Great Wall Quantitative Selected Stock (000978) have all performed well. At present, quantitative investment can be divided into two types: technical and fundamental. The former pays more attention to the real-time market, uses computer technology to find arbitrage opportunities, and pursues the qualitative change in computing power to compete with the speed of tick level. The latter focuses more on underlying value, looking for undetected mispricing and profiting from it.</p><p>In China, under the macro background of revitalizing the real economy, guiding capital from the virtual to the real and promoting financial services to the real economy, the fundamental quantitative model has attracted more and more attention, among which factor model is the most widely used. The factor model is built based on modern financial frameworks, including the capital asset pricing model (CAPM), arbitrage pricing theory (APT), and so on (Wang, 2016). The factors are widely recognized by market entities to explain and predict stock returns and risks. Finding effective factors is significant to the performance of the factor model (Wang, 2017). Under the premise of Markowitz’s hypothesis, CAPM believed that stock returns only have a linear relationship with the systemic risk (Sharpe, 1964). However, it is hard to achieve complete efficiency in the real market filled with asymmetric information. The stock price can still be explained by factors other than systemic risk. To take advantage of that, many multi-factor models were established, such as Fama-French three-factor model and the five-factor model. On this basis, empirical tests on the effectiveness of factors in China’s market were conducted. For example, Li et al. (2017) empirically tested the effectiveness of the Fama-French five-factor model in China’s stock market, and the results showed that the five-factor model outperformed CAPM, three-factor model and Carhart four-factor model. From the three aspects of safety, cheap and quality, Hu &amp; Gu (2018) selected 8 abnormal factors as the comprehensive indicators and tested the applicability of Buffett’s alpha strategy in China A-share market. However, with the advent of big data era, these traditional models are unable to digest the massive and dynamic information. As a consequence, the machine learning algorithms are introduced into the quantitative investment model.</p><p>The machine learning algorithm is a data mining technology based on artificial intelligence, which has been widely used in finance, economics, psychology, biomedicine, and so on. It can not only learn the complex logical relationship behind the data but also improve its performance in the process of repeated training. In the study of foreign markets, researches on quantitative investment and machine learning algorithms are abundant. For example, Nair et al. (2010) used decision tree (DT), neural network (NN) and naive Bayesian to identify the upward and downward trends of stock prices and compared the performance of different investment strategies. Kourentzes et al. (2014) found that the integrated NN model was superior to the single model, and integrated learning could improve the accuracy and robustness of prediction. Choudhry &amp; Garg (2008) developed a set of machine learning algorithms combining genetic algorithm and support vector machine (SVM) to predict stock prices. In the study of China, Chen &amp; Yu (2014) used the heuristic algorithm to extract data features and then constructed a quantitative stock selection model based on SVM, whose annualized return was significantly better than the benchmark of the same period. Yu et al. (2015) established a grey NN model according to the Shanghai securities composite index and introduced the E-GRACH model to predict individual stock returns. Based on the convolutional neural network (CNN) and long-term and short-term memory neural network (LSTM), Sun &amp; Bi (2018) constructed a dual classification model of securities ups and downs, which showed strong profitability and generalization ability. Li et al. (2019) compared the performance of more than 10 machine learning algorithms in stock price prediction, including Lasso regression, gradient lifting tree and integrated NN.</p><p>In this paper, quantitative stock selection models with machine learning algorithms are established based on the listed companies in China A-share market. The innovation points of this paper are as follows. Firstly, it enriches the empirical research on alpha returns in China A-share market and provides empirical support for the weak-form EMH. Secondly, it combines the machine learning algorithms and the classical multi-factor model in quantitative stock selection, which improves the utilization efficiency of factor information and the performance of the investment model. Thirdly, the performance of 16 machine learning algorithms in the quantitative stock selection models is compared, which enriches the academic research in the new composite field.</p></sec><sec id="s2"><title>2. Research Design</title><sec id="s2_1"><title>2.1. Model Design</title><p>The framework of the quantitative stock selection model with a machine learning algorithm is shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p><p>As shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>, the task of the machine learning module is to obtain the asset return prediction function with good generalization ability based on the train sets. First, assume that</p><p>R i = f ( x i ; θ ) + ε , (1)</p><p>where R i represents the stock return of the i-th company, f ( ⋅ ) represents the asset return prediction function, x i = ( x i 1 , ⋯ , x i k ) represents the factor vector of the i-th company, θ is the parameter, and ε is the error term.</p><p>Then, based on the asset return prediction function f ( ⋅ ) trained in the machine learning module, the next return of stocks in the current stock pool is predicted. To avoid future factor information, the factor data all lagged behind the</p><p>stock return data for at least one period. That is, the required factor data is implemented and available for the prediction task. According to the prediction result, buy or hold the stocks in the top 1% of prediction, sell if the ranking deviates from this range, and build an equal-weight portfolio. Due to the threshold and restriction of short asset allocation in China A-share market, only long asset allocation is considered here.</p><p>Finally, the alpha return of the model is calculated by the following formula</p><p>α = R a − ( R f + β ∗ ( R M − R f ) ) , (2)</p><p>where R a represents the monthly return rate, R M represents the monthly return rate of the market (benchmark), R f represents the risk-free return rate (monthly compound interest calculation), and β represents the sensitivity of the model return to the market return fluctuation. The calculation formula is as follows</p><p>β = C o v ( R a , R M ) V a r ( R M ) , (3)</p><p>where C o v ( ⋅ ) stands for the covariance and V a r ( ⋅ ) stands for the variance.</p><p>If α &gt; 0 , then the model performance is better than the benchmark performance. If α = 0 , then the model performance is comparable to the benchmark performance. If α &lt; 0 , then the model performance is worse than the benchmark performance.</p></sec><sec id="s2_2"><title>2.2. Dynamic Time Window</title><p>At the beginning of each investment round, the train set is built based on the realized factors and returns. Let the interval of the population sample be [ 1 , T ] , and let the time window of all train sets be w. Then, the interval of the n-th train set is [ n , n + w ] . Let w be 12 months, then the design of the dynamic time window is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p></sec><sec id="s2_3"><title>2.3. Machine Learning Algorithm</title><p>Machine learning is a collection of many forms of prediction functions and algorithms. In this paper, 16 representative algorithms are selected, among which 8 linear algorithms include ordinary least square (OLS) regression, partial least</p><p>square (PLS) regression, Ridge, Bayesian Ridge, Lasso, LassoLars, Elastic Net, and linear support vector regression (LSVR) machine. And 3 machine learning algorithms are selected, including support vector regression (SVR) machine, decision tree (DT) and gradient boosting decision tree (GBDT). In addition, 5 integrated machine learning algorithms are selected, including random forest (RF), adaptive boosting (AdaBoost), extremely randomized tree (ET), extreme gradient boost (XGBoost), and light gradient boosting machine (LGBM). The algorithm in this paper is based on “sklearn”, “xgboost” and “lightgbm” in Python, and the “GridSearch” method is used to adjust the hyperparameters in the training.</p></sec><sec id="s2_4"><title>2.4. Data Source and Sample Selection</title><p>The sample of this paper is the listed companies in China A-share market from February 2005 to July 2020, including several rounds of economic cycles, which enhance the empirical robustness. The position adjustment round of the model is monthly. The key variable is the monthly stock return considering cash dividend reinvestment. And the benchmark of the model is The Shanghai Securities Composite index (000001). The sample data are obtained from the CSMAR database. To increase the reliability and accuracy of the empirical results, the samples are filtered and treated as follows at the starting point of each round.</p><p>1) Exclude ST, *ST and stocks listed for less than one year.</p><p>2) Eliminate the stocks whose data are largely missing for continuous trading suspension.</p><p>3) If the factor value of a stock is still missing, it will be filled with 0.</p><p>4) Z-score standardization of data. Because the differences in dimensions of each factor would increase the complexity of the algorithm and affect the performance of the model, z-score standardization is performed.</p><p>Selecting effective factors is fundamental to enhance the model’s information capture ability and improve investment performance. However, many research reports of financial securities companies are based solely on data or on models. In this paper, the construction of the factor pool starts from a prudent literature analysis. On this basis, we consider each company from four aspects and divide the factors into four types: transaction friction, profitability, valuation and growth. The selected factors are shown in <xref ref-type="table" rid="table1">Table 1</xref>.</p></sec></sec><sec id="s3"><title>3. Empirical Results and Analysis</title><sec id="s3_1"><title>3.1. Portfolio Performance Analysis</title><p>The model is built based on factor pool and dynamic time window design. If the time window w = 12 months, then the number of rounds N = 174 . According to the data cleaning rules in the previous section, all samples of China A-share market companies are processed at the starting point of each round. The sample number entered into the model is shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Transaction friction, profitability, valuation, and growth factors</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Type</th><th align="center" valign="middle" >Factor</th><th align="center" valign="middle" >Description</th></tr></thead><tr><td align="center" valign="middle"  rowspan="6"  >Transaction friction<sup>a</sup></td><td align="center" valign="middle" >ILLIQ</td><td align="center" valign="middle" >Liquidity indicator under Amihud measurements</td></tr><tr><td align="center" valign="middle" >Psos</td><td align="center" valign="middle" >Liquidity indicator under The Pastor-Stambaugh measurement</td></tr><tr><td align="center" valign="middle" >Roll</td><td align="center" valign="middle" >Roll indicator that uses daily yield estimates</td></tr><tr><td align="center" valign="middle" >Zeros</td><td align="center" valign="middle" >Zero yield days in the month/trading days in the month</td></tr><tr><td align="center" valign="middle" >ZerosImp</td><td align="center" valign="middle" >Factor Zeros/average daily turnover (10,000) in the month</td></tr><tr><td align="center" valign="middle" >ToverOs</td><td align="center" valign="middle" >The sum of the daily turnover rate in the month</td></tr><tr><td align="center" valign="middle"  rowspan="8"  >Profitability<sup>b</sup></td><td align="center" valign="middle" >ROE</td><td align="center" valign="middle" >Return on equity, net profit TTM/((shareholders’ equity closing balance + shareholders’ equity closing balance year-on-year)/2)</td></tr><tr><td align="center" valign="middle" >ROA</td><td align="center" valign="middle" >Return on assets, net profit TTM/((total asset closing balance + total asset total year-over-year closing balance)/2)</td></tr><tr><td align="center" valign="middle" >TROA</td><td align="center" valign="middle" >Total return on assets, (total profit TTM + financial expenses TTM)/ ((total asset closing balances + total year-over-year end balance)/2)</td></tr><tr><td align="center" valign="middle" >EPS</td><td align="center" valign="middle" >Earnings per share, net profit/total equity</td></tr><tr><td align="center" valign="middle" >AdjEPS</td><td align="center" valign="middle" >EPS adjusted by Wu &amp; Wu (2003)</td></tr><tr><td align="center" valign="middle" >UnEPS</td><td align="center" valign="middle" >Adjusted EPS for the current period - Adjusted EPS for the first two periods of the current period</td></tr><tr><td align="center" valign="middle" >SigmaEPS</td><td align="center" valign="middle" >The standard deviation for EPS that was not expected for the first 5 half years before EPS</td></tr><tr><td align="center" valign="middle" >Sue</td><td align="center" valign="middle" >No expected EPS /standard deviation</td></tr><tr><td align="center" valign="middle"  rowspan="4"  >Valuation<sup>c</sup></td><td align="center" valign="middle" >PE</td><td align="center" valign="middle" >Closing price * total equity/net profit closing value TTM</td></tr><tr><td align="center" valign="middle" >PS</td><td align="center" valign="middle" >Closing price * total equity/gross income closing value TTM</td></tr><tr><td align="center" valign="middle" >Evm</td><td align="center" valign="middle" >Enterprise multiple, (total market cap + total liabilities − monetary funds)/EBITDA</td></tr><tr><td align="center" valign="middle" >Amv</td><td align="center" valign="middle" >Market value, A-share close price * outstanding A-shares</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >Growth<sup>d</sup></td><td align="center" valign="middle" >Agr</td><td align="center" valign="middle" >Capital preservation and appreciation rate, the closing value of the total owner’s equity/the opening value of the total owner’s equity</td></tr><tr><td align="center" valign="middle" >Sgr</td><td align="center" valign="middle" >Sustainable growth rate, return on net assets * earnings retention rate/(1 − return on net assets * earnings retention rate)</td></tr></tbody></table></table-wrap><p><sup>a</sup>Refer to Hu &amp; Gu, 2018; Zhang et al., 2014; Amihud, 2002; Goyenko et al., 2009; Jiang et al., 2018; Pastor &amp; Stambaugh, 2003; Roll, 1984; <sup>b</sup>Refer to Wu &amp; Wu, 2003; Yang &amp; Huang, 2005; Yang et al., 2020; Zhang et al., 2020; Zhao, 1998; Chan &amp; Jegadeesh, 1996; <sup>c</sup>Refer to Hu &amp; Gu, 2018; Jiang et al., 2018; Loughran &amp; Wellman, 2011; <sup>d</sup>Refer to Li &amp; Liao, 2007; Zhang et al., 2020.</p><p>From <xref ref-type="fig" rid="fig3">Figure 3</xref>, the number of effective samples in China A-share market shows an overall upward trend, which is related to the number of listed companies, the proportion of ST and *ST companies, and the data quality of company factors. Considering this dynamic feature, this paper buys or holds the top 1% of predicted returns in each round of portfolio construction. When the all rounds are completed, the overall performance of the quantitative stock selection model is analyzed. The returns and risks of the investment portfolios are shown in <xref ref-type="table" rid="table2">Table 2</xref>.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Performances of quantitative stock selection models based on 16 algorithms</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Algorithm</th><th align="center" valign="middle" >Monthly Return (%)</th><th align="center" valign="middle" >Alpha</th><th align="center" valign="middle" >Sharpe Ratio</th><th align="center" valign="middle" >Win Rate (%)</th></tr></thead><tr><td align="center" valign="middle" >OLS</td><td align="center" valign="middle" >2.35</td><td align="center" valign="middle" >0.0371</td><td align="center" valign="middle" >0.2920</td><td align="center" valign="middle" >60.92</td></tr><tr><td align="center" valign="middle" >PLS</td><td align="center" valign="middle" >2.82</td><td align="center" valign="middle" >0.0396</td><td align="center" valign="middle" >0.2994</td><td align="center" valign="middle" >66.09</td></tr><tr><td align="center" valign="middle" >Ridge</td><td align="center" valign="middle" >2.43</td><td align="center" valign="middle" >0.0374</td><td align="center" valign="middle" >0.2938</td><td align="center" valign="middle" >60.92</td></tr><tr><td align="center" valign="middle" >Bayesian Ridge</td><td align="center" valign="middle" >2.38</td><td align="center" valign="middle" >0.0399</td><td align="center" valign="middle" >0.3051</td><td align="center" valign="middle" >61.49</td></tr><tr><td align="center" valign="middle" >Lasso</td><td align="center" valign="middle" >2.50</td><td align="center" valign="middle" >0.0397</td><td align="center" valign="middle" >0.3100</td><td align="center" valign="middle" >62.64</td></tr><tr><td align="center" valign="middle" >LassoLars</td><td align="center" valign="middle" >2.50</td><td align="center" valign="middle" >0.0395</td><td align="center" valign="middle" >0.3087</td><td align="center" valign="middle" >62.64</td></tr><tr><td align="center" valign="middle" >ElasticNet</td><td align="center" valign="middle" >2.61</td><td align="center" valign="middle" >0.0391</td><td align="center" valign="middle" >0.3067</td><td align="center" valign="middle" >62.64</td></tr><tr><td align="center" valign="middle" >LSVR</td><td align="center" valign="middle" >3.24</td><td align="center" valign="middle" >0.0394</td><td align="center" valign="middle" >0.3080</td><td align="center" valign="middle" >62.64</td></tr><tr><td align="center" valign="middle" >SVR</td><td align="center" valign="middle" >2.35</td><td align="center" valign="middle" >0.0415</td><td align="center" valign="middle" >0.2585</td><td align="center" valign="middle" >60.34</td></tr><tr><td align="center" valign="middle" >DT</td><td align="center" valign="middle" >3.04</td><td align="center" valign="middle" >0.0426</td><td align="center" valign="middle" >0.2811</td><td align="center" valign="middle" >59.77</td></tr><tr><td align="center" valign="middle" >GBDT</td><td align="center" valign="middle" >2.53</td><td align="center" valign="middle" >0.0425</td><td align="center" valign="middle" >0.3153</td><td align="center" valign="middle" >63.79</td></tr><tr><td align="center" valign="middle" >RF</td><td align="center" valign="middle" >2.06</td><td align="center" valign="middle" >0.0396</td><td align="center" valign="middle" >0.2824</td><td align="center" valign="middle" >60.92</td></tr><tr><td align="center" valign="middle" >AdaBoost</td><td align="center" valign="middle" >3.09</td><td align="center" valign="middle" >0.0436</td><td align="center" valign="middle" >0.3150</td><td align="center" valign="middle" >62.64</td></tr><tr><td align="center" valign="middle" >ET</td><td align="center" valign="middle" >2.57</td><td align="center" valign="middle" >0.0511</td><td align="center" valign="middle" >0.3132</td><td align="center" valign="middle" >62.64</td></tr><tr><td align="center" valign="middle" >XGBoost</td><td align="center" valign="middle" >2.65</td><td align="center" valign="middle" >0.0402</td><td align="center" valign="middle" >0.2816</td><td align="center" valign="middle" >61.49</td></tr><tr><td align="center" valign="middle" >LGBM</td><td align="center" valign="middle" >3.13</td><td align="center" valign="middle" >0.0454</td><td align="center" valign="middle" >0.3397</td><td align="center" valign="middle" >64.94</td></tr></tbody></table></table-wrap><p>From <xref ref-type="table" rid="table2">Table 2</xref>, the quantitative stock selection models based on machine learning algorithms all obtain positive alpha returns during the research period of nearly 16 years. At the same period, the average and median monthly returns of the benchmark are 0.89% and 0.72% respectively. All algorithms perform better than the benchmark.</p><p>The portfolio derived by different algorithms has different performances. Firstly, in ascending order of overall performance, they are OLS, other linear algorithms, single machine learning algorithms, and integrated machine learning algorithms. Secondly, ET achieves the highest alpha of all algorithms, ranking in the top five in Sharpe ratio and win rate, showing that ET has a strong ability to</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Quantitative stock selection model performance under different dynamic time windows</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Window (Monthly)</th><th align="center" valign="middle" >Monthly Return (%)</th><th align="center" valign="middle" >Alpha</th><th align="center" valign="middle" >Sharpe Ratio</th><th align="center" valign="middle" >Win Rate (%)</th></tr></thead><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >2.19</td><td align="center" valign="middle" >0.0382</td><td align="center" valign="middle" >0.2799</td><td align="center" valign="middle" >60.61</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" >2.63</td><td align="center" valign="middle" >0.0388</td><td align="center" valign="middle" >0.2901</td><td align="center" valign="middle" >62.17</td></tr><tr><td align="center" valign="middle" >6</td><td align="center" valign="middle" >2.36</td><td align="center" valign="middle" >0.0404</td><td align="center" valign="middle" >0.2921</td><td align="center" valign="middle" >61.90</td></tr><tr><td align="center" valign="middle" >12</td><td align="center" valign="middle" >2.59</td><td align="center" valign="middle" >0.0406</td><td align="center" valign="middle" >0.2979</td><td align="center" valign="middle" >62.31</td></tr><tr><td align="center" valign="middle" >24</td><td align="center" valign="middle" >1.99</td><td align="center" valign="middle" >0.0327</td><td align="center" valign="middle" >0.2330</td><td align="center" valign="middle" >59.84</td></tr><tr><td align="center" valign="middle" >36</td><td align="center" valign="middle" >1.48</td><td align="center" valign="middle" >0.0263</td><td align="center" valign="middle" >0.1985</td><td align="center" valign="middle" >56.98</td></tr></tbody></table></table-wrap><p>predict security prices, and can still play the role of selecting high-quality stocks when the market is volatile or downward. Thirdly, LGBM achieves the highest Sharpe ratio of all the algorithms, ranked second in both alpha and win rate, and performs well and steadily in all indicators. Finally, PLS achieves the highest win rate of all the algorithms but performs mediocre in the other two indicators.</p></sec><sec id="s3_2"><title>3.2. The Influence of Time Window Selection on Model Performance</title><p>The selection of the dynamic time window w is one of the factors affecting model performance. In the previous section, 12 months window is selected as the dynamic time window. This section will examine the performance of the model under different dynamic time windows.</p><p>From <xref ref-type="table" rid="table3">Table 3</xref>, the relationship between dynamic time window and alpha presents an inverted U shape. In the dynamic time windows of 6 months and 12 months, the alpha both exceed 0.04. Besides, the out-of-sample generalization effect is the best in the dynamic time windows of 12 months.</p></sec></sec><sec id="s4"><title>4. Conclusion and Enlightenment</title><p>The value of an investment is derived from the present value of all the cash flows generated over the life of the investment, so an accurate judgment of the future value of the asset is the key to achieving excess investment returns. Both scholars and market investors have tried to build models with strong prediction and generalization ability. However, for the nonlinearity and high noise characteristics of financial data, the prediction performance of traditional statistical models is improper. In this paper, 16 algorithms including machine learning algorithms are used to predict the prices of A-share listed companies and construct the investment portfolio according to the forecast results. The results show that: 1) The performance of the model based on machine learning algorithms is better than other models. It is mainly manifested in the strong out-of-sample generalization ability, which makes the portfolio return obtained by the quantitative stock selection model based on machine learning algorithm far exceed the market benchmark. 2) China A-share market follows the weak-form EMH. By excavating the factor information that has not been fully digested by the market, there is still a possibility of achieving continuous alpha in China A-share market. 3) The integrated machine learning algorithms represented by ET and LGBM perform well in stock return prediction. By comparing the performance of 16 algorithms in quantitative stock selection, it is found that the integrated machine learning algorithm has significant advantages in analyzing nonlinear and high-noise data, and has strong out-of-sample generalization ability. To further promote the intelligent development of the quantitative model and big data analysis field, and improve the efficiency and accuracy of data mining, this paper puts forward the following enlightenments based on the above conclusions.</p><p>Firstly, apply machine learning algorithm to quantitative study in finance, economy and management. Machine learning algorithms can effectively digest and utilize high frequency and high noise data, and have better explanatory power for nonlinear or chaotic data relationships. Through the programmatic implementation, the machine learning algorithms have strong operability and generalization. In addition, the fundamental principle of machine learning, “there is no free lunch”, reminds us that we should apply different algorithms to specific problems. Some machine learning algorithms have problems such as poor interpretability and information black box, so more empirical studies are needed to test and analyze their specific application scenarios.</p><p>Secondly, optimize the system design and regulatory mechanism of China A-share market. Although the introduction of securities margin trading and stock index futures ended the lack of short selling mechanism in China A-share market, there are still many restrictions on short-selling operations because of the relatively late start, imperfect mechanism and irrational investors. The high cost of short selling also limits the scope for “smart money” in capital markets, hindering the process of achieving equilibrium and efficiency. Regulatory authorities need to reasonably optimize the institutional design and regulatory mechanism of the capital market in the fintech era based on strengthening risk education for investors and improving information disclosure in the capital market.</p><p>Thirdly, for institutional or individual investors, the effective factor pool should be found and constructed to give full play to the advantages of the quantitative stock selection model. The empirical study shows that the quantitative model of fundamental factors still can obtain continuous alpha returns in China A-share market. In addition, stock selection by quantitative models can effectively reduce the degree to which market subjects are affected by factors such as cognitive bias and group behavior, so it is suggested to consider the quantitative model when making investment decisions.</p></sec><sec id="s5"><title>Funding</title><p>This paper is supported by the Humanities and Social Science Projects of the Ministry of Education (19YJA910006), the NSF of Zhejiang Province (LY20A010019) and the Fundamental Research Funds for the Provincial Universities of Zhejiang (GK199900299012-204).</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Lin, Y., &amp; Ye, R. D. (2021). Can Machine Learning Unlock the Continuous Alpha? Empirical Study Based on China A-Share Market. Open Journal of Business and Management, 9, 2358-2369. https://doi.org/10.4236/ojbm.2021.95127</p></sec></body><back><ref-list><title>References</title><ref id="scirp.111962-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Amihud, Y. (2002). Illiquidity and Stock Returns: Cross-Section and Time-Series Effects. Journal of Financial Markets, 5, 31-56. https://doi.org/10.1016/S1386-4181(01)00024-6</mixed-citation></ref><ref id="scirp.111962-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Chan, L., &amp; Jegadeesh, N. (1996). Momentum Strategies. Journal of Finance, 51, 1681-1713.  
https://doi.org/10.1111/j.1540-6261.1996.tb05222.x</mixed-citation></ref><ref id="scirp.111962-ref3"><label>3</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Chen</surname><given-names> R. D.</given-names></name>,<name name-style="western"><surname> &amp; Yu</surname><given-names> H. H. </given-names></name>,<etal>et al</etal>. (<year>2014</year>)<article-title>. Stock Selection Model Based on Support Vector Machine within Heuristic Algorithm</article-title><source> Systems Engineering</source><volume> 2</volume>,<fpage> 40</fpage>-<lpage>48</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Choudhry, R., &amp; Garg, K. (2008). A Hybrid Machine Learning System for Stock Market Forecasting. World Academy of Science, Engineering and Technology, 39, 315-318.</mixed-citation></ref><ref id="scirp.111962-ref5"><label>5</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Ding</surname><given-names> Z. G.</given-names></name>,<name name-style="western"><surname> Jin</surname><given-names> B.</given-names></name>,<name name-style="western"><surname> &amp; Xu</surname><given-names> D. C. </given-names></name>,<etal>et al</etal>. (<year>2017</year>)<article-title>. Test of Efficient Market: Criticism of Behavioral Finance to EMH</article-title><source> Contemporary Economic Research</source><volume> 3</volume>,<fpage> 51</fpage>-<lpage>59</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Goyenko, R. Y., Holden, C. W., &amp; Trzcinka, C. A. (2009). Do Liquidity Measures Measure Liquidity? Journal of Financial Economics, 92, 153-181.  
https://doi.org/10.1016/j.jfineco.2008.06.002</mixed-citation></ref><ref id="scirp.111962-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Hu, Y., &amp; Gu, M. (2018). Buffett’s Alpha: Evidence from China Stock Market. Management World, 8, 41-54, 191.</mixed-citation></ref><ref id="scirp.111962-ref8"><label>8</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Huang</surname><given-names> Z. X.</given-names></name>,<name name-style="western"><surname> Zeng</surname><given-names> L. H</given-names></name>,<name name-style="western"><surname> Jiang</surname><given-names> Q.</given-names></name>,<name name-style="western"><surname> &amp; Duan</surname><given-names> Z. D. </given-names></name>,<etal>et al</etal>. (<year>2008</year>)<article-title>. Information Revelation and Capital Market Efficiency: Information Efficiency and Allocation Efficiency</article-title><source> China Economic Quarterly</source><volume> 2</volume>,<fpage> 665</fpage>-<lpage>684</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Jiang, F. W., Qi, X. L., &amp; Tang, G. H. (2018). Q-Theory, Mispricing, and Profitability Premium: Evidence from China. Journal of Banking &amp; Finance, 87, 135-149. 
https://doi.org/10.1016/j.jbankfin.2017.10.001</mixed-citation></ref><ref id="scirp.111962-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Kourentzes, N., Barrow, D. K., &amp; Crone, S. F. (2014). Neural Network Ensemble Operators for Time Series Forecasting. Expert Systems with Applications, 41, 4235-4244. 
https://doi.org/10.1016/j.eswa.2013.12.011</mixed-citation></ref><ref id="scirp.111962-ref11"><label>11</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names> B.</given-names></name>,<name name-style="western"><surname> Shao</surname><given-names> X. Y.</given-names></name>,<name name-style="western"><surname> &amp; Li</surname><given-names> Y. Y. </given-names></name>,<etal>et al</etal>. (<year>2019</year>)<article-title>. Research on Machine Learning Driven Quantitative Investing</article-title><source> China Industrial Economics</source><volume> 8</volume>,<fpage> 61</fpage>-<lpage>79</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref12"><label>12</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names> J.</given-names></name>,<name name-style="western"><surname> &amp; Liao</surname><given-names> H. </given-names></name>,<etal>et al</etal>. (<year>2007</year>)<article-title>. The Investment Tactics of the P/E Ratio: Appraise and Test</article-title><source> Business Management Journal</source><volume> 6</volume>,<fpage> 73</fpage>-<lpage>79</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref13"><label>13</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Li</surname><given-names> Z. B.</given-names></name>,<name name-style="western"><surname> Yang</surname><given-names> G. Y.</given-names></name>,<name name-style="western"><surname> Feng</surname><given-names> Y. C.</given-names></name>,<name name-style="western"><surname> &amp; Jing</surname><given-names> L. </given-names></name>,<etal>et al</etal>. (<year>2017</year>)<article-title>. Fama-French Five-Factor Model in China Stock Market</article-title><source> Journal of Financial Research</source><volume> 6</volume>,<fpage> 191</fpage>-<lpage>206</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Loughran, T., &amp; Wellman, J. W. (2011). New Evidence on the Relation between the Enterprise Multiple and Average Stock Returns. Social Science Electronic Publishing, 46, 1629-1650. https://doi.org/10.1017/S0022109011000445</mixed-citation></ref><ref id="scirp.111962-ref15"><label>15</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Malkiel</surname><given-names> B. G.</given-names></name>,<name name-style="western"><surname> &amp; Fama</surname><given-names> E. F. </given-names></name>,<etal>et al</etal>. (<year>1970</year>)<article-title>. Efficient Capital Markets: A Review of Theory and Empirical Work</article-title><source> The Journal of Finance</source><volume> 25</volume>,<fpage> 383</fpage>-<lpage>417</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Nair, B. B., Mohandas, V. P., &amp; Sakthivel, N. R. (2010). A Decision Tree-Rough Set Hybrid System for Stock Market Trend Prediction. International Journal of Computer Applications, 6, 1-6. https://doi.org/10.5120/1106-1449</mixed-citation></ref><ref id="scirp.111962-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Pastor, L., &amp; Stambaugh, R. F. (2003). Liquidity Risk and Expected Stock Returns. Journal of Political Economy, 111, 642-685. https://doi.org/10.1086/374184</mixed-citation></ref><ref id="scirp.111962-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Roll, R. (1984). A Simple Implicit Measure of the Effective Bid-Ask Spread in an Efficient Market. Journal of Finance, 39, 1127-1139.  
https://doi.org/10.1111/j.1540-6261.1984.tb03897.x</mixed-citation></ref><ref id="scirp.111962-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Sharpe, W. F. (1964). Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk. Journal of Finance, 19, 425-442.  
https://doi.org/10.1111/j.1540-6261.1964.tb02865.x</mixed-citation></ref><ref id="scirp.111962-ref20"><label>20</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Sun</surname><given-names> D. C.</given-names></name>,<name name-style="western"><surname> &amp; Bi</surname><given-names> X. C. </given-names></name>,<etal>et al</etal>. (<year>2018</year>)<article-title>. High-Frequency Trading Strategies Based on Deep Learning Algorithms and Their Profitability</article-title><source> Journal of University of Science and Technology of China</source><volume> 11</volume>,<fpage> 923</fpage>-<lpage>932</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Wang, F. J, &amp; Su, L. Z. (2013). Is Internal Capital Market Efficient in Chinese Listed Companies? Empirical Evidences from Multiple Divisions Listed Companies in H-Stock. Accounting Research, 1, 70-75, 96.</mixed-citation></ref><ref id="scirp.111962-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Wang, R. (2016). Research on Multiple-Factor Quantitative Stock Selection in A-Share Market. Master’s Thesis, Shanxi University of Finance &amp; Economics.</mixed-citation></ref><ref id="scirp.111962-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Wang, Y. (2017). Study on Multi-Factor Model Based on Grey Correlation Analysis Method. Master’s Thesis, Beijing Jiaotong University.</mixed-citation></ref><ref id="scirp.111962-ref24"><label>24</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Wu</surname><given-names> S. N.</given-names></name>,<name name-style="western"><surname> &amp; Wu</surname><given-names> C. P. </given-names></name>,<etal>et al</etal>. (<year>2003</year>)<article-title>. An Empirical Study on Price Inertia Strategy and Earnings Inertia Strategy in China’s Stock Market</article-title><source> Economic Science</source><volume> 4</volume>,<fpage> 41</fpage>-<lpage>50</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Yang, S. E., &amp; Huang, L. (2005). Financial Crisis Warning Model Based on BP Neural Network. Systems Engineering-Theory &amp; Practice, 1, 12-18, 26.</mixed-citation></ref><ref id="scirp.111962-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Yang, W., Feng, L., Song, M., &amp; Li, C. T. (2020). Can the Reference Point Ratio Measure Stock Price Overvaluation? Evidence from Stock Crash Risk. Management World, 1, 167-186, 241.</mixed-citation></ref><ref id="scirp.111962-ref27"><label>27</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Yu</surname><given-names> Z. J.</given-names></name>,<name name-style="western"><surname> Yang</surname><given-names> S. L.</given-names></name>,<name name-style="western"><surname> Zhang</surname><given-names> Z.</given-names></name>,<name name-style="western"><surname> &amp; Jiao</surname><given-names> J. </given-names></name>,<etal>et al</etal>. (<year>2015</year>)<article-title>. Stock Returns Prediction Based on Error-Correction Grey Neural Network</article-title><source> Chinese Journal of Management Science</source><volume> 12</volume>,<fpage> 20</fpage>-<lpage>26</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref28"><label>28</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhang</surname><given-names> L.</given-names></name>,<name name-style="western"><surname> Deng</surname><given-names> L. Y.</given-names></name>,<name name-style="western"><surname> &amp; Zhou</surname><given-names> Y. </given-names></name>,<etal>et al</etal>. (<year>2016</year>)<article-title>. Contrarian Effect of Semi-Parametric Alpha Strategy</article-title><source> Chinese Journal of Management Science</source><volume> 12</volume>,<fpage> 30</fpage>-<lpage>38</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, N., Shi, H. W., Zheng, L., Shan, Z. H., &amp; Wu, H. X. (2020). Pcanet-Based Multi-Factor Stock Selection Model for Value Growth. Computer Science, S2, 64-67.</mixed-citation></ref><ref id="scirp.111962-ref30"><label>30</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhang</surname><given-names> X. Y.</given-names></name>,<name name-style="western"><surname> &amp; Zhang</surname><given-names> Z. C. </given-names></name>,<etal>et al</etal>. (<year>2005</year>)<article-title>. Empirical Tests on Efficiency of Commodity Futures Markets in China</article-title><source> Chinese Journal of Management Science</source><volume> 6</volume>,<fpage> 1</fpage>-<lpage>5</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref31"><label>31</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhang</surname><given-names> Y. P. </given-names></name>,<etal>et al</etal>. (<year>2015</year>)<article-title>. Are Investors Really Rational: The Challenge of Behavioral Finance to Fama’s EMH</article-title><source> Academics</source><volume> 1</volume>,<fpage> 116</fpage>-<lpage>125</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref32"><label>32</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhang</surname><given-names> Z.</given-names></name>,<name name-style="western"><surname> Li</surname><given-names> Y. Z.</given-names></name>,<name name-style="western"><surname> Zhang</surname><given-names> Y. L.</given-names></name>,<name name-style="western"><surname> &amp; Liu</surname><given-names> X. </given-names></name>,<etal>et al</etal>. (<year>2014</year>)<article-title>. A Test on Indirect Liquidity Measures in China Stock Market: An Empirical Analysis of the Direct and Indirect Measures of Bid-Ask Spread</article-title><source> China Economic Quarterly</source><volume> 1</volume>,<fpage> 233</fpage>-<lpage>262</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.111962-ref33"><label>33</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhao</surname><given-names> Y. L. </given-names></name>,<etal>et al</etal>. (<year>1998</year>)<article-title>. Information Content of Accounting Earnings Disclosure: Empirical Evidence from Shanghai Stock Market</article-title><source> Economic Research Journal</source><volume> 7</volume>,<fpage> 42</fpage>-<lpage>50</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref></ref-list></back></article>