<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">APE</journal-id><journal-title-group><journal-title>Advances in Physical Education</journal-title></journal-title-group><issn pub-type="epub">2164-0386</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ape.2019.94015</article-id><article-id pub-id-type="publisher-id">APE-94928</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Medicine&amp;Healthcare</subject><subject> Social Sciences&amp;Humanities</subject></subj-group></article-categories><title-group><article-title>
 
 
  Neural Network Algorithm in Predicting Football Match Outcome Based on Player Ability Index
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Hengzhi</surname><given-names>Chen</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>Bashu Secondary School, Chongqing, China</addr-line></aff><pub-date pub-type="epub"><day>09</day><month>09</month><year>2019</year></pub-date><volume>09</volume><issue>04</issue><fpage>215</fpage><lpage>222</lpage><history><date date-type="received"><day>29,</day>	<month>August</month>	<year>2019</year></date><date date-type="rev-recd"><day>7,</day>	<month>September</month>	<year>2019</year>	</date><date date-type="accepted"><day>10,</day>	<month>September</month>	<year>2019</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Nowadays, when people want to predict the result of a football match, most of them just refer to their own experience or some specialists’ opinions. However, since artificial intelligence is very good at analyzing big data, it is more and more used to predict the result instead of one’s experience in order to appro
  ach
   the accuracy. There are three typical algorithms
  —
  c
  onvolutional neural network (ANN), random forest (RF) and support vector machine (SVM). In this paper, these three algorithms are all applied to predict the result of a football match, and the accuracy of them 
  is
   also compared.
 
</p></abstract><kwd-group><kwd>Convolutional Neural Network</kwd><kwd> Football Match Prediction</kwd><kwd> FIFA</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Since football is becoming more and more popular worldwide, football gambling also develops very fast. Since casinos always want to make the largest benefit in this area, a high performance of football result prediction is required. However, if the prediction is just based on one’s experience, it cannot be very accurate since the workload of analyzing all the data of players in two teams and matches between two teams is too heavy for people and the result is somehow subjective. Under this situation, using artificial intelligence to predict the results becomes a good solution. Artificial intelligence is absolutely objective, and it has a great ability in calculating and analyzing huge amounts of data to generate the best prediction. Havard Rue and Oyvind Salvesen collect the data of all the matches in Premier League during 1997-1998 season, and then they use the Bias dynamic generalized linear model to predict the final rank, which gives very good results. Patrick Lucey studies the relationship between the goals and the shooting position as well as the strength of defense. Artificial Intelligence has been widely applied to predict match outcomes. Shi et al. investigated the usefulness of machine learning for the prediction of college basketball outcomes and found that feature selection is very important when building up machine learning models  (Shi et al., 2013) . Loeffelholz et al. applied neural network to predict NBA games and their trained models beat basketball experts on game prediction  (Loeffelholz et al., 2009) . Jain et al. integrated fuzzy approach and SVM to develop a hybrid fuzzy-SVM algorithm and applied it to forecast basketball match outcomes  (Jain &amp; Kaur, 2017) . Bayesian nets were applied to predict football results in Joseph et al.’s work  (Joseph et al., 2006) . Lucey et al. used logistic regression on engineered features and achieved improved expected goal value  Lucey et al., 2014 ). Na&#239;ve Bayes, compared to multivariate linear regression, was proved to have better performance in predicting basketball matches outcomes  (Miljković et al., 2010) . Random forest was studied and applied to estimate the win probability for NFL games  (Lock &amp; Nettleton, 2014) . Domain knowledge was incorporated in extreme gradient boosted trees and the investigation showed that domain knowledge is important to improve the prediction accuracies  (Berrar et al., 2019) .</p><p>Furthermore, Cho et al. proposed a framework combining social network analysis and gradient boosting to predict soccer game outcomes and obtained improved results  (Cho et al., 2018) . So far, random forest and support vector machine have not been fully investigated and tested on soccer match outcome prediction. In this paper, the neural network algorithm, the random forest algorithm and the support vector machine algorithm are all used to predict the results of the matches in Spain La Liga. All the data are collected from the FIFA website.</p></sec><sec id="s2"><title>2. Theoretical Background</title><p>Football is a complex exercise, and many factors are related to the final result. Hence, there always exist some amazing results, which is because of some unpredictable reasons. For example, some players may get injured and have to leave the match, which cannot be predicted before. Hence, we need to focus on the data which is objective and controllable, such as the data of the starters’ ability.</p><sec id="s2_1"><title>2.1. Data Analysis</title><p>The data (From FIFA Official Website) of the starters’ ability are given on FIFA website, which is calculated according to their historical performance, including success rate of passing, the success rate of shooting, the success of controlling and so on (<xref ref-type="fig" rid="fig1">Figure 1</xref>).</p><p>According to this graph of a player’s ability, FIFA concludes that his final ability point is 94. In the matches recorded by FIFA, the data of all the starters’ ability can be found. There are 22 players in a match, so we search the data of all of</p><p>them, and then use this data to predict the result of the match by artificial intelligence.</p><p>The ability of players’ is just a part of the factors which influence the result of the match. As <xref ref-type="fig" rid="fig2">Figure 2</xref> shows, considering the matches in La Liga during 2008/2009 season to 2015/2016 season, we can see that 35.4% matches are not determined by players’ ability. Therefore, we must consider other important factors such as the coach, which is related to the morale of the team very much. A good coach can make the team unite and powerful, while a bad coach will make the team messy and depressed. However, the ability of a coach and the morale of a team are very difficult to quantify, and the coach of a team may be changed very frequently due to many complex reasons such as the bad fitness between the coach and the team manager. So, FIFA doesn’t give the data of a coach’s ability or a team’s morale. In this paper, this reason will not be considered, either.</p><p>The lack of such data will not affect the prediction a lot. In fact, if a result is mainly due to the coach or the team’s morale, it will be an outlier, and such outliers will be considered in the program.</p></sec><sec id="s2_2"><title>2.2. Other Factors</title><p>Besides the players’ ability and the teams’ ability, there are still some other factors, such as the weather. The performance of the players in sunny weather and rainy weather are different, since the football may be more slippery and more difficult to control. However, the weather is difficult to predict before the match, and it has the same influence on both teams. In addition, many playgrounds have roof to reduce such influence as much as possible. For example, the Champions League Final in 2016/2017 season is launched in the Millennium Stadium which has roof to prevent the influence of the rain. Hence, the weather factor</p><p>will not be considered in this paper. What’s more, whether the match is a home game is also a important factor. There are 19 teams in La Liga, and each team will fight with others twice in one season—one home game and one away game. If a team play at home, the environment is familiar and most audience will cheer and applause for them, which will excite the players and boost their morale; if a team play as guest, the situation will be opposite—their stress will increase a lot and their morale will decrease. Hence, in this paper, the data of the players’ ability already include this factor—we will both consider a player’s ability when he plays at home and as guest. The program will also distinguish if a team is playing at home or as guest. The rate of win and draw in home games during 2008/2009 season to 2015/2016 season in La Liga is shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p><p>There is still a lack of data of some players in La Liga. However, the number of players is less than 15, so it will not affect the prediction a lot. We will use the average ability point 72.6 for them.</p></sec></sec><sec id="s3"><title>3. Model and Experiment</title><p>For each model, we input the data of 22 players in two teams in one match. The output is “win”, “draw” and “lose”.</p><sec id="s3_1"><title>3.1. Support Vector Machine</title><p>First, we set the kernel as polynomial. However, the correct rate is only 0.503, which is below our expectation.</p><p>( γ 〈 x , x ′ 〉 + r ) d (1)</p><p>Then we set the kernel as linear, the correct rate increases to 0.542, which is acceptable.</p><p>〈 x , x ′ 〉 (2)</p></sec><sec id="s3_2"><title>3.2. Random Forest</title><p>In the random forest algorithm, each “tree” in the random forest analyze different</p><p>factors and vote for the result, which will decrease the bias. We try different numbers of trees, and then we find that when the number is between 400 and 800, the result is the best, and the correct rate will reach 0.520.</p><p>Then we set the largest depth of each tree as 3, and the correct rate increases to 0.545.</p></sec><sec id="s3_3"><title>3.3. Neural Network</title><p>For the neural network algorithm, we choose the convolution neural network to predict the result.</p><p>Firstly, traditional convolution neural network has six layers to process the data, but we only use four layers in this paper—two convolution layers, one pooling layer and one full connection layer. The purpose is to avoid overfitness. Since convolution neural network is better at processing one-hot problem than pure classification problem, we will write “win” as (0, 0, 1), “draw” as (0, 1, 0) and “lose” as (0, 0, 1). However, we should notice that this method is not suitable for random forest algorithm. The reason is that each tree will determine whether the result is 0 or 1 for all three points, and the output may be (0, 0, 0) or (1, 1, 1), which is meaningless. In convolution neural network algorithm, we set batch size as 80, learning rate base as 0.30, learning rate decay as 0.999 and regulization rate as 0.0020, training steps as 400, and moving average decay as 0.85. This will give the best result, and the correct rate is floating between 0.533 and 0.574, which means the prediction is not very stable. However, in this paper, we do not discuss the stability of the correct rate, so we just take 0.574 as the final result.</p></sec></sec><sec id="s4"><title>4. Conclusion and Discussion</title><p>Generally, we choose the best correct rate of the prediction results generated by these three algorithms as their accuracy, and then we compare them, which is shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>.</p><p>The accuracy of these three methods is all between 54% and 58%. They are all acceptable since they are all higher than the prediction accuracy of the famous football analyst of BBC, Mark Lawrenson, which is only 52%. In addition, the accuracy of the convolution neural network is higher than the prediction accuracy of the authoritative football gambling organization Pinnacle Sports, which is only 55%. In these three algorithms, convolution neural network performs the best.</p><p>However, in all three algorithms, they seldom predict “draw”. This is because the number of “draw” games is much less than that of “win” games or “lose” games. The match between a stronger team and a weaker team seldom results in “draw”, so we may attribute a draw game mostly to strategy and morale factors, which will not be considered in this paper. This is also mentioned in Ben Ulmer’s paper. We can also see that the “lose” prediction is less than the “win” prediction, and this is because the “lose” games are less than the “win” games in the training model (Tables 1-3).</p><p>In addition, we can also see that these three algorithms have different prediction ability for “win”, “draw” and “lose”. Random forest has the best ability to predict “win” and convolution neural network has the best ability to predict “lose”. All three algorithms are not able to predict “draw” correctly (<xref ref-type="fig" rid="fig5">Figure 5</xref>).</p><p>Although the accuracy of these three methods is acceptable, it can be more approved since we only consider the data of players’ ability. In the future, the data of teams (the coaches’’ ability and the teams’ morale) can be also considered. For example, if a team has consecutive wins recently, its morale may be increased a lot. To quantify a team’s morale, considering its recent match results is a good idea. More inputs will deepen the analysis of the data and thus improve accuracy.</p><p>In this paper, we only consider three popular algorithms of machine learning. In the future, we may consider more algorithms and we can compare the accuracy of them. For example, the DNN algorithm constructed by Andrew Carter</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Random forest</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Number of Predicted Wins</th><th align="center" valign="middle" >Number of Predicted Lose</th><th align="center" valign="middle" >Number of Predicted Draws</th></tr></thead><tr><td align="center" valign="middle" >307</td><td align="center" valign="middle" >73</td><td align="center" valign="middle" >0</td></tr><tr><td align="center" valign="middle" >Number of Actual Wins</td><td align="center" valign="middle" >Number of Actual Lose</td><td align="center" valign="middle" >Number of Actual Draws</td></tr><tr><td align="center" valign="middle" >183</td><td align="center" valign="middle" >105</td><td align="center" valign="middle" >92</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> SVM_linear</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Number of Predicted Wins</th><th align="center" valign="middle" >Number of Predicted Lose</th><th align="center" valign="middle" >Number of Predicted Draws</th></tr></thead><tr><td align="center" valign="middle" >291</td><td align="center" valign="middle" >89</td><td align="center" valign="middle" >0</td></tr><tr><td align="center" valign="middle" >Number of Actual Wins</td><td align="center" valign="middle" >Number of Actual Lose</td><td align="center" valign="middle" >Number of Actual Draws</td></tr><tr><td align="center" valign="middle" >183</td><td align="center" valign="middle" >105</td><td align="center" valign="middle" >92</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Neural network</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Number of Predicted Wins</th><th align="center" valign="middle" >Number of Predicted Lose</th><th align="center" valign="middle" >Number of Predicted Draws</th></tr></thead><tr><td align="center" valign="middle" >268</td><td align="center" valign="middle" >112</td><td align="center" valign="middle" >0</td></tr><tr><td align="center" valign="middle" >Number of Actual Wins</td><td align="center" valign="middle" >Number of Actual Lose</td><td align="center" valign="middle" >Number of Actual Draws</td></tr><tr><td align="center" valign="middle" >183</td><td align="center" valign="middle" >105</td><td align="center" valign="middle" >92</td></tr></tbody></table></table-wrap><p>also achieves good prediction accuracy. Studying and comparing more algorithms will let us find the best algorithm to predict the results of football matches.</p></sec><sec id="s5"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s6"><title>Cite this paper</title><p>Chen, H. Z. (2019). Neural Network Algorithm in Predicting Football Match Outcome Based on Player Ability Index. Advances in Physical Education, 9, 215-222. https://doi.org/10.4236/ape.2019.94015</p></sec></body><back><ref-list><title>References</title><ref id="scirp.94928-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Berrar, D., Lopes, P., &amp; Dubitzky, W. (2019). Incorporating Domain Knowledge in Machine Learning for Soccer Outcome Prediction. Machine Learning, 108, 97-126. 
https://doi.org/10.1007/s10994-018-5747-8</mixed-citation></ref><ref id="scirp.94928-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Cho, Y., Yoon, J., &amp; Lee, S. (2018). Using Social Network Analysis and Gradient Boosting to Develop a Soccer Win-Lose Prediction Model. Engineering Applications of Artificial Intelligence, 72, 228-240. https://doi.org/10.1016/j.engappai.2018.04.010</mixed-citation></ref><ref id="scirp.94928-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Jain, S., &amp; Kaur, H. (2017). Machine Learning Approaches to Predict Basketball Game Outcome. In 2017 3rd International Conference on Advances in Computing, Communication &amp; Automation (ICACCA) (pp. 1-7). Dehradun, India.  
https://doi.org/10.1109/ICACCAF.2017.8344688</mixed-citation></ref><ref id="scirp.94928-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Joseph, A., Fenton, N. E., &amp; Neil, M. (2006). Predicting Football Results Using Bayesian Nets and Other Machine Learning Techniques. Knowledge-Based Systems, 19, 544-553. https://doi.org/10.1016/j.knosys.2006.04.011</mixed-citation></ref><ref id="scirp.94928-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Lock, D., &amp; Nettleton, D. (2014). Using Random Forests to Estimate win Probability before Each Play of an NFL Game. Journal of Quantitative Analysis in Sports, 10, 197-205. https://doi.org/10.1515/jqas-2013-0100</mixed-citation></ref><ref id="scirp.94928-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Loeffelholz, B., Bednar, E., Bauer, K. W. (2009). Predicting NBA Games Using Neural Networks. Journal of Quantitative Analysis in Sports, 5, 7.  
https://doi.org/10.2202/1559-0410.1156</mixed-citation></ref><ref id="scirp.94928-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Lucey, P., Bialkowski, A., Monfort, M., Carr, P., &amp; Matthews, I. (2014). Quality vs Quantity: Improved Shot Prediction in Soccer Using Strategic Features from Spatiotemporal Data. In Proceedings of 8th Annual MIT Sloan Sports Analytics Conference (pp. 1-9).</mixed-citation></ref><ref id="scirp.94928-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Miljkovic, D., Gajic, L., Kovacevic, A., &amp; Konjovic, Z. (2010). The Use of Data Mining for Basketball Matches Outcomes Prediction. In IEEE 8th International Symposium on Intelligent Systems and Informatics (pp. 309-312). Subotica, Serbia.  
https://doi.org/10.1109/SISY.2010.5647440</mixed-citation></ref><ref id="scirp.94928-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Shi, Z., Moorthy, S., &amp; Zimmermann, A. (2013). Predicting NCAAB Match Outcomes Using ML Techniques-Some Results and Lessons Learned. ECML/PKDD 2013 Workshop on Machine Learning and Data Mining for Sports Analytics.</mixed-citation></ref></ref-list></back></article>