<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">SN</journal-id><journal-title-group><journal-title>Social Networking</journal-title></journal-title-group><issn pub-type="epub">2169-3285</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/sn.2022.113003</article-id><article-id pub-id-type="publisher-id">SN-118870</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  A Sentiment Analysis Approach to Discover Public Panic: Based on Weibo Covid-19 Data
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Wanjun</surname><given-names>Wu</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>Bytedance Data Analysis Group, Beijing, China</addr-line></aff><pub-date pub-type="epub"><day>29</day><month>07</month><year>2022</year></pub-date><volume>11</volume><issue>03</issue><fpage>33</fpage><lpage>39</lpage><history><date date-type="received"><day>2,</day>	<month>July</month>	<year>2022</year></date><date date-type="rev-recd"><day>26,</day>	<month>July</month>	<year>2022</year>	</date><date date-type="accepted"><day>29,</day>	<month>July</month>	<year>2022</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  <b>Background: </b>
  Weibo is a Twitter-like micro-blog platform in China where people post their real-life events as well as express their feelings in short texts. Since the outbreak of the Covid-19 pandemic, thousands of people have expressed their concerns and worries about the outbreak via Weibo, showing the existence of public panic. <b>Methods: </b>This paper comes up with a sentiment analysis approach to discover public panic. First, we used Octoparse to obtain Weibo posts about the hot topic Covid-19 Pandemic. Second, we break down those sentences into independent words and clean the data by removing stop words. Then, we use the sentiment score function that deals with negative words, adverbs, and sentiment words to get the sentiment score of each Weibo post. <b>Results: </b>We observe the distribution of sentiment scores and get the benchmark to evaluate public panic. Also, we apply the same process to test the mass sentiment under other topics to test the efficiency of the sentiment function, which shows that our function works well.
 
</p></abstract><kwd-group><kwd>Sentiment Analysis</kwd><kwd> Data Analysis</kwd><kwd> Covid-19</kwd><kwd> Micro-Blogdata</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><sec id="s1_1"><title>1.1. Weibo</title><p>Weibo is a Twitter-like micro-blog platform in China where people post their real-life events as well as express their feelings in short texts. Since its launch in May 2009, Weibo has been a universal real-time information network to help people discover what is happening [<xref ref-type="bibr" rid="scirp.118870-ref1">1</xref>]. During the Covid-19 pandemic, thousands of individuals and media organizations follow the latest news and express their concerns about the outbreak via Weibo, which gives us a chance to discover valuable intelligence from the massive user-generated text streams [<xref ref-type="bibr" rid="scirp.118870-ref2">2</xref>].</p><p>General anxiety and panic can be seen in the flood of posts related to the pandemic, and those emotions will be amplified in the development and revolutions of the viruses. If there is no timely response and handling, public fear and panic can reduce people’s judgement, which may lead to spreading rumours and even vicious incidents. Therefore, timely, credible and accurate information release is critical [<xref ref-type="bibr" rid="scirp.118870-ref3">3</xref>].</p></sec><sec id="s1_2"><title>1.2. Sentiment Analysis</title><p>In order to discover the public panic, we can use sentiment analysis to deal with Weibo Covid-19 data since Weibo data expose individuals’ real-life status as well as what they really think. Sentiment analysis is a powerful technique to dig out people’s ideas and feelings about a given text [<xref ref-type="bibr" rid="scirp.118870-ref4">4</xref>], either with a machine-learning or lexicon-based approach [<xref ref-type="bibr" rid="scirp.118870-ref5">5</xref>]. The machine-learning approach selects features of a given text, and uses Naive Bayes, Max Entropy, Support Vector Machine and other classifiers for sentiment classification [<xref ref-type="bibr" rid="scirp.118870-ref6">6</xref>]. The Lexicon-based approach calculates positive and negative sentiment words to classify a given text, relying on an open-source dictionary. In this paper, we choose the lexicon-based approach, which comprehensively considers a sentence’s components and finally calculates the sentiment score.</p></sec><sec id="s1_3"><title>1.3. Paper Structure</title><p>This paper comes up with a sentiment analysis approach to discover public panic, and the overall methodology framework is shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>. First, we used a web scraping tool Octoparse to obtain Weibo posts about the hot topic Covid-19 Pandemic. Second, we break down those sentences into independent words and clean the data by removing stop words. Then, we use the sentiment score function that deals with negative words, adverbs, and sentiment words to</p><p>get the sentiment score of each Weibo post. Finally, we observe the distribution of sentiment scores and get the benchmark to evaluate public panic.</p></sec></sec><sec id="s2"><title>2. Materials and Methods</title><sec id="s2_1"><title>2.1. Data</title><p>In this paper, we used Octoparse to obtain Weibo posts related to the hot topic Covid-19 Pandemic in 7 May. Octoparse is a web scraping tool that provides data extraction services to grab data under a certain hot topic on Weibo.</p></sec><sec id="s2_2"><title>2.2. Methodology</title><sec id="s2_2_1"><title>2.2.1. Framework</title><p>A Chinese sentence’s scientific breakdown and analysis are vital to sentiment analysis. First, we passed every post through Jieba (an open-source tool) to break the whole sentences into several independent words, like the process in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><p>Second, we removed stop words from the split sentences. Stop words are common and high-frequency words like “a”, “the”, “of”, “and” [<xref ref-type="bibr" rid="scirp.118870-ref7">7</xref>], which are negligible in sentiment analysis. This process removed those unimportant words and made only those keywords left, reducing the dimension of the data that we need to handle.</p><p>The next step was dealing with the negative and adverb words. Negative words are essential when determining whether a sentence’s attitude is positive or negative. For example, the sentiment of “I like watching basketball games.” and the sentiment of “I DO NOT like watching basketball games” are different. Likewise, adverb words play an important role in determining the emotional intensity of a sentence. The intensity of “I like watching basketball games very much.” is slightly stronger than the version without “very much”. The Sentiment Score Calculation chapter will explain the processing details of negative and adverb words.</p></sec><sec id="s2_2_2"><title>2.2.2. Sentiment Score Calculation</title><p>In order to comprehensively consider the components of a sentence as we discussed above, we conducted the sentiment score function to quantify each Weibo post:</p><p>SentimentScore   ofa   post = ( ∑ i , j a i &#215; ( − 1 ) j &#215; w i ) / ( #wordsin   thesentence )</p><p>i: the number of sentiment words in the sentence.</p><p>w<sub>i</sub>: the score of sentiment word i.</p><p>j: the number of negative words in front of the sentiment word i.</p><p>a<sub>i</sub>: the score of adverb word in infront of sentiment word i.</p><p>First, we choose Boson NLP dictionary to assign the raw score w<sub>i</sub> of each sentiment word. Boson NLP dictionary assigns each sentiment word with an emotional intensity score. It is the most popular Chinese word segmentation method because of its context-specific lexicons such as news and social media texts [<xref ref-type="bibr" rid="scirp.118870-ref8">8</xref>].</p><p>Next, to deal with negative words, we count three words index above every sentiment word to check whether there are negative words. Common Chinese negative words are contained in the negative words dictionary. If one negative word exists, we will multiply −1 on the raw score w<sub>i</sub>. If N negative words exist, we would multiply minus (−1)<sup>N</sup> on the raw score w<sub>i</sub>. In this way, we can successfully add the impact of negative words into the sentiment score.</p><p>Likewise, when handling adverb words, we counted 3 three words index above every sentiment word to check the adverb’s existence. Different adverb word owns different emotional intensity. For example, the intensity of “a little bit” is slighter than “so much”. Adverb words dictionary contains universal Chinese adverbs, which are marked with different scores due to their intensity. Once we find the existence of an adverb related to the certain sentiment word w<sub>i</sub>, we multiply the intensity score a<sub>i</sub> to w<sub>i</sub> to qualify the impact of the adverb.</p><p>After calculating the scores of each sentiment word of the sentence, we add all those scores together to get a final one. Then, we divide the final score by the number of the sentiment words in the sentence to normalize the impact of the length of the sentence.</p></sec></sec></sec><sec id="s3"><title>3. Data Analysis</title><sec id="s3_1"><title>3.1. Data Processing &amp; Visualization</title><p>After passing Weibo Covid-19 data through the sentiment score function, we can get the score corresponding to each post. Top 2 &amp; the bottom 2 posts are shown in the <xref ref-type="table" rid="table1">Table 1</xref> &amp; <xref ref-type="table" rid="table2">Table 2</xref>.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> The top 2 post’s scores are calculated by the sentiment score function, and we translate the posts’ content into English for ease of understanding</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Top 2 Posts Scores</th><th align="center" valign="middle" >Top 2 Posts Content (Translated in English by Google)</th></tr></thead><tr><td align="center" valign="middle" >0.965</td><td align="center" valign="middle" >#Shanghai college entrance examination postponement# Let’s take a look at the 2021 college students who are fortunate enough to be sandwiched between the two-year college entrance examination postponed due to the epidemic. I wish this year’s college students great.</td></tr><tr><td align="center" valign="middle" >0.962</td><td align="center" valign="middle" >#JOYANG Sunshine Service#, I would like to thank the Joyoung after-sales for the Joyoung new wall breaker P771 pushed to me during the epidemic. It solved the problem in my life during the epidemic that I couldn’t go out for consumption, so I could enjoy a variety of food at home with more peace of mind. The machine performance introduction will be shared with you, thanks to Joyoung@Joyoung Sunshine Service</td></tr></tbody></table></table-wrap><p>The top &amp; bottom scored posts indicate that our calculating function works well. The top 2 posts show the author’s wishes and hopes, while the bottom two posts state people’s complaints about the inconvenience brought by the pandemic.</p><p>On the one hand, the output sentiment scores follow the normal distribution (shown in <xref ref-type="fig" rid="fig3">Figure 3</xref> &amp; <xref ref-type="table" rid="table3">Table 3</xref>), indicating that most posts just state the fact and do not contain too much emotional catharsis. On the other hand, we should pay more attention to those posts whose scores are beyond 3-σ, since those posts show people’s more intense emotions.</p></sec><sec id="s3_2"><title>3.2. An Approach to Discover Public Panic</title><p>To demonstrate the efficiency of our score calculation strategy, we grab Weibo data under tag #Come on for the college entrance examination, where most people show their wishes and hopes for the coming exam. The distribution of</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> The bottom 2 post’s scores are calculated by the sentiment score function, and we translate the posts’ content into English for ease of understanding</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Bottom 2 Posts Score</th><th align="center" valign="middle" >Bottom 2 Posts Content (Translated in English by Google)</th></tr></thead><tr><td align="center" valign="middle" >−1.02</td><td align="center" valign="middle" >This damn Covid-19 made me and my boyfriend become the Cowherd and Weaver Girl.</td></tr><tr><td align="center" valign="middle" >−0.93</td><td align="center" valign="middle" >I really hate this epidemic because the epidemic has disrupted a lot of plans.</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Distribution of output post scores under #Covid-19</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >mean</th><th align="center" valign="middle" >std</th><th align="center" valign="middle" >min</th><th align="center" valign="middle" >25%</th><th align="center" valign="middle" >50%</th><th align="center" valign="middle" >75%</th><th align="center" valign="middle" >max</th></tr></thead><tr><td align="center" valign="middle" >0.14</td><td align="center" valign="middle" >0.25</td><td align="center" valign="middle" >−1.02</td><td align="center" valign="middle" >0.01</td><td align="center" valign="middle" >0.15</td><td align="center" valign="middle" >0.29</td><td align="center" valign="middle" >0.96</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Distribution of output post scores under #Come on for the college entrance examination</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >#Tag</th><th align="center" valign="middle" >mean</th><th align="center" valign="middle" >std</th><th align="center" valign="middle" >min</th><th align="center" valign="middle" >25%</th><th align="center" valign="middle" >50%</th><th align="center" valign="middle" >75%</th><th align="center" valign="middle" >max</th></tr></thead><tr><td align="center" valign="middle" >#Come on for the college entrance examination</td><td align="center" valign="middle" >1.39</td><td align="center" valign="middle" >0.89</td><td align="center" valign="middle" >−0.56</td><td align="center" valign="middle" >0.73</td><td align="center" valign="middle" >1.22</td><td align="center" valign="middle" >1.97</td><td align="center" valign="middle" >2.98</td></tr><tr><td align="center" valign="middle" >#Covid-19</td><td align="center" valign="middle" >0.14</td><td align="center" valign="middle" >0.25</td><td align="center" valign="middle" >−1.02</td><td align="center" valign="middle" >0.01</td><td align="center" valign="middle" >0.15</td><td align="center" valign="middle" >0.29</td><td align="center" valign="middle" >0.96</td></tr></tbody></table></table-wrap><p>sentiment scores under this tag is more favorable than those under #Covid-19, which is shown in <xref ref-type="fig" rid="fig4">Figure 4</xref> and <xref ref-type="table" rid="table4">Table 4</xref>, indicating that our function is efficient in distinguishing the mass sentiment under different topics.</p><p>To discover public panic, we can use the proportion of those posts whose scores are under zero as a benchmark. If that proportion is close to 50% or even greater than 50%, we should pay attention to the significant public panic under specific topics. In the #Covid-19 case, the proportion of those posts whose scores are under zero is 23.5%. Though some people express their worries and complaints about the pandemic, many still show positive attitudes towards it.</p></sec></sec><sec id="s4"><title>4. Conclusions &amp; Discussion</title><p>This paper presents a sentiment analysis approach to discovering public panic via Weibo data. First, we used Octoparse to obtain Weibo posts about the hot topic Covid-19 Pandemic. Second, we break down those sentences into independent words and clean the data by removing stop words. Then, we use the sentiment score function that deals with negative words, adverbs, and sentiment words to get the sentiment score of each Weibo post.</p><p>We observe the distribution of sentiment scores and get the benchmark to evaluate public panic. Also, we apply the same process to test the mass sentiment under other topics to test the efficiency of the sentiment function, which shows that our function works well.</p><p>To further improve our method, on the one hand, we can choose different #Topics to get enough distribution data to get a confident interval of the benchmark to evaluate the public panic. On the other hand, we can improve our sentiment dictionary and adverb dictionary to get a more precise sentiment function.</p></sec><sec id="s5"><title>Funding Statement</title><p>This work is sponsored by Shanghai Pujiang Program (20PJ1418400).</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The author declares no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Wu, W.J. (2022) A Sentiment Analysis Approach to Discover Public Panic: Based on Weibo Covid-19 Data. Social Networking, 11, 33-39. https://doi.org/10.4236/sn.2022.113003</p></sec></body><back><ref-list><title>References</title><ref id="scirp.118870-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Bai, H. and Guang, Y. (2016) A Weibo-Based Approach to Disaster Informatics: Incidents Monitor in Post-Disaster Situation via Weibo Text Negative Sentiment Analysis. Natural Hazards, 83, 1177-1196. https://doi.org/10.1007/s11069-016-2370-5</mixed-citation></ref><ref id="scirp.118870-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Chung, S. and Aring, D. (2018) Integrated Real-Time Big Data Stream Sentiment Analysis Service. Journal of Data Analysis and Information Processing, 6, 46-66. https://doi.org/10.4236/jdaip.2018.62004</mixed-citation></ref><ref id="scirp.118870-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Ma, C. and Yan, X.K. (2020) Research Progress in Psychological Stress Response and Prevention and Control Strategies of COVID-19. Journal of Jilin University (Medicine Edition), 46, 649-654.</mixed-citation></ref><ref id="scirp.118870-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Karamitsos, I., Albarhami, S. and Apostolopoulos, C. (2019) Tweet Sentiment Analysis (TSA) for Cloud Providers Using Classification Algorithms and Latent Semantic Analysis. Journal of Data Analysis and Information Processing, 7, 276-294. https://doi.org/10.4236/jdaip.2019.74016</mixed-citation></ref><ref id="scirp.118870-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Redhu, S., et al. (2018) Sentiment Analysis Using Text Mining: A Review. International Journal on Data Science and Technology, 4, 49-53. https://doi.org/10.11648/j.ijdst.20180402.12</mixed-citation></ref><ref id="scirp.118870-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Xie, L.X., Zhou, M. and Sun, M.S. (2012) Hierarchical Structure Based Hybrid Approach to Sentiment Analysis of Chinese Micro Blog and Its Feature Extraction. Journal of Chinese Information Processing, 26, 73-83.</mixed-citation></ref><ref id="scirp.118870-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Asghar, M.Z., et al. (2014) A Review of Feature Extraction in Sentiment Analysis. Journal of Basic and Applied Scientific Research, 4, 181-186.</mixed-citation></ref><ref id="scirp.118870-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Guo, L.M., et al. (2019) Collaborative Filtering Recommendation Based on Trust and Emotion. Journal of Intelligent Information Systems, 53, 113-135. https://doi.org/10.1007/s10844-018-0517-4</mixed-citation></ref></ref-list></back></article>