<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.4 20241031//EN" "JATS-journalpublishing1-4.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.4" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">gep</journal-id>
      <journal-title-group>
        <journal-title>Journal of Geoscience and Environment Protection</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2327-4344</issn>
      <issn pub-type="ppub">2327-4336</issn>
      <publisher>
        <publisher-name>Scientific Research Publishing</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.4236/gep.2026.145016</article-id>
      <article-id pub-id-type="publisher-id">gep-151681</article-id>
      <article-categories>
        <subj-group>
          <subject>Article</subject>
        </subj-group>
        <subj-group>
          <subject>Earth</subject>
          <subject>Environmental Sciences</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Mapping Community Vulnerability to Hurricane Hazards in Coastal North Carolina Using Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name name-style="western">
            <surname>Dahal</surname>
            <given-names>Om</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <name name-style="western">
            <surname>Kalluri</surname>
            <given-names>Satya</given-names>
          </name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">0000-0002-2805-7055</contrib-id>
          <name name-style="western">
            <surname>Uprety</surname>
            <given-names>Dambar</given-names>
          </name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author" corresp="yes">
          <contrib-id contrib-id-type="orcid">0000-0003-1695-4303</contrib-id>
          <name name-style="western">
            <surname>Sun</surname>
            <given-names>Donglian</given-names>
          </name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
      </contrib-group>
      <aff id="aff1"><label>1</label> Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USA </aff>
      <aff id="aff2"><label>2</label> NOAA NESDIS, College Park, MD, USA </aff>
      <aff id="aff3"><label>3</label> Department of Information Systems and Business Analytics, Kent State University, Kent, OH, USA </aff>
      <author-notes>
        <fn fn-type="conflict" id="fn-conflict">
          <p>The authors declare no conflicts of interest regarding the publication of this paper.</p>
        </fn>
      </author-notes>
      <pub-date pub-type="epub">
        <day>09</day>
        <month>05</month>
        <year>2026</year>
      </pub-date>
      <pub-date pub-type="collection">
        <month>05</month>
        <year>2026</year>
      </pub-date>
      <volume>14</volume>
      <issue>05</issue>
      <fpage>265</fpage>
      <lpage>290</lpage>
      <history>
        <date date-type="received">
          <day>24</day>
          <month>02</month>
          <year>2026</year>
        </date>
        <date date-type="accepted">
          <day>26</day>
          <month>05</month>
          <year>2026</year>
        </date>
        <date date-type="published">
          <day>29</day>
          <month>05</month>
          <year>2026</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2026 by the authors and Scientific Research Publishing Inc.</copyright-statement>
        <copyright-year>2026</copyright-year>
        <license license-type="open-access">
          <license-p> This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link> ). </license-p>
        </license>
      </permissions>
      <self-uri content-type="doi" xlink:href="https://doi.org/10.4236/gep.2026.145016">https://doi.org/10.4236/gep.2026.145016</self-uri>
      <abstract>
        <p>Extreme record-breaking hurricanes, followed by heavy rainfall and flooding, claim a lot of lives and billions of dollars’ worth of property damage every year in the Atlantic coastal areas of the United States. The Atlantic coast areas are most vulnerable to hurricane hazards, but not all the communities are equally vulnerable due to their varying degrees of exposure and coping abilities. Thus, it is of vital importance to learn the extent of vulnerability of different communities for prevention, preparedness, response, and recovery efforts. Many physical, statistical, and data-driven methods have been employed to predict geophysical area-centered vulnerability to landslides and floods, primarily using geophysical explanatory variables, but not hurricane-induced hazards. This study makes three key contributions. First, it integrates geophysical, demographic, and social media data to assess community-level vulnerability to hurricane hazards. Second, it applies a Random Forest framework to model vulnerability at the census block level, capturing non-linear interactions among predictors. Third, it provides empirical evidence on the relative importance of explanatory variables, highlighting the role of real-time social media data in disaster vulnerability assessment. The results indicate strong predictive performance (R<sup>2</sup> = 0.93) and identify tweets, roads, elevation, NDVI, and water bodies as the most influential variables. The findings highlight the importance of integrating geophysical, demographic, and real-time social media data for accurate vulnerability assessment. This approach provides a scalable framework for disaster preparedness and risk management in coastal regions.</p>
      </abstract>
      <kwd-group kwd-group-type="author-generated" xml:lang="en">
        <kwd>Mapping Vulnerable Communities</kwd>
        <kwd>Hurricane Hazards</kwd>
        <kwd>Remote Sensing</kwd>
        <kwd>Social Media</kwd>
        <kwd>Demographic Data</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec1">
      <title>1. Introduction</title>
      <p>Disaster is an overall consequence of a hazard event ([<xref ref-type="bibr" rid="B16">16</xref>]). Vulnerability is a function of exposure and coping ability, or people cannot deal with the hazards due to the physical and social backgrounds of the place of their residence ([<xref ref-type="bibr" rid="B28">28</xref>]). The vulnerability of communities varies with their coping ability, which is a combination of resistance and resilience ([<xref ref-type="bibr" rid="B24">24</xref>]). Levels of risk depend on the hazard intensity and levels of vulnerability. Therefore, the same hazard may have different impacts on different communities or places depending on their exposure and coping ability ([<xref ref-type="bibr" rid="B16">16</xref>]). Vulnerability has been conceptualized as pre-existing conditions that potentially expose humans to hazards, e.g., humans settled in hazardous areas. Loss of life and property is likely in the hazardous areas when there is a natural event. This is a vulnerability caused by biophysical settings of the area of residence ([<xref ref-type="bibr" rid="B24">24</xref>]). The second way of conceptualizing vulnerability is social vulnerability that stems from social marginalization due to age, race, disability, or income ([<xref ref-type="bibr" rid="B24">24</xref>]). Assessment of social or community vulnerability needs inclusion of selecting demographic data, essentially disability, vulnerable age groups (children, aged population), and poverty (economic factor) ([<xref ref-type="bibr" rid="B2">2</xref>]). The third approach is the vulnerability of places that combine biophysical as well as social risk within a specific geographic area to assess vulnerability ([<xref ref-type="bibr" rid="B24">24</xref>]). There are multiple frameworks to explain the root cause of vulnerability to natural disasters, from social conditions inherent in the community to the biophysical environment around the community, or a combination of both. It is crucial to consider a coupled human-environment system, associating it with the proximity to hazards to identify vulnerable communities ([<xref ref-type="bibr" rid="B8">8</xref>]).</p>
      <p>Identification and mapping of coastal communities at risk of hurricane flood hazards is crucial for every stage of disaster management, consisting of prevention, preparedness, response, and recovery. The use of remote sensing data analysis methods has been increasingly used for risk assessment from hurricane disasters ([<xref ref-type="bibr" rid="B13">13</xref>]). This is particularly promising given the increasing availability and high spatial resolution of remotely sensed data for hurricane risk assessment ([<xref ref-type="bibr" rid="B29">29</xref>]).</p>
      <p>The geotagged information from Twitter, Facebook, or Flickr has also been proven to be highly applicable in hurricane impact studies because they provide valuable information regarding geometries, attributes, and semantic information. The social media-generated data can have spatial patterns, and the social media posts during disaster strikes are closer to the affected areas, and they are likely to relate to the disaster effects. Therefore, social media-generated geographic information is a promising alternative to geospatial data for natural hazard analysis. For the natural disaster hazard model, the crowdsourced locations, messages, or images can be used as valuable complementary data ([<xref ref-type="bibr" rid="B16">16</xref>]).</p>
      <p>Statistical, physical, and data-driven (e.g., machine learning) models are typically used in the prediction of natural hazard events by modelling to assess the risks. Even though the physical models have great capabilities of predicting natural hazards and risks, they require datasets collected from the ground, intensive computation, and a high level of expertise. The most notable drawback of this modeling is that the prediction cannot be carried out in a short time frame because data collection efforts take a long time. Similarly, numerical prediction models have systematic errors ([<xref ref-type="bibr" rid="B18">18</xref>]). To overcome the shortcomings of these models, data-driven prediction modeling has been widely used. The strengths of the machine learning models are that they do not require knowledge of underlying physical processes, are quicker to develop, and allow fast training, validation, testing, and evaluation. Moreover, this approach has outperformed the conventional approaches with higher prediction accuracy, and data-driven algorithms can predict beyond the range of training datasets spatially and temporally ([<xref ref-type="bibr" rid="B18">18</xref>]). Artificial Neural Networks (ANNs), Multilayer Perceptron (MLP), Adaptive Neuro-Fuzzy Inference System (ANFIS), Wavelet Neural Network (WNN), Support Vector Machine (SVM), Decision Tree (DT), and Ensemble Prediction Systems (EPSs) are the algorithms that have the highest favorability among the natural hazards modeling community ([<xref ref-type="bibr" rid="B18">18</xref>]).</p>
      <p>In the hurricane hazard risk analyses literature, socio-economic variables have been preferred less than geophysical variables to analyze the vulnerability of coastal communities from hurricane hazards, although it is a multivariate non-linear problem. Thus, this work has utilized both types of variables in the analysis.</p>
      <p>High wind and storm surge, coupled with flooding, are the number one cause of infrastructure damage and loss of life and property in the coastal United States ([<xref ref-type="bibr" rid="B12">12</xref>]). Extreme weather events (e.g., hurricanes, floods, fires) are in an increasing trend. Consequently, the effects on life and property are expected to increase in the future ([<xref ref-type="bibr" rid="B4">4</xref>]; [<xref ref-type="bibr" rid="B14">14</xref>]), which will make coastal human communities more vulnerable ([<xref ref-type="bibr" rid="B14">14</xref>]).</p>
      <p>Consideration of the demographic scenario of coastal areas is vital in hurricane vulnerability analysis. About 94.7 million (29.1 percent of the total U.S. population) live in coastline regions, of which about 44.4 million people live in the Atlantic coastline. The Atlantic coastline had a 13.2 percent population growth between 2000 and 2017. The percentage population of 85 and older is higher in coastline counties compared to that of the United States ([<xref ref-type="bibr" rid="B7">7</xref>]). The population in Atlantic coastal areas is most vulnerable to hurricanes because the frequency of devastating hurricanes is high in this region. It is evident from the fact that eight hurricanes made landfall in Atlantic coastal areas between 2000 and 2017, each of which caused more than 10 billion worth of damage ([<xref ref-type="bibr" rid="B7">7</xref>]).</p>
      <p>Hurricane Florence is another disaster event of the most devastating hurricane event that occurred in the 2018 hurricane season. It lasted until September 18 since it made landfall on September 14 with a forward motion of about 3 - 4 miles per hour with a zone of tropical storm force winds nearly 400 miles wide ([<xref ref-type="bibr" rid="B9">9</xref>]). This hurricane was at the intensity of category 1 along the southeastern coast of North Carolina. It caused a total of 52 fatalities and estimated damage of approximately $24 billion, of which a significant portion of the loss was in North Carolina. About one million households lost power only in North Carolina. Numerous trees were uprooted due to the force of hurricane winds, but most of the damages to homes and commercial buildings were caused by freshwater flooding, with approximately 74,563 structures being flooded ([<xref ref-type="bibr" rid="B25">25</xref>]). The loss of agricultural farm products and livestock alone due to Hurricane Florence accounted for at least $1.1 billion ([<xref ref-type="bibr" rid="B9">9</xref>]).</p>
      <p>Florence produced 10 to more than 30 inches of rainfall in New Hanover County and surrounding areas due to slow movements and persistent rain bands before and after the hurricane made landfall by setting a new record of rainfall in two decades. This extreme rain resulted in record-breaking river floods across New Hanover County. Eighteen record-breaking peaks of streamflow were recorded in North Carolina, with some of them having the highest records since 1940 ([<xref ref-type="bibr" rid="B25">25</xref>]).</p>
      <p>New Hanover County is one of the coastline counties located in the tidewater area in the state of North Carolina. This county has a total area of 328 square miles (850 km<sup>2</sup>), of which 192 square miles (500 km<sup>2</sup>, 42%) is water (<ext-link ext-link-type="uri" xlink:href="https://en.wikipedia.org/wiki/New_Hanover_County,_North_Carolina">https://en.wikipedia.org/wiki/New_Hanover_County,_North_Carolina</ext-link>). This is one of the most densely populated counties in North Carolina, with a population of 227,198, and 95,097 households according to Census Bureau estimates of 2017 (<ext-link ext-link-type="uri" xlink:href="https://www.census.gov/acs/www/data/data-tables-and-tools/supplemental-tables/">https://www.census.gov/acs/www/data/data-tables-and-tools/supplemental-tables/</ext-link>). Wilmington is one of the largest cities in North Carolina, located in New Hanover County (<xref ref-type="fig" rid="fig1">Figure 1</xref> left).</p>
      <fig id="fig1">
        <label>Figure 1</label>
        <graphic xlink:href="https://html.scirp.org/file/2173728-rId18.jpeg?20260529044916" />
      </fig>
      <p><bold>Figure 1.</bold> New Hanover County, North Carolina (left), and hurricane Florence wind track and swath (right).</p>
      <p>New Hanover County is one of the hardest hit areas by Hurricane Florence on the North Carolina coast, where the worst flash floods were experienced in local history. Florence made landfall near Wrightsville Beach, which is in the coastal area of New Hanover County (<xref ref-type="fig" rid="fig1">Figure 1</xref>, right). In New Hanover County, up to 3 feet of flash flood water inundated Northchase, Writsboro, and Ogden neighborhoods. Downtown Wilmington was inundated by 2 feet of floodwater from the Cape Fear River. As a result, the entire county was generally isolated from the outside world due to access road closures for several days ([<xref ref-type="bibr" rid="B25">25</xref>]).</p>
      <p>Given the devastating hurricane, rainfall, and flooding events that occur frequently in the coastal United States, it is vital to learn the levels of vulnerability of different communities for damage prevention, preparedness, and rescue and recovery efforts. Moreover, it is necessary to know what variables should be given higher priority for the study. Similarly, whether the extensively used machine learning algorithm in prediction (mainly used to predict areas of potential landslides and flooding natural disasters), Random Forest (RF), can be useful to predict vulnerable communities from hurricane hazards. Thus, the objectives set for this study are as follows:</p>
      <p>1) To identify the level of vulnerability of different communities in coastal New Hanover County, North Carolina, from Hurricane Florence using geophysical, socio-economic, and social media-generated variables.</p>
      <p>2) To examine the usefulness and applicability of the Random Forests algorithm to make categorical predictions of vulnerability in coastal communities from hurricane hazards.</p>
      <p>The remainder of this paper is organized as follows. Section 2 describes the datasets used in the study. Section 3 outlines the methodology, including the Random Forest modeling framework and variable construction. Section 4 presents the results and analysis of vulnerability prediction. Section 5 concludes with key findings and directions for future research.</p>
    </sec>
    <sec id="sec2">
      <title>2. Data Used</title>
      <p>The datasets used in this study are listed here:</p>
      <p>Sentinel-2 multispectral data at a high resolution (10 m) were used for land-use/land-cover (LULC) classification and to calculate Normalized Difference Vegetation Index (NDVI).The digital elevation model (DEM) obtained from the U.S. Geological Survey (USGS) was used as an elevation dataset for the model.Road features were obtained from the North Carolina (NC) Department of Transportation (NCDOT).Major rivers distributed by the NC Center for Geographic Information and Analysis.American Community Survey (ACS) data at the block group level of age, disability, and poverty were used as demographic explanatory variables.Real-time Twitter stream during the Hurricane Florence period from September 14 to September 19, 2018.</p>
    </sec>
    <sec id="sec3">
      <title>3. Methodology</title>
      <sec id="sec3dot1">
        <title>3.1. Random Forest Classification and Regression</title>
        <p>Natural hazard risk prediction is a multivariate and non-linear task ([<xref ref-type="bibr" rid="B27">27</xref>]) due to the combined role of many disaster-inducing factors. Several methods and machine learning algorithms have been employed to solve predictive analysis, such as support vector machine (SVM), artificial neural networks (ANN), and decision trees (DT). The major weakness of these algorithms is their inability to estimate each conditioning factor’s contribution to the total risk ([<xref ref-type="bibr" rid="B27">27</xref>]). CART decision trees are greedy. Even with bagging, the trees can have structural similarities that will result in high correlation in predictions. However, in Random Forest (RF), the trees are uncorrelated or at least correlated because learning algorithms just select a random sample of features from a random sample of variables consistent with standard Random Forest methodology ([<xref ref-type="bibr" rid="B5">5</xref>]). RFs are modifications of classification and regression trees (CART) algorithms ([<xref ref-type="bibr" rid="B20">20</xref>]). It is a supervised classification and regression method of modelling that allows growing an ensemble of trees and letting them vote for the most occurring class as the predicted class ([<xref ref-type="bibr" rid="B5">5</xref>]). The RF is the algorithm that is capable of estimating the contribution of each factor to the total effect ([<xref ref-type="bibr" rid="B27">27</xref>]). The RF has high forecast accuracy, acceptable tolerance to outliers and noise, and easy avoidance of overfitting ([<xref ref-type="bibr" rid="B27">27</xref>]). RF algorithm generates numerous binary trees, which are collectively called forests ([<xref ref-type="bibr" rid="B19">19</xref>]). In the RF, trees grow based on a bootstrap sample. For each node, random subsets of samples are selected. The “out-of-bag” error rate is calculated using samples out of the bootstrap sample ([<xref ref-type="bibr" rid="B19">19</xref>]). Mean decreases in accuracy and mean in the Gini are calculated in the process, which is then used to calculate the variable importance score ([<xref ref-type="bibr" rid="B19">19</xref>]).</p>
        <p>Given an observation, for each tree in the model, RF predicts the outcome using a tree applied to an observation and stores the outcome as a list. If the model is a classifier, it returns the maximum count. If the model is a regression, it returns an average as depicted in <xref ref-type="fig" rid="fig2">Figure 2</xref>, consistent with standard ensemble learning frameworks </p>
        <p>The RF algorithm uses a parallel ensemble method called “bagging” or bootstrap aggregation to generate classifiers. This is a method that averages multiple estimates that are measured from random subsamples of variables. A subset of observations is selected at random to form a subsample and used to train the model. The process is repeated to select the subset of samples from the original observation until the specified number of trees is reached. This process is known as bootstrapping, consistent with standard Random Forest methodology ([<xref ref-type="bibr" rid="B5">5</xref>]). Random Forests are built by: specifying the number of trees, </p>
        <fig id="fig2">
          <label>Figure 2</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId19.jpeg?20260529044921" />
        </fig>
        <p><bold>Figure 2.</bold> Random Forest process flow.</p>
        <p>specifying the number of variables, specifying the number of features (columns) to be used in each tree. Then, for each tree: select some samples with replacement from all observations (rows), select given number of features randomly, train a decision tree with selected samples and features ([<xref ref-type="bibr" rid="B5">5</xref>]). Select a specified number of samples from the original dataset, which is known as bootstrap samples. Randomly select variables from the sample for each node split. An unpruned classification tree is grown for each bootstrap sample. All the trees are aggregated to predict the new label by majority votes ([<xref ref-type="bibr" rid="B1">1</xref>]).</p>
        <p><bold>Variable</bold><bold>importance</bold></p>
        <p>Mean decrease accuracy and mean decrease Gini are widely used for measuring, ranking, and selecting variable importance ([<xref ref-type="bibr" rid="B19">19</xref>]). Often in regression problems, the drop in the sum of squared errors, and classification problems, the Gini impurity is commonly used to evaluate node purity in tree-based models and is widely applied in Random Forest algorithms ([<xref ref-type="bibr" rid="B5">5</xref>]; [<xref ref-type="bibr" rid="B11">11</xref>]). The greater the impurity, the greater the importance of the variable, as commonly established in tree-based learning methods ([<xref ref-type="bibr" rid="B5">5</xref>]; [<xref ref-type="bibr" rid="B11">11</xref>]). The Gini impurity is computed by summing the probability of each item chosen multiplied by the probability of an error to classify that item into the correct class ([<xref ref-type="bibr" rid="B1">1</xref>]). The Gini impurity is obtained by the following algorithm (Equation (1)).</p>
        <disp-formula id="FD1">
          <label>(1)</label>
          <mml:math>
            <mml:mrow>
              <mml:mi>G</mml:mi>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mi>k</mml:mi>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:mo>=</mml:mo>
              <mml:mstyle displaystyle="true">
                <mml:msubsup>
                  <mml:mo>∑</mml:mo>
                  <mml:mrow>
                    <mml:mi>i</mml:mi>
                    <mml:mo>=</mml:mo>
                    <mml:mn>1</mml:mn>
                  </mml:mrow>
                  <mml:mi>n</mml:mi>
                </mml:msubsup>
                <mml:mrow>
                  <mml:mi>P</mml:mi>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mi>i</mml:mi>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                  <mml:mo>×</mml:mo>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:mn>1</mml:mn>
                      <mml:mo>−</mml:mo>
                      <mml:mi>P</mml:mi>
                      <mml:mrow>
                        <mml:mo>(</mml:mo>
                        <mml:mi>i</mml:mi>
                        <mml:mo>)</mml:mo>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mstyle>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>The Gini impurity of the parent node is higher than that of the child node ([<xref ref-type="bibr" rid="B27">27</xref>]). The Gini decrease of each explanatory variable is combined to estimate the total contribution of it in the prediction of vulnerability ([<xref ref-type="bibr" rid="B27">27</xref>]). The variable importance is calculated by the given formula (Equation (2)).</p>
        <disp-formula id="FD2">
          <label>(2)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:msub>
                <mml:mi>P</mml:mi>
                <mml:mi>k</mml:mi>
              </mml:msub>
              <mml:mo>=</mml:mo>
              <mml:mfrac>
                <mml:mrow>
                  <mml:mstyle displaystyle="true">
                    <mml:msubsup>
                      <mml:mo>∑</mml:mo>
                      <mml:mrow>
                        <mml:mi>i</mml:mi>
                        <mml:mo>=</mml:mo>
                        <mml:mn>1</mml:mn>
                      </mml:mrow>
                      <mml:mi>n</mml:mi>
                    </mml:msubsup>
                    <mml:mrow>
                      <mml:mstyle displaystyle="true">
                        <mml:msubsup>
                          <mml:mo>∑</mml:mo>
                          <mml:mrow>
                            <mml:mi>j</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>1</mml:mn>
                          </mml:mrow>
                          <mml:mi>t</mml:mi>
                        </mml:msubsup>
                        <mml:mrow>
                          <mml:msub>
                            <mml:mi>D</mml:mi>
                            <mml:mrow>
                              <mml:mi>G</mml:mi>
                              <mml:mi>k</mml:mi>
                              <mml:mi>i</mml:mi>
                              <mml:mi>j</mml:mi>
                            </mml:mrow>
                          </mml:msub>
                        </mml:mrow>
                      </mml:mstyle>
                    </mml:mrow>
                  </mml:mstyle>
                </mml:mrow>
                <mml:mrow>
                  <mml:mstyle displaystyle="true">
                    <mml:msubsup>
                      <mml:mo>∑</mml:mo>
                      <mml:mrow>
                        <mml:mi>k</mml:mi>
                        <mml:mo>=</mml:mo>
                        <mml:mn>1</mml:mn>
                      </mml:mrow>
                      <mml:mi>m</mml:mi>
                    </mml:msubsup>
                    <mml:mrow>
                      <mml:mstyle displaystyle="true">
                        <mml:msubsup>
                          <mml:mo>∑</mml:mo>
                          <mml:mrow>
                            <mml:mi>i</mml:mi>
                            <mml:mo>=</mml:mo>
                            <mml:mn>1</mml:mn>
                          </mml:mrow>
                          <mml:mi>n</mml:mi>
                        </mml:msubsup>
                        <mml:mrow>
                          <mml:msub>
                            <mml:mi>D</mml:mi>
                            <mml:mrow>
                              <mml:mi>G</mml:mi>
                              <mml:mi>k</mml:mi>
                              <mml:mi>i</mml:mi>
                              <mml:mi>j</mml:mi>
                            </mml:mrow>
                          </mml:msub>
                        </mml:mrow>
                      </mml:mstyle>
                    </mml:mrow>
                  </mml:mstyle>
                </mml:mrow>
              </mml:mfrac>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>where <italic>P</italic><italic><sub>k</sub></italic> is the variable importance, <italic>m</italic> = the total number of explanatory variables, <italic>n</italic> = the total number of classification trees, <italic>t</italic>= the total number of nodes, <italic>D</italic><italic><sub>Gkij</sub></italic> = Gini decrease value of the <italic>j</italic><italic><sup>th</sup></italic> node in the <italic>i</italic><italic><sup>th</sup></italic> tree that belongs to the <italic>k</italic><italic><sup>th</sup></italic> variable. Mean-squared error (MSE) is obtained by the given equation (Equation (3)).</p>
        <disp-formula id="FD3">
          <label>(3)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mi>ε</mml:mi>
              <mml:mo>=</mml:mo>
              <mml:msup>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:msub>
                        <mml:mi>V</mml:mi>
                        <mml:mrow>
                          <mml:mtext>observed</mml:mtext>
                        </mml:mrow>
                      </mml:msub>
                      <mml:mo>−</mml:mo>
                      <mml:msub>
                        <mml:mi>V</mml:mi>
                        <mml:mrow>
                          <mml:mtext>response</mml:mtext>
                        </mml:mrow>
                      </mml:msub>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
                <mml:mn>2</mml:mn>
              </mml:msup>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>where ε is the mean squared error, <italic>V</italic><sub>observed</sub> is the variable from observed data, and <italic>V</italic><sub>response</sub> is the variable from the result ([<xref ref-type="bibr" rid="B17">17</xref>]).</p>
        <p><bold>Out-of-Bag</bold><bold>(OOB)</bold><bold>error</bold></p>
        <p>Each tree in the random forest is constructed from a random sample of observations, usually called bootstrap samples. The observations that are left out from constructing a tree during the classification process are called “out-of-bag” (OOB) observations, i.e., unseen data in classification (or out of bootstrap samples). Therefore, each tree is constructed from different samples from the whole dataset. The prediction for an observation made from the trees for which the observation was not used. The error rate that is estimated from these predictions is known as OOB error ([<xref ref-type="bibr" rid="B1">1</xref>]; [<xref ref-type="bibr" rid="B15">15</xref>]).</p>
      </sec>
      <sec id="sec3dot2">
        <title>3.2. Training Features</title>
        <p>Ground observations from social media were used for training. NAPSG Foundation, GISCorps, and CEDR Digital maintained a Story Map displaying 2018 hurricane crowdsourced photos collected from Instagram, Twitter, Facebook, and online news media (<ext-link ext-link-type="uri" xlink:href="https://napsg.maps.arcgis.com/apps/StoryMapCrowdsource/index.html?appid=69b95886cf8e49a3a349c9d550174a91">https://napsg.maps.arcgis.com/apps/StoryMapCrowdsource/index.html?appid=69b95886cf8e49a3a349c9d550174a91</ext-link>). This is the collection of photos with a brief description of events by social media users illustrating the incidences (e.g., hurricane impact, hurricane intensity, damage, storm surge, flooding, rescue efforts, etc.) during and after the hurricane event. After careful observation of photos and their descriptions, they were classified into four different levels of severity as class vulnerability levels (categories) and assigned the numbers from 1 to 4 - 1 being the most at risk (highest vulnerability) location and 4 being the least at risk (lowest or no vulnerability) location. Within the study area, 99 locations were identified from the story map that were appropriate for training input. Similarly, emergency shelters, shelter locations designated by New Hanover County to evacuate county residents during natural disaster emergencies, including hurricanes and floods. These were considered as no-risk or least risk locations, which were assigned to 5 and 6 in the vulnerability category. More locations were identified by observing satellite imagery and flood maps during Hurricane Florence and assigned numbers from 1 to 6 vulnerability categories depending on the severity of the impact observed. A total of 273 locations were identified for input as training features. These training features with vulnerability labels are summarized in <bold>Table 1</bold> and displayed on a map in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p>
        <p><bold>Table 1.</bold> Vulnerability categories and the number of locations for model training.</p>
        <table-wrap id="tbl1">
          <label>Table 1</label>
          <table>
            <tbody>
              <tr>
                <td>Vulnerability Category</td>
                <td>
                </td>
                <td>Number of Locations</td>
              </tr>
              <tr>
                <td>1</td>
                <td>Very High</td>
                <td>59</td>
              </tr>
              <tr>
                <td>2</td>
                <td>High</td>
                <td>77</td>
              </tr>
              <tr>
                <td>3</td>
                <td>Moderate</td>
                <td>51</td>
              </tr>
              <tr>
                <td>4</td>
                <td>Low</td>
                <td>31</td>
              </tr>
              <tr>
                <td>5</td>
                <td>Relatively Low</td>
                <td>17</td>
              </tr>
              <tr>
                <td>6</td>
                <td>Very Low</td>
                <td>38</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <fig id="fig3">
          <label>Figure 3</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId27.jpeg?20260529044922" />
        </fig>
        <p><bold>Figure 3.</bold> Training point features corresponding to <bold>Table 1</bold> (left), and prediction polygon features (census blocks, right).</p>
        <p>The distribution of training samples across vulnerability categories (<bold>Table 1</bold>) indicates moderate class imbalance, with fewer observations in extreme categories. This imbalance may influence model training by biasing predictions toward more frequent classes. While Random Forest is relatively robust to class imbalance due to its ensemble structure, imbalanced class distributions can still affect predictive performance ([<xref ref-type="bibr" rid="B6">6</xref>]). Techniques such as stratified sampling, class weighting, or balanced subsampling could further improve model reliability.</p>
        <p>Future work will explicitly address class imbalance and evaluate performance differences across categories, particularly for extreme vulnerability levels.</p>
      </sec>
      <sec id="sec3dot3">
        <title>3.3. Labeling Criteria and Quality Control</title>
        <p>Vulnerability categories (1 - 6) were assigned based on observable damage severity and contextual information from crowdsourced images, descriptions, and satellite imagery. The classification followed these general rules:</p>
        <p><bold>Category</bold><bold>1</bold><bold>(Very</bold><bold>High):</bold> severe structural damage, deep flooding, life-threatening conditions;<bold>Category</bold><bold>2</bold><bold>(High):</bold> significant flooding or infrastructure damage;<bold>Category</bold><bold>3</bold><bold>(Moderate):</bold> localized flooding or moderate disruption;<bold>Category</bold><bold>4</bold><bold>(Low):</bold> minimal visible damage;<bold>Category</bold><bold>5 - 6</bold><bold>(Very</bold><bold>Low):</bold> safe zones such as shelters or unaffected areas.</p>
        <p>To reduce subjectivity, ambiguous cases were reviewed iteratively and assigned based on consensus interpretation of image evidence and contextual metadata. A subset of locations was re-evaluated to ensure consistency in labeling decisions.</p>
        <p>These photos are powered by NAPSG Foundation, GISCorps, and CEDR Digital, a Story Map (upper left), a general map with a cluster of locations with impacted locations; the map showing the location of the damaged gas station in Wilmington, NC on 9/14/2018 (upper right), map showing location of a downed tree on a house on 9/14/2018 (lower left); location on map and an abandoned car in Wilmington, NC on 9/15/2018 (lower right).</p>
      </sec>
      <sec id="sec3dot4">
        <title>3.4. Prediction Polygon Features</title>
        <p>This is a feature that represents polygons to receive the results of the predictions made by the models. Since the goal of this work is to make predictions for the vulnerability of communities, the census blocks would be ideal polygon features to predict on because census blocks are the areas that encompass small communities with distinctive geophysical and demographic similarities. New Hanover County consists of 5069 census blocks as delineated by the US Census Bureau in the 2010 census (<xref ref-type="fig" rid="fig3">Figure 3</xref>, right).</p>
      </sec>
      <sec id="sec3dot5">
        <title>3.5. Explanatory Variables</title>
        <p>The vulnerability is due to one or a combination of multiple geophysical, demographic, or socio-economic conditions of people or places. These conditions, information generated regarding these conditions, and information regarding the hurricane itself can be defined as explanatory variables for hurricane disaster analysis. There is no consensus as to which factors should be given higher priority when deciding the level of vulnerability of communities from hurricanes in coastal areas ([<xref ref-type="bibr" rid="B3">3</xref>]). This work used a combination of geophysical, demographic, and social media-generated information as explanatory variables, as discussed in forthcoming sections.</p>
        <p><bold>Geophysical</bold><bold>variables</bold></p>
        <p>1) Land use/land cover</p>
        <p>Sentinel-2, a sensor developed by the European Space Agency (ESA), provides high-resolution (10 m) multispectral imagery for surface reflectance, which was used for land-use/land-cover (LULC) classification. The imagery was classified using a Semi-automatic Classification Plugin for QGIS version 2.18. Out of 12 Sentinel-2 spectral bands, bands 1 (coastal aerosol), 9 (water vapor), and 10 (cirrus) were excluded from the classification dataset. The imagery was classified into nine different land-use and land-cover classes: a) forest, b) ocean, c) river, d) lake/pond, e) road, f) residential, g) agricultural, h) commercial, and i) marsh. The maximum likelihood algorithm was used to classify the imagery, which calculates the probability distribution for the classes, based on the Bayesian theorem, to determine which pixel belongs to the land cover class in training ([<xref ref-type="bibr" rid="B22">22</xref>]). The classified output raster was then resampled to 30 m to reduce the number of pixels to synchronize with the processing ability of ArcGIS Pro, Forest-based Classification and Regression tool.</p>
        <p>2) Elevation</p>
        <p>Digital elevation model (DEM) dataset at 1/9 arc seconds (approximately 1 m) resolution obtained from the 3D Elevation Program (3DEP) of the USGS National Map Services (<ext-link ext-link-type="uri" xlink:href="https://www.usgs.gov/core-science-systems/ngp/3dep">https://www.usgs.gov/core-science-systems/ngp/3dep</ext-link>) was used as an elevation dataset for the model. The DEM was resampled to 30 m to overcome the computational limitation of the tool. The elevation of New Hanover County ranges from 0 m to 30 m.</p>
        <p>3) Slope</p>
        <p>The slope tells the steepness of a raster surface. The slope was calculated in degrees using the DEM data discussed in the previous section. The planar method parameter was used, where the slope is measured as the maximum rate of change in value from a cell to its immediate neighbors. The following slope algorithm was used (Equation (4)).</p>
        <disp-formula id="FD4">
          <label>(4)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mtext>Slope degrees</mml:mtext>
              <mml:mo>=</mml:mo>
              <mml:mtext>ATAN</mml:mtext>
              <mml:msup>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:msup>
                        <mml:mrow>
                          <mml:mrow>
                            <mml:mo>[</mml:mo>
                            <mml:mrow>
                              <mml:mrow>
                                <mml:mrow>
                                  <mml:mtext>d</mml:mtext>
                                  <mml:mi>z</mml:mi>
                                </mml:mrow>
                                <mml:mo>/</mml:mo>
                                <mml:mrow>
                                  <mml:mtext>d</mml:mtext>
                                  <mml:mi>x</mml:mi>
                                </mml:mrow>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo>]</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                        <mml:mn>2</mml:mn>
                      </mml:msup>
                      <mml:mo>+</mml:mo>
                      <mml:msup>
                        <mml:mrow>
                          <mml:mrow>
                            <mml:mo>[</mml:mo>
                            <mml:mrow>
                              <mml:mrow>
                                <mml:mrow>
                                  <mml:mtext>d</mml:mtext>
                                  <mml:mi>z</mml:mi>
                                </mml:mrow>
                                <mml:mo>/</mml:mo>
                                <mml:mrow>
                                  <mml:mtext>d</mml:mtext>
                                  <mml:mi>y</mml:mi>
                                </mml:mrow>
                              </mml:mrow>
                            </mml:mrow>
                            <mml:mo>]</mml:mo>
                          </mml:mrow>
                        </mml:mrow>
                        <mml:mn>2</mml:mn>
                      </mml:msup>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mn>1</mml:mn>
                    <mml:mo>/</mml:mo>
                    <mml:mn>2</mml:mn>
                  </mml:mrow>
                </mml:mrow>
              </mml:msup>
              <mml:mo>∗</mml:mo>
              <mml:mrow>
                <mml:mrow>
                  <mml:mn>180</mml:mn>
                </mml:mrow>
                <mml:mo>/</mml:mo>
                <mml:mi>π</mml:mi>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>where <inline-formula><mml:math><mml:mrow><mml:mfrac><mml:mrow><mml:mtext> d </mml:mtext><mml:mi> z </mml:mi></mml:mrow><mml:mrow><mml:mtext> d </mml:mtext><mml:mi> x </mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></inline-formula> is the rate of change in the <italic>x</italic>-direction, and <inline-formula><mml:math><mml:mrow><mml:mfrac><mml:mrow><mml:mtext> d </mml:mtext><mml:mi> z </mml:mi></mml:mrow><mml:mrow><mml:mtext> d </mml:mtext><mml:mi> y </mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:math></inline-formula> is the rate of change in the <italic>y</italic>-direction. The slope indicates the topographic change and variability of the surface. A lower slope means a flatter surface, which has a higher risk of flooding ([<xref ref-type="bibr" rid="B27">27</xref>]). The slope raster used as input is shown in the map.</p>
        <p>4) Stream Power Index (SPI)</p>
        <p>Stream Power Index (SPI) is a measure of the power of flowing water on the terrain surface. The higher the stream power index, the more erosion it can cause downstream. Stream power is a hydrological factor that can condition or explain how damaging the flood would be ([<xref ref-type="bibr" rid="B27">27</xref>]; [<xref ref-type="bibr" rid="B17">17</xref>]). The SPI was calculated from slope and flow accumulation raster datasets obtained from terrain analysis of digital elevation models. The percent rise slope was used to calculate SPI by the formula in the ArcGIS raster calculator (Equation (5)).</p>
        <disp-formula id="FD5">
          <label>(5)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mtext>SPI</mml:mtext>
              <mml:mo>=</mml:mo>
              <mml:mtext>ln</mml:mtext>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:mtext>Flow accumulation raster</mml:mtext>
                  <mml:mo>+</mml:mo>
                  <mml:mn>0.001</mml:mn>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
              <mml:mo>∗</mml:mo>
              <mml:mrow>
                <mml:mo>(</mml:mo>
                <mml:mrow>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:mrow>
                        <mml:mrow>
                          <mml:mtext>Slope raster</mml:mtext>
                        </mml:mrow>
                        <mml:mo>/</mml:mo>
                        <mml:mrow>
                          <mml:mn>100</mml:mn>
                        </mml:mrow>
                      </mml:mrow>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                  <mml:mo>+</mml:mo>
                  <mml:mn>0.001</mml:mn>
                </mml:mrow>
                <mml:mo>)</mml:mo>
              </mml:mrow>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>5) Normalized Difference Vegetation Index (NDVI)</p>
        <p>Normalized Difference Vegetation Index (NDVI) measures the difference between near-infrared (NIR) and red values of wavelengths. NDVI values range from −1 to 1. Healthy vegetation has the highest NDVI value, i.e., inclined towards 1, and water is inclined towards −1. Other land cover values fall between these two extremes depending on the type, growth, soil moisture, and presence or absence of vegetation, snow, and soil roughness ([<xref ref-type="bibr" rid="B27">27</xref>]). NDVI of the area of interest was computed from the Sentinel-2 imagery bands, Band 4 (Red) and Band 8 (NIR), as given by the formula in Equation (6) below. NDVI input is as shown in the map.</p>
        <disp-formula id="FD6">
          <label>(6)</label>
          <mml:math display="inline">
            <mml:mrow>
              <mml:mtext>NDVI</mml:mtext>
              <mml:mo>=</mml:mo>
              <mml:mfrac>
                <mml:mrow>
                  <mml:mtext>NIR</mml:mtext>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:mtext>Band 8</mml:mtext>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                  <mml:mo>−</mml:mo>
                  <mml:mtext>Red</mml:mtext>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:mtext>Band 4</mml:mtext>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
                <mml:mrow>
                  <mml:mtext>NIR</mml:mtext>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:mtext>Band 8</mml:mtext>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                  <mml:mo>+</mml:mo>
                  <mml:mtext>Red</mml:mtext>
                  <mml:mrow>
                    <mml:mo>(</mml:mo>
                    <mml:mrow>
                      <mml:mtext>Band 4</mml:mtext>
                    </mml:mrow>
                    <mml:mo>)</mml:mo>
                  </mml:mrow>
                </mml:mrow>
              </mml:mfrac>
            </mml:mrow>
          </mml:math>
        </disp-formula>
        <p>6) Major roads</p>
        <p>Major roads play a critical role before, during, and after natural disasters from the perspective of evacuations, rescue, and recovery needs. The more roads and the wider roads are closer to a settlement, the easier it becomes to evacuate and provide post-event assistance. As a result, the communities could become safer from the impacts of hurricanes and floods. Thus, road features can be considered marked variables to explain vulnerability prediction. Road features were obtained from the North Carolina Department of Transportation (NCDOT).</p>
        <p>7) Water features</p>
        <p>The NC Center for Geographic Information and Analysis distributes the major hydrography data that includes major rivers and water bodies (lakes, ponds, dams, etc.) and floodwaters. Rivers and other water features are the areas where floods surge during hurricanes and heavy rainfall. People near these water features could be in danger of being affected by floods. For this reason, this is an important addition to the list of explanatory variables of vulnerability prediction.</p>
        <p><bold>Demographic</bold><bold>variables</bold></p>
        <p>As shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>, poverty, gender, race, ethnicity, age, and disability are demographic indicators of social vulnerability. Poor, women, children, people with disabilities, and aged people are vulnerable because of their inability to have access to resources that they need to protect themselves when disaster strikes and recover in the aftermath of disaster ([<xref ref-type="bibr" rid="B24">24</xref>]). American Community Survey (ACS) 2017 data at block group levels of age, disability, and poverty were used as demographic explanatory variables. Age groups 0 - 14 and 65 plus, the population with disability, and the population with poverty are considered more vulnerable than the rest of the population in the event of natural disasters such as hurricanes.</p>
        <fig id="fig4">
          <label>Figure 4</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId39.jpeg?20260529044928" />
        </fig>
        <p><bold>Figure 4.</bold> Workflow for hurricane vulnerability mapping and prediction.</p>
        <p><bold>Social media variables</bold></p>
        <p>Social media is a fundamental tool for individuals to access and disseminate real-time information regarding storm intensity, damage, safety, evacuation, and recovery during disaster events ([<xref ref-type="bibr" rid="B10">10</xref>]; [<xref ref-type="bibr" rid="B16">16</xref>]). The “tweets” variable represents the spatial presence of geotagged Twitter posts during Hurricane Florence. Specifically, the predictor was constructed as the count of geotagged tweets within proximity to each location, serving as a proxy for real-time human-reported impact intensity. Tweets were filtered using the keyword “#Florence” and restricted to geolocated posts within New Hanover County during September 14-19, 2018. A total of 65 geotagged tweets were retained after filtering.</p>
        <p>Given the limited number of observations, this variable may be subject to spatial and sampling bias. Therefore, its influence on model performance should be interpreted cautiously. Future work will explore alternative representations such as kernel density estimation, spatial smoothing, or population-normalized tweet intensity to improve robustness.</p>
      </sec>
    </sec>
    <sec id="sec4">
      <title>4. Results</title>
      <p>The Random Forests model was created and applied. The explanatory distance variables and explanatory raster variables described in the previous section were used to predict hurricane flood vulnerability by the RF regression model. The model was constructed based on “vulnerability levels,” which are variables to predict. Variables to predict were represented as six ordered vulnerability levels from 1 to 6 (1 indicating the most vulnerable and 6 indicating the least vulnerable to hurricane hazards) as an attribute in training features. A combination of thirteen vector and raster geospatial datasets that conceivably would explain the vulnerability of communities in New Hanover County, North Carolina, was used as input variables in the analysis.</p>
      <p>Explanatory variables were calculated by first finding distances from the nearest input distance features to each of the input training features. Explanatory variables are extracted from the raster input dataset for each point location. The distance attributes were calculated from the training feature to the closest segments of the polygons or lines of explanatory variables. The explanatory variables were then used to construct a model and predict the vulnerability of communities using census blocks as prediction areas. The input variables for training include LULC, elevation, NDVI, SPI (<xref ref-type="fig" rid="fig5">Figure 5</xref>), slope, major roads, major rivers, water bodies (<xref ref-type="fig" rid="fig6">Figure 6</xref>), poverty, disability, children, and aged population (<xref ref-type="fig" rid="fig7">Figure 7</xref>), and tweets (<xref ref-type="fig" rid="fig8">Figure 8</xref>), as shown below:</p>
      <sec id="sec4dot1">
        <title>Regression Results and Analysis</title>
        <p>The two-thousand decision trees parameter was found to be optimal during model construction. The predictions from regression models were made to census blocks to produce predicted vulnerability output corresponding to vulnerability levels in input training features. Explanatory variables were calculated from the distance feature and raster datasets.</p>
        <p>Thirty percent of the training data was excluded from training the model for validation. After the model is trained, the validation data are used to predict the values of the test data; the predicted values are then compared to the observed values to provide a measure of prediction accuracy based on data that were not included in the training process.</p>
        <p>The validation strategy employed a random 70/30 split, which may lead to optimistic performance estimates in spatial datasets due to spatial autocorrelation. In geographically structured data, nearby observations tend to share similar characteristics, potentially inflating predictive accuracy ([<xref ref-type="bibr" rid="B23">23</xref>]).</p>
        <fig id="fig5">
          <label>Figure 5</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId40.jpeg?20260529044931" />
        </fig>
        <p><bold>Figure 5.</bold> Explanatory geophysical variables: (a) land use/land cover, (b) elevation, (c) NDVI, and (d) SPI.</p>
        <fig id="fig6">
          <label>Figure 6</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId41.jpeg?20260529044930" />
        </fig>
        <p><bold>Figure 6.</bold> Explanatory variables: (a) slope, (b) major roads, (c) major rivers, and (d) water bodies.</p>
        <fig id="fig7">
          <label>Figure 7</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId42.jpeg?20260529044930" />
        </fig>
        <p><bold>Figure 7.</bold> Explanatory demographic variables: (a) poverty, (b) disability, (c) children, and (d) aged population.</p>
        <fig id="fig8">
          <label>Figure 8</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId43.jpeg?20260529044929" />
        </fig>
        <p><bold>Figure 8.</bold> Explanatory variables: tweets.</p>
        <p>A more robust approach would involve spatial cross-validation techniques, such as spatial blocking or leave-area-out validation, to better assess generalization across geographically distinct regions ([<xref ref-type="bibr" rid="B26">26</xref>]). Due to data and computational constraints, this was not implemented in the current study; however, future work will incorporate spatial validation frameworks to provide more conservative and realistic performance estimates.</p>
        <p>The leaf size parameter is the number of observations required to keep a terminal node without further splitting. The minimum leaf size parameter set for this regression model was 5, i.e., the tree stopped growing after it had a minimum observation of 5 at its terminal node. Tree depth means the depth of each tree in the tree. The tree depths in the forest range between 0 - 18 (this is data-driven), with having mean tree depth of five. The number of data points available to form per tree was set to 100%, and the number of randomly sampled variables for each tree was 3 (the square root of the total number of variables). The percentage of data excluded for validation for the regression model was 30.</p>
        <p>Variable importance is a measure of how important a variable is in prediction. Complex interactions among the variables determine Random Forests. It is determined by looking at how much the prediction error increases when data for that variable is permuted while all others are left unchanged. The calculations are carried out for each tree.</p>
        <p>Mean decrease in accuracy measures to which a variable contributes to the mean decrease in accuracy of prediction during the OOB error calculation. The variables with a large mean decrease in accuracy are more important for classification. The more the accuracy of the random forest decreases due to the exclusion (permutation) of a single variable, the more important that variable is considered. The mean decrease in Gini measures how each variable contributes to the homogeneity of the nodes. Each time a particular variable is used to split a node, the Gini for the child nodes is calculated and compared to that of the original node. Variables that result in nodes with higher purity have a higher decrease in Gini.</p>
        <p>Mean squared error (MSE) is the average squared difference between the predicted values and the actual values. This is a measure of the quality of the model. The values closer to zero are better. In this model, the MSE for the number of trees 1000 and 2000 are 1.728 and 1.729, respectively. While doubling the number of trees, the error decreased, but not significantly for the regression model.</p>
        <p>The percent of variation explained is the determination of the degree of relationship in the patterns of variation, or how well the variation of one variable explains the variation of the other variable. The coefficient of determination, R<sup>2</sup>, is a measure of the variation explained. The higher the value of R<sup>2</sup>, the higher the predictive value of the regression. In this study, R<sup>2</sup> is 0.931 from the training and only 0.610 from the validation. The percent of variation explained may vary as the number of trees parameter is changed. In this analysis, the percent of variation explained is 1.728 and 1.719 for 1000 and 2000 trees respectively, a slight increase when the number of trees is doubled, indicating that predictive ability of the model increased as the number of tree parameter has increased from 1000 to 2000, but also it appears that it did not make a remarkable difference in the ability of model to predict. The difference between training (R<sup>2</sup> = 0.931) and validation performance (R<sup>2</sup> = 0.610) suggests moderate overfitting, which is common in spatial machine learning models with complex interactions. Despite this, the model retains acceptable predictive performance on unseen data, indicating reasonable generalization ability.</p>
        <p>Since the outcome variable represents ordered categories, additional evaluation metrics such as Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) would provide more interpretable measures of prediction error compared to R<sup>2</sup> alone ([<xref ref-type="bibr" rid="B11">11</xref>]). Furthermore, if formulated as a classification problem, performance could be evaluated using per-class accuracy and confusion matrices to distinguish between minor and major misclassification errors ([<xref ref-type="bibr" rid="B21">21</xref>]).</p>
        <p>In this study, R<sup>2</sup> and MSE are reported to maintain consistency with the regression framework; however, future extensions will incorporate these additional metrics to better align evaluation with the ordinal nature of vulnerability. <italic>P</italic>-value in regression analysis measures the relationship between the change in the predictor and response variables. Higher <italic>P</italic>-values mean the response variable is insignificant for prediction, and a value as low as (or lower than) &lt;0.05 indicates a significant relationship with the predicted outcome. <italic>P</italic>-value in this analysis is zero (0), which means that the variables used are statistically significant, having a decent relationship with the predicted outcome.</p>
        <p>Variable importance ranked in <bold>Table 2</bold> and <xref ref-type="fig" rid="fig9">Figure 9</xref> shows the contribution of each explanatory variable to predict the vulnerability situation in the study area from Hurricane Florence using a regression model. Tweets, roads, elevation, and NDVI have the highest contribution for predicting the vulnerable communities, whereas water body, land use/land cover, slope, demographic variables, and Stream Power Index (SPI) have a moderate contribution.</p>
        <p>Table 2. Variable importance output from the RF regression.</p>
        <table-wrap id="tbl2">
          <label>Table 2</label>
          <table>
            <tbody>
              <tr>
                <td>Variable</td>
                <td>Importance</td>
                <td>%</td>
              </tr>
              <tr>
                <td>TWEETS</td>
                <td>92.30</td>
                <td>18</td>
              </tr>
              <tr>
                <td>ROADS</td>
                <td>58.34</td>
                <td>11</td>
              </tr>
              <tr>
                <td>ELEV</td>
                <td>53.66</td>
                <td>10</td>
              </tr>
              <tr>
                <td>NDVI</td>
                <td>51.53</td>
                <td>10</td>
              </tr>
              <tr>
                <td>WTRBODY</td>
                <td>44.64</td>
                <td>9</td>
              </tr>
              <tr>
                <td>RIVERS</td>
                <td>43.56</td>
                <td>9</td>
              </tr>
              <tr>
                <td>LULC</td>
                <td>34.23</td>
                <td>7</td>
              </tr>
              <tr>
                <td>SLOPE30</td>
                <td>31.51</td>
                <td>6</td>
              </tr>
              <tr>
                <td>AGE</td>
                <td>28.74</td>
                <td>6</td>
              </tr>
              <tr>
                <td>DISABILITY</td>
                <td>27.26</td>
                <td>5</td>
              </tr>
              <tr>
                <td>POVERTY</td>
                <td>23.72</td>
                <td>5</td>
              </tr>
              <tr>
                <td>SPI_INDX</td>
                <td>21.71</td>
                <td>4</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <fig id="fig9">
          <label>Figure 9</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId44.jpeg?20260529044929" />
        </fig>
        <p><bold>Figure 9.</bold> Summary of variable importance from the regression model.</p>
        <p>The regression analysis predicted that approximately 10 percent of census blocks (482) would be classified as vulnerability level 1, approximately 47 percent (2311) as vulnerability level 2, and 31 percent (1538) as vulnerability level 3. The rest would be categorized as four or five (<xref ref-type="fig" rid="fig10">Figure 10</xref>). </p>
        <fig id="fig10">
          <label>Figure 10</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId45.jpeg?20260529044930" />
        </fig>
        <p><bold>Figure 10.</bold> Predicted categories for different explanatory variables on census blocks from the RF regression model.</p>
        <p><xref ref-type="fig" rid="fig10">Figure 10</xref> shows vulnerability categories by explanatory variables. This shows that about 10% of the communities had the highest level of vulnerability, nearly 47 percent of the communities corresponding to the census blocks had a high level of vulnerability, about 31 percent of the communities were moderately vulnerable, and the rest, nearly 12 percent, had a low level of vulnerability to the risk associated with Hurricane Florence. It is found that the predicted map indicates that generally the areas around water bodies, including coast, rivers, and floodwaters, and lowland areas appear to have higher vulnerability compared to the areas away from them (<xref ref-type="fig" rid="fig11">Figure 11</xref>).</p>
        <fig id="fig11">
          <label>Figure 11</label>
          <graphic xlink:href="https://html.scirp.org/file/2173728-rId46.jpeg?20260529044930" />
        </fig>
        <p><bold>Figure 11.</bold>Predicted categories on census blocks from the RF regression model.</p>
      </sec>
    </sec>
    <sec id="sec5">
      <title>5. Limitations and Future Work</title>
      <p>This study has several limitations. First, the training data were derived from crowdsourced and interpreted imagery, which introduces potential subjectivity in labeling. Second, the relatively small number of geotagged social media observations may limit the robustness of the “tweets” variable. Third, the use of a random validation split may overestimate predictive performance due to spatial autocorrelation.</p>
      <p>Additionally, the regression framework approximates an ordinal outcome, which may not fully capture discrete category boundaries.</p>
      <p>Future work will address these limitations by incorporating larger datasets, spatial cross-validation techniques, and alternative modeling approaches such as ordinal classification and deep learning frameworks.</p>
    </sec>
    <sec id="sec6">
      <title>6. Summary</title>
      <p>Extreme hurricane events and the associated floods are in an increasing trend in the Atlantic Coast areas, making coastal communities more vulnerable. The growing population in the United States’ coastal areas increases the risk of more loss of life and property damage. Hurricane Florence made landfall in New Hanover County, North Carolina, as a Category 1 storm, causing at least 24 billion worth of property damage and claiming dozens of human lives. The damage to property and loss of life were primarily caused by record-breaking heavy rainfall and flooding. The area of study for this work is a coastal county that comprises approximately 42% water area, which is one of the reasons why New Hanover County witnessed the most dangerous inundation flood, being isolated from the rest of the world for several days.</p>
      <p>Random Forest regression modeling for geospatial predictive analysis of vulnerability to hurricane hazards was performed. Geophysical attributes were preferred over socio-demographic variables to carry out hurricane vulnerability modeling from machine learning. Given the lack of research on the use of a combination of variables that potentially could explain the vulnerability, this study demonstrates the value of integrating demographic and social media-generated variables alongside geophysical data to improve hurricane vulnerability prediction. The objectives are to make categorical predictions and map vulnerable communities using the Random Forests (RF) machine learning algorithm.</p>
      <p>Disasters and thus vulnerability levels of communities demonstrate differences with the different demographic, socio-economic, and physical-environmental conditions of the place, i.e., exposure and coping ability. It is necessary to consider the coupled human-environment system when mapping vulnerability from natural hazards. </p>
      <p>Among statistical, physical, and data-driven models used to predict natural hazards, data-driven methods are proven to be the most useful. A combination of geo-physical, demographic, and social media-generated data was used as explanatory variables for predicting vulnerability at the level of census blocks. Land use/land cover, water bodies, elevation, NDVI, Stream Power Index (SPI), slope, major roads, and major rivers are geo-physical variables; poverty, disability, and age are demographic variables; and tweets during hurricane events were social media-generated variables used to feed into the random forest regression model. Training data were collected from three different sources: 1) crowdsourced location features with photos from Instagram, Twitter, Facebook, and online news media during Hurricane Florence, 2) county-designated safe emergency shelter locations, and 3) imagery captured during the hurricane event. A total of 273 point locations were used as labelled feature data for model training. Census blocks were used as prediction polygon features since they represented areas with geophysical and demographic similarities.</p>
      <p>The RF is an extensively used data modeling algorithm in natural hazard risk prediction, such as landslides and floods. However, the use of this modeling technique is very infrequent in hurricane vulnerability prediction. The RF is a supervised regression method of modeling by growing an ensemble of trees and letting them vote for the most occurring class as the predicted class. Trees grow based on bootstrap samples, and the “out-of-bag” error rate is calculated using samples out of the bootstrap samples for checking errors. Variable importance is an important output from the RF, and can be used to judge which variables are more useful than others to describe the impact of the disaster event.</p>
      <p>For prediction by the RF regression, two thousand decision trees, three as a number of randomly sampled variables for constructing each tree, and 30 percent data were excluded for model validation. The MSEs for the numbers of trees 1000 and 2000 are 1.728 and 1.729, respectively. The regression model shows that while doubling the number of trees, the error decreased, but not significantly. Therefore, 2000 trees can be considered as an optimal number. However, the predictive ability does not appear to have increased remarkably by increasing the number of trees. Having an R<sup>2</sup> value of 0.931, <italic>P</italic>-value 0.000, and standard error 0.014 shows that the variables used are statistically significant, having good relationships with the predicted outcome. Even so, the R<sup>2</sup> value (0.610) appears lower than expected, and the standard error (0.048) appears higher for the predictions for the test data (excluded from model training) compared to the predictions for the data used to train the model. The variables, including tweets, roads, elevation, and NDVI, appear to have high importance for vulnerability prediction from hurricanes using a random forest regression model.</p>
      <p>The RF model results show that geophysical and social media-generated variables have higher weight in terms of importance than demographic variables. The communities in the majority of census blocks have the highest level of vulnerability, whereas just around one-tenth of the communities are the least vulnerable in New Hanover County, North Carolina, from Hurricane Florence.</p>
      <p>Conducting prediction analysis for vulnerability from hurricane risks using the RF algorithms, or other data-driven methods, for predicting the location of vulnerable communities is highly encouraged for future work. The prediction of community vulnerability could likely be performed at the building level for future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <title>References</title>
      <ref id="B1">
        <label>1.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Ai, F. F., Bin, J., Zhang, Z. M., Huang, J. H. et al. (2014). Application of Random Forests to Select Premium Quality Vegetable Oils by Their Fatty Acid Composition. <italic>Food</italic><italic>Chemistry,</italic><italic>143,</italic> 472-478. https://doi.org/10.1016/j.foodchem.2013.08.013 <pub-id pub-id-type="doi">10.1016/j.foodchem.2013.08.013</pub-id><pub-id pub-id-type="pmid">24054269</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.foodchem.2013.08.013">https://doi.org/10.1016/j.foodchem.2013.08.013</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Ai, F.</string-name>
              <string-name>Bin, J.</string-name>
              <string-name>Zhang, Z.</string-name>
              <string-name>Huang, J.</string-name>
            </person-group>
            <year>2014</year>
            <pub-id pub-id-type="doi">10.1016/j.foodchem.2013.08.013</pub-id>
            <pub-id pub-id-type="pmid">24054269</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B2">
        <label>2.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Aubrecht, C., Özceylan, D., Steinnocher, K., &amp; Freire, S. (2013). Multi-Level Geospatial Modeling of Human Exposure Patterns and Vulnerability Indicators. <italic>Natural</italic><italic>Hazards,</italic><italic>68,</italic> 147-163. https://doi.org/10.1007/s11069-012-0389-9 <pub-id pub-id-type="doi">10.1007/s11069-012-0389-9</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s11069-012-0389-9">https://doi.org/10.1007/s11069-012-0389-9</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Aubrecht, C.</string-name>
              <string-name>Steinnocher, K.</string-name>
              <string-name>Freire, S.</string-name>
            </person-group>
            <year>2013</year>
            <pub-id pub-id-type="doi">10.1007/s11069-012-0389-9</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B3">
        <label>3.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Bathi, J. R., &amp; Das, H. S. (2016). Vulnerability of Coastal Communities from Storm Surge and Flood Disasters. <italic>In</italic><italic>ternational Jo</italic><italic>urnal</italic><italic>of</italic><italic>Environmental</italic><italic>Research</italic><italic>and</italic><italic>Public</italic><italic>Health,</italic><italic>13,</italic> Article 239. https://doi.org/10.3390/ijerph13020239 <pub-id pub-id-type="doi">10.3390/ijerph13020239</pub-id><pub-id pub-id-type="pmid">26907313</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/ijerph13020239">https://doi.org/10.3390/ijerph13020239</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Bathi, J.</string-name>
              <string-name>Das, H.</string-name>
            </person-group>
            <year>2016</year>
            <elocation-id>239</elocation-id>
            <pub-id pub-id-type="doi">10.3390/ijerph13020239</pub-id>
            <pub-id pub-id-type="pmid">26907313</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B4">
        <label>4.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Bouwer, L. M. (2018). Observed and Projected Impacts from Extreme Weather Events: Implications for Loss and Damage. In <italic>Loss</italic><italic>and</italic><italic>Damage</italic><italic>from</italic><italic>Climate</italic><italic>Change:</italic><italic>Concepts</italic>, <italic>Methods</italic><italic>and</italic><italic>Policy</italic><italic>Options</italic> (pp. 63-82). Springer.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Bouwer, L.</string-name>
              <string-name>Concepts, M</string-name>
            </person-group>
            <year>2018</year>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B5">
        <label>5.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Breiman, L. (2001). Random Forests. <italic>Machine</italic><italic>Learning,</italic><italic>45,</italic> 5-32. https://doi.org/10.1023/a:1010933404324 <pub-id pub-id-type="doi">10.1023/a:1010933404324</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1023/a:1010933404324">https://doi.org/10.1023/a:1010933404324</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Breiman, L.</string-name>
            </person-group>
            <year>2001</year>
            <fpage>101093</fpage>
            <pub-id pub-id-type="doi">10.1023/a:1010933404324</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B6">
        <label>6.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Chen, C., Liaw, A., &amp; Breiman, L. (2004). <italic>Using Random Forest to Learn Imbalanced Data</italic>. University of California.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Chen, C.</string-name>
              <string-name>Liaw, A.</string-name>
              <string-name>Breiman, L.</string-name>
            </person-group>
            <year>2004</year>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B7">
        <label>7.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Cohen, D. (2019). <italic>About 60.2</italic><italic>Million</italic><italic>Live in Areas Most Vulnerable to Hurricanes</italic>. U.S. Census Bureau.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Cohen, D.</string-name>
            </person-group>
            <year>2019</year>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B8">
        <label>8.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Cutter, S. L., Barnes, L., Berry, M., Burton, C., Evans, E., Tate, E. et al. (2008). A Place-Based Model for Understanding Community Resilience to Natural Disasters. <italic>Global</italic><italic>Environmental</italic><italic>Change,</italic><italic>18,</italic> 598-606. https://doi.org/10.1016/j.gloenvcha.2008.07.013 <pub-id pub-id-type="doi">10.1016/j.gloenvcha.2008.07.013</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.gloenvcha.2008.07.013">https://doi.org/10.1016/j.gloenvcha.2008.07.013</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Cutter, S.</string-name>
              <string-name>Barnes, L.</string-name>
              <string-name>Berry, M.</string-name>
              <string-name>Burton, C.</string-name>
              <string-name>Evans, E.</string-name>
              <string-name>Tate, E.</string-name>
            </person-group>
            <year>2008</year>
            <pub-id pub-id-type="doi">10.1016/j.gloenvcha.2008.07.013</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B9">
        <label>9.</label>
        <citation-alternatives>
          <mixed-citation publication-type="report">Feaster, T. D., Weaver, J. C., Gotvald, A. J., &amp; Kolb, K. R. (2018). <italic>Preliminary Peak Stage</italic><italic>and Streamflow Data at Selected</italic><italic>Streamgaging</italic><italic>Stations in North Carolina and South Carolina</italic><italic>for Flooding Following Hurricane Florence, September 2018. US Geological Survey Open-File Report</italic>.</mixed-citation>
          <element-citation publication-type="report">
            <person-group person-group-type="author">
              <string-name>Feaster, T.</string-name>
              <string-name>Weaver, J.</string-name>
              <string-name>Gotvald, A.</string-name>
              <string-name>Kolb, K.</string-name>
              <string-name>Florence, S</string-name>
            </person-group>
            <year>2018</year>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B10">
        <label>10.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Goodchild, M. F., &amp; Glennon, J. A. (2010). Crowdsourcing Geographic Information for Disaster Response: A Research Frontier. <italic>International Journal of Digital Earth, 3,</italic> 231-241. https://doi.org/10.1080/17538941003759255 <pub-id pub-id-type="doi">10.1080/17538941003759255</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/17538941003759255">https://doi.org/10.1080/17538941003759255</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Goodchild, M.</string-name>
              <string-name>Glennon, J.</string-name>
            </person-group>
            <year>2010</year>
            <pub-id pub-id-type="doi">10.1080/17538941003759255</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B11">
        <label>11.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Hastie, T., Tibshirani, R., Friedman, J. H., &amp; Friedman, J. H. (2009). <italic>The</italic><italic>Elements</italic><italic>of</italic><italic>Statistical</italic><italic>Learning</italic>: <italic>Data</italic><italic>Mining</italic>, <italic>Inference</italic>, <italic>and</italic><italic>Prediction</italic>. Springer.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Hastie, T.</string-name>
              <string-name>Tibshirani, R.</string-name>
              <string-name>Friedman, J.</string-name>
              <string-name>Friedman, J.</string-name>
              <string-name>Mining, I</string-name>
            </person-group>
            <year>2009</year>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B12">
        <label>12.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Helderop, E., &amp; Grubesic, T. H. (2019). Hurricane Storm Surge in Volusia County, Florida: Evidence of a Tipping Point for Infrastructure Damage. <italic>Disasters, 43,</italic> 157-180. https://doi.org/10.1111/disa.12296 <pub-id pub-id-type="doi">10.1111/disa.12296</pub-id><pub-id pub-id-type="pmid">29968929</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/disa.12296">https://doi.org/10.1111/disa.12296</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Helderop, E.</string-name>
              <string-name>Grubesic, T.</string-name>
              <string-name>County, F</string-name>
            </person-group>
            <year>2019</year>
            <pub-id pub-id-type="doi">10.1111/disa.12296</pub-id>
            <pub-id pub-id-type="pmid">29968929</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B13">
        <label>13.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Hoque, M. A. A., Phinn, S., Roelfsema, C., &amp; Childs, I. (2017a). Tropical Cyclone Disaster Management Using Remote Sensing and Spatial Analysis: A Review. <italic>International Journal of Disaster Risk Reduction, 22,</italic> 345-354. https://doi.org/10.1016/j.ijdrr.2017.02.008 <pub-id pub-id-type="doi">10.1016/j.ijdrr.2017.02.008</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ijdrr.2017.02.008">https://doi.org/10.1016/j.ijdrr.2017.02.008</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Hoque, M.</string-name>
              <string-name>Phinn, S.</string-name>
              <string-name>Roelfsema, C.</string-name>
              <string-name>Childs, I.</string-name>
            </person-group>
            <year>2017</year>
            <pub-id pub-id-type="doi">10.1016/j.ijdrr.2017.02.008</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B14">
        <label>14.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Hoque, M. A. A., Phinn, S., &amp; Roelfsema, C. (2017b). A Systematic Review of Tropical Cyclone Disaster Management Research Using Remote Sensing and Spatial Analysis. <italic>Ocean &amp; Coastal Management, 146,</italic> 109-120. https://doi.org/10.1016/j.ocecoaman.2017.07.001 <pub-id pub-id-type="doi">10.1016/j.ocecoaman.2017.07.001</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ocecoaman.2017.07.001">https://doi.org/10.1016/j.ocecoaman.2017.07.001</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Hoque, M.</string-name>
              <string-name>Phinn, S.</string-name>
              <string-name>Roelfsema, C.</string-name>
            </person-group>
            <year>2017</year>
            <pub-id pub-id-type="doi">10.1016/j.ocecoaman.2017.07.001</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B15">
        <label>15.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Janitza, S., &amp; Hornung, R. (2018). On the Overestimation of Random Forest’s Out-of-Bag Error. <italic>PLOS ONE, 13,</italic> e0201904. https://doi.org/10.1371/journal.pone.0201904 <pub-id pub-id-type="doi">10.1371/journal.pone.0201904</pub-id><pub-id pub-id-type="pmid">30080866</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0201904">https://doi.org/10.1371/journal.pone.0201904</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Janitza, S.</string-name>
              <string-name>Hornung, R.</string-name>
            </person-group>
            <year>2018</year>
            <pub-id pub-id-type="doi">10.1371/journal.pone.0201904</pub-id>
            <pub-id pub-id-type="pmid">30080866</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B16">
        <label>16.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Klonner, C., Marx, S., Usón, T., Porto de Albuquerque, J., &amp; Höfle, B. (2016). Volunteered Geographic Information in Natural Hazard Analysis: A Systematic Literature Review of Current Approaches with a Focus on Preparedness and Mitigation. <italic>ISPRS International</italic><italic>Journal</italic><italic>of</italic><italic>Geo-Information,</italic><italic>5,</italic> Article 103. https://doi.org/10.3390/ijgi5070103 <pub-id pub-id-type="doi">10.3390/ijgi5070103</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/ijgi5070103">https://doi.org/10.3390/ijgi5070103</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Klonner, C.</string-name>
              <string-name>Marx, S.</string-name>
              <string-name>Albuquerque, J.</string-name>
            </person-group>
            <year>2016</year>
            <elocation-id>103</elocation-id>
            <pub-id pub-id-type="doi">10.3390/ijgi5070103</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B17">
        <label>17.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Lee, K., Lee, H., Lee, K., &amp; Shin, J. (2017). Training Confidence-Calibrated Classifiers for Detecting Out-of-Distribution Samples. arXiv preprint arXiv:1711.09325</mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Lee, K.</string-name>
              <string-name>Lee, H.</string-name>
              <string-name>Lee, K.</string-name>
              <string-name>Shin, J.</string-name>
            </person-group>
            <year>2017</year>
            <fpage>1711</fpage>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B18">
        <label>18.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Mosavi, A., Ozturk, P., &amp; Chau, K. (2018). Flood Prediction Using Machine Learning Models: Literature Review. <italic>Water,</italic><italic>10,</italic> Article 1536. https://doi.org/10.3390/w10111536 <pub-id pub-id-type="doi">10.3390/w10111536</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/w10111536">https://doi.org/10.3390/w10111536</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Mosavi, A.</string-name>
              <string-name>Ozturk, P.</string-name>
              <string-name>Chau, K.</string-name>
            </person-group>
            <year>2018</year>
            <elocation-id>1536</elocation-id>
            <pub-id pub-id-type="doi">10.3390/w10111536</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B19">
        <label>19.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Park, S., &amp; Kim, J. (2019). Landslide Susceptibility Mapping Based on Random Forest and Boosted Regression Tree Models, and a Comparison of Their Performance. <italic>Applied</italic><italic>Sciences,</italic><italic>9,</italic> Article 942. https://doi.org/10.3390/app9050942 <pub-id pub-id-type="doi">10.3390/app9050942</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/app9050942">https://doi.org/10.3390/app9050942</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Park, S.</string-name>
              <string-name>Kim, J.</string-name>
            </person-group>
            <year>2019</year>
            <elocation-id>942</elocation-id>
            <pub-id pub-id-type="doi">10.3390/app9050942</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B20">
        <label>20.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Pourghasemi, H. R., &amp; Kerle, N. (2016). Random Forests and Evidential Belief Function-based Landslide Susceptibility Assessment in Western Mazandaran Province, Iran. <italic>Environmental Earth Sciences</italic>, 75, Article Number 185. https://doi.org/10.1007/s12665-015-4950-1 <pub-id pub-id-type="doi">10.1007/s12665-015-4950-1</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s12665-015-4950-1">https://doi.org/10.1007/s12665-015-4950-1</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Pourghasemi, H.</string-name>
              <string-name>Kerle, N.</string-name>
              <string-name>Province, I</string-name>
            </person-group>
            <year>2016</year>
            <elocation-id>Number</elocation-id>
            <pub-id pub-id-type="doi">10.1007/s12665-015-4950-1</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B21">
        <label>21.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Powers, D. M. (2020). <italic>Evaluation: From Precision, Recall and F-Measure to ROC,</italic><italic>Informedness</italic><italic>, Ma</italic><italic>rkedness and Correlation</italic>.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Powers, D.</string-name>
              <string-name>Precision, R</string-name>
              <string-name>ROC, I</string-name>
            </person-group>
            <year>2020</year>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B22">
        <label>22.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Richards, J. A., &amp; Jia, X. (2006). <italic>Remote</italic><italic>Sensing</italic><italic>Digital</italic><italic>Image</italic><italic>Analysis</italic>: <italic>An</italic><italic>Introduction</italic>. Springer.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Richards, J.</string-name>
              <string-name>Jia, X.</string-name>
            </person-group>
            <year>2006</year>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B23">
        <label>23.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera‐Arroita, G. et al. (2017). Cross‐validation Strategies for Data with Temporal, Spatial, Hierarchical, or Phylogenetic Structure. <italic>Ecography</italic><italic>,</italic><italic>40,</italic> 913-929. https://doi.org/10.1111/ecog.02881 <pub-id pub-id-type="doi">10.1111/ecog.02881</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/ecog.02881">https://doi.org/10.1111/ecog.02881</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Roberts, D.</string-name>
              <string-name>Bahn, V.</string-name>
              <string-name>Ciuti, S.</string-name>
              <string-name>Boyce, M.</string-name>
              <string-name>Elith, J.</string-name>
              <string-name>Arroita, G.</string-name>
              <string-name>Temporal, S</string-name>
            </person-group>
            <year>2017</year>
            <pub-id pub-id-type="doi">10.1111/ecog.02881</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B24">
        <label>24.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Rygel, L., O’sullivan, D., &amp; Yarnal, B. (2006). A Method for Constructing a Social Vulnerability Index: An Application to Hurricane Storm Surges in a Developed Country. <italic>Mitigation</italic><italic>and</italic><italic>Adaptation</italic><italic>Strategies</italic><italic>for</italic><italic>Global</italic><italic>Change,</italic><italic>11,</italic> 741-764. https://doi.org/10.1007/s11027-006-0265-6 <pub-id pub-id-type="doi">10.1007/s11027-006-0265-6</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s11027-006-0265-6">https://doi.org/10.1007/s11027-006-0265-6</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Rygel, L.</string-name>
              <string-name>Yarnal, B.</string-name>
            </person-group>
            <year>2006</year>
            <pub-id pub-id-type="doi">10.1007/s11027-006-0265-6</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B25">
        <label>25.</label>
        <citation-alternatives>
          <mixed-citation publication-type="report">Stewart, S. R., &amp; Berg, R. (2019). <italic>Hurricane</italic><italic>Florence</italic> ( <italic>AL062018</italic>): <italic>31</italic><italic>August</italic><italic>-</italic><italic>17</italic><italic>September</italic><italic>2018</italic>. National Hurricane Center Tropical Cyclone Report, National Oceanic and Atmospheric Administration (NOAA).</mixed-citation>
          <element-citation publication-type="report">
            <person-group person-group-type="author">
              <string-name>Stewart, S.</string-name>
              <string-name>Berg, R.</string-name>
              <string-name>Report, N</string-name>
            </person-group>
            <year>2019</year>
            <fpage>31</fpage>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B26">
        <label>26.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Valavi, R., Elith, J., Lahoz-Monfort, J. J., &amp; Guillera-Arroita, G. (2018). <italic>BlockCV</italic><italic>: An R Package for Generating Spatially or Environmentally Separated Folds for K-Fold Cross-Validation of Species Distribution Models</italic>.</mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Valavi, R.</string-name>
              <string-name>Elith, J.</string-name>
              <string-name>Lahoz-Monfort, J.</string-name>
              <string-name>Guillera-Arroita, G.</string-name>
            </person-group>
            <year>2018</year>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B27">
        <label>27.</label>
        <citation-alternatives>
          <mixed-citation publication-type="journal">Wang, Z., Lai, C., Chen, X., Yang, B., Zhao, S., &amp; Bai, X. (2015). Flood Hazard Risk Assessment Model Based on Random Forest. <italic>Journal of Hydrology, 527,</italic> 1130-1141. https://doi.org/10.1016/j.jhydrol.2015.06.008 <pub-id pub-id-type="doi">10.1016/j.jhydrol.2015.06.008</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.jhydrol.2015.06.008">https://doi.org/10.1016/j.jhydrol.2015.06.008</ext-link></mixed-citation>
          <element-citation publication-type="journal">
            <person-group person-group-type="author">
              <string-name>Wang, Z.</string-name>
              <string-name>Lai, C.</string-name>
              <string-name>Chen, X.</string-name>
              <string-name>Yang, B.</string-name>
              <string-name>Zhao, S.</string-name>
              <string-name>Bai, X.</string-name>
            </person-group>
            <year>2015</year>
            <pub-id pub-id-type="doi">10.1016/j.jhydrol.2015.06.008</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B28">
        <label>28.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Wu, S., Yarnal, B., &amp; Fisher, A. (2002). Vulnerability of Coastal Communities to Sea-Level Rise: A Case Study of Cape May County, New Jersey, USA. <italic>Climate Research, 22,</italic> 255-270. https://doi.org/10.3354/cr022255 <pub-id pub-id-type="doi">10.3354/cr022255</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3354/cr022255">https://doi.org/10.3354/cr022255</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Wu, S.</string-name>
              <string-name>Yarnal, B.</string-name>
              <string-name>Fisher, A.</string-name>
              <string-name>County, N</string-name>
              <string-name>Jersey, U</string-name>
            </person-group>
            <year>2002</year>
            <pub-id pub-id-type="doi">10.3354/cr022255</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
      <ref id="B29">
        <label>29.</label>
        <citation-alternatives>
          <mixed-citation publication-type="other">Zhou, Z., Gong, J., &amp; Hu, X. (2019). Community-Scale Multi-Level Post-Hurricane Damage Assessment of Residential Buildings Using Multi-Temporal Airborne Lidar Data. <italic>Automation</italic><italic>in</italic><italic>Construction,</italic><italic>98,</italic> 30-45. https://doi.org/10.1016/j.autcon.2018.10.018 <pub-id pub-id-type="doi">10.1016/j.autcon.2018.10.018</pub-id><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.autcon.2018.10.018">https://doi.org/10.1016/j.autcon.2018.10.018</ext-link></mixed-citation>
          <element-citation publication-type="other">
            <person-group person-group-type="author">
              <string-name>Zhou, Z.</string-name>
              <string-name>Gong, J.</string-name>
              <string-name>Hu, X.</string-name>
            </person-group>
            <year>2019</year>
            <pub-id pub-id-type="doi">10.1016/j.autcon.2018.10.018</pub-id>
          </element-citation>
        </citation-alternatives>
      </ref>
    </ref-list>
  </back>
</article>