<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">OJBM</journal-id><journal-title-group><journal-title>Open Journal of Business and Management</journal-title></journal-title-group><issn pub-type="epub">2329-3284</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ojbm.2019.74107</article-id><article-id pub-id-type="publisher-id">OJBM-94171</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Business&amp;Economics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Research on P2P Credit Risk Assessment Model Based on RBM Feature Extraction—Take SME Customers as an Example
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jianhui</surname><given-names>Yang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Qiman</surname><given-names>Li</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Dongsheng</surname><given-names>Luo</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>School of Business Administration, South China University of Technology, Guangzhou, China</addr-line></aff><pub-date pub-type="epub"><day>06</day><month>08</month><year>2019</year></pub-date><volume>07</volume><issue>04</issue><fpage>1553</fpage><lpage>1563</lpage><history><date date-type="received"><day>10,</day>	<month>July</month>	<year>2019</year></date><date date-type="rev-recd"><day>4,</day>	<month>August</month>	<year>2019</year>	</date><date date-type="accepted"><day>7,</day>	<month>August</month>	<year>2019</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  This paper combines the nonlinear dimensionality reduction method, 
  and 
  the Restricted Boltzmann machine (RBM algorithm), to assess the credit risk of P2P borrowers. After screening and processing many big data indicators, the most representative indicators are selected to build the P2P customer credit risk assessment model. In addition, after comparing the advantages and disadvantages of linear dimensionality reduction algorithm and nonlinear dimensionality reduction algorithm, this paper establishes a P2P enterprise customer credit risk assessment model based on RBM feature extraction combined with contrast divergence theory. It is concluded that the effect of RBM is better than that of PCA when the same model is selected. The Logist
  i
  c
   model performs best in the three models when the same data feature extraction method is selected.
 
</p></abstract><kwd-group><kwd>P2P</kwd><kwd> Credit Evaluation Model</kwd><kwd> RBM Algorithm</kwd><kwd> Credit Risk</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>In the current boom of e-commerce, social networking, Internet finance, P2P consumer credit, consumer finance and other Internet platforms, the central bank’s credit reporting has become increasingly prominent in the timeliness, comprehensiveness and hierarchy of data. How to dig deep into the massive information flow of the Internet, develop a big data risk control model based on massive indicators, comprehensively assess the credit risk status of enterprise customers, and provide a judgment basis for financial credit approval of P2P lending platform, have become the core of credit risk model system construction.</p><p>Credit risk is a difficult problem in the current P2P industry: from a macro perspective, due to the low barriers to entry of P2P, the uncontrollable macro-risk situation is getting worse. From a micro perspective, most of the P2P platform business is still in its infancy. The operating experience and risk management capabilities of platform operators are generally insufficient, and the development situation is extremely unstable. From this perspective, credit issues remain the cause of large-scale risks in the P2P industry in the future. As China’s P2P industry has developed rapidly in recent years, the theoretical research on P2P network lending by domestic and foreign scholars is still closely surrounding the development of Internet platform operations. There is little discussion in the academic community on risk management, security prevention, and industry regulation. Especially in the quantitative assessment of credit risk of P2P enterprise customers, it is still almost empty. In view of this, this paper attempts to draw on the existing research results of credit risk assessment (such as the credit model of traditional commercial banks). After analyzing the credit risk characteristics of P2P industry, the credit risk of P2P borrowers is evaluated by using artificial intelligence method. The credit risk is actually through the machine learning method, by learning the borrower’s historical data, to assess its future repayment ability and default risk, and obtain a P2P enterprise credit risk assessment model suitable for China’s current national conditions.</p></sec><sec id="s2"><title>2. Literature Review</title><p>Many scholars have studied credit risk measurement and evaluation models and adopted a variety of methods. Based on the traditional credit risk measurement model and Ronalce model, Rosenberg &amp; Gleity [<xref ref-type="bibr" rid="scirp.94171-ref1">1</xref>] constructed a new P2P credit risk measurement model, and through simulation, the neural network model can be used to obtain better results. On the basis of the traditional credit risk measurement model, Huang [<xref ref-type="bibr" rid="scirp.94171-ref2">2</xref>] combined with the support vector machine and empirical research on loan default, which shows that the new metric model combined with support vector machine can get better result than the metric model combined with neural network. Puroetal [<xref ref-type="bibr" rid="scirp.94171-ref3">3</xref>] takes multiple factors as independent variables, including the borrower’s loan amount, credit rating, current overdue loan amount, debt yield, loan interest rate, etc., constructing a logistic regression model for testing, and obtaining good results. Jiang Wei [<xref ref-type="bibr" rid="scirp.94171-ref4">4</xref>] replaced the training algorithm in BP neural network with improved particle swarm optimization algorithm, and constructed BP neural network algorithm model with improved particle swarm optimization, combined with credit evaluation index system, and finally realized based on improved PSO-BP neural network. The personal credit evaluation model establishes a BP neural network credit evaluation model to quantitatively evaluate the credit of the lender and improve the automation of personal credit evaluation. Liu Chang and Xu Zhuoting [<xref ref-type="bibr" rid="scirp.94171-ref5">5</xref>] analyzed the causes of P2P online loan risk, and established the risk prediction model with the loan data of Lending Club, the world’s largest P2P company, and gave the prediction accuracy, in order to provide credit risk management method for domestic P2P companies.</p><p>The nonlinear dimensionality reduction method used in this paper, the Restricted Boltzmann machine (RBM algorithm), comes from the field of unsupervised learning, a multi-layer limited Boltzmann proposed by Professor Hinton [<xref ref-type="bibr" rid="scirp.94171-ref6">6</xref>] A deep belief network model composed of machines (RBM)-DBN model. It first learns through a multi-step unsupervised neural network, then adjusts the parameters of the supervised learning, and finally trains the discriminant classifier model. It has the advantage that the traditional neural network can’t compete with it—for the initialization of the parameter, it can greatly improve the fitting speed of the multi-stage neural network, thus strengthening the neural network’s construction ability. An important component of DBN is the Restricted Boltzmann Machine (RBM) for each layer. At present, the application scenarios of DBN are mainly in the recognition of handwritten fonts, information retrieval, text mining, target object recognition, machine learning and machine translation.</p><p>The research on applying DBN and RBM to credit risk assessment has Chen Yanwu [<xref ref-type="bibr" rid="scirp.94171-ref7">7</xref>] proposed an engineering model algorithm for feature extraction using constrained Bozman machines. It can rely on expert experience to Information dimensionality reduction and feature extraction in the database to improve the accuracy of the application credit scoring model. The empirical results show that RBM is an efficient feature extraction and data dimension reduction method. Applying it to the personal credit application scoring model can significantly improve the accuracy of the model algorithm. Zhang Yanxia [<xref ref-type="bibr" rid="scirp.94171-ref8">8</xref>] further optimized the multi-step iterative operation for two different sparse self-encoding RBM algorithms: Sp-RBM and Log-Sum-RBM, combined with the improved idea of Polyak Averaging. From the perspective of empirical results, the accuracy and accuracy of the sparse RBM model are higher than the original RBM model, and Log-Sum-RBM has better characterization ability than Sp-RBM. [<xref ref-type="bibr" rid="scirp.94171-ref9">9</xref>] The author also analyzes the application of different RBM models in the field of credit risk assessment.</p></sec><sec id="s3"><title>3. Research Method</title><p>In the classical neural network algorithm theory, Professor Hinton sees the restricted Boltzmann machine (RBM algorithm) as a typical undirected graph, as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>. ν defined as the visible layer, it represents the input data set in the P2P customer credit risk assessment study. Next, we define h as a hidden layer and apply it to our credit evaluation research, which is a feature extractor. In other words, it is the dimension reduction process. In the middle of the visible and hidden layers, we use W as the neighboring weight between the layers. For the most classic RBM models, all visible neurons and hidden neurons are generally binary variables, that is ∀ i , j , v j ∈ { 0 , 1 } , h j ∈ { 0 , 1 } [<xref ref-type="bibr" rid="scirp.94171-ref10">10</xref>] .</p><p>In different practical applications, the problem we are more concerned with is the distribution of visible neurons ν defined by the RBM parameters P ( v | θ ) .</p><p>P θ ( v ) = ∑ k P θ ( v , h ) = 1 Z θ ∑ h e − E θ ( v , h ) (1)</p><p>Similarly, applying the pattern of visible neurons to hidden neurons, we have:</p><p>P θ ( v ) = ∑ k P θ ( v , h ) = 1 Z θ ∑ v e − E θ ( v , h ) . (2)</p><p>In order to find the specific situation of the P ( v | θ ) distribution, here we need to solve the normalization factor Z θ , and estimate it, roughly 2 n + m times calculations. In view of this, even if we can obtain the parameters ω i , j , a j and b j through the training of the model, we still cannot accurately calculate the unique distribution determined by these parameters.</p><p>Of course, it is worth mentioning here that, due to the special structure of RBM neurons, we know that when determining the state of local visible neuron states, in this case the activation states of each hidden neuron are conditionally independent [<xref ref-type="bibr" rid="scirp.94171-ref11">11</xref>] .</p><p>We record the vector obtained by digging the binning variable h k at h as h − k = ( h 1 , h 2 , ⋯ , h k − 1 , h k + 1 , ⋯ , h n h ) T , use the following formulas (3) and (4)</p><p>a k ( v ) = b k + ∑ i = 1 n v w k , i v i . (3)</p><p>β ( v , h − k ) = ∑ i = 1 n γ a i v i + ∑ j = 1 j ≠ k n h b j h j + ∑ i = 1 n v ∑ j = 1 j ≠ k n h h j w j , i v i (4)</p><p>we got</p><p>E ( v , h ) = − β ( v , h − k ) − h k a k ( v ) (5)</p><p>Here, h k a k ( v ) and − β ( v , h − k ) represent the E ( v , h ) formula, respectively, and the subscript is equivalent to one side of the k and non-k variables.</p><p>For the hidden neurons at the Jth, the activation probability formula is as shown in (6) below.</p>P ( h k = 1 | v ) = ( h k = 1 | h − k , v ) = ( h k = 1 , h − k , v ) P ( h − k , v ) = ( h k = 1 , h − k , v ) ( h k = 1 , h − k , v ) + ( h k = 0 , h − k , v ) = 1 Z e − E ( h k = 1 , h − k , v ) 1 Z e − E ( h k = 1 , h − k , v ) + 1 Z e − E ( h k = 0 , h − k , v ) = e − E ( h k = 1 , h − k , v ) e − E ( h k = 1 , h − k , v ) + e − E ( h k = 0 , h − k , v ) = 1 1 + e − E ( h k = 0 , h − k , v ) + E ( h k = 1 , h − k , v ) = 1 1 + e [ β ( v , h − k ) + 0 * α k ( v ) ] + [ − β ( v , h − k ) − 1 * α k ( v ) ] = 1 1 + e − α k ( v ) = s i g m o i d ( α k ( v ) ) = s i g m o i d ( b k + ∑ i = 1 n v w k , i v i ) (6)<p>Through the derivation of the above formula, we find the formula (7)</p><p>P ( h k = 1 | v ) = s i g m o i d ( b k + ∑ i = 1 n v w k , i v i ) . (7)</p><p>For the symmetric RBM neuron structure map, when we fix the state condition of the hidden neurons, it can be clarified that the activation states of the respective visible neurons are also conditionally independent [<xref ref-type="bibr" rid="scirp.94171-ref12">12</xref>] . Similarly, we derive the independent activation probability of the visible neurons at the ith by the derivation of the formula as shown in (8) below.</p><p>P ( v k = 1 | h ) = s i g m o i d ( a k + ∑ j = 1 n h w j , k h j ) . (8)</p><p>Finally, the activation probabilities for different neurons are:</p><p>P ( h | v ) = ∏ j = 1 n h p ( h j | v ) . (9)</p><p>P ( v | h ) = ∏ j = 1 n v p ( v j | h ) . (10)</p></sec><sec id="s4"><title>4. Empirical Study</title><sec id="s4_1"><title>4.1. Data Description</title><p>Whether the credit risk assessment model is effective or not, one of the important rating ideas is whether it can accurately identify the potential financial problems of SMEs borrowing from P2P. Therefore, the ideal sample in this section is the SMEs that have borrowed through the P2P platform. However, because the P2P platform does not disclose the borrower’s specific information, and most companies that use the P2P platform to raise funds are not listed companies. For non-listed companies, they have no obligation to publish financial statements. Therefore, it is difficult to collect enough sample data to support this empirical study. In order to make this modeling idea go smoothly, the main method of this research is to find potential lending companies and representative P2P lending companies (similar to those of companies that raise funds through P2P platform) for empirical research. In the end, we chose SMEs listed on the GEM.</p><p>At the end of 2017, there were 722 SMEs listed on the GEM, excluding some samples with missing values, and there were 599 companies with complete financial data. The paper selects the 2016 annual report data of the enterprise, and the net profit of the following year is used as the label of the economic strength of the borrowing enterprise. If there is a loss in the next year’s annual report, the risk of default of the enterprise is considered to be high, and it is regarded as a sample of default, marked as 0. Otherwise marked as 1. The enterprise data comes from wind data, and the executive data comes from web crawlers.</p><p>Data is collected form WIND database and WDZJ-OFFICIAL website.</p></sec><sec id="s4_2"><title>4.2. Indicator Selection</title><p>This article establishes individual user portraits through six dimensions: identity information verification label, stability information label, financial application information label, important asset information label, commodity consumption information label, and media viewing information label. Then we will consider enterprise executive information below. In the case, combined with the empirical data, the indicators are embodied, and the P2P enterprise customer credit index system is established.</p><p>After fully considering the difficulty of obtaining the indicators, here is a summary of the corporate customer credit pre-selection indicators established by the three dimensions of the company’s own label, the company’s main executive label, and the external evaluation label.</p><p>Among the nearly 200 pre-selected indicator variables, it is necessary to screen out variables with significant effects. In this paper, the credit risk assessment of P2P enterprise customers, the primary problem is to discretize the continuous variables to facilitate the next data grouping and WOE coding, and to solve the IV value.</p><p>In the traditional machine learning model, if the data set is improperly discretized, the accuracy of the trained model classification will be greatly reduced. In order to discretize the continuous variables, after considering each model, we finally choose the entropy-based discretization method.</p><p>To solve the IV indicator, we need to calculate the WOE (Weight Of Evidence) value in the first step [<xref ref-type="bibr" rid="scirp.94171-ref13">13</xref>] . Combined with the P2P enterprise customer credit risk assessment model to be established in this paper, the dependent variable here is the case that the enterprise loan is overdue and the normal loan is repaid. In fact, WOE is a measure of the proportion of defaults when estimating the value of an independent variable in a particular dimension. If the value of WOE is larger, it means that the dimension is more important.</p><p>w o e i = ln ( b i / b T g i / g T ) &#215; 100 (11)</p><p>IV = ∑ i ( ( b i b T − g i g T ) ∗ ln ( b i / b T g i / g T ) ) (12)</p><p>It can be seen from the above formula (12) that the IV value is calculated by the weighted average of the WOE values, which represents the feature size of the dimension information. In other words, the IV index is the independent variable and the dependent variable result. An associated metric. From the structure of Equation (12), we know that the IV value is always greater than 0, so we can sum the IV values corresponding to the entire group to calculate the overall IV value (<xref ref-type="table" rid="table1">Table 1</xref>).</p><p>Take the operating income indicator as an example to illustrate and explain how to calculate the cabinet (the attributes of the variables) WOE and IV, see <xref ref-type="table" rid="table2">Table 2</xref>.</p><p>The IV value of the operating income index = 0.36 &gt; 0.3, which is an indicator with strong forecasting ability. In the actual data analysis, sometimes the variable with the IV value between 0.01 and 0.02 is still significant in the use of Logistic regression. Therefore, this paper adopts a conservative approach and only excludes variables with IV less than 0.01 (<xref ref-type="table" rid="table3">Table 3</xref>).</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Predictive ability of value indicators</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Range of IV</th><th align="center" valign="middle" >Predictive power</th></tr></thead><tr><td align="center" valign="middle" >&lt;0.02</td><td align="center" valign="middle" >NO</td></tr><tr><td align="center" valign="middle" >0.02 - 0.10</td><td align="center" valign="middle" >WEAK</td></tr><tr><td align="center" valign="middle" >0.1 - 0.3</td><td align="center" valign="middle" >MEDIUM</td></tr><tr><td align="center" valign="middle" >&gt;0.3</td><td align="center" valign="middle" >Strong</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> WOE and IV calculations after the optimal segmentation of operating income indicators</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Binning range</th><th align="center" valign="middle" >Number of default samples</th><th align="center" valign="middle" >Normal sample number</th><th align="center" valign="middle" >WOE</th><th align="center" valign="middle" >IV</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >[14747804.5, 12933539309]</td><td align="center" valign="middle" >12</td><td align="center" valign="middle" >156</td><td align="center" valign="middle" >0.2771371</td><td align="center" valign="middle" >0.0243930</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >[12933539309, 25852330812]</td><td align="center" valign="middle" >13</td><td align="center" valign="middle" >154</td><td align="center" valign="middle" >0.3700832</td><td align="center" valign="middle" >0.0450963</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" >[25852330812, 38771122316]</td><td align="center" valign="middle" >6</td><td align="center" valign="middle" >108</td><td align="center" valign="middle" >−0.0482852</td><td align="center" valign="middle" >0.0004343</td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" >[38771122316, 51689913820]</td><td align="center" valign="middle" >2</td><td align="center" valign="middle" >148</td><td align="center" valign="middle" >−1.4619785</td><td align="center" valign="middle" >0.2936793</td></tr><tr><td align="center" valign="middle" >total</td><td align="center" valign="middle" >-</td><td align="center" valign="middle" >33</td><td align="center" valign="middle" >566</td><td align="center" valign="middle" >−0.8630433</td><td align="center" valign="middle" >0.3636032</td></tr></tbody></table></table-wrap><table-wrap-group id="3"><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> P2P corporate customer credit indicators screened according to IV values</title></caption><table-wrap id="3_1"><table><tbody><thead><tr><th align="center" valign="middle" >The registered capital</th><th align="center" valign="middle" >Assets balance</th><th align="center" valign="middle" >Balance of current liabilities</th><th align="center" valign="middle" >Operating income</th></tr></thead><tr><td align="center" valign="middle" >The total number of employees</td><td align="center" valign="middle" >Notes payable</td><td align="center" valign="middle" >Total current liabilities</td><td align="center" valign="middle" >Operating cost</td></tr><tr><td align="center" valign="middle" >Major shareholders holdings</td><td align="center" valign="middle" >Accounts payable</td><td align="center" valign="middle" >Total illiquid liabilities</td><td align="center" valign="middle" >Operating profit</td></tr><tr><td align="center" valign="middle" >The top 10 shareholders holding together</td><td align="center" valign="middle" >Deferred revenue</td><td align="center" valign="middle" >Long-term accounts receivable</td><td align="center" valign="middle" >Non-operating income</td></tr><tr><td align="center" valign="middle" >Other liquid assets</td><td align="center" valign="middle" >Remuneration payable</td><td align="center" valign="middle" >The total amount of comprehensive income</td><td align="center" valign="middle" >Non-business expenses</td></tr><tr><td align="center" valign="middle" >Net profit/business revenue</td><td align="center" valign="middle" >Current assets/total assets</td><td align="center" valign="middle" >The total profit year-on-year growth rate</td><td align="center" valign="middle" >Total liabilities year-on-year growth rate</td></tr><tr><td align="center" valign="middle" >Operating profit/business revenue</td><td align="center" valign="middle" >Non-current assets/total assets</td><td align="center" valign="middle" >Net profit growth rate</td><td align="center" valign="middle" >Total assets year-on-year growth rate</td></tr></tbody></table></table-wrap><table-wrap id="3_2"><table><tbody><thead><tr><th align="center" valign="middle" >Total operating costs/business revenue</th><th align="center" valign="middle" >Operating cycle</th><th align="center" valign="middle" >Year-on-year growth rate of net assets</th><th align="center" valign="middle" >Cash flows to year-on-year growth rate</th></tr></thead><tr><td align="center" valign="middle" >Return on equity</td><td align="center" valign="middle" >Long-term capital debt ratio</td><td align="center" valign="middle" >Inventory turnover days</td><td align="center" valign="middle" >Inventory turnover</td></tr><tr><td align="center" valign="middle" >Return on total assets</td><td align="center" valign="middle" >Long-term capital fit ratio</td><td align="center" valign="middle" >Annualized return on equity</td><td align="center" valign="middle" >Earnings per share</td></tr><tr><td align="center" valign="middle" >Return on invested capital</td><td align="center" valign="middle" >Sales net interest rates</td><td align="center" valign="middle" >Annualized rate of return on total assets</td><td align="center" valign="middle" >Net assets per share</td></tr><tr><td align="center" valign="middle" >Rate of return on labor input</td><td align="center" valign="middle" >Sales gross profit margin</td><td align="center" valign="middle" >Annual net interest rate of the total assets</td><td align="center" valign="middle" >EBITDA per share</td></tr><tr><td align="center" valign="middle" >Profit total</td><td align="center" valign="middle" >The cost of sales ratio</td><td align="center" valign="middle" >Taobao purchase index</td><td align="center" valign="middle" >1688 industry index</td></tr><tr><td align="center" valign="middle" >Executives at age</td><td align="center" valign="middle" >Executives of gender</td><td align="center" valign="middle" >Executive level of education</td><td align="center" valign="middle" >Executives marital status</td></tr><tr><td align="center" valign="middle" >Executives phone number use fixed number of year</td><td align="center" valign="middle" >Social networking sites active number of fans</td><td align="center" valign="middle" >Senior management years in the industry</td><td align="center" valign="middle" >Senior management years in this profession</td></tr></tbody></table></table-wrap></table-wrap-group></sec><sec id="s4_3"><title>4.3. RBM Feature Extraction</title><p>Analysis of the sample data found that there were only 33 samples of the GEM &lt; 0 in 2017. Normal sample: Default sample = 566:33, approximately 17:1. Such data sets can easily fall into the trap of uneven learning. Therefore, we hope to influence the unbalanced sample set by the change of sampling method to obtain a more balanced distribution of data samples.</p><p>Considering the sample imbalance problem in the actual P2P enterprise customer credit risk assessment research, here we use the undersampling method, divide the sample set into 17 groups, respectively classify and return with the default sample set, .and finally the total classifier model is obtained.</p><p>Before using the RBM algorithm for feature extraction of indicators, it is necessary to standardize the index values for Z-Score, x − x &#175; σ [<xref ref-type="bibr" rid="scirp.94171-ref14">14</xref>] [<xref ref-type="bibr" rid="scirp.94171-ref15">15</xref>] . In addition,</p><p>we must also establish the number of neurons in the hidden layer. Given that there is no universal standard for establishing the number of hidden neuron nodes in the academic world, the approach we have here is the suggestion of Professor Hinton in the 2012 paper: Depending on the data category of the different samples, the number of hidden neuron nodes must be at least less than the most basic number of bytes in the sample set. The sample set used in this paper has a total data volume of 599 and its logarithm log 2 ( 599 ) = 9.226 , so at least 10 hidden neuron nodes should be selected. Based on the reconstruction error after the single-step iteration, we made the following attempt in <xref ref-type="table" rid="table4">Table 4</xref>, and finally selected 40 hidden neuron nodes with the smallest single-layer reconstruction error to train the RBM model.</p></sec><sec id="s4_4"><title>4.4. Model Comparison</title><p>After undergoing the above data preprocessing, we put P2P enterprise data into the machine learning model for classification and prediction, such as SVM, Logistic, KNN and so on. As can be seen:</p><p>1) The effect of RBM is better than that of PCA when the same model is selected.</p><p>2) The Logistic model performed best in the three models with the same data feature extraction method selected. In general, the RBM-Logistic model has the best classification, with an accuracy rate of 74.87% (<xref ref-type="fig" rid="fig2">Figure 2</xref>).</p></sec></sec><sec id="s5"><title>5. Result</title><p>According to the P2P SME customer credit index system screened by IV value, it can be seen that in the P2P SME credit risk assessment, corporate financial information (such as operating income, net assets growth, current liabilities total ratio, etc.) is a very important ring. This also reflects the full understanding of the financial situation of SMEs, while focusing on the debt situation of enterprises, which is of great benefit to the construction of the credit evaluation index system.</p><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Reconstruction errors of different hidden neuron nodes under single-step iteration</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Hidden node</th><th align="center" valign="middle" >Single layer reconstruction error</th></tr></thead><tr><td align="center" valign="middle" >10</td><td align="center" valign="middle" >30.4052</td></tr><tr><td align="center" valign="middle" >15</td><td align="center" valign="middle" >29.8549</td></tr><tr><td align="center" valign="middle" >20</td><td align="center" valign="middle" >29.548</td></tr><tr><td align="center" valign="middle" >25</td><td align="center" valign="middle" >29.5532</td></tr><tr><td align="center" valign="middle" >30</td><td align="center" valign="middle" >29.6088</td></tr><tr><td align="center" valign="middle" >40</td><td align="center" valign="middle" >29.0978</td></tr><tr><td align="center" valign="middle" >45</td><td align="center" valign="middle" >29.3006</td></tr><tr><td align="center" valign="middle" >50</td><td align="center" valign="middle" >29.5142</td></tr><tr><td align="center" valign="middle" >55</td><td align="center" valign="middle" >29.4811</td></tr><tr><td align="center" valign="middle" >60</td><td align="center" valign="middle" >34.5148</td></tr></tbody></table></table-wrap><p>A good credit risk model can not only reduce the burden of qualification review for SMEs for the P2P credit platform, but also reduce the risk of lending, while also speeding up the financing process for SMEs, which has many benefits for both parties. Therefore, for the P2P online lending platform, it is very important for the P2P business to construct a scientific and reasonable credit evaluation model.</p><p>After comparing the advantages and disadvantages of the linear dimensionality reduction algorithm and the nonlinear dimensionality reduction algorithm, combined with the contrast divergence theory, the P2P enterprise customer credit risk assessment model based on RBM feature extraction is established. Finally, it is concluded that the effect of RBM is better than that of PCA when the same model is selected. The Logistic model performs best in the three models when the same data feature extraction method is selected. Therefore, the P2P network lending platform can consider constructing the RBM-Logistic model with the highest accuracy when conducting credit risk assessment for SMEs.</p></sec><sec id="s6"><title>Acknowledgements</title><p>The authors wish to acknowledge the valuable support of everyone who helped in the actual administration of the questionnaire and all the other participants of the study.</p></sec><sec id="s7"><title>Fund</title><p>Project supported by the Ministry of education humanities and social sciences research project: Research on credit risk evaluation and supervision of P2P (Y9150040).</p></sec><sec id="s8"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s9"><title>Cite this paper</title><p>Yang, J.H., Li, Q.M. and Luo, D.S. (2019) Research on P2P Credit Risk Assessment Model Based on RBM Feature Extraction—Take SME Customers as an Example. Open Journal of Business and Management, 7, 1553-1563. https://doi.org/10.4236/ojbm.2019.74107</p></sec></body><back><ref-list><title>References</title><ref id="scirp.94171-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Gleit, R.A. (1994) Quantitative Methods in Credit Management: A Survey. Operations Research, 42, 589-613. https://doi.org/10.1287/opre.42.4.589</mixed-citation></ref><ref id="scirp.94171-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Huang, Z., Chen, H., Hsu, C.-J., Chen, W.-H. and Wu, S. (2004) Credit Rating Analysis with Support Vector Machines and Neural Networks: A Market Comparative Study. Decision Support Systems, 37, 543-558.  
https://doi.org/10.1016/S0167-9236(03)00086-1</mixed-citation></ref><ref id="scirp.94171-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Puro, L., Teich, J.E., Wallenius, H. and Wallenius, J. (2010) Borrower Decision aid for People-to-People Lending. Decision Support Systems, 49, 52-60.  
https://doi.org/10.1016/j.dss.2009.12.009</mixed-citation></ref><ref id="scirp.94171-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Jiang, W. (2018) Research on Personal Credit Evaluation Model and Algorithm Based on Improved PSO-BP Neural Network. Master’s Thesis, University of Electronic Science and Technology, Chengdu.</mixed-citation></ref><ref id="scirp.94171-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Liu, C. and Xu, Z.T. (2018) Research on Credit Risk of P2P Network Loan—An Empirical Analysis of the Lending Club Platform. Rural Economy and Technology, 29, 102-103.</mixed-citation></ref><ref id="scirp.94171-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Hinton, G.E. (2002) Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14, 1771-1800.  
https://doi.org/10.1162/089976602760128018</mixed-citation></ref><ref id="scirp.94171-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Chen, Y.H. (2016) Internet Loan Application Scoring Model Based on Feature Extraction of Restricted Pozmann Machine. Master’s Thesis, Shanghai Normal University, Shanghai.</mixed-citation></ref><ref id="scirp.94171-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, Y.X. (2016) Deep Learning Model Based on Restricted Boltzmann Machine and Its Application. Master’s Thesis, University of Electronic Science and Technology, Chengdu.</mixed-citation></ref><ref id="scirp.94171-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Freund, Y. and Haussler, D. (1994) Unsupervised Learning of Distributions of Binary Vectors Using Two Layer Networks.</mixed-citation></ref><ref id="scirp.94171-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Li, X., Pang, J., Mo, B., et al. (2016) Deep Neural Network for Short-Text Sentiment Classification. In: International Conference on Database Systems for Advanced Applications, Springer International Publishing, Berlin, 168-175.  
https://doi.org/10.1007/978-3-319-32055-7_15</mixed-citation></ref><ref id="scirp.94171-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Roux, N.L. and Bengio, Y. (2008) Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Neural Computation, 20, 1631-1649.  
https://doi.org/10.1162/neco.2008.04-07-510</mixed-citation></ref><ref id="scirp.94171-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Tenenbaum, J.B., Silva, V. and Langford, J.C. (2000) A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290, 2319-2323.  
https://doi.org/10.1126/science.290.5500.2319</mixed-citation></ref><ref id="scirp.94171-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Pope, D.G. and Sydnor, J.R. (2011) What’s in a Picture? Evidence of Discrimination from Prosper.com. Journal of Human Resources, 46, 53-92.  
https://doi.org/10.1353/jhr.2011.0025</mixed-citation></ref><ref id="scirp.94171-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Klafft, M. (2009) Online Peer-to-Peer Lending: A Lenders’ Perspective. SSRN Electronic Journal, 2, 81-87. https://doi.org/10.2139/ssrn.1352352</mixed-citation></ref><ref id="scirp.94171-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Larrimore, L., Li, J., Larrimore, J., et al. (2011) Peer to Peer Lending: The Relationship between Language Features, Trustworthiness, and Persuasion Success. Journal of Applied Communication Research, 39, 19-37.  
https://doi.org/10.1080/00909882.2010.536844</mixed-citation></ref></ref-list></back></article>