<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">ME</journal-id><journal-title-group><journal-title>Modern Economy</journal-title></journal-title-group><issn pub-type="epub">2152-7245</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/me.2018.911115</article-id><article-id pub-id-type="publisher-id">ME-88577</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Business&amp;Economics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Machine Learning Approaches to Predict Default of Credit Card Clients
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ruilin</surname><given-names>Liu</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>University of Southern California, Los Angeles, CA, USA</addr-line></aff><pub-date pub-type="epub"><day>12</day><month>11</month><year>2018</year></pub-date><volume>09</volume><issue>11</issue><fpage>1828</fpage><lpage>1838</lpage><history><date date-type="received"><day>7,</day>	<month>September</month>	<year>2018</year></date><date date-type="rev-recd"><day>16,</day>	<month>November</month>	<year>2018</year>	</date><date date-type="accepted"><day>19,</day>	<month>November</month>	<year>2018</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  This paper compares traditional machine learning models, 
  i.e.
   Support Vector Machine, k-Nearest Neighbors, Decision Tree and Random Forest, with Feedforward Neural Network and Long Short-Term Memory. We observe that the two neural networks achieve higher accuracies than traditional models. This paper also tries to figure out whether dropout can improve accuracy of neural networks. We observe that for Feedforward Neural Network, applying dropout can lead to better performances in certain cases but worse performances in others. The influence of dropout on LSTM models is small. Therefore, using dropout does not guarantee higher accuracy.
 
</p></abstract><kwd-group><kwd>Machine Learning</kwd><kwd> Feedforward Neural Network</kwd><kwd> Long Short-Term Memory</kwd><kwd> Dropout</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Neural network can explore the relationship among input features and corresponding labels, so it is suitable for complex machine learning problems. On the other hand, other machine learning models such as linear regression or Support Vector Machine (SVM) [<xref ref-type="bibr" rid="scirp.88577-ref1">1</xref>] can solve simpler problems more efficiently. Therefore, after analyzing specific problems, one should answer the question of “is neural network really necessary in this case?”</p><p>Moreover, there are different models within the category of “neural network”. Feedforward Neural Network uses neurons in the same layer together at the same time to calculate neurons in the next layer. Besides their difference in weights, neurons are “parallel” in this process. On the contrary, Recurrent Neural Network is a useful model for sequential dataset. It allows previous inputs to influence the processing of future inputs. The difference in accuracy between these two networks should be compared [<xref ref-type="bibr" rid="scirp.88577-ref2">2</xref>] - [<xref ref-type="bibr" rid="scirp.88577-ref8">8</xref>] .</p><p>We discuss previous research in Section 2, model description in Section 3, dataset description and experiment results in Section 4, conclusion in Section 5 and potential future work in Section 6.</p></sec><sec id="s2"><title>2. Related Works</title><p>Yeh [<xref ref-type="bibr" rid="scirp.88577-ref2">2</xref>] randomly divided 25,000 payment data into a training set and a testing set. Then they chose six data mining methods―Logistic Regression, Discriminant Analysis (Fisher’s rule), Na&#239;ve Bayes, kNN, Decision Tree and (Feedforward) Neural Network. The error rate of each model on testing set was recorded. Accuracies were known by using 1 − (error rate). kNN returned the highest accuracy of 0.84. Feedforward Neural Network and Decision Tree both returned second highest accuracy of 0.83. Discriminant Analysis returned the lowest accuracy of 0.74. From this paper, one could observe that neural network is not guaranteed to have better performance than other simpler models, and one of the traditional models, kNN, was able to achieve higher accuracy than neural network. However, they did not apply the technique of dropout on their neural network model. Also, Long Short-Term Memory [<xref ref-type="bibr" rid="scirp.88577-ref3">3</xref>] , a model that is widely applied now, was not considered. In neural networks, there is an “epochs” parameter that determines how many times a sample is fed into the model, but this parameter was not included in Yeh [<xref ref-type="bibr" rid="scirp.88577-ref2">2</xref>] . To have clearer comparisons between neural networks and traditional models, a research that includes these factors is needed.</p></sec><sec id="s3"><title>3. Model Description</title><sec id="s3_1"><title>3.1. k-Nearest Neighbors</title><p>k-Nearest Neighbors (kNN) [<xref ref-type="bibr" rid="scirp.88577-ref4">4</xref>] stores all training samples (including their features and labels) in a space according to its metrics without processing or calculation. When the model receives an object to be predicted, it puts the new object into that space (also according to the metrics). The model then makes prediction by looking at k nearest neighbors to the new object. Usually, the prediction is the label that occurs the most among those k samples.</p><p>This model determines a sample’s label based on nearby samples with known labels, so it does not “get trained” but only memorizes. To truly train a boundary that separates two categories and can be used for future predictions, Support Vector Machine is a classical model to choose [<xref ref-type="bibr" rid="scirp.88577-ref5">5</xref>] .</p></sec><sec id="s3_2"><title>3.2. Support Vector Machine</title><p>Support Vector Machine (SVM) first puts all samples in a space. It represents two categories as 1 and −1. Then it finds two parallel hyperplanes, and each hyperplane limits the boundary of one category. With normalized dataset, these two hyperplanes are expressed as 〈 w , x 〉 + b = + 1 and 〈 w , x 〉 + b = − 1 , where w is the normal vector of the hyperplanes, x is the input vector, and b is the constant. The training process is to find w so that the distance between these two hyperplanes is maximized. When it receives a new input vector xi, it labels it 1 if 〈 w , x i 〉 + b ≥ 1 , or −1 if 〈 w , x i 〉 + b ≤ − 1 .</p><p>If the dataset cannot be perfectly separated, the model increases the dimension of the dataset. SVM tries to find the dimension that is high enough and the dataset can be perfectly separated. Then, linear SVM can be applied.</p><p>SVM draws boundary in the sample space, and a different approach of categorizing is to make a series of comparisons between trained numbers and the values of explanatory variables. This is what Decision Tree does [<xref ref-type="bibr" rid="scirp.88577-ref6">6</xref>] .</p></sec><sec id="s3_3"><title>3.3. Decision Tree and Random Forest</title><p>Decision Tree is a tree graph with “if” statements. Each if-statement divides the samples into two branches. Samples that satisfy the if-statements go to one direction, and samples that do not satisfy go to the other direction. The training process is to find out if-statements that can make largest divisions.</p><p>Still, this model needs to be restricted. If the depth of tree is large, those if-statements would gradually become very narrow because they need to separate more similar samples into different direction. This means too much training and overfitting [<xref ref-type="bibr" rid="scirp.88577-ref6">6</xref>] . Therefore, the depth of the tree is often limited. This can be achieved by setting either the maximum depth directly, or the minimum number of samples required in leaf nodes (suppose node A contains only 20 samples and the minimum number of samples required is 30, then node A should not exist, and its parent should be the leaf node).</p><p>To further avoid overfitting, Random Forest repeatedly selects some random samples with replacement, and train multiple Decision Trees. When it receives a new sample, all those trees make predictions and do majority voting to determine the label of the new sample.</p></sec><sec id="s3_4"><title>3.4. Neural Network &amp; Dropout</title><p>Neural Network consists of an input layer, some hidden layers and an output layer. As shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>, the input layer takes features in the dataset as input. Then, these neurons together are used to compute each of the neuron in the next layer according to the weights of their connections (each bridge between two neurons has its unique weight). Each layer also has an activation function. This function determines the value a neuron passes to the next layer according to the value it receives from the previous layer. The final layer is the output layer.</p><p>However, one problem of the Neural Network is that when the number of layers and neurons is large, there could be many connections between neurons. If a model considers all connections and trains all the weights every time, that model may become too complicated while training. This results in overfitting in the model. Dropout is a useful way to avoiding overfitting. When the previous layer passes values to the next layer, it randomly ignores certain number of neurons and all their connections to the next layer (according to the dropping rate parameter). This decreases the number of neurons for each layer and prevents too much training and reduces overfitting.</p><p>The Neural Network described above ignores the sequential relationships of inputs. In real world problems, previous events can potentially affect future events, so it is valuable to take time sequence into account while doing machine learning. Recurrent Neural Network is a suitable model for this.</p></sec><sec id="s3_5"><title>3.5. Recurrent Neural Network &amp; Long Short-Term Memory</title><p>Recurrent Neural Network (RNN) can reflect the sequential relationship among inputs. The hidden layer used to process previous inputs is passed to next hidden layers, which are used to process future inputs. Therefore, by training hidden layers, previous inputs can affect how the model processes future inputs (<xref ref-type="fig" rid="fig2">Figure 2</xref>).</p><p>Long Short-Term Memory (LSTM) is an advanced architecture of RNN. RNN may have gradient vanishing or explosion when timestep is long. LSTM has a cell state that processes the inputs, and the cell state is also passed to the next timestep. The forget gate uses a sigmoid function to determine what proportion of cell state is kept. With the forget gate, the gradient is guaranteed to be in an ideal range.</p></sec></sec><sec id="s4"><title>4. Dataset &amp; Experiments</title><p>This dataset is provided by I-Cheng Yeh [<xref ref-type="bibr" rid="scirp.88577-ref2">2</xref>] , from Department of Information Management, Chung Hua University, Taiwan. It is accessed from UCI Machine Learning Repository. It contains 30,000 samples of credit information and whether default occurs. The explanatory variables include “the amount of credit, gender (1 = male; 2 = female), education (1 = graduate school; 2 = university; 3 = high school; 4 = others), marital status (1 = married; 2 = single; 3 = others), age, history of delayed payment from April to September, 2005, amount of bill statement from April to September, 2005, and amount paid from April to September, 2005”. 6636 of 30,000 samples have default payments, 23,364 do not. The 30,000 samples are randomly shuffled, and after shuffling, the top 10,000 samples are chosen. The top 8500 samples are used as training set, and the rest 1500 samples are used as testing set. The data has been normalized to mean of 0 and variance of 1.</p><sec id="s4_1"><title>4.1. F1-Score</title><p>Accuracy is Numberofcorrectpredictions Numberofsamples . When the dataset is imbalanced,</p><p>accuracy may not be sufficient because simply predicting all samples to be the major class can still get high accuracy.</p><p>In such situation, a good metrics to use is f1-score. F1-score is calculated by</p><p>2 ∗ precision ∗ recall precision + recall , where precision is TruePositives TruePositives + FalsePositives , and</p><p>recall is TruePositives TruePositives + FalseNegatives . Precision measures a model’s ability to</p><p>correctly identify positive samples, and recall measures the proportion of positive samples that are identified. F1-score ranges from 0, cannot make true positive prediction, to 1, being correct in all predictions.</p><p>In this dataset, 77.88% of samples are negative. While this paper still focuses on accuracies, f1-scores are also be measured as references to guarantee that models are not blindly guessing samples to be negative. If a model has strong tendency to make negative predictions, its recall will be low, so it will return a low f1-score (Tables 1-3).</p><p>When the kernel is “RBF” and C = 1, the accuracy, 0.804, is the highest among all results. The corresponding f1-score is 0.4520. The f1-scores of “RBF” kernel is generally higher than the scores of “Poly” kernel.</p><p>Random Forest is better than Decision Tree since it reduces overfitting, and both accuracies and f1-scores reflect this. In this experiment, when MSL is 10 or 20, the accuracy is 0.8000, slightly higher than the accuracy of the previous Decision Tree which has MSL set to 20. When MSL &gt; 5, there is no significant difference on accuracies. The f1-score when MSL = 10 is 0.4544, and the f1-score when MSL = 20 is 0.4425, so there is little difference between setting MSL to 10 or 20.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> SVM</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >C = 0.01</th><th align="center" valign="middle" >C = 0.1</th><th align="center" valign="middle" >C = 1</th><th align="center" valign="middle" >C = 10</th><th align="center" valign="middle" >C = 50</th><th align="center" valign="middle" >C = 100</th></tr></thead><tr><td align="center" valign="middle" >Poly acc:</td><td align="center" valign="middle" >0.7700</td><td align="center" valign="middle" >0.7813</td><td align="center" valign="middle" >0.7913</td><td align="center" valign="middle" >0.7900</td><td align="center" valign="middle" >0.7893</td><td align="center" valign="middle" >0.7846</td></tr><tr><td align="center" valign="middle" >Poly f1:</td><td align="center" valign="middle" >0.0247</td><td align="center" valign="middle" >0.1676</td><td align="center" valign="middle" >0.3372</td><td align="center" valign="middle" >0.3939</td><td align="center" valign="middle" >0.3984</td><td align="center" valign="middle" >0.3984</td></tr><tr><td align="center" valign="middle" >RBF acc:</td><td align="center" valign="middle" >0.7626</td><td align="center" valign="middle" >0.7940</td><td align="center" valign="middle" >0.8040</td><td align="center" valign="middle" >0.8013</td><td align="center" valign="middle" >0.7973</td><td align="center" valign="middle" >0.7806</td></tr><tr><td align="center" valign="middle" >RBF f1:</td><td align="center" valign="middle" >0.0000</td><td align="center" valign="middle" >0.4054</td><td align="center" valign="middle" >0.4520</td><td align="center" valign="middle" >0.4638</td><td align="center" valign="middle" >0.4658</td><td align="center" valign="middle" >0.4525</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> KNN</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >k = 3</th><th align="center" valign="middle" >k = 5</th><th align="center" valign="middle" >k = 10</th><th align="center" valign="middle" >k = 20</th><th align="center" valign="middle" >k = 50</th><th align="center" valign="middle" >k = 100</th></tr></thead><tr><td align="center" valign="middle" >Accuracy</td><td align="center" valign="middle" >0.7760</td><td align="center" valign="middle" >0.7826</td><td align="center" valign="middle" >0.7933</td><td align="center" valign="middle" >0.7980</td><td align="center" valign="middle" >0.7966</td><td align="center" valign="middle" >0.7973</td></tr><tr><td align="center" valign="middle" >f1-score</td><td align="center" valign="middle" >0.4384</td><td align="center" valign="middle" >0.4349</td><td align="center" valign="middle" >0.4077</td><td align="center" valign="middle" >0.4077</td><td align="center" valign="middle" >0.3920</td><td align="center" valign="middle" >0.3704</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Decision Tree: (min_samples_leaf is denoted by “MSL”)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >MSL = 3</th><th align="center" valign="middle" >MSL = 5</th><th align="center" valign="middle" >MSL = 10</th><th align="center" valign="middle" >MSL = 20</th><th align="center" valign="middle" >MSL = 50</th><th align="center" valign="middle" >MSL = 100</th></tr></thead><tr><td align="center" valign="middle" >Accuracy</td><td align="center" valign="middle" >0.7760</td><td align="center" valign="middle" >0.7826</td><td align="center" valign="middle" >0.7933</td><td align="center" valign="middle" >0.7980</td><td align="center" valign="middle" >0.7966</td><td align="center" valign="middle" >0.7973</td></tr><tr><td align="center" valign="middle" >f1-score</td><td align="center" valign="middle" >0.3697</td><td align="center" valign="middle" >0.3906</td><td align="center" valign="middle" >0.4046</td><td align="center" valign="middle" >0.4465</td><td align="center" valign="middle" >0.4547</td><td align="center" valign="middle" >0.4912</td></tr></tbody></table></table-wrap></sec><sec id="s4_2"><title>4.2. Feedforward Neural Network</title><p>In this paper, “relu”, “softmax” and “sigmoid” activation functions will be compared. There are two layers with the same activation function. The output layer has 2 neurons and “softmax” as activation function, so that the output is a probability distribution.</p></sec><sec id="s4_3"><title>4.3. Feedforward Neural Network without Dropout</title><p>In <xref ref-type="table" rid="table4">Table 4</xref>, numbers on the leftmost column represent the number of neurons in corresponding layers (i.e., “8→8” means a Dense layer with 8 neurons, followed by another Dense layer with 8 neurons). The “Epochs” parameter varies from 1 to 400 for each model, and the value represents the “Epochs” which returns the highest accuracy for that model. These highest accuracies and their corresponding f1-scores and “Epochs” are compared in the following table.</p><p>According to the accuracies and f1-scores, “sigmoid” activation function outperforms “softmax” and “relu”. For 4 out of 5 cases, “sigmoid” has the highest accuracies. The highest accuracy, 0.8227, also occurs in “sigmoid” when there are 32 neurons in both layers and training samples are fed into the model 117 times.</p><p>However, the f1-scores of “sigmoid” models are lower than those of “softmax” and “relu”, so for heavily imbalanced dataset, the other two activation functions are better choices.</p></sec><sec id="s4_4"><title>4.4. Feedforward Neural Network with Dropout: (Using Sigmoid)</title><p>In this experiment, a dropout function is set between the second last layer and the output layer. Accuracies and f1-scores with dropout are compared with those without dropout (Tables 5-7).</p><p>Accuracy &amp; f1-score table for Dense(8)→Dense(8)→Dense(2), first two layers using “sigmoid” activation</p><p>Accuracy &amp; f1-score table for Dense(16)→Dense(16)→Dense(2), first two layers using “sigmoid” activation</p><p>Accuracy &amp; f1-score table for Dense(32)→Dense(32)→Dense(2), first two layers using “sigmoid” activation</p><p>When each layer has only 8 neurons, using dropout causes decrease in accuracies and increase in f1-scores at the beginning, but as the dropout rate becomes 0.3, f1-score decreases too. Dense(16)→Dense(16) →Dense(2) shows better performance after applying dropout. Having higher dropout rate increases both accuracies and f1-scores. When each layer has 32 neurons and dropout is added, the model can still get high accuracies (higher than 0.82) and high f1-scores (higher than 0.45), both are relatively better than other two models.</p><p>Therefore, dropout in Feedforward Neural Network can be useful only when there are larger numbers of neurons in each layer. The reason might be that, if the number of neurons is already small, like 8 neurons per layer, dropping neurons and connections could make the model lack of necessary information.</p></sec><sec id="s4_5"><title>4.5. LSTM without Dropout</title><p>In the following models, all layers use “sigmoid” as activation function since it returns high accuracies in Feedforward Neural Network. A feedforward Dense layer with 2 neurons is also added after the output layer of LSTM. The “Adam” optimization algorithm is used during training. “Epochs” also ranges from 1 to 400, depending on when each model returns its highest accuracy (<xref ref-type="table" rid="table8">Table 8</xref>).</p><p>According to the results, LSTM models have lower accuracies than Feedforward Neural Network. Using “sigmoid” activation function, 3 Feedforward Neural Network models have accuracies higher than 0.82, but among these five LSTM models, none of them have accuracies higher than that. Also, there are no observable improvements of f1-scores while using LSTM models.</p></sec><sec id="s4_6"><title>4.6. LSTM with Dropout</title><p>Three models, LSTM(8)→LSTM(8)→Dense(2), LSTM(16)→LSTM(16)→Dense(2) and LSTM(32)→LSTM(32)→Dense(2) are chosen. Their accuracies and f1-scores with and without dropout are compared. A dropout technique is set between the second last layer and the output layer. Other settings are the same as previous LSTM models (Tables 9-11).</p><p>Accuracy &amp; f1-score table for LSTM(8)→LSTM(8)→Dense(2), first two layers using “sigmoid” activation</p><p>Accuracy &amp; f1-score table for LSTM(16)→LSTM(16)→Dense(2), first two layers using “sigmoid” activation</p><p>Accuracy &amp; f1-score table for LSTM(32)→LSTM(32)→Dense(2), first two layers using “sigmoid” activation</p><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Corresponding f1-scoresand “Epochs”</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >softmax</th><th align="center" valign="middle" >relu</th><th align="center" valign="middle" >sigmoid</th></tr></thead><tr><td align="center" valign="middle" >8&#224;8</td><td align="center" valign="middle" >Acc: 0.8160 F1: 0.4588 Epochs: 133</td><td align="center" valign="middle" >Acc: 0.8153 F1: 0.4358 Epochs: 43</td><td align="center" valign="middle" >Acc: 0.8207 F1: 0.4289 Epochs: 104</td></tr><tr><td align="center" valign="middle" >8&#224;16</td><td align="center" valign="middle" >Acc: 0.8193 F1: 0.4525 Epochs: 31</td><td align="center" valign="middle" >Acc: 0.8193 F1: 0.4591 Epochs: 117</td><td align="center" valign="middle" >Acc: 0.8173 F1: 0.4339 Epochs: 34</td></tr><tr><td align="center" valign="middle" >16&#224;16</td><td align="center" valign="middle" >Acc: 0.8147 F1: 0.4484 Epochs: 67</td><td align="center" valign="middle" >Acc: 0.8173 F1: 0.4690 Epochs: 176</td><td align="center" valign="middle" >Acc: 0.8207 F1: 0.4190 Epochs: 82</td></tr><tr><td align="center" valign="middle" >16&#224;32</td><td align="center" valign="middle" >Acc: 0.8173 F1: 0.4476 Epochs: 53</td><td align="center" valign="middle" >Acc: 0.8147 F1: 0.4303 Epochs: 2</td><td align="center" valign="middle" >Acc: 0.8180 F1: 0.4253 Epochs: 31</td></tr><tr><td align="center" valign="middle" >32&#224;32</td><td align="center" valign="middle" >Acc: 0.8147 F1: 0.4549 Epochs: 56</td><td align="center" valign="middle" >Acc: 0.8153 F1: 0.4381 Epochs: 7</td><td align="center" valign="middle" >Acc: 0.8227 F1: 0.4593 Epochs: 117</td></tr></tbody></table></table-wrap><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Accuracy &amp; f1-score</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Accuracy</th><th align="center" valign="middle" >F1-score</th><th align="center" valign="middle" >Epochs</th></tr></thead><tr><td align="center" valign="middle" >Without dropout</td><td align="center" valign="middle" >0.8207</td><td align="center" valign="middle" >0.4289</td><td align="center" valign="middle" >104</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.1</td><td align="center" valign="middle" >0.8193</td><td align="center" valign="middle" >0.4366</td><td align="center" valign="middle" >34</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.2</td><td align="center" valign="middle" >0.8193</td><td align="center" valign="middle" >0.4435</td><td align="center" valign="middle" >188</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.3</td><td align="center" valign="middle" >0.8160</td><td align="center" valign="middle" >0.4153</td><td align="center" valign="middle" >44</td></tr></tbody></table></table-wrap><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> Accuracy &amp; f1-score</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Accuracy</th><th align="center" valign="middle" >F1-score</th><th align="center" valign="middle" >Epochs</th></tr></thead><tr><td align="center" valign="middle" >Without dropout</td><td align="center" valign="middle" >0.8207</td><td align="center" valign="middle" >0.4190</td><td align="center" valign="middle" >82</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.1</td><td align="center" valign="middle" >0.8173</td><td align="center" valign="middle" >0.4315</td><td align="center" valign="middle" >58</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.2</td><td align="center" valign="middle" >0.8213</td><td align="center" valign="middle" >0.4440</td><td align="center" valign="middle" >96</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.3</td><td align="center" valign="middle" >0.8227</td><td align="center" valign="middle" >0.4637</td><td align="center" valign="middle" >139</td></tr></tbody></table></table-wrap><table-wrap id="table7" ><label><xref ref-type="table" rid="table7">Table 7</xref></label><caption><title> Accuracy &amp; f1-score</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Accuracy</th><th align="center" valign="middle" >F1-score</th><th align="center" valign="middle" >Epochs</th></tr></thead><tr><td align="center" valign="middle" >Without dropout</td><td align="center" valign="middle" >0.8227</td><td align="center" valign="middle" >0.4593</td><td align="center" valign="middle" >117</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.1</td><td align="center" valign="middle" >0.8246</td><td align="center" valign="middle" >0.4509</td><td align="center" valign="middle" >133</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.2</td><td align="center" valign="middle" >0.8220</td><td align="center" valign="middle" >0.4649</td><td align="center" valign="middle" >324</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.3</td><td align="center" valign="middle" >0.8200</td><td align="center" valign="middle" >0.4328</td><td align="center" valign="middle" >76</td></tr></tbody></table></table-wrap><table-wrap id="table8" ><label><xref ref-type="table" rid="table8">Table 8</xref></label><caption><title> LSTM models</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Accuracy</th><th align="center" valign="middle" >F1-score</th><th align="center" valign="middle" >Epochs</th></tr></thead><tr><td align="center" valign="middle" >LSTM(8)&#224;LSTM(8)&#224;Dense(2)</td><td align="center" valign="middle" >0.8180</td><td align="center" valign="middle" >0.4371</td><td align="center" valign="middle" >85</td></tr><tr><td align="center" valign="middle" >LSTM(8)&#224;LSTM(16)&#224;Dense(2)</td><td align="center" valign="middle" >0.8173</td><td align="center" valign="middle" >0.4244</td><td align="center" valign="middle" >82</td></tr><tr><td align="center" valign="middle" >LSTM(16)&#224;LSTM(16)&#224;Dense(2)</td><td align="center" valign="middle" >0.8180</td><td align="center" valign="middle" >0.4324</td><td align="center" valign="middle" >59</td></tr><tr><td align="center" valign="middle" >LSTM(16)&#224;LSTM(32)&#224;Dense(2)</td><td align="center" valign="middle" >0.8153</td><td align="center" valign="middle" >0.4241</td><td align="center" valign="middle" >70</td></tr><tr><td align="center" valign="middle" >LSTM(32)&#224;LSTM(32)&#224;Dense(2)</td><td align="center" valign="middle" >0.8193</td><td align="center" valign="middle" >0.4503</td><td align="center" valign="middle" >106</td></tr></tbody></table></table-wrap><table-wrap id="table9" ><label><xref ref-type="table" rid="table9">Table 9</xref></label><caption><title> Accuracy &amp; f1-score</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Accuracy</th><th align="center" valign="middle" >F1-score</th><th align="center" valign="middle" >Epochs</th></tr></thead><tr><td align="center" valign="middle" >Without dropout</td><td align="center" valign="middle" >0.8180</td><td align="center" valign="middle" >0.4371</td><td align="center" valign="middle" >85</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.1</td><td align="center" valign="middle" >0.8213</td><td align="center" valign="middle" >0.4417</td><td align="center" valign="middle" >129</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.2</td><td align="center" valign="middle" >0.8180</td><td align="center" valign="middle" >0.4485</td><td align="center" valign="middle" >269</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.3</td><td align="center" valign="middle" >0.8233</td><td align="center" valign="middle" >0.4421</td><td align="center" valign="middle" >152</td></tr></tbody></table></table-wrap><table-wrap id="table10" ><label><xref ref-type="table" rid="table1">Table 1</xref>0</label><caption><title> Accuracy &amp; f1-score</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Accuracy</th><th align="center" valign="middle" >F1-score</th><th align="center" valign="middle" >Epochs</th></tr></thead><tr><td align="center" valign="middle" >Without dropout</td><td align="center" valign="middle" >0.8180</td><td align="center" valign="middle" >0.4324</td><td align="center" valign="middle" >59</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.1</td><td align="center" valign="middle" >0.8193</td><td align="center" valign="middle" >0.4655</td><td align="center" valign="middle" >119</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.2</td><td align="center" valign="middle" >0.8187</td><td align="center" valign="middle" >0.4494</td><td align="center" valign="middle" >123</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.3</td><td align="center" valign="middle" >0.8193</td><td align="center" valign="middle" >0.4547</td><td align="center" valign="middle" >168</td></tr></tbody></table></table-wrap><table-wrap id="table11" ><label><xref ref-type="table" rid="table1">Table 1</xref>1</label><caption><title> Accuracy &amp; f1-score</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Accuracy</th><th align="center" valign="middle" >F1-score</th><th align="center" valign="middle" >Epochs</th></tr></thead><tr><td align="center" valign="middle" >Without dropout</td><td align="center" valign="middle" >0.8193</td><td align="center" valign="middle" >0.4503</td><td align="center" valign="middle" >106</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.1</td><td align="center" valign="middle" >0.8187</td><td align="center" valign="middle" >0.4472</td><td align="center" valign="middle" >107</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.2</td><td align="center" valign="middle" >0.8193</td><td align="center" valign="middle" >0.4435</td><td align="center" valign="middle" >73</td></tr><tr><td align="center" valign="middle" >Dropout rate = 0.3</td><td align="center" valign="middle" >0.8200</td><td align="center" valign="middle" >0.4304</td><td align="center" valign="middle" >151</td></tr></tbody></table></table-wrap><p>Using dropout does not lead to significantly better performances. In most cases, accuracies for LSTM models are around 0.82. An advantage over Feedforward Neural Network is that f1-scores of LSTM are stable. After applying dropout, for Feedforward Neural Network there are some sudden decreases in f1-scores. Such phenomena are not observed on LSTM models. Since both accuracies and f1-scores are stable, it is fair to conclude that dropout does not have strong influence on LSTM. An explanation is that a LSTM model already has complicated connections within and between timesteps, so dropping several connections cannot be reflected.</p></sec></sec><sec id="s5"><title>5. Conclusions</title><p>Traditional machine learning models are only able to achieve accuracy of 0.8040, which is achieved by SVM. The highest accuracy of neural network is 0.8246, by using Dense(32)→Dense(32)→Dense(2), dropout rate = 0.1, “sigmoid” as activation function. For LSTM, LSTM(8)→LSTM(8)→Dense(2), dropout rate = 0.3, “sigmoid” as activation function achieves accuracy of 0.8233, which is also better than the best traditional model. Looking at f1-scores, many of neural networks’ f1-scores are around 0.44, while Random Forest and SVM using “rbf” kernel can reach 0.45. However, the difference on accuracies is more significant.</p><p>Therefore, unlike the research of Yeh [<xref ref-type="bibr" rid="scirp.88577-ref2">2</xref>] shown in Section 2, neural networks outperform traditional models, except for situations when the research strongly focuses on positive predictions (False Negative is dangerous and high f1-score is required).</p><p>For Feedforward Neural Network, using dropout is sometimes efficient for better performance. The accuracies and f1-scores for Dense(16)→Dense(16) and Dense(32)→Dense(32) are generally improved. Therefore, when using feedforward neural network, dropout can be helpful when the number of neurons per layer is not small.</p><p>For LSTM, using dropout does not make significant difference. The accuracies and f1-scores are all close to the results without using dropout. Still, dropout can be applied if one tries to avoid False Negative and focuses on f1-score. Unlike the results of Feedforward Neural Network models, using dropout on LSTM prevents sudden decrease in f1-scores.</p><p>A noticeable point is that LSTM models perform worse than Feedforward Neural Network. Generally, people would prefer LSTM and consider it as advanced architecture of neural network, but experiments in this paper show that LSTM get similar f1-scores and even lower accuracies compared to Feedforward Neural Network. Future work is needed to explain this abnormal phenomenon and give a clear boundary of whether to use LSTM or not.</p></sec><sec id="s6"><title>Conflicts of Interest</title><p>The author declares no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s7"><title>Cite this paper</title><p>Liu, R.L. (2018) Machine Learning Approaches to Predict Default of Credit Card Clients. Modern Economy, 9, 1828-1838. https://doi.org/10.4236/me.2018.911115</p></sec></body><back><ref-list><title>References</title><ref id="scirp.88577-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Weston, J. (n.d.) Support Vector Machine (and Statistical Learning Theory) Tutorial. NEC Labs America.  
http://www.cs.columbia.edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf</mixed-citation></ref><ref id="scirp.88577-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Yeh, I.C. and Lien, C.H. (2009) The Comparisons of Data Mining Techniques for the Predictive Accuracy of Probability of Default of Credit Card Clients. Expert Systems with Applications, 36, 2473-2480.  
https://doi.org/10.1016/j.eswa.2007.12.020</mixed-citation></ref><ref id="scirp.88577-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Hochreiter, S. and Schmidhuber, J. (1997) Long Short-Term Memory. Neural Computation, 9, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735</mixed-citation></ref><ref id="scirp.88577-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Cover, T.M. and Hart, P.E. (1967) Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, IT-IS, 21-27.</mixed-citation></ref><ref id="scirp.88577-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Cortes, C. (1995) Support-Vector Networks. Machine Learning, 20, 273-297.  
https://doi.org/10.1007/BF00994018</mixed-citation></ref><ref id="scirp.88577-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Quilan, J.R. (1986) Induction of Decision Trees. Machine Learning, 1, 81-106.  
https://doi.org/10.1007/BF00116251</mixed-citation></ref><ref id="scirp.88577-ref7"><label>7</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Rosenblatt</surname><given-names> F. </given-names></name>,<etal>et al</etal>. (<year>1958</year>)<article-title>The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain</article-title><source> Psychological Review</source><volume> 65</volume>,<fpage> 386</fpage>-<lpage>408</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.88577-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32.  
https://doi.org/10.1023/A:1010933404324</mixed-citation></ref></ref-list></back></article>