<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JBiSE</journal-id><journal-title-group><journal-title>Journal of Biomedical Science and Engineering</journal-title></journal-title-group><issn pub-type="epub">1937-6871</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jbise.2024.171001</article-id><article-id pub-id-type="publisher-id">JBiSE-130521</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biomedical&amp;Life Sciences</subject></subj-group></article-categories><title-group><article-title>
 
 
  Using Cross Entropy as a Performance Metric for Quantifying Uncertainty in DNN Image Classifiers: An Application to Classification of Lung Cancer on CT Images
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Eri</surname><given-names>Matsuyama</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Masayuki</surname><given-names>Nishiki</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Noriyuki</surname><given-names>Takahashi</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Haruyuki</surname><given-names>Watanabe</given-names></name><xref ref-type="aff" rid="aff4"><sup>4</sup></xref></contrib></contrib-group><aff id="aff3"><addr-line>School of Health Sciences, Fukushima Medical 
University, Fukushima, Japan</addr-line></aff><aff id="aff4"><addr-line>School of Radiological Technology, Gunma Prefectural College of Health Sciences, Gunma, Japan</addr-line></aff><aff id="aff1"><addr-line>Faculty of Informatics, University of Fukuchiyama, Kyoto, Japan</addr-line></aff><aff id="aff2"><addr-line>Graduate School of Radiological Sciences, 
International University of Health and Welfare, Tochigi, Japan</addr-line></aff><pub-date pub-type="epub"><day>16</day><month>01</month><year>2024</year></pub-date><volume>17</volume><issue>01</issue><fpage>1</fpage><lpage>12</lpage><history><date date-type="received"><day>11,</day>	<month>December</month>	<year>2023</year></date><date date-type="rev-recd"><day>14,</day>	<month>January</month>	<year>2024</year>	</date><date date-type="accepted"><day>17,</day>	<month>January</month>	<year>2024</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
     
   Cross entropy is a measure in machine learning and deep learning that assesses the difference between predicted and actual probability distributions. In this study, we propose cross entropy as a performance evaluation metric for image classifier models and apply it to the CT image classification of lung cancer. A convolutional neural network is employed as the deep neural network (DNN) image classifier, with the residual network (ResNet) 50 chosen as the DNN archi-tecture. The image data used comprise a lung CT image set. Two classification models are built from datasets with varying amounts of data, and lung cancer is categorized into four classes using 10-fold cross-validation. Furthermore, we employ t-distributed stochastic neighbor embedding to visually explain the data distribution after classification. Experimental results demonstrate that cross en-tropy is a highly useful metric for evaluating the reliability of image classifier models. It is noted that for a more comprehensive evaluation of model perfor-mance, combining with other evaluation metrics is considered essential. 
 
</p></abstract><kwd-group><kwd>Cross Entropy</kwd><kwd> Performance Metrics</kwd><kwd> DNN Image Classifiers</kwd><kwd> Lung Cancer</kwd><kwd> Prediction Uncertainty</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. INTRODUCTION</title><p>In 2023, it is estimated that there will be nearly 2 million new cancer cases and approximately 610 thousand cancer-related deaths in the United States. The leading cause of cancer-related deaths for both men and women is lung cancer [<xref ref-type="bibr" rid="scirp.130521-ref1">1</xref>]. Lung cancer is broadly categorized into small cell lung cancer and non-small cell lung cancer (NSCLC) based on histological type. NSCLC accounts for approximately 80% - 85% of all lung cancer cases [<xref ref-type="bibr" rid="scirp.130521-ref2">2</xref>]. Further classification of NSCLC is based on histology, leading to subtypes such as lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), large cell carcinoma (LULC), and others, each exhibiting unique characteristics. LUAD represents 85% of NSCLC cases, and most of the patients often face challenges in survival due to drug resistance and recurrence [<xref ref-type="bibr" rid="scirp.130521-ref2">2</xref>]. LUSC constitutes around 30% of all NSCLCs and is strongly linked to smoking, characterized by a high overall mutation rate of 8.1 mutations per megabase (1,000,000 base pairs long) and significant genomic complexity [<xref ref-type="bibr" rid="scirp.130521-ref3">3</xref>]. LULC has a molecular profile characteristic of adenocarcinoma, and this profile is more similar to adenocarcinoma than squamous cell carcinoma [<xref ref-type="bibr" rid="scirp.130521-ref4">4</xref>]. Additionally, the prognosis is worse than other types of non-small cell lung cancer. Even within the broad category of NSCLC, the characteristics vary depending on the subtype. Therefore, early identification of the histological type is crucial for treatment strategies and reducing mortality.</p><p>Low-dose computed tomography (LDCT) screening proves valuable for early lung cancer detection [5 - 9]. Nevertheless, the rapid evolution of computed tomography (CT) equipment has led to the identification of numerous microscopic nodules, intensifying the workload for radiologists. Consequently, the implementation of computer-aided diagnosis (CAD) systems is anticipated to help radiologists ease their burden. Broadly, CAD is categorized into two types: computer-aided detection (CADe), focusing on lesion detection (presence diagnosis), and computer-aided diagnosis (CADx), aiming to analyze lesions (definitive diagnosis, such as benign/malignant differentiation). Extensive research in chest CT CAD, dating back to the 1960s, has been conducted for nodules and lung diseases, yielding some positive outcomes [10 - 14]. However, challenges persist, including a higher rate of false positives compared to physicians [<xref ref-type="bibr" rid="scirp.130521-ref11">11</xref>] and limitations in enhancing system accuracy [<xref ref-type="bibr" rid="scirp.130521-ref13">13</xref>]. Conversely, image recognition using deep neural networks (DNN) has exhibited significant advancements in the past decade.</p><p>In recent years, it has been reported that amazing recognition accuracy can be obtained with the attention mechanism developed for natural language processing [<xref ref-type="bibr" rid="scirp.130521-ref15">15</xref>] and vision transformer, which applies a transformer-like model to image processing [<xref ref-type="bibr" rid="scirp.130521-ref16">16</xref>]. These advancements have eliminated the need for the traditionally challenging feature extraction process in CAD research, thus enabling the development of highly accurate and robust designs. Consequently, research on artificial intelligence-assisted CAD systems targeting pulmonary diseases using deep neural networks (DNN) has progressed significantly [17 - 20].</p><p>These papers discuss pattern detection of interstitial lung diseases [17 , 18] and histological classification of lung cancer [<xref ref-type="bibr" rid="scirp.130521-ref19">19</xref>] using convolutional neural networks (CNNs). In both cases, to enhance the model’s performance, the accuracy of the DNN model is evaluated from various perspectives, including accuracy, precision, recall, F-measure, receiver operating characteristic (ROC) curve, and area under the ROC (AUC), all considered gold standards. However, these existing evaluation metrics have problems such as lack of transparency in DNN inferences and inability to estimate uncertainty regarding results. As specific examples, there are issues such as uncertainty arising from facing out-of-distribution data [<xref ref-type="bibr" rid="scirp.130521-ref21">21</xref>], over-confident problems, and covariate shift [<xref ref-type="bibr" rid="scirp.130521-ref20">20</xref>].</p><p>Furthermore, in image classification tasks utilizing DNNs, it is common to employ the softmax function to represent the output as a probability value. However, one issue with DCNN is that calibration is often insufficient, making it difficult to interpret the model’s output directly as a probabilistic measure [<xref ref-type="bibr" rid="scirp.130521-ref22">22</xref>].</p><p>Evaluating model uncertainty is essential for enhancing the transparency and reliability of predictions, improving data quality, and reducing misjudgments. Bayesian neural network (BNN) and Monte Carlo dropout (MCDO) [<xref ref-type="bibr" rid="scirp.130521-ref23">23</xref>] are recognized methods for estimating uncertainty in neural networks. BNN expresses the weights of a network model as a probability distribution, making it possible to estimate uncertainty in addition to prediction results. MCDO is a type of BNN, and is a method that enables approximate modeling of the probability distribution of weights by representing the weights of a network model using a Bernoulli distribution. However, both methods encounter the challenge of excessive computational costs when applied to DNNs.</p><p>In this study, we suggest employing cross entropy as a performance evaluation metric to quantify uncertainty in DNN image classifiers, applying it specifically to the classification of lung cancer in CT images. Cross entropy is typically utilized as a cost function during the training phase of DNN model construction. However, in this study, we use it as one of the performance evaluation metrics for the classification model.</p></sec><sec id="s2"><title>2. MATERIALS AND METHOD</title><p>In this study, we use a CNN as a DNN image classifier and perform finetuning. Two classification models are constructed using two data sets with different numbers of data. Each model undergoes a 10-fold cross-validation to perform a four-class classification task. In this experiment, alongside computing the proposed cross-entropy metric, we also calculate existing evaluation metrics for comparison. Additionally, we visualize the data distribution after classifying the classes.</p><sec id="s2_1"><title>2.1. Image Date Sets</title><p>The data used comprise a lung CT image set classified into four classes: LUAD, LULC, LUSC, and normal. This dataset is publicly available on the web for non-profit purposes, as provided by the research community [<xref ref-type="bibr" rid="scirp.130521-ref24">24</xref>]. Consequently, ethical concerns do not arise in this study, and obtaining informed consent is not necessary. An illustration of the image data is presented in <xref ref-type="fig" rid="fig1">Figure 1</xref>.</p><p>In the experiment, two models were constructed: “Model A”, which was trained on a total of 1000 images with imbalanced data counts for each lesion, and “Model B”, which was trained on a total of 748 images with balanced data counts for each lesion. Both models undergo a 10-fold cross-validation, where 90% of the data is allocated for training and the remaining 10% for validation. The distribution and total numbers of the data are detailed in <xref ref-type="table" rid="table1">Table 1</xref>.</p></sec><sec id="s2_2"><title>2.2. Multioutput Classification Model Used</title><p>In this study, we employ the residual network (ResNet) 50 architecture [<xref ref-type="bibr" rid="scirp.130521-ref25">25</xref>] for the deep convolutional neural network (DCNN) and conduct learning through fine-tuning. Typically, in DCNNs, accuracy does not improve unless the number of stacked layers is sufficiently large. However, when the number of layers surpasses a certain threshold, the vanishing gradient problem arises, leading to a deterioration in accuracy. In ResNet, the introduction of a mechanism called shortcut connection solved the vanishing gradient problem by directly adding the input of the preceding layer to the subsequent layer [<xref ref-type="bibr" rid="scirp.130521-ref25">25</xref>]. Consequently, this allows for the realization of a deep network, and ResNet50 is considered highly effective for medical imaging applications [<xref ref-type="bibr" rid="scirp.130521-ref26">26</xref>].</p><p>In the fine-tuning process of this experiment, we utilize the pre-trained ResNet50 model on natural images, retraining the entire network using lung CT images. In other words, fine-tuning is executed without placing a frozen (no weight updates) layer, and a four-class classification is conducted. Consequently, the final fully connected layer and the last classification layer are replaced and trained with new configurations tailored to the number of categories. To meet the structural requirements of ResNet50, the input data size needs to be 224 &#215; 224. Therefore, bicubic interpolation is employed to standardize the overall image size. The mini-batch size is set to 10, and the optimizer used is Adam (combining momentum SGD + RMSprop). In the retraining with CT images, parameters are adjusted so that the learning rate increases in the newly replaced fully connected layer, decreases in the transfer layer, and decreases after completion of every 5 epochs. To prevent overfitting, an L2 regularization term is incorporated into the cost function (loss function). The number of epochs is determined by evaluating accuracy validation after each iteration. Retraining is halted if the accuracy falls below the highest accuracy achieved in the last 5 consecutive validations.</p><p>In this experiment, with a focus on model interpretability, we visualize the distribution of post-classification data. To achieve this, we employ t-distributed stochastic neighbor embedding (t-SNE) [<xref ref-type="bibr" rid="scirp.130521-ref27">27</xref>]. t-SNE is a dimension reduction method that condenses data into a low-dimensional space while preserving distances in high-dimensional data, allowing for nonlinear mapping. In this experiment, all high-dimensional activation data points in the final softmax layer are visualized through a two-dimensional mapping.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Breakdown of the image dataset used</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Class</th><th align="center" valign="middle" >Model A</th><th align="center" valign="middle" >Model B</th></tr></thead><tr><td align="center" valign="middle" >LUAD (adenocarcinoma)</td><td align="center" valign="middle" >338</td><td align="center" valign="middle" >187</td></tr><tr><td align="center" valign="middle" >LULC (large cell carcinoma)</td><td align="center" valign="middle" >187</td><td align="center" valign="middle" >187</td></tr><tr><td align="center" valign="middle" >LUSC (squamous cell carcinoma)</td><td align="center" valign="middle" >260</td><td align="center" valign="middle" >187</td></tr><tr><td align="center" valign="middle" >Normal</td><td align="center" valign="middle" >215</td><td align="center" valign="middle" >187</td></tr><tr><td align="center" valign="middle" >Total</td><td align="center" valign="middle" >1000</td><td align="center" valign="middle" >748</td></tr></tbody></table></table-wrap></sec><sec id="s2_3"><title>2.3. Cross Entropy</title><p>Cross entropy serves as a metric for gauging the dissimilarity between two probability distributions [28 - 33]. In the realm of machine learning and deep learning, it is commonly used to assess the gap between the predicted probability distribution produced by a model and the true, ground truth probability distribution. Fundamentally, cross entropy quantifies the degree of disparity between these two distributions.</p><p>The cross entropy between these two distributions is given by the following formula:</p><p>H ( p , q ) = − ∑ x p ( x ) log e q ( x ) (1)</p><p>where p is the true distribution, q is the predicted distribution, and x ranges over all possible outcomes.</p><p>Cross entropy indicates the amount of information lost when utilizing the predicted distribution to infer the real one [28 - 33]. Essentially, it offers insights into the effectiveness of a classification model that provides probabilities ranging from 0 to 1. Put simply, it reveals the proximity of the predicted distribution to the actual one. A perfect match results in zero cross entropy, while significant differences yield a higher value. Consequently, cross entropy serves as a versatile metric for evaluating the performance of classification models.</p><p>The following provides a simplified numerical example of utilizing cross entropy for the quality evaluation of a deep learning classifier in a multi-class classification context [31 , 33]. Let’s consider a case where we have a deep learning classifier trained for a multi-class classification problem with three classes: apple, orange, and pear. The model has undergone training, and now our objective is to assess its performance using cross entropy.</p><p>Assume we have a small test dataset with three samples and the true class labels are:</p><p>Sample 1: True label = apple.</p><p>Sample 2: True label = orange.</p><p>Sample 3: True label = pear.</p><p>Now, let’s say the model’s predictions for these samples produce the following class probabilities:</p><p>Sample 1: Predicted probabilities = [0.7, 0.15, 0.15]</p><p>(70% confidence in apple, 15% in orange, 15% in pear).</p><p>Sample 2: Predicted probabilities = [0.1, 0.8, 0.1]</p><p>(10% confidence in apple, 80% in orange, 10% in pear).</p><p>Sample 3: Predicted probabilities = [0.25, 0.25, 0.5]</p><p>(25% confidence in apple, 25% in orange, 50% in pear).</p><p>Using Equation (1), we calculate the cross entropy for each sample and then the average cross entropy for the entire test dataset:</p><p>Sample 1—True label: [1, 0, 0], Predicted label: [0.7, 0.15, 0.15],</p><p>Cross entropy = −(1) log<sub>e</sub> (0.7) = 0.3567.</p><p>Sample 2—True label: [0, 1, 0], Predicted label: [0.1, 0.8, 0.1],</p><p>Cross entropy= − (1) log<sub>e</sub> (0.8) = 0.2231.</p><p>Sample 3—True label: [0, 0, 1], Predicted label: [0.25, 0.25, 0.5],</p><p>Cross entropy = − (1) log<sub>e</sub> (0.5) = 0.6931.</p><p>Then, we calculate the average cross entropy for the entire test dataset:</p><p>Average cross entropy = (0.3567 + 0.2231 + 0.6931)/3 = 0.4243.</p><p>In this example, the average cross entropy for the test dataset is approximately 0.4243. A lower cross entropy implies that the model’s predicted probabilities are closer to the true class probabilities, indicating better model performance.</p><p>When evaluating the classification performance of two CNN models using cross entropy, the entropy values for both models are compared. A lower entropy suggests that the model is more confident in its predictions, leading to higher accuracy. Conversely, a higher entropy indicates more uncertainty and lower accuracy.</p></sec><sec id="s2_4"><title>2.4. Merits of Using Cross Entropy as an Evaluation Metric for Classification Models</title><p>Using cross entropy for quality evaluation of a deep learning classifier provides several advantages [28 - 34]:</p><p>• Cross entropy, rooted in information theory, can be perceived as a measure of information gain or loss. It quantifies the information gained when the true class label is disclosed, taking into account the predicted probabilities.</p><p>• Cross entropy is very sensitive to prediction errors. Incorrect predictions made with confidence are penalized more heavily than predictions closer to the correct answer. This sensitivity makes it a valuable indicator when accurate classification is a priority.</p><p>• Cross entropy takes into account the probability distribution predicted by the classifier. It assesses the dissimilarity between the predicted probabilities and the true class labels. This approach offers a more detailed evaluation of the model’s confidence in its predictions.</p><p>• Cross entropy directly quantifies the likelihood of observed data based on predicted probabilities. This measurement evaluates how well the model’s predicted probabilities match the actual class labels and aids in the probabilistic interpretation of the classifier’s output.</p><p>• Cross entropy applies a logarithmic scaling to errors. This means that it penalizes even minor prediction errors, thus motivating the model to have greater confidence in its predictions.</p></sec></sec><sec id="s3"><title>3. RESULTS</title><p>The average accuracy of “Model A”, trained on 1000 images with imbalanced data for each lesion, and that of “Model B”, trained on 748 images with balanced data, was 0.974 and 0.954, respectively. The AUC values were 0.996 and 0.988 for “Model A” and “Model B”, respectively. The confusion matrices for both models are presented in <xref ref-type="fig" rid="fig2">Figure 2</xref>. The confusion matrices in the figure are cross tables that count the results of 10 subsets of the 10-fold cross validation. <xref ref-type="table" rid="table2">Table 2</xref> and <xref ref-type="table" rid="table3">Table 3</xref> show the results of cross entropy and the existing evaluation metrics (precision, recall, F1, and specificity) when each lesion is considered positive for Models A and B, respectively. The values in the last row of the respective table represent the average values for each metric.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Cross entropy and existing evaluation metrics for Model A</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Cross Entropy</th><th align="center" valign="middle" >Precision</th><th align="center" valign="middle" >Recall</th><th align="center" valign="middle" >F1</th><th align="center" valign="middle" >Specificity</th></tr></thead><tr><td align="center" valign="middle" >LUAD</td><td align="center" valign="middle" >0.136</td><td align="center" valign="middle" >0.962</td><td align="center" valign="middle" >0.976</td><td align="center" valign="middle" >0.969</td><td align="center" valign="middle" >0.980</td></tr><tr><td align="center" valign="middle" >LULC</td><td align="center" valign="middle" >0.204</td><td align="center" valign="middle" >0.972</td><td align="center" valign="middle" >0.941</td><td align="center" valign="middle" >0.957</td><td align="center" valign="middle" >0.994</td></tr><tr><td align="center" valign="middle" >Normal</td><td align="center" valign="middle" >0.004</td><td align="center" valign="middle" >0.995</td><td align="center" valign="middle" >0.995</td><td align="center" valign="middle" >0.995</td><td align="center" valign="middle" >0.991</td></tr><tr><td align="center" valign="middle" >LUSC</td><td align="center" valign="middle" >0.122</td><td align="center" valign="middle" >0.973</td><td align="center" valign="middle" >0.977</td><td align="center" valign="middle" >0.975</td><td align="center" valign="middle" >0.991</td></tr><tr><td align="center" valign="middle" >Average</td><td align="center" valign="middle" >0.117</td><td align="center" valign="middle" >0.976</td><td align="center" valign="middle" >0.972</td><td align="center" valign="middle" >0.974</td><td align="center" valign="middle" >0.989</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Cross entropy and existing evaluation metrics for Model B</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Cross Entropy</th><th align="center" valign="middle" >Precision</th><th align="center" valign="middle" >Recall</th><th align="center" valign="middle" >F1</th><th align="center" valign="middle" >Specificity</th></tr></thead><tr><td align="center" valign="middle" >LUAD</td><td align="center" valign="middle" >0.313</td><td align="center" valign="middle" >0.921</td><td align="center" valign="middle" >0.930</td><td align="center" valign="middle" >0.926</td><td align="center" valign="middle" >0.921</td></tr><tr><td align="center" valign="middle" >LULC</td><td align="center" valign="middle" >0.270</td><td align="center" valign="middle" >0.962</td><td align="center" valign="middle" >0.952</td><td align="center" valign="middle" >0.957</td><td align="center" valign="middle" >0.988</td></tr><tr><td align="center" valign="middle" >Normal</td><td align="center" valign="middle" >0.139</td><td align="center" valign="middle" >0.989</td><td align="center" valign="middle" >0.989</td><td align="center" valign="middle" >0.989</td><td align="center" valign="middle" >0.996</td></tr><tr><td align="center" valign="middle" >LUSC</td><td align="center" valign="middle" >0.268</td><td align="center" valign="middle" >0.947</td><td align="center" valign="middle" >0.947</td><td align="center" valign="middle" >0.947</td><td align="center" valign="middle" >0.982</td></tr><tr><td align="center" valign="middle" >Average</td><td align="center" valign="middle" >0.247</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.972</td></tr></tbody></table></table-wrap><p><xref ref-type="table" rid="table4">Table 4</xref> displays the accuracy and cross entropy for each of the 10 subsets in Model A. The “Average” in the last row represents the average cross entropy in each subset, while the “Average” in the rightmost column signifies the average cross entropy for each lesion prediction. Additionally, as an illustration of the existing evaluation metrics, <xref ref-type="table" rid="table5">Table 5</xref> and <xref ref-type="table" rid="table6">Table 6</xref> respectively present the calculated values for subsets No. 2 and No. 8, where accuracy was equivalent among the 10 subsets in <xref ref-type="table" rid="table4">Table 4</xref>. <xref ref-type="fig" rid="fig3">Figure 3</xref> illustrates the dimensionality reduction data distribution map after class classification for subsets No. 2 and No. 8, respectively.</p></sec><sec id="s4"><title>4. DISCUSSION</title><p>Generally, classification results obtained using the DCNN model are computed by tallying the number of correct answers/incorrect answers and summarizing them into a confusion matrix. Subsequently, various evaluation metrics are calculated. In this experiment, we created a confusion matrix (<xref ref-type="fig" rid="fig2">Figure 2</xref>) and computed various existing metrics (columns 3 to 6 of <xref ref-type="table" rid="table2">Table 2</xref> and <xref ref-type="table" rid="table3">Table 3</xref>). In addition to accuracy and AUC metrics, as evident from these tables, all the average values of the existing metrics are higher for Model A than for Model B. Consequently, when all metrics exhibit high values, it is generally straightforward to conclude that Model A is more accurate. However, it is crucial to selectively use evaluation metrics based on the specific purpose of the classification task. For instance, in the case of a 4-class classification of lung cancer with LULC as the positive class, as shown in the third row of the two tables, the recall for Model A is 0.941, while for Model B, it is 0.952, indicating a higher value for Model B. Conversely, precision and specificity are higher for Model A, and the F1 score remains the same. In such cases, it becomes challenging to determine which model can accurately distinguish LULC.</p><p>Furthermore, when considering “Normal” as the positive class, as indicated in the fourth row of <xref ref-type="table" rid="table2">Table 2</xref> and <xref ref-type="table" rid="table3">Table 3</xref>, Model B exhibits high specificity. In this case, Model B can be considered to be able to more accurately identify lung cancer from a group of lung cancer images. Consequently, relying solely on existing evaluation metrics derived from the confusion matrix has its limitations when assessing the performance of a model. Moreover, there is an issue wherein existing evaluation metrics do not furnish information about the distribution of probabilities, which represent the model outputs. Consequently, it is not possible to evaluate reliability based on confidence.</p><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Accuracy and cross entropy for Model A</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Subset No.</th><th align="center" valign="middle" >1</th><th align="center" valign="middle" >2</th><th align="center" valign="middle" >3</th><th align="center" valign="middle" >4</th><th align="center" valign="middle" >5</th><th align="center" valign="middle" >6</th><th align="center" valign="middle" >7</th><th align="center" valign="middle" >8</th><th align="center" valign="middle" >9</th><th align="center" valign="middle" >10</th><th align="center" valign="middle" >Average</th></tr></thead><tr><td align="center" valign="middle" >Accuracy</td><td align="center" valign="middle" >0.970</td><td align="center" valign="middle" >0.980</td><td align="center" valign="middle" >0.960</td><td align="center" valign="middle" >0.940</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >0.990</td><td align="center" valign="middle" >0.980</td><td align="center" valign="middle" >0.990</td><td align="center" valign="middle" >0.930</td><td align="center" valign="middle" >0.974</td></tr><tr><td align="center" valign="middle" >LUAD</td><td align="center" valign="middle" >0.013</td><td align="center" valign="middle" >0.160</td><td align="center" valign="middle" >0.032</td><td align="center" valign="middle" >0.233</td><td align="center" valign="middle" >0.009</td><td align="center" valign="middle" >0.005</td><td align="center" valign="middle" >0.015</td><td align="center" valign="middle" >0.210</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.688</td><td align="center" valign="middle" >0.136</td></tr><tr><td align="center" valign="middle" >LULC</td><td align="center" valign="middle" >0.492</td><td align="center" valign="middle" >0.002</td><td align="center" valign="middle" >0.641</td><td align="center" valign="middle" >0.237</td><td align="center" valign="middle" >0.005</td><td align="center" valign="middle" >0.001</td><td align="center" valign="middle" >0.015</td><td align="center" valign="middle" >0.032</td><td align="center" valign="middle" >0.146</td><td align="center" valign="middle" >0.467</td><td align="center" valign="middle" >0.204</td></tr><tr><td align="center" valign="middle" >Normal</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.041</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.004</td></tr><tr><td align="center" valign="middle" >LUSC</td><td align="center" valign="middle" >0.603</td><td align="center" valign="middle" >0.108</td><td align="center" valign="middle" >0.026</td><td align="center" valign="middle" >0.185</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.012</td><td align="center" valign="middle" >0.028</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >0.256</td><td align="center" valign="middle" >0.122</td></tr><tr><td align="center" valign="middle" >Average</td><td align="center" valign="middle" >0.277</td><td align="center" valign="middle" >0.067</td><td align="center" valign="middle" >0.175</td><td align="center" valign="middle" >0.164</td><td align="center" valign="middle" >0.004</td><td align="center" valign="middle" >0.002</td><td align="center" valign="middle" >0.020</td><td align="center" valign="middle" >0.068</td><td align="center" valign="middle" >0.037</td><td align="center" valign="middle" >0.353</td><td align="center" valign="middle" >0.117</td></tr></tbody></table></table-wrap><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Calculated values for subset No. 2 of the 10 subsets from <xref ref-type="table" rid="table4">Table 4</xref>, where the accuracy of the subset is 0.98</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Cross Entropy</th><th align="center" valign="middle" >Precision</th><th align="center" valign="middle" >Recall</th><th align="center" valign="middle" >F1</th><th align="center" valign="middle" >Specificity</th></tr></thead><tr><td align="center" valign="middle" >LUAD</td><td align="center" valign="middle" >0.160</td><td align="center" valign="middle" >0.971</td><td align="center" valign="middle" >0.971</td><td align="center" valign="middle" >0.971</td><td align="center" valign="middle" >0.985</td></tr><tr><td align="center" valign="middle" >LULC</td><td align="center" valign="middle" >0.002</td><td align="center" valign="middle" >0.950</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >0.974</td><td align="center" valign="middle" >0.988</td></tr><tr><td align="center" valign="middle" >LUSC</td><td align="center" valign="middle" >0.108</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >0.962</td><td align="center" valign="middle" >0.980</td><td align="center" valign="middle" >1.000</td></tr><tr><td align="center" valign="middle" >Normal</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >1.000</td></tr><tr><td align="center" valign="middle" >Average</td><td align="center" valign="middle" >0.067</td><td align="center" valign="middle" >0.980</td><td align="center" valign="middle" >0.983</td><td align="center" valign="middle" >0.982</td><td align="center" valign="middle" >0.993</td></tr></tbody></table></table-wrap><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> Calculated values for subset No. 8 of the 10 subsets from <xref ref-type="table" rid="table4">Table 4</xref>, where the accuracy of the subset is 0.98</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Cross Entropy</th><th align="center" valign="middle" >Precision</th><th align="center" valign="middle" >Recall</th><th align="center" valign="middle" >F1</th><th align="center" valign="middle" >Specificity</th></tr></thead><tr><td align="center" valign="middle" >LUAD</td><td align="center" valign="middle" >0.210</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >0.941</td><td align="center" valign="middle" >0.970</td><td align="center" valign="middle" >1.000</td></tr><tr><td align="center" valign="middle" >LULC</td><td align="center" valign="middle" >0.032</td><td align="center" valign="middle" >0.947</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >0.973</td><td align="center" valign="middle" >0.988</td></tr><tr><td align="center" valign="middle" >LUSC</td><td align="center" valign="middle" >0.028</td><td align="center" valign="middle" >0.963</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >0.981</td><td align="center" valign="middle" >0.986</td></tr><tr><td align="center" valign="middle" >Normal</td><td align="center" valign="middle" >0.000</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >1.000</td><td align="center" valign="middle" >1.000</td></tr><tr><td align="center" valign="middle" >Average</td><td align="center" valign="middle" >0.068</td><td align="center" valign="middle" >0.978</td><td align="center" valign="middle" >0.985</td><td align="center" valign="middle" >0.981</td><td align="center" valign="middle" >0.994</td></tr></tbody></table></table-wrap><p>The second column in <xref ref-type="table" rid="table2">Table 2</xref> and <xref ref-type="table" rid="table3">Table 3</xref> represents the cross entropy for each class calculated using the probability distribution output by the model. Cross entropy indicates how closely the predicted probability distribution aligns with the true distribution, with lower values indicating proximity to the true distribution and higher values indicating deviation from it. In essence, it quantitatively demonstrates the reliability of the model’s predictions. From the two tables, the average cross-entropy values are 0.117 for Model A and 0.247 for Model B. This result suggests that Model A is closer to the true probability distribution, signifying lower uncertainty. Similarly, even when considering any of the lesions as positive, it can be stated that Model A exhibits lower uncertainty. By employing cross entropy as an evaluation metric in this manner, it becomes possible to compare the uncertainty between multiple models and assess their reliability.</p><p>In DCNN classification, the classification results may be influenced by the imbalance in the training data. <xref ref-type="table" rid="table4">Table 4</xref> illustrates the cross entropy for each subset in the 10-fold cross-validation of Model A. In subset No. 1, the cross-entropy value (0.603) for LUSC is significantly higher compared to LUSC in the other subsets. Regarding LUAD, the value in subset No. 10 (0.688) is comparably high. These findings suggest that predictions for these lesions are uncertain (ambiguous), indicating a bias in the data. This outcome implies that using cross entropy as a metric can prompt a reevaluation of the data, leading to an improvement in data quality. For example, the accuracy for both subsets No. 5 and No. 6 is 1.0, but their respective cross entropy values differ. This result demonstrates that even if all class classifications are correct, varying levels of uncertainty exist. <xref ref-type="table" rid="table5">Table 5</xref> and <xref ref-type="table" rid="table6">Table 6</xref> compare subsets No. 2 and No. 8, both having an accuracy of 0.98. With existing evaluation metrics, interpreting which specific metrics should be used to assess performance becomes challenging.</p><p>On the contrary, utilizing cross entropy facilitates a straightforward comparative evaluation. In subsets No. 2 and No. 8, the average cross entropy for the 4-class classification is 0.067 and 0.068, respectively, indicating nearly equivalent performance. However, the cross-entropy values for each class of lesions differ, signifying distinct predictive uncertainties. For instance, in the prediction of LUAD, the cross-entropy value (0.210) in subset No. 8 (<xref ref-type="table" rid="table6">Table 6</xref>) is higher than that (0.160) in subset No. 2 (<xref ref-type="table" rid="table5">Table 5</xref>). A similar pattern is observed in the case of LULC, where the cross-entropy value is higher in subset No. 8. This implies that the predictions for LUAD and LULC are more ambiguous (with higher uncertainty) in subset No. 8, as compared to subset No. 2. As for LUSC, the cross-entropy value is higher in subset No. 2, indicating greater uncertainty in the predictions for this subset.</p><p>The visualization of these data distributions is presented in <xref ref-type="fig" rid="fig3">Figure 3</xref>. In subset No. 8 (<xref ref-type="fig" rid="fig3">Figure 3</xref>(b)), two LUAD data points (blue) are intertwined with the clusters of LULC (red) and LUSC (purple). The cross entropy for LUAD in this subset is 0.210, indicating the highest level of uncertainty in the predictions. With this mixture, it can be inferred that the uncertainty of LULC (red) and LUSC (purple) has increased. In subset No. 2 (<xref ref-type="fig" rid="fig3">Figure 3</xref>(a)), there are isolated points in LUAD (blue) and LUSC (purple) respectively. From the distribution of LUAD (blue) data points, it is apparent that there is little influence on other clusters, but there is some uncertainty in the predictions. On the other hand, the isolated points in LUSC (purple) indicate uncertainty in the predictions and may also affect the prediction of LUAD (blue). Thus, cross entropy has the capability to capture the uncertainty (ambiguity) in class-specific predictions, which cannot be determined by existing evaluation metrics. Based on these results, we believe that cross entropy is a highly useful metric for evaluating model reliability.</p><p>The accuracy of cross-entropy used for classifying lung cancer in CT images depends on various factors, including the quality of the dataset, the complexity of the model architecture, and the overall experimental setup. However, the main purpose of our paper is to use cross entropy as a performance metric for quantifying uncertainty in DNN image classifiers. Thus, we have refrained from delving into detailed discussion on this matter as it lies beyond the scope addressed in this work. Forecast uncertainty in medical image classification tasks can arise from various factors. For example, limited data availability, data quality and variability, class imbalance, artifact presence, model complexity, etc. Since our paper mainly focus on the application of cross entropy to classification of lung cancer, discussion on what factor contribute to prediction uncertainty was not detailed conducted.</p><p>Cross entropy is considered a useful evaluation metric for the performance of a multi-class classifier. However, it does have some limitations. First, it is somewhat sensitive to class imbalance. If there is a significant imbalance in the distribution of classes in the dataset, the model may be biased towards the majority class. Second, while cross entropy provides a measure of how well the predicted probabilities match the true distribution of classes, it does not offer direct interpretability. Third, cross entropy assumes that the predictions for each class are independent of each other. In some real-world scenarios, classes may be correlated, and this assumption may not hold. In spite of these limitations, cross entropy is considered a valuable model evaluation metric due to its simplicity and effectiveness. However, it is essential to complement its use with other evaluation metrics for a more comprehensive assessment.</p></sec><sec id="s5"><title>5. CONCLUSION</title><p>In this study, we proposed the utilization of cross entropy, known as the loss function for DNN models, as one of the performance evaluation metrics for the models. We applied this metric to the classification of lung cancer in CT images. As a result, we demonstrated that it is possible to quantitatively depict the uncertainty of predictions based on the differences in probability distributions in the model’s output. Particularly in multi-class classification tasks, it was possible to demonstrate uncertainty for each class. Furthermore, by mapping the class classification results into two-dimensional data, we were able to visually interpret the prediction uncertainty indicated by cross-entropy values. Based on these results, cross entropy is considered a very useful metric for evaluating model reliability. However, for a more comprehensive evaluation of DNN model performance, it is essential to use cross entropy in conjunction with other evaluation metrics.</p></sec><sec id="s6"><title>ACKNOWLEDGEMENTS</title><p>This work was supported in part by JSPS KAKENHI (Grant-in-Aid for Scientific Research) Grant Number 18K15641.</p></sec><sec id="s7"><title>CONFLICTS OF INTEREST</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s8"><title>REFERENCES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.130521-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Siegel, R.L., Miller, K.D., Wagle, N.S. and Jemal, A. (2023) Cancer Statistics, 2023. CA: A Cancer Journal for Clinicians, 73, 17-48. https://doi.org/10.3322/caac.21763</mixed-citation></ref><ref id="scirp.130521-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Xu, R., Lu, T., Wang, C., Li, Q., Peng, B., Zhao, J., et al. (2023) Single-Cell Data Analysis of Malignant Epithelial Cell Heterogeneity in Lung Adenocarcinoma for Patient Classification and Prognosis Prediction. Heliyon, 9, e20164. https://doi.org/10.1016/j.heliyon.2023.e20164</mixed-citation></ref><ref id="scirp.130521-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Niu, Z., Jin, R., Zhang, Y. and Li, H. (2022) Signaling Pathways and Targeted Therapies in Lung Squamous Cell Carcinoma: Mechanisms and Clinical Trials. Signal Transduction and Targeted Therapy, 7, 353. https://doi.org/10.1038/s41392-022-01200-x</mixed-citation></ref><ref id="scirp.130521-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Copin, M.-C. (2016) Carcinome à Grandes Cellules, Carcinome Lymphoepithelioma-Like, Carcinome NUT Large Cell Carcinoma, Lymphoepithelioma-Like Carcinoma, NUT Carcinoma. Annales de Pathologie, 36, 24-33. https://doi.org/10.1016/j.annpat.2015.11.006</mixed-citation></ref><ref id="scirp.130521-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">The National Lung Screening Trial Research Team (2011) Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening. The New England Journal of Medicine, 365, 395-409. https://doi.org/10.1056/NEJMoa1102873</mixed-citation></ref><ref id="scirp.130521-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Aberle, D.R., DeMello, S., Berg, C.D., Black, W.C., Brewer, B., Church, T.R., et al. (2013) Results of the Two Incidence Screenings in the National Lung Screening Trial. The New England Journal of Medicine, 369, 920-931. https://doi.org/10.1056/NEJMoa1208962</mixed-citation></ref><ref id="scirp.130521-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">The National Lung Screening Trial Research Team (2013) Results of Initial Low-Dose Computed Tomographic Screening for Lung Cancer. The New England Journal of Medicine, 368, 1980-1991. https://doi.org/10.1056/NEJMoa1209120</mixed-citation></ref><ref id="scirp.130521-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Kramer, B.S., Berg, C.D., Aberle, D.R. and Prorok, P.C. (2011) Lung Cancer Screening with Low-Dose Helical CT: Results from the National Lung Screening Trial (NLST). Journal of Medical Screening, 18, 109-111. https://doi.org/10.1258/jms.2011.011055</mixed-citation></ref><ref id="scirp.130521-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Midthun, D.E. (2011) Screening for Lung Cancer. Clinics in Chest Medicine, 32, 659-668. https://doi.org/10.1016/j.ccm.2011.08.014</mixed-citation></ref><ref id="scirp.130521-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Goo, J.M. (2011) A Computer-Aided Diagnosis for Evaluating Lung Nodules on Chest CT: The Current Status and Perspective. Korean Journal of Radiology, 12, 145-155. https://doi.org/10.3348/kjr.2011.12.2.145</mixed-citation></ref><ref id="scirp.130521-ref11"><label>11</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Suzuki</surname><given-names> K. </given-names></name>,<etal>et al</etal>. (<year>2012</year>)<article-title>A Review of Computer-Aided Diagnosis in Thoracic and Colonic Imaging</article-title><source> Quantitative Imaging in Medicine and Surgery</source><volume> 2</volume>,<fpage> 163</fpage>-<lpage>176</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.130521-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">El-Baz, A., Beache, G.M., Gimel’farb, G., Suzuki, K., Okada, K., et al. (2013) Computer-Aided Diagnosis Systems for Lung Cancer: Challenges and Methodologies. International Journal of Biomedical Imaging, 2013, Article ID: 942353. https://doi.org/10.1155/2013/942353</mixed-citation></ref><ref id="scirp.130521-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Retico, A. (2013) Computer-Aided Detection for Pulmonary Nodule Identification: Improving the Radiologist’s Performance? Imaging in Medicine, 5, 249-263. https://doi.org/10.2217/iim.13.24</mixed-citation></ref><ref id="scirp.130521-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Firmino, M., Morais, A.H., Mendoca, R.M., Dantas, M.R., Hekis, H.R. and Valentim, R. (2014) Computer-Aided Detection System for Lung Cancer in Computed Tomography Scans: Review and Future Prospective. Biomedical Engineering Online, 13, 1-16. https://doi.org/10.1186/1475-925X-13-41</mixed-citation></ref><ref id="scirp.130521-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. (2017) Attention Is All You Need.</mixed-citation></ref><ref id="scirp.130521-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., et al. (2020) An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale.</mixed-citation></ref><ref id="scirp.130521-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A. and Mougiakakou, S. (2016) Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network. IEEE Transactions on Medical Imaging, 35, 1207-1216. https://doi.org/10.1109/TMI.2016.2535865</mixed-citation></ref><ref id="scirp.130521-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Gao, M., Bagci, U., Lu, L., Wu, A., Buty, M., Shin, H.C., et al. (2018) Holistic Classification of CT Attenuation Patterns for Interstitial Lung Diseases via Deep Convolutional Neural Networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging &amp; Visualization, 6, 1-6. https://doi.org/10.1080/21681163.2015.1124249</mixed-citation></ref><ref id="scirp.130521-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Matsuyama, E., Lee, Y., Takahashi, N. and Tsai, D.Y. (2019）A Wavelet Coefficient-Based Convolutional Neural Network for Histological Classification of Lung Cancer in CT Images. Japanese Journal of Imaging and Information Sciences in Medicine (In Japanese), 36, 64-71.</mixed-citation></ref><ref id="scirp.130521-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Zech, J.R., Badgeley, M.A., Liu, M., Costa, A.B., Titano, J. and Oermann, E.K. (2018) Variable Generalization Performance of a Deep Learning Model to Detect Pneumonia in Chest Radiographs: A Cross-Sectional Study. PLOS Medicine, 15, e1002683. https://doi.org/10.1371/journal.pmed.1002683</mixed-citation></ref><ref id="scirp.130521-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Ovadia, Y., Fertig, E., Ren, J., Nado. Z., Sculley, D., Nowozin, S., et al. (2019) Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty under Dataset Shift. 33rd International Conference on Neural Information Processing Systems, Vancouver, 8-14 December 2019, 13969-13980.</mixed-citation></ref><ref id="scirp.130521-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Guo, C., Pleiss, G., Sun Y. and Weinberger, K.Q. (2017）On Calibration of Modern Neural Networks. Proceedings of the 34th International Conference on Machine Learning, Sydney, Vol. 70, 1321-1330. https://proceedings.mlr.press/v70/guo17a.html</mixed-citation></ref><ref id="scirp.130521-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Gal, Y. and Ghahramani, Z. (2016) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. Proceedings of the 33rd International Conference on Machine Learning, Vol. 48, 1050-1059.</mixed-citation></ref><ref id="scirp.130521-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Chest CT-Scan Images Dataset. https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images</mixed-citation></ref><ref id="scirp.130521-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">He, K., Zhang, X., Ren, S. and Sun, J. (2015) Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 27-30 June 2016, 770-778. https://doi.org/10.1109/CVPR.2016.90</mixed-citation></ref><ref id="scirp.130521-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Narayanan, B.N., De Silva, M.S., Hardie, R.C., Kueterman, N.K. and Ali, R. (2019) Understanding Deep Neural Network Predictions for Medical Imaging Applications.</mixed-citation></ref><ref id="scirp.130521-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">van der Maaten, L.J.P. and Hinton, G.E. (2008) Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research, 9, 2579-2605.</mixed-citation></ref><ref id="scirp.130521-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Shan, B. and Fang, Y. (2020) A Cross Entropy Based Deep Neural Network Model for Road Extraction from Satellite Images. Entropy, 22, Article No. 535. https://doi.org/10.3390/e22050535</mixed-citation></ref><ref id="scirp.130521-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Kurian, N.C., Meshram, P.S., Patil, A., Patel S. and Sethi, A. (2021) Sample Specific Generalized Cross Entropy for Robust Histology Image Classification. 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Nice, 13-16 April 2021, 1934-1938. https://doi.org/10.1109/ISBI48211.2021.9434169</mixed-citation></ref><ref id="scirp.130521-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Mannor, S., Peleg, D. and Rubinstein, R. (2005) The Cross Entropy Method for Classification. Proceedings of the 22nd International Conference on Machine Learning, Bonn, 7-11 August 2005, 561-568. https://doi.org/10.1145/1102351.1102422</mixed-citation></ref><ref id="scirp.130521-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Brownlee, J. (2020) A Gentle Introduction to Cross-Entropy for Machine Learning. https://machinelearningmastery.com/cross-entropy-for-machine-learning/</mixed-citation></ref><ref id="scirp.130521-ref32"><label>32</label><mixed-citation publication-type="other" xlink:type="simple">Mao, A., Mohri, M. and Zhong, Y. (2023) Cross-Entropy Loss Functions: Theoretical Analysis and Applications. Proceedings of the 40th International Conference on Machine Learning, Honolulu, Vol. 202, 23803-23828. https://proceedings.mlr.press/v202/mao23b/mao23b.pdf</mixed-citation></ref><ref id="scirp.130521-ref33"><label>33</label><mixed-citation publication-type="other" xlink:type="simple">Nova (2023) A Comprehensive Guide to Cross Entropy in Machine Learning. https://aitechtrend.com/a-comprehensive-guide-to-cross-entropy-in-machine-learning/</mixed-citation></ref><ref id="scirp.130521-ref34"><label>34</label><mixed-citation publication-type="other" xlink:type="simple">Sheikh, I. (2023) Understanding Cross-Entropy Loss and Its Role in Classification Problems. https://medium.com/@l228104/understanding-cross-entropy-loss-and-its-role-in-classification-problems-d2550f2caad5</mixed-citation></ref></ref-list></back></article>