<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">ABCR</journal-id><journal-title-group><journal-title>Advances in Breast Cancer Research</journal-title></journal-title-group><issn pub-type="epub">2168-1589</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/abcr.2015.41001</article-id><article-id pub-id-type="publisher-id">ABCR-53047</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Medicine&amp;Healthcare</subject></subj-group></article-categories><title-group><article-title>
 
 
  Observer Variability in BI-RADS Ultrasound Features and Its Influence on Computer-Aided Diagnosis of Breast Masses
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>aith</surname><given-names>R. Sultan</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ghizlane</surname><given-names>Bouzghar</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Benjamin</surname><given-names>J. Levenback</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Nauroze</surname><given-names>A. Faizi</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Santosh</surname><given-names>S. Venkatesh</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Emily</surname><given-names>F. Conant</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Chandra</surname><given-names>M. Sehgal</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Radiology, University of Pennsylvania, Philadelphia, USA</addr-line></aff><aff id="aff2"><addr-line>Department of Electrical Engineering, University of Pennsylvania, Philadelphia, USA</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>lsultan@mail.med.upenn.edu(ARS)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>09</day><month>01</month><year>2015</year></pub-date><volume>04</volume><issue>01</issue><fpage>1</fpage><lpage>8</lpage><history><date date-type="received"><day>25</day>	<month>November</month>	<year>2014</year></date><date date-type="rev-recd"><day>20</day>	<month>December</month>	<year>2014</year>	</date><date date-type="accepted"><day>31</day>	<month>December</month>	<year>2014</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Objective: Computer classification of sonographic BI-RADS features can aid differentiation of the malignant and benign masses. However, the variability in the diagnosis due to the differences in the observed features between the observations is not known. The goal of this study is to measure the variation in sonographic features between multiple observations and determine the effect of features variation on computer-aided diagnosis of the breast masses. Materials and Methods: Ultrasound images of biopsy proven solid breast masses were analyzed in three independent observations for BI-RADS sonographic features. The BI-RADS features from each observation were used with Bayes classifier to determine probability of malignancy. The observer agreement in the sonographic features was measured by kappa coefficient and the difference in the diagnostic performances between observations was determined by the area under the ROC curve, Az, and interclass correlation coefficient. Results: While some features were repeatedly observed, κ = 0.95, other showed a significant variation, κ = 0.16. For all features, combined intra-observer agreement was substantial, κ = 0.77. The agreement, however, decreased steadily to 0.66 and 0.56 as time between the observations increased from 1 to 2 and 3 months, respectively. Despite the variation in features between observations the probabilities of malignancy estimates from Bayes classifier were robust and consistently yielded same level of diagnostic performance, Az was 0.772-0.817 for sonographic features alone and 0.828-0.849 for sonographic features and age combined. The difference in the performance, ΔAz, between the observations for the two groups was small (0.003-0.044) and was not statistically significant (p &lt; 0.05). Interclass correlation coefficient for the observations was 0.822 (CI: 0.787-0.853) for BI-RADS sonographic features alone and for those combined with age was 0.833 (CI: 0.800-0.862). Conclusion: Despite the differences in the BI-RADS sonographic features between different observations, the diagnostic performance of computer-aided analysis for differentiating breast masses did not change. Through continual retraining, the computer-aided analysis provides consistent diagnostic performance independent of the variations in the observed sonographic features.
 
</p></abstract><kwd-group><kwd>Breast Imaging</kwd><kwd> Breast Cancer</kwd><kwd> Observer Variability</kwd><kwd> Computer-Aided Diagnosis</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Despite major advances in diagnostic breast cancer imaging, the yield for biopsying a breast lesion is still low and up to 85% of biopsies are found to be benign [<xref ref-type="bibr" rid="scirp.53047-ref1">1</xref>] . There continues to be a need for further innovations to improve confidence and reliability of breast imaging. In this context, several studies have proposed the use of computer algorithms and machine learning methods to improve the diagnostic value of breast ultrasound [<xref ref-type="bibr" rid="scirp.53047-ref2">2</xref>] - [<xref ref-type="bibr" rid="scirp.53047-ref7">7</xref>] . These computer based systems can serve as a second reader to decrease false positive rates of breast images [<xref ref-type="bibr" rid="scirp.53047-ref2">2</xref>] . In our earlier study, we introduced an approach that combines individual sonographic features quantitatively by machine learning to determine the probability of malignancy of solid breast masses [<xref ref-type="bibr" rid="scirp.53047-ref7">7</xref>] . The results show that the Bayesian method of weighting provides a systematic approach for combining ultrasound BI-RADS features yielding a high level of diagnostic performance, with an A<sub>z</sub> of approximately 0.884. While the results are encouraging, variability in the diagnostic performance on repeated assessments is not known. The goal of this study was to determine the extent of variation in the computer-aided diagnosis between repeated interpretations of the breast ultrasound images. In brief, the variability in the diagnosis can result from two factors: 1) differences in feature selection and 2) differences in weighting of the individual features contributing to overall estimate of the probability of malignancy. In this study we investigate the role of both the factors. First, the observer variability in feature selection from three observations of the ultrasound images was measured by inter-rater kappa statistics. Second, the sonographic features from each observation were combined using Bayes model to determine the probability of malignancy. The diagnostic performances of the probability estimates of three observations were compared to determine diagnostic variability. Since the predictive values of the sonographic features are influenced by the age of the patients [<xref ref-type="bibr" rid="scirp.53047-ref7">7</xref>] , we also evaluated the diagnostic performance of the sonographic features in conjunction with the patient age.</p></sec><sec id="s2"><title>2. Materials and Methods</title><sec id="s2_1"><title>2.1. Image Acquisition and Analysis</title><p>This retrospective study was approved by institutional Review Board. 264 masses were obtained from 248 female patients with biopsy-proven solid masses and known mammographic BI-RADS. Sonographic images were acquired using broadband 12 - 5 MHz transducer and a Philips ATL 5000 scanner. 5 to 7 B-Scan ultrasound images including color Doppler were acquired per patient in radial and anti-radial planes.</p><p>Images were analyzed using the ACR BI-RADS ultrasound lexicon [<xref ref-type="bibr" rid="scirp.53047-ref8">8</xref>] . According to this lexicon, sonographic features of a solid breast mass [<xref ref-type="bibr" rid="scirp.53047-ref9">9</xref>] are grouped into shape, orientation, margin, lesion boundary, echo pattern, and posterior acoustic features. The observer with three-years prior training in general radiology underwent a self study session of the BI-RADS lexicon descriptors and of the training cases of breast images with known BI-RADS and pathology. The observer was blinded to patient age, race, physical examination, family history, mammographic report, and histological diagnosis during analysis.</p><p>The BI-RADS features assessment was repeated two more times after the initial assessment. The second observation (observation 2) was one month from the initial observation (observation 1) and the third observation (observation 3) was three months later. In all three observations the same image data was analyzed where the cases were presented to the observer in a random order.</p><p>Agreement in the BI-RADS features was determined by kappa statistics which assesses the inter-rater agreement beyond that is expected by chance [<xref ref-type="bibr" rid="scirp.53047-ref10">10</xref>] . According to this approach, κ = 1 corresponds to complete agreement whereas κ = 0 represents an agreement comparable to chance. The intermediate values between 0 and 1 represent the degree of agreement. On a five scale system described by Landis and Koch [<xref ref-type="bibr" rid="scirp.53047-ref11">11</xref>] , kappa values 0.01 - 0.20, 0.21 - 0.40, 0.41 - 0.60, 0.61 - 0.80 and 0.81 - 1.00 were designated to indicate slight, fair, moderate, substantial, and almost perfect agreement, respectively. Both individual features agreement values and all features combined (overall) agreement values were calculated.</p></sec><sec id="s2_2"><title>2.2. Computer-Aided Analysis</title><p>The sonographic BI-RADS features were used with machine learning algorithm to determine probability of malignancy. This involved training the algorithm using cases with known features and diagnosis. Following the training the algorithm was tested on the unknown cases to predict the probability of malignancy. The predicted values were compared with the biopsy results. The training and testing were performed by using leave-one- sample out cross validation. This involved training the algorithm on all cases of the database except one and predicting the outcome of the remaining last case. The process of training and testing was repeated recursively until the entire dataset has been analyzed. Training and testing was performed by using Bayes model in which the probability of an event (malignancy) is revised based on the accumulation of new evidence (detection of sonographic features). Bayes probability of malignancy in the presence of sonographic features <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-2470086x6.png" xlink:type="simple"/></inline-formula> was determined by the approach described earlier [<xref ref-type="bibr" rid="scirp.53047-ref12">12</xref>] . In short, it was determined by multiplying initial estimate of probability <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-2470086x7.png" xlink:type="simple"/></inline-formula> with the probabilities that feature F<sub>i</sub> is present in the malignant mass<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-2470086x8.png" xlink:type="simple"/></inline-formula>. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-2470086x9.png" xlink:type="simple"/></inline-formula>was determined by dividing the ratio of number of malignant cases with feature F<sub>i</sub> over the total number of malignant cases. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-2470086x10.png" xlink:type="simple"/></inline-formula>was determined by the ratio of number of malignant cases to the total number of cases studied. The diagnostic performance of the Bayes probabilities <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/1-2470086x11.png" xlink:type="simple"/></inline-formula> was measured by calculating the area under the ROC curve (A<sub>z</sub>), the standard error, and the 95% confidence intervals [MedCalc Software, Ostend, Belgium].</p><p>The statistical difference between the diagnostic performances of the three observations was determined based on p-values [<xref ref-type="bibr" rid="scirp.53047-ref13">13</xref>] . A p-value less than 0.05 was considered to be statistical significant. Additionally, interclass correlation coefficients of the probability estimates were calculated as a measure of the consistency of the diagnostic performance in the three observations.</p></sec></sec><sec id="s3"><title>3. Results</title><sec id="s3_1"><title>3.1. General Characteristics</title><p>Of the 264 lesions, 85 (32%) were malignant and 179 (68%) were benign. Among the malignant lesions, invasive ductal carcinoma was the most common 65 (76%). Other diagnoses included invasive lobular carcinoma 7 (8%), ductal carcinoma in situ 7 (8%) including one papillary carcinoma in situ case, adenocarcinoma 3 (3%), two poorly differentiated carcinomas and one remaining case which was diagnosed as mucinous mammary carcinoma (a rare form of invasive ductal carcinoma). Of the benign masses, 44% were found to be fibroadenomas, 33% were identified as miscellaneous fibrocystic changes, 6% were sclerosing adenosis, and the remaining 17% were identified as benign lesions without atypia in the histopathology report. The mean (&#177;standard deviation) age of all the patient population was 51.5 &#177; 14.7 years. The mean age of patients with malignant masses was 58.8 &#177; 12.1 years compared to 48.0 &#177; 14.5 years for benign cases. The difference in the mean age of the two groups was statistically significant (p = 0.0001).</p></sec><sec id="s3_2"><title>3.2. Agreement in BI-RADS Feature Selection</title><p><xref ref-type="fig" rid="fig1">Figure 1</xref> shows examples of two breast lesions with high and low agreement in feature selection between three observations. Features like oval shape, microlobulation and hypoechogencity were consistently observed in all three readings in the image shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>(a). On the other hand, considerable variation in lesion orientation and margin features was observed between observations in the image shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>(b). The results on agreement for each BI-RADS feature for all the cases are summarized in <xref ref-type="table" rid="table1">Table 1</xref>. κ for the individual features ranged from 0.16 to 0.95. The highest intra-observer agreement was found to be on the lesion echo pattern with κ between 0.69 and 0.98 for the three observations. The feature which showed the lowest agreement value was lesion boundary with κ between 0.15 and 0.53.</p><p>When all the features were investigated collectively, the overall intra-observer agreement between observa-</p><fig-group id="fig1"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> (a) Example of a breast mass that showed high agreement in sonographic features selected between the three observations; (b) Example of a breast lesion that showed lowest agreement in features selected over the three observations.</title></caption><fig id ="fig1_1"><label> (b)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-2470086x12.png"/></fig><fig id ="fig1_2"><label></label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-2470086x13.png"/></fig></fig-group><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Intra-observer agreement values for BI-RADS US descriptors. The term “overall” represents agreement in all the features together. O1, O2, and O3 refer to first, second and third observations respectively</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Feature</th><th align="center" valign="middle" >O1 vs. O2 (1 month interval) (κ)</th><th align="center" valign="middle" >O2 vs. O3 (2 months interval) (κ)</th><th align="center" valign="middle" >O3 vs. O1 (3 months interval) (κ)</th><th align="center" valign="middle" >Intra-observer (κ) [<xref ref-type="bibr" rid="scirp.53047-ref15">15</xref>]</th><th align="center" valign="middle" >Intra-observer (κ) [<xref ref-type="bibr" rid="scirp.53047-ref16">16</xref>]</th></tr></thead><tr><td align="center" valign="middle" >Shape</td><td align="center" valign="middle" >0.51</td><td align="center" valign="middle" >0.75</td><td align="center" valign="middle" >0.46</td><td align="center" valign="middle" >0.71</td><td align="center" valign="middle" >0.7 3</td></tr><tr><td align="center" valign="middle" >Orientation</td><td align="center" valign="middle" >0.65</td><td align="center" valign="middle" >0.71</td><td align="center" valign="middle" >0.56</td><td align="center" valign="middle" >0.83</td><td align="center" valign="middle" >0.68</td></tr><tr><td align="center" valign="middle" >Boundary</td><td align="center" valign="middle" >0.16</td><td align="center" valign="middle" >0.53</td><td align="center" valign="middle" >0.15</td><td align="center" valign="middle" >0.85</td><td align="center" valign="middle" >0.68</td></tr><tr><td align="center" valign="middle" >Echo pattern</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >0.70</td><td align="center" valign="middle" >0.69</td><td align="center" valign="middle" >0.67</td><td align="center" valign="middle" >0.65</td></tr><tr><td align="center" valign="middle" >Posterior acoustic features</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >0.69</td><td align="center" valign="middle" >0.67</td><td align="center" valign="middle" >0.82</td><td align="center" valign="middle" >0.64</td></tr><tr><td align="center" valign="middle" >Margin</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.56</td><td align="center" valign="middle" >0.56</td><td align="center" valign="middle" >0.59</td><td align="center" valign="middle" >0.64</td></tr><tr><td align="center" valign="middle" >Overall</td><td align="center" valign="middle" >0.77 (Substantial)</td><td align="center" valign="middle" >0.66 (Substantial)</td><td align="center" valign="middle" >0.56 (Moderate)</td><td align="center" valign="middle" >0.77 (Substantial)</td><td align="center" valign="middle" >0.74 (Substantial)</td></tr></tbody></table></table-wrap><p>tions 1 and 2 made at an interval of 1 month was 0.77. κ for the agreement between observations 2 and 3 made at a time interval of 2 months was 0.66. For the time interval of 3 months between observations (observation 1 and observation 3) the agreement reduced to 0.56. Thus there was a progressive decrease in agreement (κ) as the time interval between the observations increased from 1 month to 3 months (<xref ref-type="table" rid="table1">Table 1</xref>).</p></sec><sec id="s3_3"><title>3.3. Diagnostic Performance Analysis</title><p>The area under the ROC curve for the ultrasound features alone ranged from 0.772 to 0.817 for the three observations (<xref ref-type="table" rid="table2">Table 2</xref> and <xref ref-type="fig" rid="fig2">Figure 2</xref>). The difference in the performance (ΔA<sub>z</sub>) between the observations was small (0.013 to 0.044) and not statistically significant (p &gt; 0.05, <xref ref-type="table" rid="table2">Table 2</xref>). The diagnostic performance increased markedly (range: 0.828 - 0.849, <xref ref-type="table" rid="table3">Table 3</xref> and <xref ref-type="fig" rid="fig3">Figure 3</xref>) when the age was included as a risk factor in estimating probability of malignancy. Similar to sonographic features alone, ΔA<sub>z</sub> for sonographic features plus age was small (0.003 - 0.021, <xref ref-type="table" rid="table3">Table 3</xref>) and not statistically significant. Inter class correlation coefficient for the three observations was 0.822 (95% CI 0.787 - 0.853) for features alone and 0.833 (95% CI 0.800 - 0.862) for BI- RADS features combined with age.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Area under the ROC curve (A<sub>z</sub>), the standard error (SE), 95% confidence interval (95% CI) and the p-value for Baysian estimated probabilities in the three observations. Observation 1 represents the initial observation. Observations 2 and 3 were made 1 and 2 months after observation 1</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >A<sub>z</sub> &#177; SE</th><th align="center" valign="middle" >95% CI</th><th align="center" valign="middle"  colspan="3"  >ΔA<sub>z</sub> and p-value</th></tr></thead><tr><td align="center" valign="middle" >Observation 1</td><td align="center" valign="middle" >0.772 &#177; 0.35</td><td align="center" valign="middle" >0.717 - 0.822</td><td align="center" valign="middle"  rowspan="2"  >p = 0.49 ΔA<sub>z</sub> = 0.013</td><td align="center" valign="middle" ></td><td align="center" valign="middle"  rowspan="3"  >p = 0 .09 ΔA<sub>z</sub> = 0.031</td></tr><tr><td align="center" valign="middle" >Observation 2</td><td align="center" valign="middle" >0.786 &#177; 0.32</td><td align="center" valign="middle" >0.731 - 0.834</td><td align="center" valign="middle"  rowspan="2"  >p = 0.08 ΔA<sub>z</sub> = 0.044</td></tr><tr><td align="center" valign="middle" >Observation 3</td><td align="center" valign="middle" >0.817 &#177; 0.029</td><td align="center" valign="middle" >0.765 - 0.862</td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Area under the ROC curve (A<sub>z</sub>), the standard error (SE), 95% confidence interval (95% CI) and the p-value for Baysian estimated probabilities combined with patient age in the three observations. Observation 1 represents the initial observation. Observations 2 and 3 were made 1 and 2 months after Observation 1</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >A<sub>z</sub> &#177; SE</th><th align="center" valign="middle" >95% CI</th><th align="center" valign="middle"  colspan="3"  >ΔA<sub>z</sub> and p-value</th></tr></thead><tr><td align="center" valign="middle" >Observation 1</td><td align="center" valign="middle" >0.828 &#177; 0.0258</td><td align="center" valign="middle" >0.777 - 0.872</td><td align="center" valign="middle"  rowspan="3"  >p = 0.87 ΔA<sub>z</sub> = 0.003</td><td align="center" valign="middle"  rowspan="3"  >p = 0.39 ΔA<sub>z</sub> = 0.012</td><td align="center" valign="middle"  rowspan="3"  >p = 0.17 ΔA<sub>z</sub> = 0.021</td></tr><tr><td align="center" valign="middle" >Observation 2</td><td align="center" valign="middle" >0.831 &#177; 0.027</td><td align="center" valign="middle" >0.780 - 0.874</td></tr><tr><td align="center" valign="middle" >Observation 3</td><td align="center" valign="middle" >0.849 &#177; 0.0248</td><td align="center" valign="middle" >0.800 - 0.890</td></tr></tbody></table></table-wrap><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> The diagnostic performances of Bayes probabilities estimates from three observations. O1, O2 and O3 refer to first, second and third observations, respectively</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-2470086x14.png"/></fig></sec></sec><sec id="s4"><title>4. Discussion</title><p>Previous studies evaluating the observer variability in the interpretation of BI-RADS sonographic features have shown that the agreement between observers can be fair to substantial [<xref ref-type="bibr" rid="scirp.53047-ref14">14</xref>] - [<xref ref-type="bibr" rid="scirp.53047-ref17">17</xref>] . Abdulla et al. [<xref ref-type="bibr" rid="scirp.53047-ref14">14</xref>] , for instance, demonstrated that inter-observer variability as measured by kappa statistics (κ) for individual features ranged from fair (κ = 0.36) to substantial (κ = 0.70). Similarly, Calasa et al. [<xref ref-type="bibr" rid="scirp.53047-ref15">15</xref>] demonstrated that intra-observer variability for individual features ranged from moderate (κ = 0.59) to substantial (κ = 0.85) with an overall substantial agreement with kappa values ranging from 0.72 to 0.79. In general, variation in features observed in this study</p><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> The diagnostic performances of Bayes probabilities estimates from three observations combined with patients’ age. O1, O2 and O3 refer to first, second and third observations, respectively</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/1-2470086x15.png"/></fig><p>is comparable to the previously reported values, although the range of κ for individual features in the present study is wider (0.16 - 0.95).</p><p>The results of this study also show that the time interval between observations influences observer agreement and there is a steady decrease in κ with time between the observations. The reason for the steady decrease is not completely understood but could be potentially due to the “recall effect” described by Ryan et al., when reviewing the same chest X-ray image repeatedly [<xref ref-type="bibr" rid="scirp.53047-ref18">18</xref>] . When the observations are made close together in time, the user is influenced by the memory of earlier observation, thus creating an unconscious recall bias. As the time between the observations increases, the influence of the earlier observations becomes less pronounced, thus reducing agreement. The results demonstrating a change in agreement with time have not been previously reported and they suggest that the time interval between the observations must be controlled in designing observer agreement studies.</p><p>Prior studies evaluating the variability in breast cancer diagnosis with ultrasound have primarily focused on the variability caused by feature selection. While useful, this assessment alone is not complete because the pro- cess of diagnostic assessment of a breast lesion is a two-step process where feature selection is followed by weighting of the features to determine the combined probabilities of malignancies. The previous approaches did not take into consideration how the second step of weighting the individual features contributes to observer variability in diagnostic performance. The results of this study show that despite the variability in the individual feature between the three observations, the final diagnostic performances are comparable. These results are further supported by a strong interclass correlation between the probability estimates approaching 0.83. Although there was a notable variation in individual sonographic features between observations, the diagnostic performances did not change. The seeming discrepancy between observations is not surprising because the computer system is trained on the observed features, thus it is able to discount the differences in feature selection by weighting them differently toward assessing probability of malignancies. In essence, the continuous retraining of the computer system on the observed features compensates for the variation in feature selection. Although this study used Bayesian classifiers for computer aided diagnosis, it is reasonable to anticipate that similar patterns should holds for other learning algorithm. It is also conceivable that individual observers may compensate for the variations in features detection by weighting them differently towards the final diagnosis between observations. Thus, the future studies evaluating diagnostic variations between observations should go beyond studying variations in individual BI-RADS features only; they should also include assessment of the diagnostic performances. Although the results presented in this study are encouraging and demonstrate the efficacy of BI- RADS, further studies with multiple readers are needed for a comprehensive understanding of observer variability in breast ultrasound.</p><p>In conclusion, ultrasound images of breast masses were analyzed repeatedly using BI-RADS lexicon. When the features were considered together as a group, the observer agreement was moderate to substantial. However, there were notable differences when features were compared individually. Despite differences in the individual sonographic features between readings, the diagnostic performance of computer-aided analysis of malignant and benign breast masses did not change. Through a built-in learning process in the algorithm, the computer-based analysis was able to account for feature variations and thus provided an effective method to differentiate malignant and benign breast masses.</p></sec><sec id="s5"><title>NOTES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.53047-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Kopans, D.B. (1992) The Positive Predictive Value of Mammography. American Journal of Roentgenology, 158, 521-526. http://dx.doi.org/10.2214/ajr.158.3.1310825</mixed-citation></ref><ref id="scirp.53047-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Jiang, Y.L., Nishikawa, R.M., Schmidt, R.A., Metz, C.E., Giger, M.L. and Doi, K. (1999) Improving Breast Cancer Diagnosis with Computer-Aided Diagnosis. Academic Radiology, 6, 22-33. 
http://dx.doi.org/10.1016/S1076-6332(99)80058-0</mixed-citation></ref><ref id="scirp.53047-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Shen, W.C., Chang, R.F., Moon, W.K., Chou, Y.H. and Huang, C.S. (2007) Breast Ultrasound Computer-Aided Diagnosis Using BI-RADS Features. Academic Radiology, 14, 928-939. http://dx.doi.org/10.1016/j.acra.2007.04.016</mixed-citation></ref><ref id="scirp.53047-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Shen, W.C., Chang, R.F. and Moon, W.K. (2007) Computer Aided Classification System for Breast Ultrasound Based on Breast Imaging Reporting and Data System (BI-RADS). Ultrasound in Medicine &amp; Biology, 33, 1688-1698. 
http://dx.doi.org/10.1016/j.ultrasmedbio.2007.05.016</mixed-citation></ref><ref id="scirp.53047-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Moon, W.K., Lo, C.M., Chang, J.M., Huang, C.S., Chen, J.H. and Chang, R.F. (2012) Computer-Aided Classification of Breast Masses Using Speckle Features of Automated Breast Ultrasound Images. Medical Physics, 39, 6465-6473. 
http://dx.doi.org/10.1118/1.4754801</mixed-citation></ref><ref id="scirp.53047-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Moon, W.K., Lo, C.-M., Chang, J.M., Huang, C.-S., Chen, J.-H. and Chang, R.-F. (2013) Quantitative Ultrasound Analysis for Classification of BI-RADS Category 3 Breast Masses. Journal of Digital Imaging, 26, 1091-1098. 
http://dx.doi.org/10.1007/s10278-013-9593-8</mixed-citation></ref><ref id="scirp.53047-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Bouzghar, G., Levenback, B.J., Sultan, L.R., Venkatesh, S.S., Cwanger, A., Conant, E.F. and Sehgal, C.M. (2014) Bayesian Probability of Malignancy with Breast Ultrasound BI-RADS Features. Journal of Ultrasound in Medicine, 33, 641-648. http://dx.doi.org/10.7863/ultra.33.4.641</mixed-citation></ref><ref id="scirp.53047-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">American College of Radiology (2013) Breast Imaging Reporting and Data System: BI-RADS Atlas. 5th Edition, American College of Radiology, Reston.</mixed-citation></ref><ref id="scirp.53047-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Stavros, A.T., Thickman, D., Rapp, C.L., Dennis, M.A., Parker, S.H. and Sisney, G.A. (1995) Solid Breast Nodules: Use of Sonography to Distinguish between Benign and Malignant Lesions. Radiology, 196, 123-134. 
http://dx.doi.org/10.1148/radiology.196.1.7784555</mixed-citation></ref><ref id="scirp.53047-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Cohen, J. (1960) A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20, 37-46. http://dx.doi.org/10.1177/001316446002000104</mixed-citation></ref><ref id="scirp.53047-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Landis, J.R. and Koch, G.G. (1977) The Measurement of Observer Agreement for Categorical Data. Biometrics, 33, 159-174. http://dx.doi.org/10.2307/2529310</mixed-citation></ref><ref id="scirp.53047-ref12"><label>12</label><mixed-citation publication-type="book" xlink:type="simple">Cary, T.W., Cwanger, A., Venkatesh, S.S., Conant, E.F. and Sehgal, C.M. (2012) Comparison of Naive Bayes and Logistic Regression for Computer-Aided Diagnosis of Breast Masses Using Ultrasound Imaging. In: Bosch, J.G. and Doyley, M.M., Eds., Medical Imaging: Ultrasonic Imaging, Tomography, and Therapy, SPIE, Bellingham.</mixed-citation></ref><ref id="scirp.53047-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">DeLong, E.R., DeLong, D.M. and Clarke-Pearson, D.L. (1988) Comparing the Areas under Two or More Correlated ROC Curves: A Nonparametric Approach. Biometrics, 44, 837-845. http://dx.doi.org/10.2307/2531595</mixed-citation></ref><ref id="scirp.53047-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Abdullah, N., Mesurolle, B., El-Khoury, M. and Kao, E. (2009) Breast Imaging Reporting and Data System Lexicon for US: Interobserver Agreement for Assessment of Breast Masses. Radiology, 252, 665-672.</mixed-citation></ref><ref id="scirp.53047-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Calas, M.J., Almeida, R.M., Gutfilen, B. and Pereira, W.C. (2009) Intra-Observer Interpretation of Breast Ultrasonography Following the BI-RADS Classification. European Journal of Radiology, 74, 525-528. 
http://dx.doi.org/10.1016/j.ejrad.2009.04.015</mixed-citation></ref><ref id="scirp.53047-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Park, C.S., Lee, J.H., Yim, H.W., Kang, B.J., Kim, H.S., Jung, J.I., Jung, N.Y. and Kim, S.H. (2007) Observer Agreement Using the ACR Breast Imaging Reporting and Data System (BI-RADS)-Ultrasound. Korean Journal of Radiology, 8, 397-402.</mixed-citation></ref><ref id="scirp.53047-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Lee, H.J., Kim, E.K., Kim, M.J., Youk, J.H., Lee, J.Y., Kang, D.R. and Oh, K.K. (2008) Observer Variability of Breast Imaging Reporting and Data System (BI-RADS) for Breast Ultrasound. European Journal of Radiology, 65, 293-298. http://dx.doi.org/10.1016/j.ejrad.2007.04.008</mixed-citation></ref><ref id="scirp.53047-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Ryan, J.T., Haygood, T.M., Yamal, J.M., Evanoff, M., O’Sullivan, P., McEntee, M. and Brennan, P.C. (2011) The “Memory Effect” for Repeated Radiologic Observations. American Journal of Roent- 
genology, 197, W985-W991. 
http://dx.doi.org/10.2214/AJR.10.5859</mixed-citation></ref></ref-list></back></article>