<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">AM</journal-id><journal-title-group><journal-title>Applied Mathematics</journal-title></journal-title-group><issn pub-type="epub">2152-7385</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/am.2021.123014</article-id><article-id pub-id-type="publisher-id">AM-108088</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  An Oracle Bone Inscription Detector Based on Multi-Scale Gaussian Kernels
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Guoying</surname><given-names>Liu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Shuanghao</surname><given-names>Chen</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jing</surname><given-names>Xiong</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Qingju</surname><given-names>Jiao</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff2"><addr-line>School of Computer and Engineering, Zhengzhou University, Zhengzhou, China</addr-line></aff><aff id="aff1"><addr-line>School of Computer and Information Engineering, Anyang Normal University, Anyang, China</addr-line></aff><pub-date pub-type="epub"><day>11</day><month>03</month><year>2021</year></pub-date><volume>12</volume><issue>03</issue><fpage>224</fpage><lpage>239</lpage><history><date date-type="received"><day>22,</day>	<month>February</month>	<year>2021</year></date><date date-type="rev-recd"><day>27,</day>	<month>March</month>	<year>2021</year>	</date><date date-type="accepted"><day>30,</day>	<month>March</month>	<year>2021</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  The detection of Oracle Bone Inscriptions (OBIs) is one of the most fundamental tasks in the study of Oracle Bone, which aims to locate the positions of OBIs on rubbing images. The existing methods are based on the scheme of anchor boxes, involving complex network design and a great number of anchor boxes. In order to overcome the problem, this paper proposes a simpler but more effective OBIs detector by using an anchor-free scheme, where shape-adaptive Gaussian kernels are employed to represent the spatial regions of different OBIs. More specifically, to address the problem of misdetection caused by regional overlapping between some tightly distributed OBIs, the character regions are simultaneously represented by multiscale Gaussian kernels to obtain regions with sharp edges. Besides, based on the kernel predictions of different scales, a novel post-processing pipeline is used to obtain accurate predictions of bounding boxes. Experiments show that our OBIs detector has achieved significant results on the OBIs dataset, which greatly outperforms several mainstream object detectors in both speed and efficiency. Dataset is available at http://jgw.aynu.edu.cn.
 
</p></abstract><kwd-group><kwd>Oracle Bone Inscriptions</kwd><kwd> Deep Learning</kwd><kwd> Object Detection</kwd><kwd> Hourglass Network</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Oracle Bone Inscriptions (OBIs) are of the oldest and the most mysterious ancient characters in china, which record a large number of unknown ancestors’ lives, thoughts, and social states about 3600 years ago. They are very important historical materials for understanding the emergence and development of ancient China. The cues of OBIs’ locations are valuable for the interpretation of these ancient characters. Therefore, the detection of OBIs is of the most fundamental tasks in the field of Oracle Bone study, which tries to locate the positions of OBIs on rubbing images. At present, few people pay attention to the automatic detection of OBIs, and OBI experts have to locate the OBIs only according to their knowledges and experiences, which is rather boring and time-consuming. In this paper, we mainly focus on the automatic detection of OBIs and attempt to explore a simple but efficient method to find out the precise positions of OBIs on rubbing images.</p><p>Currently, there are only a few methods for the OBIs detection task in the field of image processing. For example, Meng [<xref ref-type="bibr" rid="scirp.108088-ref1">1</xref>] build a single-stage OBIs detector via extending SSD300 to SSD1024. Wang [<xref ref-type="bibr" rid="scirp.108088-ref2">2</xref>] introduced a region-based full convolutional network and proposed a novel auxiliary detection algorithm based on character recognition, which can help the detection model reduce the false positive of cracks. In our earlier works [<xref ref-type="bibr" rid="scirp.108088-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.108088-ref4">4</xref>], we also did some simple explorations on the OBIs detection. We applied several state-of-art object detection models on OBIs dataset and compared and analyzed their detection results. Later, based on the statistical characteristics of the characters in scale size, we redesigned the size and aspect ratio of the anchor and proposed and Spatial Block to stabilize the features and alleviate noise interference during training.</p><p>Although these methods have achieved good detection results on the OBIs dataset, there are still certain limitations in accuracy and efficiency. First, due to the lack of character-level class labels in the OBIs dataset, the semantic information of the character is not easily captured through position regression. So, some special characters may be mis-detected by the detection model, for example, some compound characters composed of multiple parts are easily mis-detected as multiple characters, as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>(Left). Similarly, multiple characters are also easy to be detected as a compound character, as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>(Right). Second, most algorithms are based on the scheme of anchor boxes, which involve complex network design and the need for a large number of anchor boxes, such as the number of anchor boxes in DSSD [<xref ref-type="bibr" rid="scirp.108088-ref6">6</xref>] exceeds 40 k and the number in RetinaNet [<xref ref-type="bibr" rid="scirp.108088-ref7">7</xref>] exceeds 100 k. To some extent, it reduces the detection efficiency of the detection model. In this work, our main goal is to explore a simpler OBIs detector and improve the detection accuracy.</p><p>We are motivated by the recently proposed CRAFT (Character Region Awareness for Text Detection) [<xref ref-type="bibr" rid="scirp.108088-ref8">8</xref>]. This work uses adaptively shaped Gaussian kernel to represent character region, where the detection of the text instances is converted to the prediction of the corresponding Gaussian map. Thus, it not only bypasses the need for anchor boxes but also enables the detection model to learn character spatial regions. In our work, we follow the formulation that represents the Oracle Bone Character region by adaptively shaped Gaussian kernel and directly outputs the Gaussian prediction of character region, as shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>. However, experiments show that Gaussian kernel representation has good performance only when dealing with character regions that are not rigidly bounded and it is prone to regional overlapping for some tightly distributed oracle characters, as shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>. To overcome this problem, we represent a single character using Gaussian kernels of multiple scales simultaneously, where the smaller the scale, the larger the margin between the character regions, and then based on these kernel predictions, a progressive scale expansion strategy is used to obtain accurate character bounding boxes. Experimental results show that, compared to some state-of-art object detectors, our character detector based on multi-scale Gaussian kernels have achieved more accurate results on the OBIs dataset. The main contributions of this work are summarized as follows:</p><p>&#183; We firstly propose an anchor-free detector for OBIs detection. The detector uses the Gaussian kernel to represent the character spatial region, which not only bypasses the need for anchor boxes, but also enables the detection model to learn character spatial regions.</p><p>&#183; To overcome the problem of misdetection caused by regional overlapping between some tightly distributed oracle characters, we represent character region using Gaussian kernels of multiple scales simultaneously, and then based on these kernel predictions, character regions with sharp edges are obtained in the way of progressive scale expansion.</p><p>&#183; Experiments show that compared to some state-of-art object detectors, our character detector based on multi-scale Gaussian kernels representation has achieved excellent detection results in accuracy and efficiency on the OBIs dataset.</p></sec><sec id="s2"><title>2. Related Work</title><sec id="s2_1"><title>2.1. Traditional Object Detection Methods</title><p>In the early days, most object detection methods [<xref ref-type="bibr" rid="scirp.108088-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.108088-ref10">10</xref>], adopted the detection routes of Sliding Window or Connected Components Analysis. Based on the Sliding Window method, windows of different scales are usually used to densely slide on the input image and meanwhile, the content of each window is classified by a classifier or rules made by people. The methods based on Connected Components Analysis usually first obtain the selected connected regions through a variety of ways (e.g., color clustering or extreme region extraction) and then filter out non-object regions in the candidate region based on some artificially designed rules. As one of the most successful detection methods, [<xref ref-type="bibr" rid="scirp.108088-ref11">11</xref>] uses Haar features and Adaboost [<xref ref-type="bibr" rid="scirp.108088-ref12">12</xref>] to train a series of cascaded classifiers for face detection, achieving high efficiency and satisfactory accuracy. DPM [<xref ref-type="bibr" rid="scirp.108088-ref13">13</xref>] is another popular method that had maintained the best results on PASCAL VOC [<xref ref-type="bibr" rid="scirp.108088-ref14">14</xref>] for many years. It uses a mixture of multi-scale deformable part models to represent highly variable object classes. Later, some methods further improved the accuracy of object detection based on knowledge of morphological operations [<xref ref-type="bibr" rid="scirp.108088-ref15">15</xref>], conditional random fields [<xref ref-type="bibr" rid="scirp.108088-ref16">16</xref>] and graphs [<xref ref-type="bibr" rid="scirp.108088-ref17">17</xref>].</p></sec><sec id="s2_2"><title>2.2. Object Detection in Deep Learning</title><p>Motivated by the thriving of deep learning-based object [<xref ref-type="bibr" rid="scirp.108088-ref18">18</xref>] or text [<xref ref-type="bibr" rid="scirp.108088-ref19">19</xref>] detection architectures, we thought that oracle characters as a particular object could get benefits from these fields. There are two main trends in the field of object detection: two-stage and one-stage.</p><p>Two-stage approaches divide the object detection task into two stages: generates ROIs (Region of Interesting) and then classify and regress the ROIs.</p><p>Two-stage approach was introduced and popularized by R-CNN [<xref ref-type="bibr" rid="scirp.108088-ref20">20</xref>]. It generates ROIs using a low-level vision algorithm and then uses a DCN-based region-wise classifier to classify the ROIs independently. Later, SPP-Net [<xref ref-type="bibr" rid="scirp.108088-ref21">21</xref>] and Fast-RCNN [<xref ref-type="bibr" rid="scirp.108088-ref22">22</xref>] improve R-CNNs by extracting ROIs from the feature maps. However, both still rely on separate proposal algorithms and cannot be trained end-to-end. Faster-RCNN [<xref ref-type="bibr" rid="scirp.108088-ref5">5</xref>] is allowed to be trained end-to-end by introducing RPN (region proposal network). RPN generates proposals from a set of pre-determined candidate boxes, usually known as anchor boxes, which not only makes the detectors more efficient but also allows the detectors to be trained end-to-end. Mask-RCNN [<xref ref-type="bibr" rid="scirp.108088-ref23">23</xref>] further improves the efficiency of Faster-RCNN by adding a mask prediction branch and can thereby detect objects and predict their masks at the same time. Other works focus on the architecture design, the contextual relationship, improving speed.</p><p>One-stage approaches remove the ROIs extraction process and directly classify and regress the candidate anchor boxes.</p><p>YoLo [<xref ref-type="bibr" rid="scirp.108088-ref24">24</xref>] uses a single feed-forward convolutional network to directly predict object classes and locations, which is extremely fast. After that, YoLov2 [<xref ref-type="bibr" rid="scirp.108088-ref25">25</xref>] further improves YoLo by using more anchor boxes and a new bounding box regression method. DSSD [<xref ref-type="bibr" rid="scirp.108088-ref6">6</xref>] and RON [<xref ref-type="bibr" rid="scirp.108088-ref2">2</xref>] adopt networks similar to the Hourglass Network [<xref ref-type="bibr" rid="scirp.108088-ref26">26</xref>], enabling them to combine low-level and high-level features via skip connections to predict bounding boxes more accurately. RefineDet [<xref ref-type="bibr" rid="scirp.108088-ref27">27</xref>] refines the locations and sizes of the anchor boxes twice, exploiting the merits of both one-stage and two-stage approaches. CornerNet [<xref ref-type="bibr" rid="scirp.108088-ref28">28</xref>] and CenterNet [<xref ref-type="bibr" rid="scirp.108088-ref29">29</xref>] are other keypoint-based approaches that directly detect an object using a pair of corners. Although these methods achieve high performance, it still has room for improvement.</p></sec><sec id="s2_3"><title>2.3. Related Works of OIBs Detection</title><p>Up to now, there are only a few methods for the OBIs detection task in the field of image processing. Meng [<xref ref-type="bibr" rid="scirp.108088-ref1">1</xref>] build a single-stage OBIs detector via extending SSD300 to SSD1024. Wang [<xref ref-type="bibr" rid="scirp.108088-ref2">2</xref>] introduced a region-based full convolutional network and proposed a novel auxiliary detection algorithm based on character recognition, which can help the detection model reduce the false positive of cracks. In our earlier works [<xref ref-type="bibr" rid="scirp.108088-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.108088-ref4">4</xref>], we also did some simple explorations on OBIs detection. We applied several state-of-art object detection models on the OBIs dataset and compared and analyzed their detection results. Later, based on the statistical characteristics of the characters in scale size, we redesigned the size and aspect ratio of the anchor and proposed the Spatial Block to stabilize the features and alleviate noise interference during training.</p><p>However, most of these methods are only a few simple explorations by migrating some classic object detection models slightly modified to the OBIs dataset. Thus, there are still certain limitations in accuracy and efficiency. As mentioned above, most algorithms are based on the scheme of anchor boxes, which involve complex network design and the need for a large number of anchor boxes. Secondly, some special characters (such as compound characters) may be mis-detected by the detection model. In this work, our main goal is to explore a simpler OBIs detector and improve the detection accuracy.</p></sec></sec><sec id="s3"><title>3. Methodology</title><sec id="s3_1"><title>3.1. The Pipeline of Our Character Detection Model</title><p>Our character detection model regards oracle bone characters as special key points, which aims to predict complete and separated character regions. The overall data stream of the model is shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>. Firstly, the rubbing input I<sub>O</sub> passes through a convolutional neural network to predict a feature map I F ∈ R H &#215; W &#215; C that incorporates multi-layer context information of feature maps. The feature map I<sub>F</sub> is mapped to n branches by the region prediction module whose output are used to generate n scale region maps S 1 , S 2 , ⋯ , S n , where each S i represents a character region score map of scale size. S 1 represents the character region prediction of the minimal scale, and S n represents the character region prediction of the maximal scale. Finally, based on these obtained multi-scale Gaussian region predictions, the final accurate character bounding boxes are obtained after a series of simple post-processing operations.</p></sec><sec id="s3_2"><title>3.2. Architecture of Detection Network</title><p>The OBIs detector uses the Hourglass Network [<xref ref-type="bibr" rid="scirp.108088-ref26">26</xref>] as its basic backbone. The Hourglass Network is a fully convolutional neural network with a cascade structure, which is composed of one or more Hourglass modules. The Hourglass module is similar to a lightweight encoding and decoding network, which down samples the input features through a series of convolution and maximum pooling, and then restores to the original resolution through a series of up sampling and convolutional layers. To reduce the loss of details caused by the max-pooling operation, skip connections are used to bring the details back to the up-sampling feature. Besides, a single hourglass module can capture global and local features in a unified structure. When multiple hourglass modules are stacked in the network, the Hourglass model can reprocess features to obtain higher-level information.</p><p>In our character detector, we stack two Hourglass modules and make a few modifications to the overall Hourglass network. Specifically, before the features are input to the Hourglass module, we use a convolutional layer with stride 2 and a 3 &#215; 3 convolution to replace the 7 &#215; 7 convolution in the original network, which can scale the input image to 1/2 size. Similarly, in the Hourglass module, a 3 &#215; 3 convolution with stride 2 is used to replace the maximum pooling in the original module to down-sample the input features. At the end of the Hourglass module, we continue to add an up-sampling layer to restore the output to the original input resolution.</p></sec><sec id="s3_3"><title>3.3. Loss Functions</title><p>The overall loss function of the OBIs detection model is expressed as follows:</p><p>L = λ L F u l l M a p + ( 1 − λ ) L Z o o m M a p (1)</p><p>where L F u l l M a p and L Z o o m M a p represent the loss of character region instance with complete shape and multiple shrinking character region instances respectively, and λ is used to balance the weight of L F u l l M a p and L Z o o m M a p .</p><p>L F u l l M a p = L P i x ( S ( p ) , S ∗ ( p ) ) (2)</p><p>where p represents the coordinate position of a pixel. S ( p ) represents the predicted character region score with complete shape, and S ∗ ( p ) represents the corresponding ground truth score.</p><p>L P i x ( T ( p ) , T ∗ ( p ) ) = ∑ p ‖ T ( p ) − T ∗ ( p ) ‖ 2 2 (3)</p><p>L Z o o m M a p = ∑ i = 1 N − 1 L P i x ( Z i ( p ) , Z i ∗ ( p ) ) (4)</p><p>where N represents the number of scales, Z i ( p ) represents the predicted character region score of the scale i , and Z i ∗ ( p ) represents the ground truth score of the scale i .</p><p>In addition to the character features, there is a lot of disturbance on the rubbing image that is very similar to character features, such as background noise and cracks. To enable the detection model to learn to distinguish these patterns, Online Hard Negative Mining [<xref ref-type="bibr" rid="scirp.108088-ref30">30</xref>] (OHEM) is applied to enforce the 1:3 ratio of positive and negative pixels in the detection loss L F u l l M a p .</p></sec><sec id="s3_4"><title>3.4. Ground Truth Label Generation</title><p>For each training image, we generate the ground truth label of the region score with complete shape and n shrinking using character-level bounding boxes provided by the OBIs dataset, as shown in <xref ref-type="fig" rid="fig5">Figure 5</xref>. The detailed steps are as 1) According to character level bounding boxes provided by the OBIs dataset, following the shrinking principle in [<xref ref-type="bibr" rid="scirp.108088-ref8">8</xref>], setup n shrinking pixel spacing D = { d 1 , d 2 , ⋯ , d n } . 2) Based on the shrinking spacing D, shrink inward along the original bounding boxes to obtain n bounding box sets of different scales. 3) Prepare a 2D isotropic Gaussian kernel. 4) Calculate the perspective transformation matrixM between the Gaussian kernel and each character box. 5) Based on the perspective transformation matrixM, warp Gaussian map to the box area.</p></sec><sec id="s3_5"><title>3.5. Inference</title><p>During inference, the detection model finally outputs n character region maps of different scales. In this section, we briefly describe how to predict the accurate character level bounding box based on the region score maps.</p><p>The key of the post-processing pipeline is a scale extension algorithm from [<xref ref-type="bibr" rid="scirp.108088-ref31">31</xref>], which adopts a novel progressive extension strategy to detect dense scene text. It uses the adjacent relationship between Gaussian heatmaps of different scales to gradually expand from the text region with the minimal kernel to the maximal kernel with complete shape. On this basis, we added some additional steps and a few modifications to suit our character detection task. We first perform a simple pre-processing on the original multi-scale gaussian map prediction and reduce the noise in the gaussian map through some morphological operations (opening operation, distanceTransform). Secondly, for the separated character regions K obtained by the scale extension algorithm, we calculated their connected components C and assigned different labels Label. Finally, based on these assigned Label, the minimum enclosing rectangle of each connected component is calculated to obtain the final accurate bounding box. The functions like connectedComponents, morphologyEx, and minAreaRect provided by Opencv can be applied for this purpose. The details are shown in Algorithm 1.</p><p>Algorithm 1. Post-processing pipeline of detection model.</p></sec></sec><sec id="s4"><title>4. Experiments</title><sec id="s4_1"><title>4.1. Oracle Bone Inscriptions Dataset</title><p>In this paper, all experiments are based on the OBIs dataset provided by the Key Laboratory of the Ministry of Education for Oracle Information Processing, Anyang Normal University. The dataset focuses on the task of OBIs detection and it mainly includes two parts: the number of oracle bone rubbing image collected from the OBIs literature collection using a high-resolution scanner, which is up to 9500 pieces, and the bounding box of characters level by hand-made. Different from the general natural scene image, the rubbing image mainly has the following characteristics:</p><p>High noises: Oracle bone rubbing, as the main carrier of OBIs, was buried in the ruins of Anyang for a long time and was not discovered until 120 years ago. Therefore, there is inevitably a certain degradation on the rubbing appearance. The most significant of these is a large amount of noise on the rubbing. These noises have different rules and are densely distributed on the rubbing image, which brings great challenges to the task of OBIs detection.</p><p>Cracks: Due to the burial environment and private excavations, many of the unearthed oracle bone rubbing have been broken, and various cracks have appeared on the surface of the rubbing. These cracks are very similar to character characteristics in texture, and it is easy to mistake for oracle bone characters.</p><p>Distribution: The characters on the same rubbing image are of different sizes, different directions, and random distribution. Besides, in the 56,743 oracle bone rubbing, there are 1425 words. Among them, there are 366 common characters, 500 not usually used, and 559 rare.</p><p>There are up to 9500 oracle rubbing records on OBIs dataset. In this experiment, the training set, validation set, and test set contain 8287, 436, and 411 data records respectively.</p></sec><sec id="s4_2"><title>4.2. Experimental Environment</title><p>In this experiment, the source code of all models is based on the Pytorch deep learning framework and trained on the four Nvidia TITAN X GPUs. Especially, due to the lack of character category information in the OBI dataset, the class-agonistic strategy is adopted. By default, all characters are treated as a single category, and the same category label is assigned. During training, the rubbing image is scaled to 512 &#215; 512 resolution, and the Adam optimizer is used to update and optimize the parameters. We start Adam at the learning rate of 0.0001, and use 0.9 momentum and 0.0001 weight decay empirically.</p></sec><sec id="s4_3"><title>4.3. Evaluation Indicators</title><p>We mainly evaluate the overall performance of the character detection model from the perspective of efficiency and accuracy. The three indicators of network weight parameters, floating-point calculation, and inference speed are used to evaluate the overall detection efficiency of the model. Precision (P), Recall (R), and F-Measure (F) has commonly used measurement indicators in mainstream object detection methods to measure the detection accuracy of the model. The calculation formulas of these indicators are as follows:</p><p>P = T P T P + F P (5)</p><p>R = T P T P + F N (6)</p><p>F = 2 ∗ P ∗ R P + R (7)</p><p>where TP, FP and FN represent Ture Positive, False Positive, False Negative respectively.</p></sec><sec id="s4_4"><title>4.4. Ablation Experiments</title><p>The validity of Gaussian kernel representation: In addition to Gaussian kernels that can be used to represent character regions, binary mask is another option. To compare the difference between the two represents, we simply compare the character detection model (using only a single scale Gaussian kernel) with the state-of-art semantic segmentation model DeepLabv3 [<xref ref-type="bibr" rid="scirp.108088-ref32">32</xref>]. Specifically, we roughly divide the rubbing image into foreground and background regions according to the principle that whether the pixels are inside the character level box annotation provided by the OBIs dataset and then use the trained segmentation model directly to predict the foreground character regions. The visualization of these models’ output results is shown in <xref ref-type="fig" rid="fig6">Figure 6</xref>. The binary mask represents the character regions using discrete values without distinction and the obtained prediction results have more regional overlapping. On the contrary, the Gaussian kernelencodes the character region based on the distance relationship with the center pixel, and the obtained character regions are clearer on the boundary.</p><p>After obtaining these binary and Gaussian region predictions, we use some simple post-processing operations (including connectedComponents, minAreaRect) to get the character bounding boxes and then calculate their P, R, F indicators respectively. The quantitative results are shown in <xref ref-type="table" rid="table1">Table 1</xref>. The method based on Gaussian kernel is significantly higher than the binary mask representation on all indicators. This shows once again that the Gaussian kernel representation has obvious advantages and is more conducive to expressing the tightly distributed character region.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> The quantitative results based on binary mask and Gaussian kernel represent</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >Precision (P)</th><th align="center" valign="middle" >Recall (R)</th><th align="center" valign="middle" >F-Measure (F)</th></tr></thead><tr><td align="center" valign="middle" >DeepLabv3 [<xref ref-type="bibr" rid="scirp.108088-ref32">32</xref>]</td><td align="center" valign="middle" >0.626</td><td align="center" valign="middle" >0.638</td><td align="center" valign="middle" >0.632</td></tr><tr><td align="center" valign="middle" >Gaussian(our)</td><td align="center" valign="middle" >0.776</td><td align="center" valign="middle" >0.646</td><td align="center" valign="middle" >0.705</td></tr></tbody></table></table-wrap><p>Is multi-scale Gaussian kernel necessary? To answer this question, we retrain the detection model, when the number of scales is different. The assessment results are shown in <xref ref-type="fig" rid="fig7">Figure 7</xref>, from which we can find that with the growing of n , the F-measure keeps rising and begins to go down when n &gt; 6 . The informative result suggests that it is not that the larger the number of scales, the better. When n = 6 , the detection model achieves the highest F-measure, thus, it is more beneficial to achieve better detection results for the task of OBIs detection when the number of scales is 6. Besides, although with the growing of n , F-measure shows a certain decline, but compared to using a single-scale Gaussian kernel, when n &gt; 1 , the value of F-measure is significantly higher. This shows to some extent that the design of multiple kernel scales is essential and effective.</p></sec><sec id="s4_5"><title>4.5. Accuracy Comparison</title><p>To better evaluate the detection effect of our character detection model, we compare our model with several mainstream object detection models, which not only include two-stage object detectors such as Faster RCNN [<xref ref-type="bibr" rid="scirp.108088-ref5">5</xref>], but also single-stage object detectors such as YoLov3 [<xref ref-type="bibr" rid="scirp.108088-ref35">35</xref>], RBFNet [<xref ref-type="bibr" rid="scirp.108088-ref34">34</xref>].</p><p><xref ref-type="table" rid="table2">Table 2</xref> shows the quantitative results with these state-of-art detection models. In terms of accuracy, our detector achieved the highest score of 89.7%,which is significantly better than the second place with a gap of 12%. However, in terms of recall rate, our model performed relatively weakly, almost at the bottom of all the models. For this phenomenon, we believe that the possible reason lies in the fact that for the detection methods based on anchor boxes, the non-maximum suppression (NMS) operation uses a manually set threshold to filter out some invalid candidate boxes, which may have some missed candidate boxes, resulting in a high recall rate. To more accurately evaluate the detection effect, we continue to compare the F-measure that is the balance of indicators of precision and recall. Similarly, our model still achieves the best results, far better than the second place by 5%. Therefore, this reflects the advantage of our model in accuracy to some degree. Also, it is not difficult to imagine that our model can capture more semantic information about the characters and has character area awareness by using directly Gaussian kernels to represent the character regions, so it can get more accurate detection results.</p></sec><sec id="s4_6"><title>4.6. Efficiency Comparison</title><p>We evaluate the detection efficiency of our character detector by measuring its inference speed, weight parameters, floating-point operations and then compared them with several state-of-art detectors.</p><p><xref ref-type="table" rid="table3">Table 3</xref> shows the efficiency comparison with these models. In inference speed, our model achieved the fastest inference speed of 23FPS, which 5FPS higher than the second place YoLov3 [<xref ref-type="bibr" rid="scirp.108088-ref35">35</xref>]. In weight parameters, our model requires fewer parameters, occupying only 12.73M, which is much lower than the 26.29M of the suboptimal model SSD [<xref ref-type="bibr" rid="scirp.108088-ref19">19</xref>]. In terms of floating-point operations, our model is only weaker than YoLov3 [<xref ref-type="bibr" rid="scirp.108088-ref35">35</xref>] and won the second position. Nevertheless, the number of floating-point operations is only 57.34 GMac, which is far lower than other state-of-art detection models. It is comprehensively known that our model can achieve faster inference speed while Has a lighter computing burden.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Accuracy quantitative results with state-of-art detection models</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >Precision (P)</th><th align="center" valign="middle" >Recall (R)</th><th align="center" valign="middle" >F-Measure (F)</th></tr></thead><tr><td align="center" valign="middle" >FasterRCNN [<xref ref-type="bibr" rid="scirp.108088-ref5">5</xref>]</td><td align="center" valign="middle" >0.754</td><td align="center" valign="middle" >0.778</td><td align="center" valign="middle" >0.766</td></tr><tr><td align="center" valign="middle" >SSD [<xref ref-type="bibr" rid="scirp.108088-ref19">19</xref>]</td><td align="center" valign="middle" >0.748</td><td align="center" valign="middle" >0.758</td><td align="center" valign="middle" >0.753</td></tr><tr><td align="center" valign="middle" >RefineDet [<xref ref-type="bibr" rid="scirp.108088-ref33">33</xref>]</td><td align="center" valign="middle" >0.752</td><td align="center" valign="middle" >0.805</td><td align="center" valign="middle" >0.778</td></tr><tr><td align="center" valign="middle" >RBFNet [<xref ref-type="bibr" rid="scirp.108088-ref34">34</xref>]</td><td align="center" valign="middle" >0.761</td><td align="center" valign="middle" >0.789</td><td align="center" valign="middle" >0.775</td></tr><tr><td align="center" valign="middle" >YoLov3 [<xref ref-type="bibr" rid="scirp.108088-ref35">35</xref>]</td><td align="center" valign="middle" >0.776</td><td align="center" valign="middle" >0.784</td><td align="center" valign="middle" >0.78</td></tr><tr><td align="center" valign="middle" >Ours</td><td align="center" valign="middle" >0.897</td><td align="center" valign="middle" >0.775</td><td align="center" valign="middle" >0.832</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Comparison results of detection efficiency with state-of-art detection model</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Methods</th><th align="center" valign="middle" >Speed(FPS)</th><th align="center" valign="middle" >Parameters(M)</th><th align="center" valign="middle" >Flops(GMac)</th></tr></thead><tr><td align="center" valign="middle" >Faster RCNN [<xref ref-type="bibr" rid="scirp.108088-ref5">5</xref>]</td><td align="center" valign="middle" >3</td><td align="center" valign="middle" >41.37</td><td align="center" valign="middle" >129.27</td></tr><tr><td align="center" valign="middle" >SSD [<xref ref-type="bibr" rid="scirp.108088-ref19">19</xref>]</td><td align="center" valign="middle" >9</td><td align="center" valign="middle" >26.29</td><td align="center" valign="middle" >90.4</td></tr><tr><td align="center" valign="middle" >RefineDet [<xref ref-type="bibr" rid="scirp.108088-ref33">33</xref>]</td><td align="center" valign="middle" >14</td><td align="center" valign="middle" >34.44</td><td align="center" valign="middle" >97.94</td></tr><tr><td align="center" valign="middle" >RBFNet [<xref ref-type="bibr" rid="scirp.108088-ref34">34</xref>]</td><td align="center" valign="middle" >15</td><td align="center" valign="middle" >36.64</td><td align="center" valign="middle" >103.65</td></tr><tr><td align="center" valign="middle" >YoLov3 [<xref ref-type="bibr" rid="scirp.108088-ref35">35</xref>]</td><td align="center" valign="middle" >17</td><td align="center" valign="middle" >61.92</td><td align="center" valign="middle" >50.06</td></tr><tr><td align="center" valign="middle" >Ours</td><td align="center" valign="middle" >23</td><td align="center" valign="middle" >12.73</td><td align="center" valign="middle" >57.34</td></tr></tbody></table></table-wrap></sec></sec><sec id="s5"><title>5. Conclusion</title><p>In this paper, we first propose an anchor-free OBIs detector for OBIs detection. The detector uses adaptively shaped Gaussian kernel to represent the spatial region of the characters, which not only bypasses the need for anchor boxes but also enables the detection model to learn character spatial regions. Furthermore, to address the problem of misdetection caused by regional overlapping between some tightly distributed characters, the character region is simultaneously represented by multiscale Gaussian kernels to obtain character regions with sharp edges. Finally, based on these kernel predictions of different scales, a novel post-processing pipeline is used to obtain accurate bounding box predictions. The experimental results show that our OBIs detector has achieved good detection results on the OBIs dataset.</p></sec><sec id="s6"><title>Fund</title><p>This work is supported by the joint fund of National Natural Science Foundation of China (NSFC) and Henan Province of China under Grant U1804153, and partly supported by the Scientific and Technological Research Projects in Henan province under Grant 212102310545 and 212102210502 and the Anyang Science and Technology Plan Project under Grant 2021C01GX020.</p></sec><sec id="s7"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s8"><title>Cite this paper</title><p>Liu, G.Y., Chen, S.H., Xiong, J. and Jiao, Q.J. (2021) An Oracle Bone Inscription Detector Based on Multi-Scale Gaussian Kernels. Applied Mathematics, 12, 224-239. https://doi.org/10.4236/am.2021.123014</p></sec></body><back><ref-list><title>References</title><ref id="scirp.108088-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Meng, L. (2017) Two-Stage Recognition for Oracle Bone Inscriptions. Lecture Notes in Computer Science, 10485, 672-682. https://doi.org/10.1007/978-3-319-68548-9_61</mixed-citation></ref><ref id="scirp.108088-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Hao, W. (2019) Research on Oracle Detection and Recognition Based on Deep Learning. South China University of Technology, Guangzhou.</mixed-citation></ref><ref id="scirp.108088-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Xing, J., Liu, G. and Xiong, J. (2019) Oracle Bone Inscription Detection: A Survey of Oracle Bone Inscription Detection Based on Deep Learning Algorithm. Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing, Sanya, December 2019, Article No. 39. https://doi.org/10.1145/3371425.3371434</mixed-citation></ref><ref id="scirp.108088-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Liu, G., Xing, J. and Xiong, J. (2020) Spatial Pyramid Block for Oracle Bone Inscription Detection. ICSCA 2020: Proceedings of the 2020 9th International Conference on Software and Computer Applications, February 2020, 133-140. https://doi.org/10.1145/3384544.3384561</mixed-citation></ref><ref id="scirp.108088-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Ren, S., He, K. and Girshick, R. (2016) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031</mixed-citation></ref><ref id="scirp.108088-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Fu, C.Y., Liu, W., Ranga, A., Tyagi, A. and Berg, A.C. (2017) DSSD: Deconvolutional Single Shot Detector.</mixed-citation></ref><ref id="scirp.108088-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollar, P. (2020) Focal Loss for Dense Object Detection. EEE Transactions on Pattern Analysis and Machine Intelligence, 42, 318-327. https://doi.org/10.1109/TPAMI.2018.2858826</mixed-citation></ref><ref id="scirp.108088-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Baek, Y., Lee, B., Han, D., Yun, S. and Lee, H. (2019) Character Region Awareness for Text Detection. IEEE/CVF Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 9357-9366. https://doi.org/10.1109/CVPR.2019.00959</mixed-citation></ref><ref id="scirp.108088-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Epshtein, B., Ofek, E. and Wexler, Y. (2010) Detecting Text in Natural Scenes with Stroke Width Transform. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 13-18 June 2010, 2963-2970. https://doi.org/10.1109/CVPR.2010.5540041</mixed-citation></ref><ref id="scirp.108088-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Huang, W., Lin, Z., Yang, J. and Wang, J. (2013) Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors. IEEE International Conference on Computer Vision, Sydney, 1-8 December 2013, 1241-1248. https://doi.org/10.1109/ICCV.2013.157</mixed-citation></ref><ref id="scirp.108088-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Papageorgiou, C.P., Oren, M. and Poggio, T. (1998) A General Framework for Object Detection. Sixth International Conference on Computer Vision, Bombay, 7 January 1998, 555-562. https://doi.org/10.1109/ICCV.1998.710772</mixed-citation></ref><ref id="scirp.108088-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Schapire, R.E. (2013) Explaining AdaBoost: Empirical Inference. Springer, Berlin, Heidelberg.</mixed-citation></ref><ref id="scirp.108088-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Felzenszwalb, P.F., Girshick, R.B., McAllester, D. and Ramanan, D. (2010) Object Detection with Discriminatively Trained Part Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627-1645. https://doi.org/10.1109/TPAMI.2009.167</mixed-citation></ref><ref id="scirp.108088-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J. and Zisserman, A. (2015) The Pascal Visual Object Classes Challenge: A Retrospective. International Journal of Computer Vision, 111, 98-136. https://doi.org/10.1007/s11263-014-0733-5</mixed-citation></ref><ref id="scirp.108088-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Lee, J.J., Lee, P.H., Lee, S.W., Yuille, A. and Koch, C. (2011) AdaBoost for Text Detection in Natural Scene. IEEE International Conference on Document Analysis and Recognition, Beijing, 18-21 September 2011, 429-434. https://doi.org/10.1109/ICDAR.2011.93</mixed-citation></ref><ref id="scirp.108088-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Wang, K., Babenko, B. and Belongie, S. (2011) End-to-End Scene Text Recognition. IEEE International Conference on Computer Vision, Barcelona, 1457-1464.</mixed-citation></ref><ref id="scirp.108088-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Wang, T., Wu, D.J., Coates, A. and Ng, A.Y. (2012) End-to-End Text Recognition with Convolutional Neural Networks. Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan, 11-15 November 2012, 3304-3308.</mixed-citation></ref><ref id="scirp.108088-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Li, Y., He, K., Sun, J., et al. (2016) R-fcn: Object Detection via Region-Based Fully Convolutional Networks. Proceedings of the 30th International Conference on Neural Information Processing, Morehouse Lane, Red Hook, December 2016, 379-387.</mixed-citation></ref><ref id="scirp.108088-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z. and Zhang, Y. (2020) ContourNet: Taking a Further Step toward Accurate Arbitrary-Shaped Scene Text Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 13-19 June 2020, 11753-11762. https://doi.org/10.1109/CVPR42600.2020.01177</mixed-citation></ref><ref id="scirp.108088-ref20"><label>20</label><mixed-citation publication-type="book" xlink:type="simple">Liu, W., Fu, C.H., Reed, S., et al. (2016) SSD: Single Shot Multi-Box Detector. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vision—ECCV 2016, Springer, Cham, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2</mixed-citation></ref><ref id="scirp.108088-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">He, K., Zhang, X., Ren, S. and Sun, J. (2016) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916. https://doi.org/10.1109/TPAMI.2015.2389824</mixed-citation></ref><ref id="scirp.108088-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Girshick, R. (2015) Fast R-CNN. 2015 IEEE International Conference on Computer Vision, Santiago, 7-13 December 2015, 1440-1448. https://doi.org/10.1109/ICCV.2015.169</mixed-citation></ref><ref id="scirp.108088-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">He, K., Gkioxari, G., Dollár, P. and Girshick, R. (2020) Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 386-397. https://doi.org/10.1109/TPAMI.2018.2844175</mixed-citation></ref><ref id="scirp.108088-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 779-788. https://doi.org/10.1109/CVPR.2016.91</mixed-citation></ref><ref id="scirp.108088-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 21-26 July 2017, 6517-6525. https://doi.org/10.1109/CVPR.2017.690</mixed-citation></ref><ref id="scirp.108088-ref26"><label>26</label><mixed-citation publication-type="book" xlink:type="simple">Newell, A., Yang, K. and Deng, J. (2016) Stacked Hourglass Networks for Human Pose Estimation. In: Leibe, B., Matas, J., Sebe, N. and Welling, M., Eds., Computer Vision—ECCV 2016, Springer, Cham, 483-499. https://doi.org/10.1007/978-3-319-46484-8_29</mixed-citation></ref><ref id="scirp.108088-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Lin, G., Milan, A., Shen, C. and Reid, I. (2017) RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 1228-1242. https://doi.org/10.1109/CVPR.2017.549</mixed-citation></ref><ref id="scirp.108088-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Law, H. and Deng, J. (2020) CornerNet: Detecting Objects as Paired Keypoints. International Journal of Computer Vision, 128, 642-656. https://doi.org/10.1007/s11263-019-01204-1</mixed-citation></ref><ref id="scirp.108088-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q. and Tian, Q. (2019) CenterNet: Keypoint Triplets for Object Detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 27 October-2 November 2019, 6569-6578. https://doi.org/10.1109/ICCV.2019.00667</mixed-citation></ref><ref id="scirp.108088-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Shrivastava, A., Gupta, A. and Girshick, R.B. (2026) Training Region-Based Object Detectors with Online Hard Example Mining. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 761-769. https://doi.org/10.1109/CVPR.2016.89</mixed-citation></ref><ref id="scirp.108088-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Wang, W., Li, X. and Liu, T. (2019) Shape Robust Text Detection with Progressive Scale Expansion Network. IEEE/CVF Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, 15-20 June 2019, 9336-9345. https://doi.org/10.1109/CVPR.2019.00956</mixed-citation></ref><ref id="scirp.108088-ref32"><label>32</label><mixed-citation publication-type="other" xlink:type="simple">Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. and Yuille, A.L. (2018) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 6834-6848. https://doi.org/10.1109/TPAMI.2017.2699184</mixed-citation></ref><ref id="scirp.108088-ref33"><label>33</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, S., Wen, L., Bian, X., Lei, Z. and Li, S.Z. (2018) Single-Shot Refinement Neural Network for Object Detection. IEEE Transactions on Circuits and Systems for Video Technology, 31, 674-687. https://doi.org/10.1109/TCSVT.2020.2986402</mixed-citation></ref><ref id="scirp.108088-ref34"><label>34</label><mixed-citation publication-type="book" xlink:type="simple">Liu, S., Huang, D. and Wang, Y. (2018) Receptive Field Block Net for Accurate and Fast Object Detection. Computer Vision. In: Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y., Eds., Computer Vision—ECCV 2018, Springer, Cham, 404-419. https://doi.org/10.1007/978-3-030-01252-6_24</mixed-citation></ref><ref id="scirp.108088-ref35"><label>35</label><mixed-citation publication-type="other" xlink:type="simple">Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement.</mixed-citation></ref></ref-list></back></article>