<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JSIP</journal-id><journal-title-group><journal-title>Journal of Signal and Information Processing</journal-title></journal-title-group><issn pub-type="epub">2159-4465</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jsip.2019.104010</article-id><article-id pub-id-type="publisher-id">JSIP-96711</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Deep Learning Based Target Tracking and Classification for Infrared Videos Using Compressive Measurements
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Chiman</surname><given-names>Kwan</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Bryan</surname><given-names>Chou</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jonathan</surname><given-names>Yang</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Trac</surname><given-names>Tran</given-names></name><xref ref-type="aff" rid="aff3"><sup>3</sup></xref></contrib></contrib-group><aff id="aff2"><addr-line>Google, Inc., Mountain View, CA, USA</addr-line></aff><aff id="aff3"><addr-line>Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA</addr-line></aff><aff id="aff1"><addr-line>Applied Research LLC, Rockville, MD, USA</addr-line></aff><pub-date pub-type="epub"><day>13</day><month>11</month><year>2019</year></pub-date><volume>10</volume><issue>04</issue><fpage>167</fpage><lpage>199</lpage><history><date date-type="received"><day>10,</day>	<month>October</month>	<year>2019</year></date><date date-type="rev-recd"><day>26,</day>	<month>November</month>	<year>2019</year>	</date><date date-type="accepted"><day>29,</day>	<month>November</month>	<year>2019</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Although compressive measurements save data storage and bandwidth usage, they are difficult to be used directly for target tracking and classification without pixel reconstruction. This is because the Gaussian random matrix destroys the target location information in the original video frames. This paper summarizes our research effort on target tracking and classification directly in the compressive measurement domain. We focus on one particular type of compressive measurement using pixel subsampling. That is, original pixels in video frames are randomly subsampled. Even in such a special compressive sensing setting, conventional trackers do not work in a satisfactory manner. We propose a deep learning approach that integrates YOLO (You Only Look Once) and ResNet (residual network) for multiple target tracking and classification. YOLO is used for multiple target tracking and ResNet is for target classification. Extensive experiments using short wave infrared (SWIR), mid-wave infrared (MWIR), and long-wave infrared (LWIR) videos demonstrated the efficacy of the proposed approach even though the training data are very scarce.
 
</p></abstract><kwd-group><kwd>Target Tracking</kwd><kwd> Classification</kwd><kwd> Compressive Sensing</kwd><kwd> SWIR</kwd><kwd> MWIR</kwd><kwd> LWIR</kwd><kwd> YOLO</kwd><kwd> ResNet</kwd><kwd> Infrared Videos</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>There are many applications such as traffic monitoring, surveillance, and security monitoring that use optical and infrared videos [<xref ref-type="bibr" rid="scirp.96711-ref1">1</xref>] - [<xref ref-type="bibr" rid="scirp.96711-ref6">6</xref>]. Object features in optical and infrared videos can be clearly seen as compared to radar based trackers [<xref ref-type="bibr" rid="scirp.96711-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.96711-ref8">8</xref>].</p><p>Compressive measurements [<xref ref-type="bibr" rid="scirp.96711-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.96711-ref10">10</xref>] are normally collected by multiplying the original vectorized image with a Gaussian random matrix. Each measurement contains a scalar value and the measurement is repeated M times where M is much fewer than N (the number of pixels). To track a target using compressive measurements, it is normally done by reconstructing the image scene and then conventional trackers are then applied. There are two drawbacks in this conventional approach. First, the reconstruction process using L<sub>0</sub> [<xref ref-type="bibr" rid="scirp.96711-ref11">11</xref>] or L<sub>1</sub> [<xref ref-type="bibr" rid="scirp.96711-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.96711-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.96711-ref14">14</xref>] based methods is time consuming, which makes real-time tracking and classification impossible. Second, there may be information loss in the reconstruction process [<xref ref-type="bibr" rid="scirp.96711-ref15">15</xref>].</p><p>In the literature, there are some trackers such as [<xref ref-type="bibr" rid="scirp.96711-ref23">23</xref>] that use the term compressive tracking. However, those trackers are not using compressive measurements directly. There are several advantages if one can directly perform target tracking and classification using compressive measurements. First, because reconstruction of video frames from compressive measurements using Orthogonal Matching Pursuit (OMP) or Augmented Lagrangian Method with L1 (ALM-L1) are time consuming, direct tracking and classification in compressive measurement domain will enable near real-time processing. Second, it is well-known that reconstruction tends to lose information [<xref ref-type="bibr" rid="scirp.96711-ref15">15</xref>]. Working directly using compressive measurement will generate more accurate tracking and classification results [<xref ref-type="bibr" rid="scirp.96711-ref15">15</xref>] - [<xref ref-type="bibr" rid="scirp.96711-ref22">22</xref>].</p><p>Recently, we developed a residual network (ResNet) [<xref ref-type="bibr" rid="scirp.96711-ref24">24</xref>] based tracking and classification framework using compressive measurements [<xref ref-type="bibr" rid="scirp.96711-ref10">10</xref>]. The compressive measurements are obtained by using pixel subsampling, which can be considered as a special case of compressive sensing. ResNet was used in both target detection and classification. The tracking is done by detection. Although the performance in [<xref ref-type="bibr" rid="scirp.96711-ref10">10</xref>] is much better than conventional trackers, there is still room for further improvement. The key area is to improve the tracking part, which has a significant impact on the classification performance. That is, if the target area is not correctly located, the classification performance will degrade.</p><p>In this paper, we propose an alternative approach, which aims to improve the tracking performance. The idea is to deploy a high performance tracker known as YOLO [<xref ref-type="bibr" rid="scirp.96711-ref25">25</xref>] for target tracking. YOLO is fast, accurate, and has comparable performance as other trackers such as Faster R-CNN [<xref ref-type="bibr" rid="scirp.96711-ref26">26</xref>]. It should be noted that YOLO is used for object detection and not for object tracking. The YOLO for tracking is done by object detection. That is, we custom train YOLO for detecting certain vehicles and the detection results (target location information) from each frame are recorded and then tracked. This is known as tracking by detection. The detection results (bounding boxes of objects) are fed into a classifier. The classification is using ResNet because ResNet has better classification than the default classifier in YOLO.</p><p>It is emphasized that a preliminary version of this paper was presented in an SPIE conference [<xref ref-type="bibr" rid="scirp.96711-ref27">27</xref>] in which we only focused on SWIR videos. Here, we have significantly expanded the earlier paper to include additional experiments using MWIR, and LWIR videos. The experiments clearly demonstrated that the performance of the proposed approach is accurate and applicable to different types of infrared videos. Moreover, another contribution of this paper is that our study is the first comprehensive study of vehicle tracking and classification of several types of infrared videos directly in compressive measurement domain (subsampling).</p><p>This paper is organized as follows. Section 2 describes the idea of compressive sensing via subsampling, YOLO detector, and ResNet. Section 3 presents the tracking and classification results directly in the compressive measurement domain using SWIR videos. Section 4 focuses on tracking and classification of vehicles in MWIR videos. Section 5 repeats the studies for LWIR videos. In all cases, a comparative study of YOLO and ResNet for classification is also presented. Finally, some concluding remarks and future research directions are included in Section 6.</p></sec><sec id="s2"><title>2. Background</title><sec id="s2_1"><title>2.1. Compressive Sensing via Subsampling</title><p>Using Gaussian random to generate compressive measurement makes the target tracking very difficult. This is because the targets can be anywhere in a frame and the target location information is lost in the compressive measurements. To resolve the above issue, we propose a new approach in which, instead of using a Gaussian random sensing matrix, we use a random subsampling operator (i.e., keeping only a certain percentage of pixels at random from the original data) to perform compressive sensing. This is similar to using a sensing matrix by randomly zeroing out certain elements from the diagonal of an identity matrix. <xref ref-type="fig" rid="fig1">Figure 1</xref> displays two examples of a random subsampling sensing matrices. <xref ref-type="fig" rid="fig1">Figure 1</xref> shows a subsampling operator which randomly selects 50% of the pixels in a vectorized image. <xref ref-type="fig" rid="fig1">Figure 1</xref>(b) shows the equivalent case of randomly selecting 50% of the pixels in a 2-D image.</p></sec><sec id="s2_2"><title>2.2. YOLO</title><p>We used the so-called tracking by detection approach. In the target tracking literature, there are several ways to carry out tracking. Some trackers such as STAPLE [<xref ref-type="bibr" rid="scirp.96711-ref28">28</xref>] or GMM [<xref ref-type="bibr" rid="scirp.96711-ref29">29</xref>] require an operator to put a bounding box on a specific target and then the trackers will try to track this initial target in subsequent frames. The limitation of this type of trackers is that they can track one target at a time. Another limitation is that they cannot track multiple targets simultaneously. Other trackers such as YOLO and Faster R-CNN do not require initial bounding boxes and can simultaneously detect objects. We can call the second type of trackers: tracking by detection. That is, based on detection results, we determine the vehicle locations in all the frames.</p><p>YOLO tracker [<xref ref-type="bibr" rid="scirp.96711-ref25">25</xref>] is fast and has similar performance as Faster R-CNN [<xref ref-type="bibr" rid="scirp.96711-ref26">26</xref>]. We picked YOLO because it is easy to install and is also compatible with our hardware, which seems to have a hard time to install and run Faster R-CNN. The input image is resized to 448 &#215; 448. There are 24 convolutional layers and 2 fully connected layers. The output is 7 &#215; 7 &#215; 30. We have used YOLOv2 because it is more accurate than YOLO version 1. The training of YOLO is quite simple. Images with ground truth target locations are needed. The bounding box for each vehicle was manually determined using tools in MATLAB. For YOLO, the last layer of the deep learning model was re-trained. We did not change any of the activation functions. YOLO took approximately 2000 epochs to train.</p><p>YOLO also comes with a built-in classification module. However, based on our evaluations, the classification accuracy using YOLO is not good as can be seen in Sections 3 - 5. This is perhaps due to a lack of training data.</p></sec><sec id="s2_3"><title>2.3. ResNet Classifier</title><p>The ResNet-18 model is an 18-layer convolutional neural network (CNN) that has the advantage of avoiding performance saturation and/or degradation when training deeper layers, which is a common problem among other CNN architectures. The ResNet-18 model avoids the performance saturation by implementing an identity shortcut connection, which skips one or more layers and learns the residual mapping of the layer rather than the original mapping.</p><p>Training of ResNet requires target patches. The targets are cropped from training videos. Mirror images are then created. We then perform data augmentation using scaling (larger and smaller), rotation (every 45 degrees), and illumination (brighter and dimmer) to create more training data. For each cropped target, we are able to create a data set with 64 more images.</p></sec></sec><sec id="s3"><title>3. Tracking and Classification Results Using SWIR Videos</title><p>Our research objective is to perform tracking and classification of three trucks using the sponsor provided SWIR videos. One video (Video 4) starts with vehicles (Ram, Frontier, and Silverado) leaving a parking lot and moves on to a remote location. Another video (Video 5) is just the opposite. These videos are challenging for several reasons. First, the target sizes vary a lot from near field to far field. Second, the target orientations also change drastically from top view to side view. Third, the illuminations in different videos are also different. Here, the compressive measurements are collected via direct sub-sampling. That is, 50% or 75% of the pixels are thrown away during the data collection process.</p><p>In our earlier paper [<xref ref-type="bibr" rid="scirp.96711-ref10">10</xref>], we have included some tracking results where conventional trackers such as GMM [<xref ref-type="bibr" rid="scirp.96711-ref29">29</xref>] and STAPLE [<xref ref-type="bibr" rid="scirp.96711-ref28">28</xref>] were used. The tracking performance was poor when there are missing data.</p><sec id="s3_1"><title>3.1. Tracking Results</title><p>We experimented with a YOLO tracker, which has been determined to perform better tracking than our earlier ResNet based tracker [<xref ref-type="bibr" rid="scirp.96711-ref10">10</xref>]. We used the following metrics for evaluating the tracker performance:</p><p>&#183; Center Location Error (CLE): It is the error between the center of the bounding box and the ground-truth bounding box.</p><p>&#183; Distance Precision (DP): It is the percentage of frames where the centroids of detected bounding boxes are within 20 pixels of the centroid of ground-truth bounding boxes.</p><p>&#183; EinGT: It is the percentage of the frames where the centroids of the detected bounding boxes are inside the ground-truth bounding boxes.</p><p>&#183; Number of frames with detection: This is the total number of frames that have detection.</p><p>Conventional Tracker Results</p><p>We applied the GMM tracker to one of our videos. From the results shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>, it can be seen that the tracking results are not satisfactory even when there are no missing pixels. In some frames, the GMM tracker simply lost the targets.</p><p>STAPLE [<xref ref-type="bibr" rid="scirp.96711-ref28">28</xref>] is one of the high performing trackers in recent years. For this algorithm, the histogram of oriented gradients (HOG) features are extracted from the most recent estimated target location and used to update the models of the tracker. Then a template response is calculated using the updated models and the extracted features from the next frame. To be able to estimate the location of the target, the histogram response is needed along with the template response. The histogram response is calculated by updating the weights in the current frame. Then the per-pixel score is computed using the next frame. This score and the weights, calculated before, are used to determine the integral image, and ultimately, the histogram response. Together, with the template and histogram response, the tracker is able to estimate the location of the target.</p><p><xref ref-type="fig" rid="fig3">Figure 3</xref> shows good tracking results when there are no missing data. The green boxes show the target locations. However, when 50% of the pixels are missing, the tracking performance deteriorates significantly as shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>.</p><p>Tracking Results: Train using Video 4 and Test using Video 5</p><p>We have two SWIR videos from the AF. Here, we used Video 4 for training</p><p>and Video 5 for testing. Tables 1-3 show the performance metrics for different missing pixel cases. Our first observation is that the number of frames with detection decreases when we have more missing pixels. This is reasonable. For those frames with detection, it can be seen that the CLE values increase when we have more missing pixels. This is also reasonable. The DP and EinGT values are all close to 100% if we have detection. Figures 5-7 show the detection/tracking results in some selected frames. It can be seen that there are more missed detections in those cases of high missing rates.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Tracking metrics for 0% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >4.96</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2623/2678</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >4.52</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2422/2678</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >4.81</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2202/2678</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Tracking metrics for 50% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >5.75</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2532/2678</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >5.71</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2371/2678</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >5.24</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1892/2678</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Tracking metrics for 75% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >6.3</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1897/2678</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >6.32</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1933/2678</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >5.28</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >831/2678</td></tr></tbody></table></table-wrap><p>Tracking Results: Train using Video 5 and Test using Video 4</p><p>Tables 4-6 show the metrics when we used Video 5 for training and Video 4 for testing. We can see that the numbers of frames with detection are high for low missing rates. For frames with detection, the CLE values generally increase whereas the DP and EinGT values are relatively stable.</p><p>Figures 8-10 show the tracking results visually. It can be seen that we have some false detections in the parking lot area. However, when the targets are far away, the tracking appears to be good.</p></sec><sec id="s3_2"><title>3.2. Classification Results</title><p>To illustrate the difficulty of classifying the three trucks, we include the pictures of them below in <xref ref-type="fig" rid="fig1">Figure 1</xref>1. It can be seen that all of them have four doors and open</p><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Tracking metrics for 0% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >4.99</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >3282/3327</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >4.09</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >3339/3327</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >3.94</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2012/3327</td></tr></tbody></table></table-wrap><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Tracking metrics for 50% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >5.57</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >3247/3327</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >4.2</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >3334/3327</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >4.19</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2002/3327</td></tr></tbody></table></table-wrap><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> Tracking metrics for 75% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >7</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >3075/3327</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >4.62</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >3248/3327</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >4.89</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >1864/3327</td></tr></tbody></table></table-wrap><p>trunks. From a distance, it will be quite difficult to recognize them correctly.</p><p>For vehicle classification, we deployed two approaches: YOLO and ResNet. The YOLO comes with a default classifier. For the ResNet classifier, we performed customized training where the training data are augmented with rotation, scaling, and illumination variations.</p><p>Classification Results Using Video 4 for Training and Video 5 for testing</p><p>Classification is only applied to frames with detection of targets from the tracker. Tables 7-9 summarize the comparison between YOLO and ResNet classifiers for 0%, 50%, and 75% missing cases, respectively. We have two observations.</p><p>First, the YOLO classifier outputs are worse than those of the ResNet. Second, when missing rates increase, the classification accuracy drops.</p><p>Classification Results Using Video 5 for training and Video 4 for testing</p><p>As shown in Tables 10-12, the ResNet classifier has much better performance than that of YOLO. Moreover, the classification results using ResNet are still quite good for 75% missing case.</p><table-wrap-group id="7"><label><xref ref-type="table" rid="table7">Table 7</xref></label><caption><title> Classification results for 0% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results. (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="7_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >381</td><td align="center" valign="middle" >265</td><td align="center" valign="middle" >1953</td><td align="center" valign="middle" >0.1466</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >202</td><td align="center" valign="middle" >2196</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.9158</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >2132</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >65</td><td align="center" valign="middle" >0.0296</td></tr></tbody></table></table-wrap><table-wrap id="7_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >2220</td><td align="center" valign="middle" >32</td><td align="center" valign="middle" >371</td><td align="center" valign="middle" >0.8464</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >161</td><td align="center" valign="middle" >2223</td><td align="center" valign="middle" >38</td><td align="center" valign="middle" >0.9178</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >987</td><td align="center" valign="middle" >41</td><td align="center" valign="middle" >1174</td><td align="center" valign="middle" >0.5332</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="8"><label><xref ref-type="table" rid="table8">Table 8</xref></label><caption><title> Classification results for 50% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier output. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="8_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >299</td><td align="center" valign="middle" >247</td><td align="center" valign="middle" >1986</td><td align="center" valign="middle" >0.1181</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >334</td><td align="center" valign="middle" >1998</td><td align="center" valign="middle" >16</td><td align="center" valign="middle" >0.8509</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1823</td><td align="center" valign="middle" >8</td><td align="center" valign="middle" >51</td><td align="center" valign="middle" >0.0271</td></tr></tbody></table></table-wrap><table-wrap id="8_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1703</td><td align="center" valign="middle" >56</td><td align="center" valign="middle" >773</td><td align="center" valign="middle" >0.6726</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >158</td><td align="center" valign="middle" >2021</td><td align="center" valign="middle" >192</td><td align="center" valign="middle" >0.8524</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >543</td><td align="center" valign="middle" >43</td><td align="center" valign="middle" >1306</td><td align="center" valign="middle" >0.6903</td></tr></tbody></table></table-wrap></table-wrap-group></sec><sec id="s3_3"><title>3.3. Discussions</title><p>We are interested in the tracking and classification performance in the 75% missing data case because only 25% of pixels need to be stored and transmitted. At this missing rate, using the numbers shown in <xref ref-type="table" rid="table1">Table 1</xref>3, the averaged percentages of frames being detected are 58% for testing using Video 5 and 82% for testing using Video 4, respectively. From <xref ref-type="table" rid="table1">Table 1</xref>4, the averaged percentages of classification are 60% for testing using Video 5 and 78% for testing using Video 4, respectively.</p><table-wrap-group id="9"><label><xref ref-type="table" rid="table9">Table 9</xref></label><caption><title> Classification results for 75% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="9_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >183</td><td align="center" valign="middle" >234</td><td align="center" valign="middle" >1479</td><td align="center" valign="middle" >0.0965</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >459</td><td align="center" valign="middle" >1360</td><td align="center" valign="middle" >106</td><td align="center" valign="middle" >0.7065</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >710</td><td align="center" valign="middle" >91</td><td align="center" valign="middle" >28</td><td align="center" valign="middle" >0.0338</td></tr></tbody></table></table-wrap><table-wrap id="9_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1032</td><td align="center" valign="middle" >590</td><td align="center" valign="middle" >275</td><td align="center" valign="middle" >0.5440</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >132</td><td align="center" valign="middle" >1722</td><td align="center" valign="middle" >79</td><td align="center" valign="middle" >0.8908</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >331</td><td align="center" valign="middle" >190</td><td align="center" valign="middle" >310</td><td align="center" valign="middle" >0.3730</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="10"><label><xref ref-type="table" rid="table1">Table 1</xref>0</label><caption><title> Classification results for 0% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="10_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >601</td><td align="center" valign="middle" >1480</td><td align="center" valign="middle" >1157</td><td align="center" valign="middle" >0.1856</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >83</td><td align="center" valign="middle" >3151</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.9743</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1496</td><td align="center" valign="middle" >44</td><td align="center" valign="middle" >435</td><td align="center" valign="middle" >0.2203</td></tr></tbody></table></table-wrap><table-wrap id="10_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >2837</td><td align="center" valign="middle" >72</td><td align="center" valign="middle" >373</td><td align="center" valign="middle" >0.8644</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >670</td><td align="center" valign="middle" >2514</td><td align="center" valign="middle" >155</td><td align="center" valign="middle" >0.7529</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >415</td><td align="center" valign="middle" >18</td><td align="center" valign="middle" >1579</td><td align="center" valign="middle" >0.7848</td></tr></tbody></table></table-wrap></table-wrap-group></sec></sec><sec id="s4"><title>4. Tracking and Classification Results Using MWIR Videos</title><p>Similar to the SWIR videos, we have also two MWIR videos from our sponsor. In Section 4.1, we present the conventional and our proposed tracking results. Section 4.2 shows the classification results.</p><sec id="s4_1"><title>4.1. Tracking Results</title><p>Conventional Tracking Results</p><p>Here, we only include the STAPLE results because GMM tracker did not work</p><table-wrap-group id="11"><label><xref ref-type="table" rid="table1">Table 1</xref>1</label><caption><title> Classification results for 50% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="11_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >596</td><td align="center" valign="middle" >1376</td><td align="center" valign="middle" >1221</td><td align="center" valign="middle" >0.1867</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >191</td><td align="center" valign="middle" >2998</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.9401</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1484</td><td align="center" valign="middle" >30</td><td align="center" valign="middle" >464</td><td align="center" valign="middle" >0.2346</td></tr></tbody></table></table-wrap><table-wrap id="11_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >2062</td><td align="center" valign="middle" >227</td><td align="center" valign="middle" >958</td><td align="center" valign="middle" >0.6350</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >212</td><td align="center" valign="middle" >2989</td><td align="center" valign="middle" >133</td><td align="center" valign="middle" >0.8965</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >26</td><td align="center" valign="middle" >10</td><td align="center" valign="middle" >1966</td><td align="center" valign="middle" >0.9820</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="12"><label><xref ref-type="table" rid="table1">Table 1</xref>2</label><caption><title> Classification results for 75% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="12_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >506</td><td align="center" valign="middle" >1222</td><td align="center" valign="middle" >1316</td><td align="center" valign="middle" >0.1662</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >334</td><td align="center" valign="middle" >2804</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.8936</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1352</td><td align="center" valign="middle" >13</td><td align="center" valign="middle" >490</td><td align="center" valign="middle" >0.2642</td></tr></tbody></table></table-wrap><table-wrap id="12_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >2300</td><td align="center" valign="middle" >120</td><td align="center" valign="middle" >655</td><td align="center" valign="middle" >0.7480</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >740</td><td align="center" valign="middle" >2392</td><td align="center" valign="middle" >116</td><td align="center" valign="middle" >0.7365</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >250</td><td align="center" valign="middle" >6</td><td align="center" valign="middle" >1608</td><td align="center" valign="middle" >0.8627</td></tr></tbody></table></table-wrap></table-wrap-group><p>at all. STAPLE appears to work reasonably well for zero and 50% missing rate cases (<xref ref-type="fig" rid="fig1">Figure 1</xref>2 and <xref ref-type="fig" rid="fig1">Figure 1</xref>3). When the missing rate increases to 75%, the STAPLE tracker failed completely as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>4. It is observed that one issue with STAPLE is that it is difficult for it to track multiple vehicles simultaneously.</p><p>MWIR Results: Train using Video 4 and Test using Video 5</p><p>Here, we used Video 4 for training and Video 5 for testing. Tables 15-17</p><table-wrap-group id="13"><label><xref ref-type="table" rid="table1">Table 1</xref>3</label><caption><title> Tracking metrics for 75% missing case. (a) Train using Video 4 and test using Video 5; (b) Train using Video 5 and test using Video 4</title></caption><table-wrap id="13_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1897/2678</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >1933/2678</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >831/2678</td></tr></tbody></table></table-wrap><table-wrap id="13_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >3075/3327</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >3248/3327</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1864/3327</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="14"><label><xref ref-type="table" rid="table1">Table 1</xref>4</label><caption><title> ResNet classification at 75% missing rate. (a) Train using Video 4 and test using Video 5; (b) Train using Video 5 and test using Video 4</title></caption><table-wrap id="14_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Classification accuracy</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >0.5440</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >0.8908</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >0.3730</td></tr></tbody></table></table-wrap><table-wrap id="14_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Classification accuracy</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >0.7480</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >0.7365</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >0.8627</td></tr></tbody></table></table-wrap></table-wrap-group><p>show the performance metrics. Our first observation is that the number of frames with detection decreases when we have more missing pixels. This is reasonable.</p><table-wrap id="table15" ><label><xref ref-type="table" rid="table1">Table 1</xref>5</label><caption><title> MWIR tracking metrics for 0% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >3.14</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2568/2677</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >3.02</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >2671/2677</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >4.82</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.84</td><td align="center" valign="middle" >2461/2677</td></tr></tbody></table></table-wrap><table-wrap id="table16" ><label><xref ref-type="table" rid="table1">Table 1</xref>6</label><caption><title> MWIR tracking metrics for 50% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >4.88</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >2465/2677</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >4.69</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >2650/2677</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >4.7</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >2124/2677</td></tr></tbody></table></table-wrap><p>For those frames with detection, it can be seen that the CLE values increase when we have more missing pixels. This is also reasonable. The DP and EinGT values are all close to 100% if we have detection. Figures 15-18 show the tracking results in some selected frames. It can be seen that there are more missed detections in those cases of high missing rates. The labels come from the YOLO tracker outputs and have more errors when the missing rates are high.</p><table-wrap id="table17" ><label><xref ref-type="table" rid="table1">Table 1</xref>7</label><caption><title> MWIR tracking metrics for 75% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >6.29</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >1917/2677</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >5.8</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >1705/2677</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >7.13</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >1453/2677</td></tr></tbody></table></table-wrap><p>MWIR Results: Train using Video 5 and Test using Video 4</p><p>Tables 18-20 show the metrics when we used Video 5 for training and Video 4 for testing. We can see that the numbers of frames with detection are high for low missing rates. For frames with detection, the CLE values generally increase whereas the DP and EinGT values are relatively stable. Figures 18-20 show the tracking results visually. It can be seen that we have some false detections in the parking lot area. However, when the targets are far away, the tracking appears to</p><table-wrap id="table18" ><label><xref ref-type="table" rid="table1">Table 1</xref>8</label><caption><title> MWIR tracking metrics for 0% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >4.18</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >2858/3324</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >4.05</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >3234/3324</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >5.2</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >2027/3324</td></tr></tbody></table></table-wrap><table-wrap id="table19" ><label><xref ref-type="table" rid="table1">Table 1</xref>9</label><caption><title> MWIR tracking metrics for 50% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >4.8</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >2755/3324</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >5.36</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.97</td><td align="center" valign="middle" >3134/3324</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >5.56</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >1860/3324</td></tr></tbody></table></table-wrap><table-wrap id="table20" ><label><xref ref-type="table" rid="table2">Table 2</xref>0</label><caption><title> MWIR tracking metrics for 75% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >6.09</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >2295/3324</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >6.63</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >2108/3324</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >6.28</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >1615/3324</td></tr></tbody></table></table-wrap><p>be good. The labels come from the YOLO tracker. We will see in the next section that the ResNet classifier has better performance than that of YOLO.</p></sec><sec id="s4_2"><title>4.2. Classification Results</title><p>MWIR Classification Results Using Video 4 for Training and Video 5 for testing</p><p>Classification is only applied to frames with detection of targets from the tracker. Tables 21-23 summarize the comparison between YOLO and ResNet classifiers for 0%, 50%, and 75% missing cases, respectively. We have two observations. First, the YOLO classifier outputs are worse than those of the ResNet.</p><table-wrap-group id="21"><label><xref ref-type="table" rid="table2">Table 2</xref>1</label><caption><title> Classification results for 0% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="21_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >188</td><td align="center" valign="middle" >732</td><td align="center" valign="middle" >1648</td><td align="center" valign="middle" >0.0732</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >25</td><td align="center" valign="middle" >2404</td><td align="center" valign="middle" >237</td><td align="center" valign="middle" >0.9017</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >2201</td><td align="center" valign="middle" >7</td><td align="center" valign="middle" >160</td><td align="center" valign="middle" >0.0676</td></tr></tbody></table></table-wrap><table-wrap id="21_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >2367</td><td align="center" valign="middle" >87</td><td align="center" valign="middle" >114</td><td align="center" valign="middle" >0.9217</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >275</td><td align="center" valign="middle" >2371</td><td align="center" valign="middle" >25</td><td align="center" valign="middle" >0.8877</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1444</td><td align="center" valign="middle" >274</td><td align="center" valign="middle" >743</td><td align="center" valign="middle" >0.3019</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="22"><label><xref ref-type="table" rid="table2">Table 2</xref>2</label><caption><title> Classification results for 50% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier output. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="22_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >424</td><td align="center" valign="middle" >877</td><td align="center" valign="middle" >872</td><td align="center" valign="middle" >0.1951</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >705</td><td align="center" valign="middle" >1404</td><td align="center" valign="middle" >91</td><td align="center" valign="middle" >0.6382</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1660</td><td align="center" valign="middle" >70</td><td align="center" valign="middle" >39</td><td align="center" valign="middle" >0.0220</td></tr></tbody></table></table-wrap><table-wrap id="22_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1630</td><td align="center" valign="middle" >44</td><td align="center" valign="middle" >791</td><td align="center" valign="middle" >0.6613</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >829</td><td align="center" valign="middle" >1595</td><td align="center" valign="middle" >226</td><td align="center" valign="middle" >0.6019</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1245</td><td align="center" valign="middle" >82</td><td align="center" valign="middle" >797</td><td align="center" valign="middle" >0.3752</td></tr></tbody></table></table-wrap></table-wrap-group><p>Second, when missing rates increase, the classification accuracy drops.</p><p>MWIR Classification Results Using Video 5 for training and Video 4 for testing</p><p>As shown in Tables 24-26, the ResNet classifier has much better performance than that of YOLO. Moreover, the classification results using ResNet are still quite good for 75% missing case.</p><table-wrap-group id="23"><label><xref ref-type="table" rid="table2">Table 2</xref>3</label><caption><title> Classification results for 75% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="23_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >276</td><td align="center" valign="middle" >1078</td><td align="center" valign="middle" >549</td><td align="center" valign="middle" >0.1450</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >559</td><td align="center" valign="middle" >1121</td><td align="center" valign="middle" >8</td><td align="center" valign="middle" >0.6641</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1361</td><td align="center" valign="middle" >66</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.0000</td></tr></tbody></table></table-wrap><table-wrap id="23_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1267</td><td align="center" valign="middle" >269</td><td align="center" valign="middle" >381</td><td align="center" valign="middle" >0.6609</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >572</td><td align="center" valign="middle" >1103</td><td align="center" valign="middle" >30</td><td align="center" valign="middle" >0.6469</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >945</td><td align="center" valign="middle" >224</td><td align="center" valign="middle" >284</td><td align="center" valign="middle" >0.1955</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="24"><label><xref ref-type="table" rid="table2">Table 2</xref>4</label><caption><title> Classification results for 0% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="24_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >528</td><td align="center" valign="middle" >735</td><td align="center" valign="middle" >1544</td><td align="center" valign="middle" >0.1884</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >1006</td><td align="center" valign="middle" >2093</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.6754</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1429</td><td align="center" valign="middle" >61</td><td align="center" valign="middle" >532</td><td align="center" valign="middle" >0.2631</td></tr></tbody></table></table-wrap><table-wrap id="24_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1934</td><td align="center" valign="middle" >65</td><td align="center" valign="middle" >859</td><td align="center" valign="middle" >0.6767</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >114</td><td align="center" valign="middle" >3041</td><td align="center" valign="middle" >79</td><td align="center" valign="middle" >0.9403</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >535</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >1487</td><td align="center" valign="middle" >0.7336</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="25"><label><xref ref-type="table" rid="table2">Table 2</xref>5</label><caption><title> Classification results for 50% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="25_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >444</td><td align="center" valign="middle" >933</td><td align="center" valign="middle" >1329</td><td align="center" valign="middle" >0.1641</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >958</td><td align="center" valign="middle" >1993</td><td align="center" valign="middle" >103</td><td align="center" valign="middle" >0.6526</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1319</td><td align="center" valign="middle" >21</td><td align="center" valign="middle" >518</td><td align="center" valign="middle" >0.2788</td></tr></tbody></table></table-wrap><table-wrap id="25_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1993</td><td align="center" valign="middle" >85</td><td align="center" valign="middle" >677</td><td align="center" valign="middle" >0.7234</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >112</td><td align="center" valign="middle" >2898</td><td align="center" valign="middle" >124</td><td align="center" valign="middle" >0.9247</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >839</td><td align="center" valign="middle" >20</td><td align="center" valign="middle" >1001</td><td align="center" valign="middle" >0.5382</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="26"><label><xref ref-type="table" rid="table2">Table 2</xref>6</label><caption><title> Classification results for 75% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results; (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="26_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >270</td><td align="center" valign="middle" >691</td><td align="center" valign="middle" >1303</td><td align="center" valign="middle" >0.1193</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >622</td><td align="center" valign="middle" >1318</td><td align="center" valign="middle" >144</td><td align="center" valign="middle" >0.6324</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1030</td><td align="center" valign="middle" >43</td><td align="center" valign="middle" >540</td><td align="center" valign="middle" >0.3348</td></tr></tbody></table></table-wrap><table-wrap id="26_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1325</td><td align="center" valign="middle" >205</td><td align="center" valign="middle" >765</td><td align="center" valign="middle" >0.5773</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >154</td><td align="center" valign="middle" >1867</td><td align="center" valign="middle" >87</td><td align="center" valign="middle" >0.8857</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >652</td><td align="center" valign="middle" >98</td><td align="center" valign="middle" >865</td><td align="center" valign="middle" >0.5356</td></tr></tbody></table></table-wrap></table-wrap-group></sec><sec id="s4_3"><title>4.3. Discussions</title><p>Similar to the SWIR study, we are interested in the tracking and classification performance in the 75% missing data case where one can have fewer pixels to save and transmit. At this missing rate, using the numbers shown in <xref ref-type="table" rid="table2">Table 2</xref>7, the averaged percentages of frames being detected are 63% for testing using Video 5 and 60% for testing using Video 4, respectively. From <xref ref-type="table" rid="table2">Table 2</xref>8, the</p><table-wrap-group id="27"><label><xref ref-type="table" rid="table2">Table 2</xref>7</label><caption><title> Tracking metrics for 75% missing case. (a) Train using Video 4 and test using Video 5. (b) Train using Video 5 and test using Video 4</title></caption><table-wrap id="27_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1917/2677</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >1705/2677</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1453/2677</td></tr></tbody></table></table-wrap><table-wrap id="27_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >2295/3324</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >2108/3324</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >1615/3324</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="28"><label><xref ref-type="table" rid="table2">Table 2</xref>8</label><caption><title> ResNet classification at 75% missing rate. (a) Train using Video 4 and test using Video 5. (b) Train using Video 5 and test using Video 4</title></caption><table-wrap id="28_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Classification accuracy</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >0.6609</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >0.6469</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >0.1955</td></tr></tbody></table></table-wrap><table-wrap id="28_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Classification accuracy</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >0.5773</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >0.8857</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >0.5356</td></tr></tbody></table></table-wrap></table-wrap-group><p>averaged percentages of classification are 50% for testing using Video 5 and 66% for testing using Video 4, respectively.</p></sec></sec><sec id="s5"><title>5. Tracking and Classification Results Using LWIR Videos</title><p>In this section, we summarize the tracking and classification results using LWIR videos.</p><sec id="s5_1"><title>5.1. Tracking Results</title><p>Conventional Tracker Results</p><p>We first present tracking results using STAPLE. Similar to the SWIR and MWIR cases, STAPLE did not perform well for the various cases as shown in Figures 21-23.</p><p>LWIR Results: Train using Video 4 and Test using Video 5</p><p>Tables 29-31 show the tracking results for different missing cases. The missed detection rates increase as more pixels are missing. From Figures 24-26, the</p><table-wrap id="table29" ><label><xref ref-type="table" rid="table2">Table 2</xref>9</label><caption><title> LWIR tracking metrics for 0% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >4.08</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1606/2677</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >4.51</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >2136/2677</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >5.14</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >735/2677</td></tr></tbody></table></table-wrap><table-wrap id="table30" ><label><xref ref-type="table" rid="table3">Table 3</xref>0</label><caption><title> LWIR tracking metrics for 50% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >4.65</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1602/2677</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >5.04</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >2084/2677</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >5.38</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >648/2677</td></tr></tbody></table></table-wrap><p>tracking results are quite good except that the labels from the YOLO tracker have some wrong labels.</p><p>LWIR Results: Train using Video 5 and Test using Video 4</p><p>From Tables 32-34 and Figures 27-29, we have the same observations here as the earlier sections. That is, as missing rates increase, the tracking performance drops.</p><table-wrap id="table31" ><label><xref ref-type="table" rid="table3">Table 3</xref>1</label><caption><title> LWIR tracking metrics for 75% missing case. Train using Video 4 and test using Video 5</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >5.48</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1427/2677</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >6.47</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >1235/2677</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >4,85</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.99</td><td align="center" valign="middle" >489/2677</td></tr></tbody></table></table-wrap><table-wrap id="table32" ><label><xref ref-type="table" rid="table3">Table 3</xref>2</label><caption><title> LWIR tracking metrics for 0% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >6.37</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >1635/3303</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >4.22</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >1902/3303</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >4.17</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >745/3303</td></tr></tbody></table></table-wrap><table-wrap id="table33" ><label><xref ref-type="table" rid="table3">Table 3</xref>3</label><caption><title> LWIR tracking metrics for 50% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >7.56</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >1373/3303</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >5.52</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >1774/3303</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >6.62</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >599/3303</td></tr></tbody></table></table-wrap><table-wrap id="table34" ><label><xref ref-type="table" rid="table3">Table 3</xref>4</label><caption><title> LWIR tracking metrics for 75% missing case. Train using Video 5 and test using Video 4</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >CLE</th><th align="center" valign="middle" >DP</th><th align="center" valign="middle" >EinGT</th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >9.75</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >557/3303</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >6.88</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >805/3303</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >7.54</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >238/3303</td></tr></tbody></table></table-wrap></sec><sec id="s5_2"><title>5.2. Classification Results</title><p>LWIR Classification Results Using Video 4 for Training and Video 5 for testing</p><p>Here, from Tables 35-37, we observe that ResNet results are better than YOLO. Even for high missing rates, the ResNet performs reasonably well.</p><table-wrap-group id="35"><label><xref ref-type="table" rid="table3">Table 3</xref>5</label><caption><title> Classification results for 0% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results. (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="35_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >302</td><td align="center" valign="middle" >841</td><td align="center" valign="middle" >370</td><td align="center" valign="middle" >0.1996</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >362</td><td align="center" valign="middle" >1634</td><td align="center" valign="middle" >49</td><td align="center" valign="middle" >0.7990</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >230</td><td align="center" valign="middle" >459</td><td align="center" valign="middle" >24</td><td align="center" valign="middle" >0.0337</td></tr></tbody></table></table-wrap><table-wrap id="35_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1606</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >1.0000</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >766</td><td align="center" valign="middle" >1369</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.6409</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >285</td><td align="center" valign="middle" >32</td><td align="center" valign="middle" >418</td><td align="center" valign="middle" >0.5687</td></tr></tbody></table></table-wrap></table-wrap-group><p>LWIR Classification Results Using Video 5 for training and Video 4 for testing</p><p>From Tables 38-40 below, we have similar observations as the earlier section. ResNet performs quite well for LWIR case.</p></sec><sec id="s5_3"><title>5.3. Discussions</title><p>Similar to the SWIR study, we are interested in the tracking and classification performance in the 75% missing data case where one can have fewer pixels to save and transmit. At this missing rate, using the numbers shown in <xref ref-type="table" rid="table4">Table 4</xref>1, the averaged percentages of frames being detected are 43% for testing using Video 5 and 16% for testing using Video 4, respectively. The detection percentages appear</p><table-wrap-group id="36"><label><xref ref-type="table" rid="table3">Table 3</xref>6</label><caption><title> Classification results for 50% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier output. Left is the confusion matrix; right is the classification results. (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="36_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >442</td><td align="center" valign="middle" >687</td><td align="center" valign="middle" >425</td><td align="center" valign="middle" >0.2844</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >417</td><td align="center" valign="middle" >1525</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >0.7621</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >146</td><td align="center" valign="middle" >437</td><td align="center" valign="middle" >48</td><td align="center" valign="middle" >0.0761</td></tr></tbody></table></table-wrap><table-wrap id="36_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1598</td><td align="center" valign="middle" >3</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.9975</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >789</td><td align="center" valign="middle" >1291</td><td align="center" valign="middle" >4</td><td align="center" valign="middle" >0.6195</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >249</td><td align="center" valign="middle" >29</td><td align="center" valign="middle" >370</td><td align="center" valign="middle" >0.5710</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="37"><label><xref ref-type="table" rid="table3">Table 3</xref>7</label><caption><title> Classification results for 75% missing case. Video 4 for training and Video 5 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results. (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="37_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >446</td><td align="center" valign="middle" >670</td><td align="center" valign="middle" >292</td><td align="center" valign="middle" >0.3168</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >425</td><td align="center" valign="middle" >1273</td><td align="center" valign="middle" >20</td><td align="center" valign="middle" >0.7410</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >35</td><td align="center" valign="middle" >443</td><td align="center" valign="middle" >8</td><td align="center" valign="middle" >0.0165</td></tr></tbody></table></table-wrap><table-wrap id="37_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1414</td><td align="center" valign="middle" >13</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.9909</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" >1174</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.9506</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >179</td><td align="center" valign="middle" >65</td><td align="center" valign="middle" >245</td><td align="center" valign="middle" >0.5010</td></tr></tbody></table></table-wrap></table-wrap-group><p>to be low. This is mainly because, for LWIR videos, each frame contains roughly one to two vehicle per frame whereas in the SWIR and MWIR videos, we have multiple vehicles in each frame. From <xref ref-type="table" rid="table4">Table 4</xref>2, the averaged percentages of classification are 81% for testing using Video 5 and 79% for testing using Video 4, respectively.</p></sec></sec><sec id="s6"><title>6. Conclusions</title><p>We present a deep learning approach for multiple target tracking and classification using infrared videos (SWIR, MWIR, and LWIR) directly in the compressive measurement domain. Key advantages include fast processing without time</p><table-wrap-group id="38"><label><xref ref-type="table" rid="table3">Table 3</xref>8</label><caption><title> Classification results for 0% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results. (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="38_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >550</td><td align="center" valign="middle" >620</td><td align="center" valign="middle" >329</td><td align="center" valign="middle" >0.3669</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >104</td><td align="center" valign="middle" >1773</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.9446</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >506</td><td align="center" valign="middle" >106</td><td align="center" valign="middle" >62</td><td align="center" valign="middle" >0.0920</td></tr></tbody></table></table-wrap><table-wrap id="38_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle"  rowspan="2"  >Classification Accuracy</th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1605</td><td align="center" valign="middle" >24</td><td align="center" valign="middle" >6</td><td align="center" valign="middle" >0.9817</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >37</td><td align="center" valign="middle" >1790</td><td align="center" valign="middle" >75</td><td align="center" valign="middle" >0.9411</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >138</td><td align="center" valign="middle" >29</td><td align="center" valign="middle" >578</td><td align="center" valign="middle" >0.7758</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="39"><label><xref ref-type="table" rid="table3">Table 3</xref>9</label><caption><title> Classification results for 50% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results. (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="39_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle" ></th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >Classification Accuracy</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >599</td><td align="center" valign="middle" >472</td><td align="center" valign="middle" >211</td><td align="center" valign="middle" >0.4672</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >258</td><td align="center" valign="middle" >1479</td><td align="center" valign="middle" >2</td><td align="center" valign="middle" >0.8505</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >531</td><td align="center" valign="middle" >9</td><td align="center" valign="middle" >53</td><td align="center" valign="middle" >0.0894</td></tr></tbody></table></table-wrap><table-wrap id="39_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle" ></th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >Classification Accuracy</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >957</td><td align="center" valign="middle" >249</td><td align="center" valign="middle" >167</td><td align="center" valign="middle" >0.6970</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >24</td><td align="center" valign="middle" >1742</td><td align="center" valign="middle" >8</td><td align="center" valign="middle" >0.9820</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >217</td><td align="center" valign="middle" >111</td><td align="center" valign="middle" >271</td><td align="center" valign="middle" >0.4524</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="40"><label><xref ref-type="table" rid="table4">Table 4</xref>0</label><caption><title> Classification results for 75% missing case. Video 5 for training and Video 4 for testing. (a) YOLO classifier outputs. Left is the confusion matrix; right is the classification results. (b) ResNet classifier outputs. Left is the confusion matrix; right is the classification results</title></caption><table-wrap id="40_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle" ></th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >Classification Accuracy</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >223</td><td align="center" valign="middle" >203</td><td align="center" valign="middle" >118</td><td align="center" valign="middle" >0.4099</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >58</td><td align="center" valign="middle" >747</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0.9280</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >192</td><td align="center" valign="middle" >2</td><td align="center" valign="middle" >40</td><td align="center" valign="middle" >0.1709</td></tr></tbody></table></table-wrap><table-wrap id="40_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="3"  >Actual</th><th align="center" valign="middle" ></th></tr></thead><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >Classification Accuracy</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Predicted</td><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >360</td><td align="center" valign="middle" >134</td><td align="center" valign="middle" >63</td><td align="center" valign="middle" >0.6463</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >9</td><td align="center" valign="middle" >794</td><td align="center" valign="middle" >2</td><td align="center" valign="middle" >0.9863</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >48</td><td align="center" valign="middle" >15</td><td align="center" valign="middle" >175</td><td align="center" valign="middle" >0.7353</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="41"><label><xref ref-type="table" rid="table4">Table 4</xref>1</label><caption><title> Tracking metrics for 75% missing case. (a) Train using Video 4 and test using Video 5; (b) Train using Video 5 and test using Video 4</title></caption><table-wrap id="41_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >1635/3303</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >1902/3303</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >745/3303</td></tr></tbody></table></table-wrap><table-wrap id="41_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Number of frames with detection</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >557/3303</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >805/3303</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >238/3303</td></tr></tbody></table></table-wrap></table-wrap-group><table-wrap-group id="42"><label><xref ref-type="table" rid="table4">Table 4</xref>2</label><caption><title> ResNet classification at 75% missing rate. (a) Train using Video 4 and test using Video 5; (b) Train using Video 5 and test using Video 4</title></caption><table-wrap id="42_1"><caption><title> (b)</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Classification accuracy</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >0.9909</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >0.9506</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >0.5010</td></tr></tbody></table></table-wrap><table-wrap id="42_2"><caption><title></title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Classification accuracy</th></tr></thead><tr><td align="center" valign="middle" >Ram</td><td align="center" valign="middle" >0.6463</td></tr><tr><td align="center" valign="middle" >Frontier</td><td align="center" valign="middle" >0.9863</td></tr><tr><td align="center" valign="middle" >Silverado</td><td align="center" valign="middle" >0.7353</td></tr></tbody></table></table-wrap></table-wrap-group><p>consuming image reconstruction. Experiments using various types of infrared videos clearly demonstrated the performance of the proposed approach under different conditions even when the training data are limited. Moreover, comparison with conventional trackers showed that the deep learning based approach is much more accurate, especially when the missing rate is high.</p><p>One future direction is to integrate the proposed approach with video cameras and perform real-time tracking and classification.</p></sec><sec id="s7"><title>Acknowledgements</title><p>This research was supported by the US Air Force under contract FA8651-17-C-0017. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the US Government.</p></sec><sec id="s8"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s9"><title>Cite this paper</title><p>Kwan, C., Chou, B., Yang, J. and Tran, T. (2019) Deep Learning Based Target Tracking and Classification for Infrared Videos Using Compressive Measurements. Journal of Signal and Information Processing, 10, 167-199. https://doi.org/10.4236/jsip.2019.104010</p></sec></body><back><ref-list><title>References</title><ref id="scirp.96711-ref1"><label>1</label><mixed-citation publication-type="book" xlink:type="simple">Li, X., Kwan, C., Mei, G. and Li, B. (2006) A Generic Approach to Object Matching and Tracking. In: Campilho, A. and Kamel, M.S., Eds., Image Analysis and Recognition. ICIAR 2006. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 839-849. https://doi.org/10.1007/11867586_76</mixed-citation></ref><ref id="scirp.96711-ref2"><label>2</label><mixed-citation publication-type="book" xlink:type="simple">Zhou, J. and Kwan, C. (2018) Tracking of Multiple Pixel Targets Using Multiple Cameras. In: Huang, T., Lv, J., Sun, C. and Tuzikov, A., Eds., Advances in Neural Networks. Lecture Notes in Computer Science, Springer, Cham, 484-493. 
https://doi.org/10.1007/978-3-319-92537-0_56</mixed-citation></ref><ref id="scirp.96711-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Zhou, J. and Kwan, C. (2018) Anomaly Detection in Low Quality Traffic Monitoring Videos Using Optical Flow. Proceedings of SPIE 10649, Pattern Recognition and Tracking XXIX, 106490F.</mixed-citation></ref><ref id="scirp.96711-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Zhou, J., Wang, Z. and Li, B. (2018) Efficient Anomaly Detection Algorithms for Summarizing Low Quality Videos. Proceedings of SPIE 10649, Pattern Recognition and Tracking XXIX, 1064906. https://doi.org/10.1117/12.2303764</mixed-citation></ref><ref id="scirp.96711-ref5"><label>5</label><mixed-citation publication-type="book" xlink:type="simple">Kwan, C., Chou, B. and Kwan, L. M. (2018) A Comparative Study of Conventional and Deep Learning Target Tracking Algorithms for Low Quality Videos. In: Huang, T., Lv, J., Sun, C. and Tuzikov, A., Eds., Advances in Neural Networks. Lecture Notes in Computer Science, Springer, Cham, 521-531. 
https://doi.org/10.1007/978-3-319-92537-0_60</mixed-citation></ref><ref id="scirp.96711-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Yin, J. and Zhou, J. (2018) The Development of a Video Browsing and Video Summary Review Tool. Proceedings of SPIE 10649, Pattern Recognition and Tracking XXIX, 1064907. https://doi.org/10.1117/12.2303654</mixed-citation></ref><ref id="scirp.96711-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Zhao, Z., Chen, H., Chen, G., Kwan, C. and Li, X.R. (2006) IMM-LMMSE Filtering Algorithm for Ballistic Target Tracking with Unknown Ballistic Coefficient. Proceedings of SPIE, Volume 6236, Signal and Data Processing of Small Targets. 
https://doi.org/10.1117/12.665760</mixed-citation></ref><ref id="scirp.96711-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Zhao, Z., Chen, H., Chen, G., Kwan, C. and Li, X.R. (2006) Comparison of Several Ballistic Target Tracking Filters. Proceedings of American Control Conference, Minneapolis, MN, 14-16 June 2006, 2197-2202.</mixed-citation></ref><ref id="scirp.96711-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Candes, E.J. and Wakin, M.B. (2008) An Introduction to Compressive Sampling. IEEE Signal Processing Magazine, 25, 21-30. 
https://doi.org/10.1109/MSP.2007.914731</mixed-citation></ref><ref id="scirp.96711-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Chou, B., Echavarren, A., Budavari, B., Li, J. and Tran, T. (2018) Compressive Vehicle Tracking Using Deep Learning. IEEE Ubiquitous Computing, Electronics &amp; Mobile Communication Conference, New York City, 8-10 November 2018, 51-56. https://doi.org/10.1109/UEMCON.2018.8796778</mixed-citation></ref><ref id="scirp.96711-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Tropp, J.A. (2004) Greed Is Good: Algorithmic Results for Sparse Approximation. IEEE Transactions on Information Theory, 50, 2231-2242. 
https://doi.org/10.1109/TIT.2004.834793</mixed-citation></ref><ref id="scirp.96711-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Yang, J. and Zhang, Y. (2011) Alternating Direction Algorithms for L1-Problems in Compressive Sensing. SIAM Journal on Scientific Computing, 33, 250-278. 
https://doi.org/10.1137/090777761</mixed-citation></ref><ref id="scirp.96711-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Dao, M., Kwan, C., Koperski, K. and Marchisio, G. (2017) A Joint Sparsity Approach to Tunnel Activity Monitoring Using High Resolution Satellite Images. 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, New York, 19-21 October 2017, 322-328.  
https://doi.org/10.1109/UEMCON.2017.8249061</mixed-citation></ref><ref id="scirp.96711-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Zhou, J., Ayhan, B., Kwan, C. and Tran, T. (2018) ATR Performance Improvement Using Images with Corrupted or Missing Pixels. Proceedings of SPIE 10649, Pattern Recognition and Tracking XXIX, 106490E.</mixed-citation></ref><ref id="scirp.96711-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Applied Research LLC (2017) Phase 1 Final Report.</mixed-citation></ref><ref id="scirp.96711-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Chou, B., Yang, J. and Tran, T. (2019) Target Tracking and Classification Directly in Compressive Measurement for Low Quality Videos. Pattern Recognition and Tracking XXX (Conference SI120). https://doi.org/10.1117/12.2518496</mixed-citation></ref><ref id="scirp.96711-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Gribben, D. and Tran, T. (2019) Multiple Human Objects Tracking and Classification Directly in Compressive Measurement Domain for Long Range Infrared Videos. IEEE Ubiquitous Computing, Electronics &amp; Mobile Communication Conference, New York City.</mixed-citation></ref><ref id="scirp.96711-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Chou, B., Yang, J., and T. Tran, T. (2019) Deep Learning Based Target Tracking and Classification Directly in Compressive Measurement for Low Quality Videos. Signal &amp; Image Processing: An International Journal (SIPIJ).</mixed-citation></ref><ref id="scirp.96711-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Gribben, D. and Tran, T. (2019) Tracking and Classification of Multiple Human Objects Directly in Compressive Measurement Domain for Low Quality Optical Videos. IEEE Ubiquitous Computing, Electronics &amp; Mobile Communication Conference, New York City.</mixed-citation></ref><ref id="scirp.96711-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Chou, B., Yang, J., Rangamani, A., Tran, T., Zhang, J. and Etienne-Cummings, R. (2019) Target Tracking and Classification Directly Using Compressive Sensing Camera for SWIR Videos. Signal, Image, and Video Processing, 13, 1629-1637. https://doi.org/10.1007/s11760-019-01506-4</mixed-citation></ref><ref id="scirp.96711-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Chou, B., Yang, J., Rangamani, A., Tran, T., Zhang, J. and Etienne-Cummings, R. (2019) Target Tracking and Classification Using Compressive Measurements of MWIR and LWIR Coded Aperture Cameras. Journal Signal and Information Processing, 10, 73-95. https://doi.org/10.4236/jsip.2019.103006</mixed-citation></ref><ref id="scirp.96711-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Chou, B., Yang, J., Rangamani, A., Tran, T., Zhang, J. and Etienne-Cummings, R., (2019) Deep Learning based Target Tracking and Classification for Low Quality Videos Using Coded Aperture Camera. Sensors, 19, 3702. 
https://doi.org/10.3390/s19173702</mixed-citation></ref><ref id="scirp.96711-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Yang, M.H., Zhang, K. and Zhang, L. (2012) Real-Time Compressive Tracking. In European Conference on Computer Vision.</mixed-citation></ref><ref id="scirp.96711-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. 2016 Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 27-30 June 2016, 770-778. https://doi.org/10.1109/CVPR.2016.90</mixed-citation></ref><ref id="scirp.96711-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Redmon, J. and Farhadi, A. (2018) YOLOv3: An Incremental Improvement.</mixed-citation></ref><ref id="scirp.96711-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Ren S., He, K., Girshick, R. and Sun, J. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In: Advances in Neural Information Processing Systems, 1-9.</mixed-citation></ref><ref id="scirp.96711-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Kwan, C., Chou, B., Yang, J., Budavari, B., and Tran, T. (2019) Compressive Object Tracking and Classification Using Deep Learning for Infrared Videos. Pattern Recognition and Tracking XXX (Conference SI120).  
https://doi.org/10.1117/12.2518490</mixed-citation></ref><ref id="scirp.96711-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O. and Torr, P. (2016) Staple: Complementary Learners for Real-Time Tracking. 2016 Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 27-30 June 2016, 1401-1409.  
https://doi.org/10.1109/CVPR.2016.156</mixed-citation></ref><ref id="scirp.96711-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Stauffer, C. and Grimson, W.E.L. (1999) Adaptive Background Mixture Models for Real-Time Tracking, Computer Vision and Pattern Recognition. IEEE Computer Society Conference, 2, 2246-2252.</mixed-citation></ref></ref-list></back></article>