<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JIS</journal-id><journal-title-group><journal-title>Journal of Information Security</journal-title></journal-title-group><issn pub-type="epub">2153-1234</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jis.2016.75024</article-id><article-id pub-id-type="publisher-id">JIS-70564</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Data Compression for Next Generation Phasor Data Concentrators (PDCs) in a Smart Grid
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Erwan</surname><given-names>Olivo</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Mitch</surname><given-names>Campion</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Prakash</surname><given-names>Ranganathan</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Electrical Engineering, University of North Dakota, Grand Forks, ND, USA</addr-line></aff><pub-date pub-type="epub"><day>30</day><month>08</month><year>2016</year></pub-date><volume>07</volume><issue>05</issue><fpage>291</fpage><lpage>296</lpage><history><date date-type="received"><day>April</day>	<month>16,</month>	<year>2016</year></date><date date-type="rev-recd"><day>Accepted:</day>	<month>September</month>	<year>11,</year>	</date><date date-type="accepted"><day>September</day>	<month>14,</month>	<year>2016</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  The storage space and cost for Smart Grid datasets has been growing exponentially due to its high data-rate of various sensor readings from Automated Metering Infrastructure (AMI), and Phasor Measurement Units (PMUs). The paper focuses on Phasor 
  Data Concentrators (PDCs) that aggregate data from PMUs. PMUs measure
   real-time voltage, current and frequency parameters across the electrical grid. A typical PDC can process data from anywhere ten to forty PMUs. The paper exploits the need for appropriate security and data compression challenges simultaneously. As a result, an optimal compression method ER1c is investigated for efficient storage of IREG and C37.118 timestamped PDC data sets. We expect that our approach can greatly reduce the storage cost requirements of commercial available PDCs (SEL 3373, GE Multilin P30) by 80%. For example, 2 years of PDC data storage space can be easily replaced with only 10 days of storage space. In addition, our approach in combination with AES 256 encryption can protect PDC data to larger degree as per National Institute of Standards and Technology (NIST) standards.
 
</p></abstract><kwd-group><kwd>Compression</kwd><kwd> PDCs</kwd><kwd> Data Security</kwd><kwd> Smart Metering</kwd><kwd> Smart Grid</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>The number of PMU deployment is growing exponentially in North America, and hence the amount of data to be stored even for a short-period is large [<xref ref-type="bibr" rid="scirp.70564-ref1">1</xref>] . The Phasor Data Concentrators (PDCs) have limited storage space (estimated 100 GB) and vary depending on vendors. So, data have to be stored in alternative formats such as DVDs or hard drives. The main idea of this paper is to optimize the storage of such data sets that has large data rates (~30 samples/second).</p><p>This optimal reduction in file or storage sizes of PDC data can help in reducing storage cost, efficiently secure and organize SQL queries and retrieval with faster download-time. The parameters (compression rates, data retrieval time) have considered as a benchmark performance metrics at super PDC level.</p><p>This paper specifically uses a two-level of compression for PDC data with AES 256- bit encryption (see <xref ref-type="fig" rid="fig1">Figure 1</xref>). There are very limited literatures on PMU data compression. For example M. H. H. Wen and O. K. Li in [<xref ref-type="bibr" rid="scirp.70564-ref2">2</xref>] discuss a simple compression unit installation in the grid between the PMUs and a PDC with no security guaranteed. In [<xref ref-type="bibr" rid="scirp.70564-ref4">4</xref>] , F. Zhang, L. Cheng, X. Li, Y. Sun, W. Gao and W. Zhao proposed a compression approach for real-time PMU data for Wide-Area Measurement Systems (WAMS). They used a swing door trending compression method at the PMU level for reducing the amount of data in the network. The security of PMU data has been not well studied and limited literatures do exist. In fact, the IEEE C37.118 standard does not give proper directions or solutions on how PMU data can be secured. In [<xref ref-type="bibr" rid="scirp.70564-ref3">3</xref>] , authors discuss how decision trees approach for synchronized phasor measurements can improve the security of grid from voltage collapse. This paper discusses a novel and optimal compression algorithm known as Erwan-Ranganathan compression version (ER1c.v.1) method to process streaming PDC data sets. We define compression ratio (CR) by the following Equation (1):</p><disp-formula id="scirp.70564-formula58"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/2-7800381x2.png"  xlink:type="simple"/></disp-formula><p>For example, a file with initial size of 1200 KB can be reduced to 300 KB through compression, and will have a compression ratio of 4. The higher the ratio is, the better the compression method and for efficient storage will be. The PMUs data are a measurement of voltage, current, frequency and phase angle of voltage and current marked with a timestamp. The timestamp follows rules as per IEEE C37.118 or IRIG 200-04 standards. IRIG-B Timestamp format is a standard for encoding timestamp information for PMU data. The specific standard followed for data is IRIG STANDARD 200-04 [<xref ref-type="bibr" rid="scirp.70564-ref4">4</xref>] . An IRIG-B frame uses 74 bits to encode information in Binary Coded Decimal (BCD). This frame is divided into 5 parts for the date (39 bits). In addition, there are 35 more bits for time quality, leap second, leap year and local offset. This data frame is longer than</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> Secure Data Storage Architecture for Streaming PMU data in PDC’s</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-7800381x3.png"/></fig><p>C37.118 standard, because of the index position marker, which use some bits too. This IRIG based PMU timestamp allows a better implementation for a human readable application, because the time corresponds to real-time.</p></sec><sec id="s2"><title>2. Optimal Encoded PMU Measurements</title><p>All PMU measurements are encoded in ASCII characters. Mostly, the parameters in PMU measurements (f, v, and i) and some section of time-stamp (date, month, or day) remain same and redundantly repeated, and thus encoded value do not have to be changed frequently. For example, frequency (f) should be at 60 Hz, the voltage v at 300 kV and the current i at 500 A. Only during the event of any anomalies or changes in measurements, any altered values from this value need to be correctly encoded. Any such sudden change in data pattern can easily be tracked in our approach. Due to limited encoding fields (FRACSEC, SOC) on the time-stamp, the proposed compression technique is able to reduce the original file sizes of PDC’s data. The results of our compression is discussed in the next section. PMUs considered for this investigation has a data rate of 30 samples per second. So, the number of samples remain constant for every second. These data rate patterns are not detected by the classical compression methods (Huffman coding, Dictionary coders or prediction by partial matching [<xref ref-type="bibr" rid="scirp.70564-ref5">5</xref>] - [<xref ref-type="bibr" rid="scirp.70564-ref7">7</xref>] ), and thus compression of PMUs data can yield better lossless compression ratios compared to other classical software compression.</p><p><xref ref-type="fig" rid="fig1">Figure 1</xref> shows the proposed architecture for efficient data storage for PMU data at PDC level. The compression and encryption discussed here is for the PDC data sets. ER1c is a developed software program that compress PDC data in an un-encrypted format type (human readable form). It uses the strength of both the Huffman coding, dictionary coders, Prediction by Partial Matching (PPM) algorithm [<xref ref-type="bibr" rid="scirp.70564-ref7">7</xref>] . ER1c itself is an optimized program that yield savings on storage cost, and compression efficiency compared to commercial available PDCs. The program optimizes any redundant data fields, and automate sections of code for better compression ratio. ER1c program reads PMU data frame by frame, but not as a character by character. A frame is composed of both a timestamp and measurement as seen below. The measurement parameter shown here is a frequency of the grid, as an example.</p><p>Frame 1: 05-Dec-2015 17:41:36.666, 59.10 Hz.</p><p>Frame 2: 05-Dec-2015 17:41:36.700, 60.01 Hz.</p><p>The focus restricts only to redundant time-stamp information in the PMU data. The classical compression methods such as dictionary coders could compress all fields like date, hour, minute, seconds and milli-second. Thus, there exist a possibility of compressing the redundant data fields such as the date, day, hour, minute and seconds to certain period of duration. These are repeated information which are encoded again line by line. This is an un-necessary process and results in wastage of CPU time and storage costs. To avoid this problem, our ER1c program will capture the initial time- stamp information only once with its date, day, hour, minute, and second. Any repeated information will not be encoded or compressed again to save storage cost. The only varying and non-repeated data field is the milli-second (ms) information, which will be encoded continuously. See the pseudo code shown in Example 1.</p><p>The ER1-c program can also detect and check whether the duration between each measurement is consistent or not. In other words, it would check the number of samples per second for data validation depending on PMU type. We assumed the PMU used in the PDC data set has a data rate of 30 samples per second. ER1c is a simple and optimal program to detect time-stamp errors. For example, if a second time-stamp (frame 2) following a first time stamp (frame 1) are not respecting duration between each measurement, the ER1c program will catch these duration or sampling errors.</p></sec><sec id="s3"><title>3. Comparison of Compression Ratios (CR)</title><p><xref ref-type="table" rid="table1">Table 1</xref> and <xref ref-type="table" rid="table2">Table 2</xref> show the compression ratios using various methods for frequency (f), current (c), voltage (v) and phase angle (ph) data, and <xref ref-type="fig" rid="fig2">Figure 2</xref> show compression speeds as number of PMU scales. The results obtained using our approach 10&#215; better than other conventional compression techniques. The more the data to compress, the better the utilization of ER1c compression is good. It is observed that ER1c offer better compression ratio (almost 3&#215; better) than 7z, rar, zip, zipx, and uha methods.</p><p>The decompression process is shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p><disp-formula id="scirp.70564-formula59"><graphic  xlink:href="http://html.scirp.org/file/2-7800381x4.png"  xlink:type="simple"/></disp-formula><p>Example 1. Pseudocode for handling redundant time-stamps.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Compression ratio for 5 minutes dataset</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Compression Type</th><th align="center" valign="middle" >CR<sub>f</sub><sub> </sub></th><th align="center" valign="middle" >CR<sub>C</sub></th><th align="center" valign="middle" >CR<sub>v</sub></th><th align="center" valign="middle" >CR<sub>ph</sub></th></tr></thead><tr><td align="center" valign="middle" >7z</td><td align="center" valign="middle" >17.15</td><td align="center" valign="middle" >20.04</td><td align="center" valign="middle" >19.88</td><td align="center" valign="middle" >19.88</td></tr><tr><td align="center" valign="middle" >rar</td><td align="center" valign="middle" >10.11</td><td align="center" valign="middle" >10.72</td><td align="center" valign="middle" >10.65</td><td align="center" valign="middle" >11.79</td></tr><tr><td align="center" valign="middle" >zip</td><td align="center" valign="middle" >10.16</td><td align="center" valign="middle" >6.25</td><td align="center" valign="middle" >7.13</td><td align="center" valign="middle" >5.83</td></tr><tr><td align="center" valign="middle" >zipx</td><td align="center" valign="middle" >18.66</td><td align="center" valign="middle" >11.72</td><td align="center" valign="middle" >12.84</td><td align="center" valign="middle" >11.43</td></tr><tr><td align="center" valign="middle" >uha</td><td align="center" valign="middle" >11.92</td><td align="center" valign="middle" >20.83</td><td align="center" valign="middle" >22.22</td><td align="center" valign="middle" >20.88</td></tr><tr><td align="center" valign="middle" >ER1c</td><td align="center" valign="middle" >30.69</td><td align="center" valign="middle" >11.40</td><td align="center" valign="middle" >10.97</td><td align="center" valign="middle" >11.69</td></tr><tr><td align="center" valign="middle" >ER1c + 7z</td><td align="center" valign="middle" >160.08</td><td align="center" valign="middle" >81.74</td><td align="center" valign="middle" >96.97</td><td align="center" valign="middle" >86.66</td></tr><tr><td align="center" valign="middle" >ER1c + rar</td><td align="center" valign="middle" >173.76</td><td align="center" valign="middle" >80.26</td><td align="center" valign="middle" >93.43</td><td align="center" valign="middle" >85.18</td></tr><tr><td align="center" valign="middle" >ER1c + zip</td><td align="center" valign="middle" >167.89</td><td align="center" valign="middle" >78.73</td><td align="center" valign="middle" >94.98</td><td align="center" valign="middle" >84.61</td></tr><tr><td align="center" valign="middle" >ER1c + zipx</td><td align="center" valign="middle" >184.49</td><td align="center" valign="middle" >81.33</td><td align="center" valign="middle" >100.89</td><td align="center" valign="middle" >90.43</td></tr><tr><td align="center" valign="middle" >ER1c + uha</td><td align="center" valign="middle" >180.63</td><td align="center" valign="middle" >83.43</td><td align="center" valign="middle" >109.48</td><td align="center" valign="middle" >90.93</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Compression ratio for 20 minutes dataset</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Compression Type</th><th align="center" valign="middle" >CR<sub>f</sub></th><th align="center" valign="middle" >CR<sub>C</sub></th><th align="center" valign="middle" >CR<sub>v</sub></th><th align="center" valign="middle" >CR<sub>ph</sub></th></tr></thead><tr><td align="center" valign="middle" >7z</td><td align="center" valign="middle" >18.05</td><td align="center" valign="middle" >24.45</td><td align="center" valign="middle" >23.87</td><td align="center" valign="middle" >24.94</td></tr><tr><td align="center" valign="middle" >rar</td><td align="center" valign="middle" >9.99</td><td align="center" valign="middle" >13.53</td><td align="center" valign="middle" >12.18</td><td align="center" valign="middle" >15.02</td></tr><tr><td align="center" valign="middle" >zip</td><td align="center" valign="middle" >10.21</td><td align="center" valign="middle" >6.27</td><td align="center" valign="middle" >7.14</td><td align="center" valign="middle" >5.84</td></tr><tr><td align="center" valign="middle" >zipx</td><td align="center" valign="middle" >19.12</td><td align="center" valign="middle" >13.30</td><td align="center" valign="middle" >13.89</td><td align="center" valign="middle" >13.62</td></tr><tr><td align="center" valign="middle" >uha</td><td align="center" valign="middle" >12.02</td><td align="center" valign="middle" >26.18</td><td align="center" valign="middle" >27.03</td><td align="center" valign="middle" >29.24</td></tr><tr><td align="center" valign="middle" >ER1c</td><td align="center" valign="middle" >29.11</td><td align="center" valign="middle" >11.38</td><td align="center" valign="middle" >10.95</td><td align="center" valign="middle" >11.65</td></tr><tr><td align="center" valign="middle" >ER1c + 7z</td><td align="center" valign="middle" >271.50</td><td align="center" valign="middle" >192.74</td><td align="center" valign="middle" >223.13</td><td align="center" valign="middle" >199.50</td></tr><tr><td align="center" valign="middle" >ER1c + rar</td><td align="center" valign="middle" >270.49</td><td align="center" valign="middle" >181.61</td><td align="center" valign="middle" >197.09</td><td align="center" valign="middle" >190.01</td></tr><tr><td align="center" valign="middle" >ER1c + zip</td><td align="center" valign="middle" >241.18</td><td align="center" valign="middle" >176.51</td><td align="center" valign="middle" >193.00</td><td align="center" valign="middle" >182.05</td></tr><tr><td align="center" valign="middle" >ER1c + zipx</td><td align="center" valign="middle" >274.62</td><td align="center" valign="middle" >211.95</td><td align="center" valign="middle" >243.84</td><td align="center" valign="middle" >229.28</td></tr><tr><td align="center" valign="middle" >ER1c + uha</td><td align="center" valign="middle" >298.62</td><td align="center" valign="middle" >206.44</td><td align="center" valign="middle" >250.60</td><td align="center" valign="middle" >220.22</td></tr></tbody></table></table-wrap><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> Compression speed with ER1c</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-7800381x5.png"/></fig><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> Decoding process of PMU data using ER1-c</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/2-7800381x6.png"/></fig></sec><sec id="s4"><title>4. Conclusion</title><p>An optimal compression method for streaming time-stamped data sets for PDCs is presented. The proposed approach is suitable at PDC level for efficient data storage, retrieval and post-event analysis. The preliminary results indicate that ER1c with combination from existing compression techniques can yield better compression ratio. We expect that our approach can greatly reduce the storage cost requirements of commercial available PDCs to 80%. For example, 2 years of PDC data storage capacity can be easily replaced by only 10 days of capacity. In addition, our approach with combination of AES 256 encryption can protect PDC data with a greater confidence and thus increase the security of growing big data sets in smart grid network.</p></sec><sec id="s5"><title>Acknowledgements</title><p>This work is made possible through UND’s RD &amp; C (21418-4010-02294).</p></sec><sec id="s6"><title>Cite this paper</title><p>Olivo, E., Campion, M. and Ranganathan, P. (2016) Data Compression for Next Generation Phasor Data Concentrators (PDCs) in a Smart Grid. Journal of Information Security, 7, 291-296. http://dx.doi.org/10.4236/jis.2016.75024</p></sec></body><back><ref-list><title>References</title><ref id="scirp.70564-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Cleary, J.G. and Ian, H. (1984) Witten Data Compression Using Adaptive Coding and Partial String Matching. IEEE Transactions on Communications, 32, 396-402.  
http://dx.doi.org/10.1109/TCOM.1984.1096090</mixed-citation></ref><ref id="scirp.70564-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Bhadade, U.S. and Trivedi, A.I. (2011) Lossless Text Compression using Dictionaries. International Journal of Computer Applications, 13.</mixed-citation></ref><ref id="scirp.70564-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Kavousianos, X., Kalligeros, E. and Nikolos, D. (2007) Optimal Selective Huffman Coding for Test-Data Compression. IEEE Transactions on Computers, 56, 1146-1152.</mixed-citation></ref><ref id="scirp.70564-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">IRIG Standard 200-04 2004. http://www.irigb.com/pdf/wp-irig-200-04.pdf</mixed-citation></ref><ref id="scirp.70564-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Khatib, A.R., Nuqui, R.F., Ingram, M.R. and Phadke, A.G. (2004) Real-Time Estimation of Security from Voltage Collapse Using Synchronized Phasor Measurements. IEEE Power Engineering Society General Meeting, 1, 582-588.</mixed-citation></ref><ref id="scirp.70564-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Wen, M.F. and Li, V.O.K. (2015) Optimal Phasor Data Compression Unit Installation for Wide-Area Measurement Systems—An Integer Linear Programming Approach. IEEE Transactions on Smart Grid, 1949-3053. http://dx.doi.org/10.1109/tsg.2015.2503425</mixed-citation></ref><ref id="scirp.70564-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Bose (2010) Smart Transmission Grid Applications and Their Supporting Infrastructure. IEEE Transactions on Power Systems, 1, 1949-3053.</mixed-citation></ref></ref-list></back></article>