<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JIS</journal-id><journal-title-group><journal-title>Journal of Information Security</journal-title></journal-title-group><issn pub-type="epub">2153-1234</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jis.2014.52006</article-id><article-id pub-id-type="publisher-id">JIS-44440</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Malware Analysis and Classification: A Survey
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>kta</surname><given-names>Gandotra</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Divya</surname><given-names>Bansal</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Sanjeev</surname><given-names>Sofat</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Computer Science and Engineering, PEC University of Technology, Chandigarh, India</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>ekta.gandotra@gmail.com(KG)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>20</day><month>02</month><year>2014</year></pub-date><volume>05</volume><issue>02</issue><fpage>56</fpage><lpage>64</lpage><history><date date-type="received"><day>21</day>	<month>February</month>	<year>2014</year></date><date date-type="rev-recd"><day>21</day>	<month>March</month>	<year>2014</year>	</date><date date-type="accepted"><day>28</day>	<month>March</month>	<year>2014</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
   One of the major and serious threats on the Internet today is malicious software, often referred to as a malware. The malwares being designed by attackers are polymorphic and metamorphic which have the ability to change their code as they propagate. Moreover, the diversity and volume of their variants severely undermine the effectiveness of traditional defenses which typically use signature based techniques and are unable to detect the previously unknown malicious executables. The variants of malware families share typical behavioral patterns reflecting their origin and purpose. The behavioral patterns obtained either statically or dynamically can be exploited to detect and classify unknown malwares into their known families using machine learning techniques. This survey paper provides an overview of techniques for analyzing and classifying the malwares.  
     
 
</p></abstract><kwd-group><kwd>Malware; Static Analysis; Dynamic Analysis; Machine Learning; Classification; Clustering</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Software that “deliberately fulfills the harmful intent of an attacker” is referred to as malicious software or malware [<xref ref-type="bibr" rid="scirp.44440-ref1">1</xref>] . These are intended to gain access to computer systems and network resources, disturb computer operations, and gather personal information without taking the consent of system’s owner, thus creating a menace to the availability of the internet, integrity of its hosts, and the privacy of its users. Malwares come in wide range of variations like Virus, Worm, Trojan-horse, Rootkit, Backdoor, Botnet, Spyware, Adware etc. These classes of malwares are not mutually exclusive meaning thereby that a particular malware may reveal the characteristics of multiple classes at the same time.</p><p>Malware is one of the most terrible and major security threats facing the Internet today. According to a survey, [<xref ref-type="bibr" rid="scirp.44440-ref2">2</xref>] conducted by FireEye in June 2013, 47% of the organizations experienced malware security incidents/network breaches in the past one year. The malwares are continuously growing in volume (growing threat landscape), variety (innovative malicious methods) and velocity (fluidity of threats) [<xref ref-type="bibr" rid="scirp.44440-ref3">3</xref>] . These are evolving, becoming more sophisticated and using new ways to target computers and mobile devices. McAfee [<xref ref-type="bibr" rid="scirp.44440-ref4">4</xref>] catalogs over 100,000 new malware samples every day means about 69 new threats every minute or about one threat per second. With the increase in readily available and sophisticated tools, the new generation cyber threats/attacks are becoming more targeted, persistent and unknown. <xref ref-type="fig" rid="fig1">Figure 1</xref> depicts the comparison of traditional and advanced malwares. The advanced malwares are targeted, unknown, stealthy, personalized and zero day as compared to the traditional malwares which were broad, known, open and one time. Once inside, they hide, replicate and disable host protections. After getting installed, they call their command and control servers for further instructions, which could be to steal data, infect other machines, and allow reconnaissance [<xref ref-type="bibr" rid="scirp.44440-ref5">5</xref>] .</p><p>Attackers exploit vulnerabilities in web services, browsers and operating systems, or use social engineering techniques to make users run the malicious code in order to spread malwares. Malware authors use obfuscation techniques [<xref ref-type="bibr" rid="scirp.44440-ref6">6</xref>] like dead code insertion, register reassignment, subroutine reordering, instruction substitution, code transposition, and code integration to evade detection by traditional defenses like firewalls, antivirus and gateways which typically use signature based techniques and are unable to detect the previously unseen malicious executables. Commercial antivirus vendors are not able to offer immediate protection for zero day malwares as they need to analyze these to create their signatures.</p><p>To overcome the limitation of signature based methods, malware analysis techniques are being followed, which can be either static or dynamic. The malware analysis techniques help the analysts to understand the risks and intensions associated with a malicious code sample. The insight so obtained can be used to react to new trends in malware development or take preventive measures to cope with the threats coming in future. Features derived from analysis of malware can be used to group unknown malwares and classify them into their existing families. This paper presents a review of techniques/approaches for analyzing and classifying the malware executables.</p></sec><sec id="s2"><title>2. Malware Analysis</title><p>Before creating the signatures for newly arrived malwares, these are required to be analyzed so as to understand the associated risks and intensions. The malicious program and its capabilities can be observed either by examining its code or by executing it in a safe environment.</p><sec id="s2_1"><title>2.1. Static Analysis</title><p>Analyzing malicious software without executing it is called static analysis. The detection patterns used in static analysis include string signature, byte-sequence n-grams, syntactic library call, control flow graph and opcode (operational code) frequency distribution etc. The executable has to be unpacked and decrypted before doing static analysis. The disassembler/debugger and memory dumper tools can be used to reverse com-</p></sec></sec></body><back><ref-list><title>References</title><ref id="scirp.44440-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Bayer, U., Moser, A., Kruegel, C. and Kirda, E. (2006) Dynamic Analysis of Malicious Code. Journal in Computer Virology, 2, 67-77. http://dx.doi.org/10.1007/s11416-006-0012-2</mixed-citation></ref><ref id="scirp.44440-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">(2013) The Need for Speed: 2013 Incident Response Survey, FireEye. http://www.inforisktoday.in/surveys/2013-incident-response-survey-s-18</mixed-citation></ref><ref id="scirp.44440-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">(2012) Addressing Big Data Security Challenges: The Right Tools for Smart Protection.http://www.trendmicro.com/cloud-content/us/pdfs/business/white-papers/wp_addressing-big-data-security-challenges.pdf</mixed-citation></ref><ref id="scirp.44440-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">(2013) Infographic: The State of Malware. http://www.mcafee.com/in/security-awareness/articles/state-of-malware-2013.aspx</mixed-citation></ref><ref id="scirp.44440-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">(2013) Next Generation Threats. http://www.fireeye.com/threat-protection/</mixed-citation></ref><ref id="scirp.44440-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">You, I. and Yim, K. (2010) Malware Obfuscation Techniques: A Brief Survey. Proceedings of International conference on Broadband, Wireless Computing, Communication and Applications, Fukuoka, 4-6 November 2010, 297-300. http://dx.doi.org/10.1109/BWCCA.2010.85</mixed-citation></ref><ref id="scirp.44440-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">IDAPro. https://www.hex-rays.com/products/ida/support/download_freeware.shtml</mixed-citation></ref><ref id="scirp.44440-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">OllyDbg. http://www.ollydbg.de/</mixed-citation></ref><ref id="scirp.44440-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">LordPE. http://www.woodmann.com/collaborative/tools/index.php/LordPE</mixed-citation></ref><ref id="scirp.44440-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">OllyDump. http://www.woodmann.com/collaborative/tools/index.php/OllyDump</mixed-citation></ref><ref id="scirp.44440-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Egele, M., Scholte, T., Kirda, E. and Kruegel, C. (2012) A Survey on Automated Dynamic Malware-Analysis Techniques and Tools. Journal in ACM Computing Surveys, 44, Article No. 6.</mixed-citation></ref><ref id="scirp.44440-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Moser, A., Kruegel, C.and Kirda, E. (2007) Limits of Static Analysis for Malware Detection. 23rd Annual Computer Security Applications Conference, Miami Beach, 421-430.</mixed-citation></ref><ref id="scirp.44440-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">(2014) Process Monitor. http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx</mixed-citation></ref><ref id="scirp.44440-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Capture BAT. https://www.honeynet.org/node/315</mixed-citation></ref><ref id="scirp.44440-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">(2014) Process Explorer. http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx</mixed-citation></ref><ref id="scirp.44440-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Process Hackerreplace. http://processhacker.sourceforge.net/</mixed-citation></ref><ref id="scirp.44440-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Wireshark. http://www.wireshark.org/</mixed-citation></ref><ref id="scirp.44440-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Regshot. http://sourceforge.net/projects/regshot/</mixed-citation></ref><ref id="scirp.44440-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Norman Sandbox. http://sandbox.norman.no</mixed-citation></ref><ref id="scirp.44440-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Willems, C., Holz, T. and Freiling, F. (2007) Toward Automated Dynamic Malware Analysis Using Cwsandbox. IEEE Security &amp; Privacy, 5, 32-39. http://dx.doi.org/10.1109/MSP.2007.45</mixed-citation></ref><ref id="scirp.44440-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Anubis. http://anubis.iseclab.org/</mixed-citation></ref><ref id="scirp.44440-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Bayer, U., Kruegel, C. and Kirda, E. (2006) TTAnalyze: A Tool for Analyzing Malware. Proceedings of the 15th European Institute for Computer Antivirus Research Annual Conference.</mixed-citation></ref><ref id="scirp.44440-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Dinaburg, A., Royal, P., Sharif, M. and Lee, W. (2008) Ether: Malware Analysis via Hardware Virtualization Extensions. Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS’08, Alexandria, 27-31 October 2008, 51-62.</mixed-citation></ref><ref id="scirp.44440-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">ThreatExpert. http://www.threatexpert.com/submit.aspx</mixed-citation></ref><ref id="scirp.44440-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Schultz, M., Eskin, E., Zadok, F. and Stolfo, S. (2001) Data Mining Methods for Detection of New Malicious Executables. Proceedings of 2001 IEEE Symposium on Security and Privacy, Oakland, 14-16 May 2001, 38-49.</mixed-citation></ref><ref id="scirp.44440-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Cohen, W. (1995) Fast Effective Rule Induction. Proceedings of 12th International Conference on Machine Learning, San Francisco, 115-123.</mixed-citation></ref><ref id="scirp.44440-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Kolter, J. and Maloof, M. (2004) Learning to Detect Malicious Executables in the Wild. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 470-478.</mixed-citation></ref><ref id="scirp.44440-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Nataraj, L., Karthikeyan, S., Jacob, G. and Manjunath, B. (2011) Malware Images: Visualization and Automatic Classification. Proceedings of the 8th International Symposium on Visualization for Cyber Security, Article No. 4.</mixed-citation></ref><ref id="scirp.44440-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Nataraj, L., Yegneswaran, V., Porras, P. and Zhang, J. (2011) A Comparative Assessment of Malware Classification Using Binary Texture Analysis and Dynamic Analysis. Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, 21-30.</mixed-citation></ref><ref id="scirp.44440-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Kong, D. and Yan, G. (2013) Discriminant Malware Distance Learning on Structural Information for Automated Malware Classification. Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems, 347-348.</mixed-citation></ref><ref id="scirp.44440-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Tian, R., Batten, L. and Versteeg, S. (2008) Function Length as a Tool for Malware Classification. Proceedings of the 3rd International Conference on Malicious and Unwanted Software, Fairfax, 7-8 October 2008, 57-64.</mixed-citation></ref><ref id="scirp.44440-ref32"><label>32</label><mixed-citation publication-type="other" xlink:type="simple">Tian, R., Batten, L., Islam, R. and Versteeg, S. (2009) An Automated Classification System Based on the Strings of Trojan and Virus Families. Proceedings of the 4th International Conference on Malicious and Unwanted Software, Montréal, 13-14 October 2009, 23-30.</mixed-citation></ref><ref id="scirp.44440-ref33"><label>33</label><mixed-citation publication-type="other" xlink:type="simple">Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I. (2009) The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter, 10-18.</mixed-citation></ref><ref id="scirp.44440-ref34"><label>34</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Santos</surname><given-names> I.</given-names></name>,<name name-style="western"><surname> Nieves</surname><given-names> J. and Bringas</given-names></name>,<name name-style="western"><surname> P.G. </surname><given-names>  </given-names></name>,<etal>et al</etal>. (<year>2011</year>)<article-title>Semi-Supervised Learning for Unknown Malware Detection</article-title><source> International Symposium on Distributed Computing and Artificial Intelligence Advances in Intelligent and Soft Computing</source><volume> 91</volume>,<fpage> 415</fpage>-<lpage>422</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.44440-ref35"><label>35</label><mixed-citation publication-type="other" xlink:type="simple">Moskovitch, R., Stopel, D., Feher, C., Nissim, N. and Elovici, Y. (2008) Unknown Malcode Detection via Text Categorization and the Imbalance Problem. Proceedings of the 6th IEEE International Conference on Intelligence and Security Informatics, Taipei, 17-20 June 2008, 156-161.</mixed-citation></ref><ref id="scirp.44440-ref36"><label>36</label><mixed-citation publication-type="other" xlink:type="simple">Santos, I., Nieves, J. and Bringas, P.G. (2011) Collective Classification for Unknown Malware Detection. Proceedings of the International Conference on Security and Cryptography, Seville, 18-21 July 2011, 251-256.</mixed-citation></ref><ref id="scirp.44440-ref37"><label>37</label><mixed-citation publication-type="other" xlink:type="simple">Siddiqui, M., Wang, M.C. and Lee, J. (2009) Detecting Internet Worms Using Data Mining Techniques. Journal of Systemics, Cybernetics and Informatics, 6, 48-53.</mixed-citation></ref><ref id="scirp.44440-ref38"><label>38</label><mixed-citation publication-type="other" xlink:type="simple">Zolkipli, M.F. and Jantan, A. (2011) An Approach for Malware Behavior Identification and Classification. Proceeding of 3rd International Conference on Computer Research and Development, Shanghai, 11-13 March 2011, 191-194.</mixed-citation></ref><ref id="scirp.44440-ref39"><label>39</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Rieck</surname><given-names> K.</given-names></name>,<name name-style="western"><surname> Trinius</surname><given-names> P.</given-names></name>,<name name-style="western"><surname> Willems</surname><given-names> C. and Holz</given-names></name>,<name name-style="western"><surname> T. </surname><given-names>  </given-names></name>,<etal>et al</etal>. (<year>2011</year>)<article-title>Automatic Analysis of Malware Behavior Using Machine Learning</article-title><source> Journal of Computer Security</source><volume> 19</volume>,<fpage> 639</fpage>-<lpage>668</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.44440-ref40"><label>40</label><mixed-citation publication-type="other" xlink:type="simple">Anderson, B., Quist, D., Neil, J., Storlie, C. and Lane, T. (2011) Graph Based Malware Detection Using Dynamic Analysis. Journal in Computer Virology, 7, 247-258. http://dx.doi.org/10.1007/s11416-011-0152-x</mixed-citation></ref><ref id="scirp.44440-ref41"><label>41</label><mixed-citation publication-type="other" xlink:type="simple">Bayer, U., Comparetti, P.M., Hlauschek, C. and Kruegel, C. (2009) Scalable, Behavior-Based Malware Clustering. Proceedings of the 16th Annual Network and Distributed System Security Symposium.</mixed-citation></ref><ref id="scirp.44440-ref42"><label>42</label><mixed-citation publication-type="other" xlink:type="simple">Indyk, P. and Motwani, R. (1998) Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. Proceedings of 30th Annual ACM Symposium on Theory of Computing, Dallas, 24-26 May 1998, 604-613.</mixed-citation></ref><ref id="scirp.44440-ref43"><label>43</label><mixed-citation publication-type="other" xlink:type="simple">Tian, R., Islam, M.R., Batten, L. and Versteeg, S. (2010) Differentiating Malware from Cleanwares Using Behavioral Analysis. Proceedings of 5th International Conference on Malicious and Unwanted Software (Malware), Nancy, 19-20 October 2010, 23-30.</mixed-citation></ref><ref id="scirp.44440-ref44"><label>44</label><mixed-citation publication-type="other" xlink:type="simple">Biley, M., Oberheid, J., Andersen, J., Morley Mao, Z., Jahanian, F. and Nazario, J. (2007) Automated Classification and Analysis of Internet Malware. Proceedings of the 10th International Conference on Recent Advances in Intrusion Detection, 4637, 178-197. http://dx.doi.org/10.1007/978-3-540-74320-0_10</mixed-citation></ref><ref id="scirp.44440-ref45"><label>45</label><mixed-citation publication-type="other" xlink:type="simple">Park, Y., Reeves, D., Mulukutla, V. and Sundaravel, B. (2010) Fast Malware Classification by Automated Behavioral Graph Matching. Proceedings of the 6th Annual Workshop on Cyber Security and Information Intelligence Research, Article No. 45.</mixed-citation></ref><ref id="scirp.44440-ref46"><label>46</label><mixed-citation publication-type="other" xlink:type="simple">Firdausi, I., Lim, C. and Erwin, A. (2010) Analysis of Machine Learning Techniques Used in Behavior Based Malware Detection. Proceedings of 2nd International Conference on Advances in Computing, Control and Telecommunication Technologies (ACT), Jakarta, 2-3 December 2010, 201-203.</mixed-citation></ref><ref id="scirp.44440-ref47"><label>47</label><mixed-citation publication-type="other" xlink:type="simple">Nari, S. and Ghorbani, A. (2013) Automated Malware Classification Based on Network Behavior. Proceedings of International Conference on Computing, Networking and Communications (ICNC), San Diego, 28-31 January 2013, 642-647.</mixed-citation></ref><ref id="scirp.44440-ref48"><label>48</label><mixed-citation publication-type="other" xlink:type="simple">Lee, T. and Mody, J.J. (2006) Behavioral Classification. Proceedings of the European Institute for Computer Antivirus Research Conference (EICAR’06).</mixed-citation></ref><ref id="scirp.44440-ref49"><label>49</label><mixed-citation publication-type="other" xlink:type="simple">Santos, I., Devesa, J., Brezo, F., Nieves, J. and Bringas, P.G. (2013) OPEM: A Static-Dynamic Approach for Machine Learning Based Malware Detection. Proceedings of International Conference CISIS’12-ICEUTE’12, Special Sessions Advances in Intelligent Systems and Computing, 189, 271-280.</mixed-citation></ref><ref id="scirp.44440-ref50"><label>50</label><mixed-citation publication-type="other" xlink:type="simple">Islam, R., Tian, R., Battenb, L. and Versteeg, S. (2013) Classification of Malware Based on Integrated Static and Dynamic Features. Journal of Network and Computer Application, 36, 646-556. http://dx.doi.org/10.1016/j.jnca.2012.10.004</mixed-citation></ref><ref id="scirp.44440-ref51"><label>51</label><mixed-citation publication-type="other" xlink:type="simple">Anderson, B., Storlie, C. and Lane, T. (2012) Improving Malware Classification: Bridging the Static/Dynamic Gap. Proceedings of 5th ACM Workshop on Security and Artificial Intelligence (AISec), 3-14.</mixed-citation></ref></ref-list></back></article>