<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2022.104002</article-id><article-id pub-id-type="publisher-id">JCC-116438</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  A New Tag Index Scheme Enables Fast Peptide Retrieval for Protein Identification
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Piyu</surname><given-names>Zhou</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Xinhang</surname><given-names>Hou</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Haipeng</surname><given-names>Wang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>School of Computer Science and Technology, Shandong University of Technology, Zibo, China</addr-line></aff><pub-date pub-type="epub"><day>06</day><month>04</month><year>2022</year></pub-date><volume>10</volume><issue>04</issue><fpage>14</fpage><lpage>23</lpage><history><date date-type="received"><day>11,</day>	<month>March</month>	<year>2022</year></date><date date-type="rev-recd"><day>8,</day>	<month>April</month>	<year>2022</year>	</date><date date-type="accepted"><day>11,</day>	<month>April</month>	<year>2022</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Sequence tag index in the field of computational proteomics can be used to facilitate faster open-search-based identification of modified peptides and in-depth analysis of mass spectrometry data. In protein-identification search engines, sequence tag index are playing a prominent role in recent ten years due to fast searching speed. However, in pursuit of less index space consumption, some protein search engines design excessively concise index schemes which lead to higher computational burden. We proposed a new tag index scheme named TIIP with a better balance between space and time complexity. TIIP has a unique two-level hierarchical index structure which allows rapid retrieval of all peptide sequences and their corresponding masses. Theoretically, the index space consumption of TIIP is not much higher compared to the typical tag index schemes, but the time complexity of sequence retrieval can be reduced to O(1), and practically, TIIP has about one million fold improvement in searching speed compared with brute force approach.
 
</p></abstract><kwd-group><kwd>Proteomics</kwd><kwd> Mass Spectrometry</kwd><kwd> Sequence Tag</kwd><kwd> Inverted Index</kwd><kwd> Tag Index</kwd><kwd> Open Search</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><sec id="s1_1"><title>1.1. Sequence Tag</title><p>The concept of sequence tag was introduced by Mann et al. in 1994 [<xref ref-type="bibr" rid="scirp.116438-ref1">1</xref>]. It refers to the partial sequence of amino acids derived from a series of continuous fragment ions as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref> [<xref ref-type="bibr" rid="scirp.116438-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref5">5</xref>].</p><p>As an analysis strategy of mass spectrometry (MS) data in proteomics, it is an intermediate method between database search [<xref ref-type="bibr" rid="scirp.116438-ref6">6</xref>] - [<xref ref-type="bibr" rid="scirp.116438-ref19">19</xref>] and de novo sequencing</p><p>[<xref ref-type="bibr" rid="scirp.116438-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref10">10</xref>]. The sequence tag method can be used to prune exponentially larger search space in discovery proteomics to realize the effective filtering of candidate peptide sequences. Because of the characteristics above, search engines embrace the possibility of open search (expanding the search scope) without taking too much time [<xref ref-type="bibr" rid="scirp.116438-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref18">18</xref>]. In recent years, sequence tag method has been adopted by many modern protein search engines, such as Open-pFind, MODplus, and TagGraph, as an essential speeding up technique and become a fundamental peptide identification algorithms [<xref ref-type="bibr" rid="scirp.116438-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref18">18</xref>].</p></sec><sec id="s1_2"><title>1.2. Tag Index Design</title><p>Sequence tag index, or tag index for short, is a hash table that can quickly search related peptide sequences [<xref ref-type="bibr" rid="scirp.116438-ref20">20</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref21">21</xref>]. It is also an inverted index for search engines to accomplish rapid retrieval. The tag index, in essence, is a set of key-value pairs with a substring as the key, and the position of this substring in the original string as the value. In search engine technique, querying with inverted index is a commonly used approach. Compared to brute force with at least O(N) time complexity, inverted index method could transform a sequence tag to a hash code, which is related to an index entry and helps retrieve tag-containing sequences in O(1). As shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>, exhaustively sequential substring matching can be avoided by using tag index, and search time of sequence tag can be reduced [<xref ref-type="bibr" rid="scirp.116438-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.116438-ref18">18</xref>].</p><p>In order to facilitate the implementation of string matching algorithm, TagGraph constructs a database index structure by suffix array, in which the structured protein sequence information is called FM-indexed protein [<xref ref-type="bibr" rid="scirp.116438-ref18">18</xref>]. The other two search engines, Open-pFind and MODplus, adopt the scheme of k-length tag index, so they can find the corresponding peptide sequence information according to the k-length tags. In the index entries of Open-pFind, it only records the protein ID and starting position of tag in protein sequence to compress the memory space [<xref ref-type="bibr" rid="scirp.116438-ref7">7</xref>]. MODplus goes further in this direction. In MODplus, all protein sequences are concatenated into one linearized sequence delimited with</p><p>the character “$”. Therefore, tag index of MODplus only needs to store tag’s start position at the linearized sequence string, so as to further compress the memory space occupied by inverted index [<xref ref-type="bibr" rid="scirp.116438-ref13">13</xref>]. The offset addresses of tags in Open-pFind and MODplus are encoded by dictionary order.</p><p>Both Open-pFind and MODplus compress the index information as much as possible in order to save memory space and complete the rapid search of tags. However, when retrieving sequences containing any of extracted tags, they all have unnecessary operations of amino acid traversal and mass calculation because of the design problem of tag index, which undoubtedly increases the cost of time. Nowadays, with the advances of computer hardware technology, memory usage of ordinary mass spectrometry data analysis is no longer a problem even for personal computer. Therefore, we should focus on how to retrieve the sequence more quickly.</p><p>Here, we introduce the new scheme named TIIP (Tag Index of Intact Proteins), a novel tag index design scheme. TIIP has a unique two-level index structure, which makes tag index and protein database cooperate effectively in search. Based on the design of TIIP, we can rapidly retrieve all peptide sequences that satisfy user-specified parameters, and get the theoretical masses information of peptide sequences quite fast.</p></sec></sec><sec id="s2"><title>2. Design of TIIP</title><sec id="s2_1"><title>2.1. Analysis of Sequence Retrieval Approach</title><p>Index design in search engine will affect the workflow design and efficiency of peptide sequence retrieval. The open search strategy in Open-pFind is to retrieve all the peptide sequences containing at least one extracted tag by extending this tag sequences from its both ends. This strategy can be compatible with semi-specific and non-specific searches.</p><p>The search strategy of MODplus will change with user-specified parameters. When searching for non-specific peptide sequences with MODplus, the search space of sequence is similar to the one considered by Open-pFind.</p><p>However, it should be noted that, as shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>, it is very uneconomic to directly retrieve a large number of non-specific peptide sequences at the beginning</p><p>of search, which will lead to a large amount of time wasted on enumerating incorrect peptide sequences and iterating over amino acids. Theoretically, assuming that average sequence length is L, the time complexity of retrieving all sequences is O(L<sup>2</sup>) when performing non-specific sequences, while the time complexity of retrieving only fully specific sequences is O(1). But practically the implementation of index structures in Open-pFind and MODplus didn’t use cleavage information. Therefore, to find the cleavage sites in such index design schemes, even if only retrieving full-specifically digested sequences would take as much time as enumerating non-specific sequences.</p><p>The design and implementation of index structure and search flow will affect each other, so considering the needs of search can help index design. If retrieving all peptide sequences within the mass tolerance, lots of non-specific peptide sequences would reduce the efficiency of identification. From another point of view, the difference between fully specific and non-specific sequence from one protein sequence is several terminal amino acids, so we could regard terminal mass shifts as special-mass modifications, whose masses equal to the cumulative masses of terminal amino acids. Based on that, we can consider retrieving only fully specific sequences and dividing the search flow into two stages. The first one is using tag index to obtain candidate fully specific peptide sequence set and score to filter, while the second one includes precursor mass difference calculation, semi-specific/non-specific sequence detection and other operations for candidate peptide sequences. In view of previous analysis, we propose that TIIP is designed as a scheme for fast retrieving full-specifically digested peptide sequences, and moving the semi-specific and non-specific detection to the later stage, transferring the identification pressure of semi-specific and non-specific digested peptide sequences from the sequence retrieval stage to the subsequent stage.</p></sec><sec id="s2_2"><title>2.2. Tag Index of Intact Proteins</title><p>In a word, the tag index and protein database structure we designed can rapid retrieve peptide sequences and fast calculate the difference between monoisotopic masses of precursor ions and theoretical masses of peptide sequences.</p><p>In order to compress the index space consumption, we use a strategy that does not need explicitly generate peptide sequences for database searching, just like the designs of Open-pFind and MODplus. The corresponding cleavage sites are recorded as a part of tag index structure. Consequently, we can retrieve fully specific peptide sequences immediately. Additionally, in order to calculate the theoretical masses of peptide sequences fast, the theoretical sequence masses are recorded in the tag index entries. Therefore, each index entry formed in this way needs three integers and one floating number: protein ID, left cleavage site, right cleavage site and theoretical mass of related peptide sequence. As shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>, each tag should store the corresponding protein and peptide sequence information in related index entry in turn that matches the setting of missing cleavage sites number.</p></sec><sec id="s2_3"><title>2.3. Refinement of TIIP</title><p>After finishing the initial design of tag index, further refinement is required. As shown in <xref ref-type="fig" rid="fig5">Figure 5</xref>, we consider refining the format of sequence records as follows, including cleavage sites, theoretical masses etc.</p><p>Firstly, the information of cleavage sites is stored in protein database, and each protein sequence corresponds to a list of all cleavage sites. Among them, in order to acquire peptide sequences, we need to store the offset addresses, which include zero and length of protein sequence, into the lists of cleavage sites. This refinement makes checking missing sites number become convenient, because any two closest sites of one protein sequence are adjacent in one list so their offset addresses are also adjacent.</p><p>Secondly, according to the list of cleavage sites belonging to each protein sequence, we calculate the cumulative mass of amino acid residues between two adjacent cleavage sites, and similarly store this masses information as a list so that we can calculate theoretical sequence masses so fast. Obviously, the length of cumulative mass list is equal to the length of related cleavage sites list minus one.</p><p>Finally, we only record protein ID, and offset addresses of the cleavage sites at the N-terminus and C-terminus of a peptide, and these two sites are the closest two to the tag in a protein sequence. Each index entry points to offset addresses of cleavage sites pair in list, so this design makes the tag index and protein database form a two-level index.</p></sec><sec id="s2_4"><title>2.4. Sequence Retrieval of TIIP</title><p>As shown in <xref ref-type="fig" rid="fig6">Figure 6</xref>, when retrieving all possible peptide sequences, a sequence tag can be used to immediately acquire the shortest full-specifically digested sequence. Then, we extend cleavage sites of the shortest and enumerate all possible sequences with allowed number of missing cleavage sites. At the same time, we calculate the theoretical masses of the generated sequences and check if they are within the specified mass tolerance.</p><p>Theoretically, compared with the tag index schemes in Open-pFind and MODplus, TIIP scheme avoids unnecessary enumeration of amino acids when retrieving sequences. If the number of missing cleavage sites is fixed, the time complexity of retrieving sequence can be reduced to O(1), and the corresponding memory space consumption is acceptable.</p></sec></sec><sec id="s3"><title>3. Experiment and Result</title><sec id="s3_1"><title>3.1. Baseline, Dataset and Parameters</title><p>Because Open-pFind and MODplus do not open source code or independently executable index components, it is inconvenient for us to conduct direct comparison test. Here, we only test and show the performances of TIIP scheme and brute force. The TIIP tested in this paper is implemented in Python.</p><p>The database in test with 20,350 human proteins was downloaded from UniProt on March 29, 2020. From the downloaded database, we randomly chosen 10,000 non-redundant peptide sequences and generated corresponding simulated mass spectra as the test dataset. <xref ref-type="fig" rid="fig7">Figure 7</xref> shows an example of simulated spectrum.</p><p>Some parameter settings, software and hardware environment during the test are shown in <xref ref-type="table" rid="table1">Table 1</xref> and <xref ref-type="table" rid="table2">Table 2</xref>.</p></sec><sec id="s3_2"><title>3.2. Memory Space and Time Cost of Tag Index</title><p>In the environment and parameter settings mentioned above, we tested the space consumption of TIIP index design scheme and the time cost of index generation. The memory space consumption of 5-tag index (index with 5-length tag) is acceptable, while index generation has fast speed. The details are shown in <xref ref-type="fig" rid="fig8">Figure 8</xref> and <xref ref-type="fig" rid="fig9">Figure 9</xref>.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Key parameters of sequence retrieval</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Property</th><th align="center" valign="middle" >Value</th></tr></thead><tr><td align="center" valign="middle" >Tag Length</td><td align="center" valign="middle" >5</td></tr><tr><td align="center" valign="middle" >Precursor Mass Window</td><td align="center" valign="middle" >[−500, 500]</td></tr><tr><td align="center" valign="middle" >Max Missed Cleavage Number</td><td align="center" valign="middle" >3</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Hardware and software environment</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Property</th><th align="center" valign="middle" >Value</th></tr></thead><tr><td align="center" valign="middle" >CPU</td><td align="center" valign="middle" >Intel(R) Core(TM) i5-4210M CPU @ 2.60 GHz</td></tr><tr><td align="center" valign="middle" >Memory (RAM)</td><td align="center" valign="middle" >16.0 GB DDR3L 1600 MHz</td></tr><tr><td align="center" valign="middle" >Operating System</td><td align="center" valign="middle" >Windows 10 64 bit Professional</td></tr></tbody></table></table-wrap></sec><sec id="s3_3"><title>3.3. Time Cost of Peptide Sequences Retrieval</title><p>This test is in full enumeration method which means that all full-specifically digested sequences are retrieved. We compared the time consumption of full traversal (brute force method) and TIIP scheme (searching with inverted index) using 5-length sequence tags. The result details are shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>0.</p><p>If using brute force method, the average time cost of only one spectrum is 2156.59 s. If we use TIIP scheme, the total time consumption for 10,000 spectra is only 60.73 s which includes the time (31.09 s) consumed in the process of database and index generation. The average time cost for a single spectrum is only 0.006 s. If the scale of spectra is larger, the database building time will be further diluted so that the average single spectrum time cost will be shorter.</p><p>It can be found that the search based on TIIP is actually very fast, which thanks to the enumeration of only full-specifically digested peptide sequences, and compared with brute force method, TIIP has about one million fold improvement in searching speed.</p></sec></sec><sec id="s4"><title>4. Conclusion</title><p>We designed TIIP scheme using two-level index structure with a tag hash table and a pre-digested protein database. When searching with TIIP, users can fast acquire candidate peptide sequences only restricted by the number of missed cleavage sites and open search mass window. Compared with the index design of Open-pFind and MODplus, TIIP scheme can avoid a large number of unnecessary enumeration of non-specifically digested peptide sequences, and thus save search time. TIIP scheme is more applicable to the cases of full-specifically digested peptides identification, while semi-specifically and non-specifically digested peptides are supported, too. For TIIP, after scoring and pruning, the candidate peptide sequences’ number will be considerably reduced. Thus, the identification of semi-specific and non-specific peptide sequences only need to be determined by the flanking masses of hitting sequence tags. Therefore, the computational costs of finding semi-specific and non-specific peptides are reduced.</p><p>According to the current research progress, we will continue further development in tag index design for better performance of any protein search engines.</p></sec><sec id="s5"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s6"><title>Cite this paper</title><p>Zhou, P., Hou, X. and Wang, H. (2022) A New Tag Index Scheme Enables Fast Peptide Retrieval for Protein Identification. Journal of Computer and Communications, 10, 14-23. https://doi.org/10.4236/jcc.2022.104002</p></sec></body><back><ref-list><title>References</title><ref id="scirp.116438-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Mann, M. and Wilm, M. (1994) Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags. Analytical Chemistry, 66, 4390-4399.https://doi.org/10.1021/ac00096a002</mixed-citation></ref><ref id="scirp.116438-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Tabb, D.L., Saraf, A. and Yates, J.R. (2003) GutenTag: High-Throughput Sequence Tagging via an Empirically Derived Fragmentation Model. Analytical Chemistry, 75, 6415-6421. https://doi.org/10.1021/ac0347462</mixed-citation></ref><ref id="scirp.116438-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Aebersold, R. and Mann, M. (2003) Mass Spectrometry-Based Proteomics. Nature, 422, 198-207. https://doi.org/10.1038/nature01511</mixed-citation></ref><ref id="scirp.116438-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Sunyaev, S., Liska, A.J., Golod, A. Shevchenko, A. and Shevchenko, A. (2003) MultiTag: Multiple Error-Tolerant Sequence Tag Search for the Sequence-Similarity Identification of Proteins by Mass Spectrometry. Analytical Chemistry, 75, 1307-1315. https://doi.org/10.1021/ac026199a</mixed-citation></ref><ref id="scirp.116438-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Patterson, S.D. and Aebersold, R.H. (2003) Proteomics: The First Decade and Beyond. Nature Genetics, 33, 311-323. https://doi.org/10.1038/ng1106</mixed-citation></ref><ref id="scirp.116438-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Bern, M., Kil, Y.J. and Becker, C. (2012) Byonic: Advanced Peptide and Protein Identification Software. Current Protocols in Bioinformatics, 40, 13.20.1-13.20.14.https://doi.org/10.1002/0471250953.bi1320s40</mixed-citation></ref><ref id="scirp.116438-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Chi, H., et al. (2018) Comprehensive Identification of Peptides in Tandem Mass Spectra Using an Efficient Open Search Engine. Nature Biotechnology, 36, 1059-1061.</mixed-citation></ref><ref id="scirp.116438-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Fu, Y., et al. (2004) Exploiting the Kernel Trick to Correlate Fragment Ions for Peptide Identification via Tandem Mass Spectrometry. Bioinformatics, 20, 1948-1954.</mixed-citation></ref><ref id="scirp.116438-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Na, S., Bandeira, N. and Paek, E. (2012) Fast Multi-Blind Modification Search through Tandem Mass Spectrometry. Molecular &amp; Cellular Proteomics, 11, M111.010199. https://doi.org/10.1074/mcp.M111.010199</mixed-citation></ref><ref id="scirp.116438-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Tanner, S., et al. (2005) InsPecT: Identification of Posttranslationally Modified Peptides from Tandem Mass Spectra. Analytical Chemistry, 77, 4626-4639.https://doi.org/10.1021/ac050102d</mixed-citation></ref><ref id="scirp.116438-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Wang, X., Li, Y., Wu, Z., Wang, H., Tan, H. and Peng, J. (2014) JUMP: A Tag-Based Database Search Tool for Peptide Identification with High Sensitivity and Accuracy. Molecular &amp; Cellular Proteomics, 13, 3663-3673. https://doi.org/10.1074/mcp.O114.039586</mixed-citation></ref><ref id="scirp.116438-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Yates, J.R., Eng, J.K., McCormack, A.L. and Schieltz, D. (1995) Method to Correlate Tandem Mass Spectra of Modified Peptides to Amino Acid Sequences in the Protein Database. Analytical Chemistry, 67, 1426-1436. https://doi.org/10.1021/ac00104a020</mixed-citation></ref><ref id="scirp.116438-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Na, S., Kim, J. and Paek, E. (2019) MODplus: Robust and Unrestrictive Identification of Post-Translational Modifications Using Mass Spectrometry. Analytical Chemistry, 91, 11324-11333. https://doi.org/10.1021/acs.analchem.9b02445</mixed-citation></ref><ref id="scirp.116438-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Kong, A.T., Leprevost, F.V., Avtonomov, D.M., Mellacheruvu, D. and Nesvizhskii, A.I. (2017) MSFragger: Ultrafast and Comprehensive Peptide Identification in Mass Spectrometry-Based Proteomics. Nature Methods, 14, 513-520.https://doi.org/10.1038/nmeth.4256</mixed-citation></ref><ref id="scirp.116438-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Wang, L.H., et al. (2007) pFind 2.0: A Software Package for Peptide and Protein Identification via Tandem Mass Spectrometry. Rapid Communications in Mass Spectrometry, 21, 2985-2991. https://doi.org/10.1002/rcm.3173</mixed-citation></ref><ref id="scirp.116438-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Li, D., et al. (2005) pFind: A Novel Database-Searching Software System for Automated Peptide and Protein Identification via Tandem Mass Spectrometry. Bioinformatics, 21, 3049-3050. https://doi.org/10.1093/bioinformatics/bti439</mixed-citation></ref><ref id="scirp.116438-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Chi, H., et al. (2015) pFind-Alioth: A Novel Unrestricted Database Search Algorithm to Improve the Interpretation of High-Resolution MS/MS Data. Journal of Proteomics, 125, 89-97. https://doi.org/10.1016/j.jprot.2015.05.009</mixed-citation></ref><ref id="scirp.116438-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Devabhaktuni, A., et al. (2019) TagGraph Reveals Vast Protein Modification Landscapes from Large Tandem Mass Spectrometry Datasets. Nature Biotechnology, 37, 469-479. https://doi.org/10.1038/s41587-019-0067-5</mixed-citation></ref><ref id="scirp.116438-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Craig, R. and Beavis, R.C. (2004) TANDEM: Matching Proteins with Tandem Mass Spectra. Bioinformatics, 20, 1466-1467. https://doi.org/10.1093/bioinformatics/bth092</mixed-citation></ref><ref id="scirp.116438-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Zhou, C., et al. (2010) Speeding up Tandem Mass Spectrometry-Based Database Searching by Longest Common Prefix. BMC Bioinformatics, 11, Article No. 577.https://doi.org/10.1186/1471-2105-11-577</mixed-citation></ref><ref id="scirp.116438-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Lu, B. and Chen, T. (2003) A Suffix Tree Approach to the Interpretation of Tandem Mass Spectra: Applications to Peptides of Non-Specific Digestion and Post-Translational Modifications. Bioinformatics, 19, ii113-ii121.https://doi.org/10.1093/bioinformatics/btg1068</mixed-citation></ref></ref-list></back></article>