Bioinformatics: Piercing and Pinpointing Moonlighting Proteins and Beyond

William (Woosung) Jung, Grade 11, ISEC 2015 Second Place Winner

——–

Biologists build upon prior knowledge in order to achieve great scientific discoveries and developments. The laboratory was the crucible of scientific inquiry, an arena where hypotheses and theories were answered and tested. However, since the arrival of the computer, another field has evolved which bolsters the abilities of scientists in both scope and speed: bioinformatics. According to (Luscombe, Greenbaum, & Gerstein, 2001), “bioinformatics [conceptualizes] biology in terms of macromolecules…and then applies “informatics” techniques (derived from disciplines such as applied math, computer science, and statistics) to understand and organize the information associated with these molecules on a large scale.” What once took scientists grueling hours and massive funding may now be achieved through Perl coding and C++ programming in a matter of minutes. However, computer programming itself is only a tool that can be tapped by scientists. Tools may till the earth or chart the stars, but they also require human minds to guide them – to maneuver around rocks and to look beyond the sky. Evaluating the fields of biology and computer science, this paper seeks to probe the individual strengths and limitations of both fields as they aid in the research of protein multitasking, also known as moonlighting. Following their individual analysis, this paper discusses why bioinformatics offers one of the greatest opportunities for piercing the moonlighting mystery through an interdisciplinary approach. First, however, we shall discuss the meaning and significance of moonlighting proteins.

As discussed by (Jeffery, 2015), moonlighting proteins are a class of multifunctional proteins: the single protein is “capable of performing multiple physiologically relevant biochemical or biophysical functions that are not due to gene fusions, multiple RNA splice variants, or pleiotropic effects.” One way in which moonlighting proteins affect how our cells function can be observed when an enzyme moonlights in response to changes in cellular conditions, prompting changes in the regulation of a biochemical pathway such as glycolysis or, in plants, photosynthesis. By changing the biochemical pathway, moonlighting alters the catalyzing process of substrates and the products they form. But the roles of these proteins are not limited to regulating the translation and transcription of DNA and RNA; in fact, moonlighting proteins have expansive impacts on the broad spectrum of our biological world. Watanabe et al. (1991) observed that an enzyme in glycolysis, phosphoglucose isomeras, also serves as a cytokine that is involved in breast-cancer tumor metastasis. Examining these moonlighting proteins may thus enable our foray into the territory of uncharted diseases as well as corresponding vaccines and remedies, as stated by (Rasch et al., 2015). Understanding the various phases of moonlighting an enzyme undergoes, doctors could intervene in the metastasis of cancerous growths, dropping mortality rates to an unprecedented degree. Because of the field’s obscurity as of current time being, there is still a vast reservoir of potential medical and biological knowledge yet to be tapped. To sift through the nebulous clouds surrounding moonlighting proteins, however, biologists face the daunting task of parsing through massive quantities of data, as cited by (Hernández et al., 2014), “usually, [they] are revealed experimentally by serendipity, and the proteins described probably represent just the tip of the iceberg.” But luck cannot be deemed a solid foundation for scientific inquiry. Instead, as we learn more about moonlighting proteins, scientists are able to identify characteristics and their patterns with growing certainty, aiding in further discoveries. While this task of identifying the unknown attributes of moonlighting is a major challenge for the scientific and technological community, as discussed later, the whole process begins with biology.

Biologists around the world are akin to librarians in the study of our living world. Their chief onus, in the scope of this essay, falls under ontology, the specification and description of a natural object’s existence. In this way, (Felton, 2002) points out that biologists are responsible for cataloguing and partitioning their observations for further research and others’ reference:

“In the past, biologists and chemists made improvements in their laboratory methods, making dramatic leaps in understanding, but also enabling other scientists to discover new information based on the improved techniques. Take, for example, the polymerase chain reaction. This discovery cobbled together existing information, but it became a tool that thousands of scientists have used to further their own study.” (p. 29)

Unfortunately, this responsibility carries a heavy burden. One of the most significant hindrances faced by biologists and chemists who seek to build upon the findings of others is semantic variance – the unavoidable fact that language and descriptions are subjective and, thereby, are not always perfectly accurate (Sánchez, Batet, Martínez, & Domingo-Ferrer, 2015). This inaccuracy compounds the difficulty faced when attempting to research a subject as massive and nebulous as moonlighting proteins because it adds uncertainty to every ontological definition of a protein. Even so, it is that uncertainty that compels biologists to review and discover the unheard-of attributes of cells and their respective functions (Hernández, 2014). Especially since the completion of the Human Genome Project in 2001, their next leap would be either to develop “a novel program” or find “a unique way to couple existing program[s]” instead of holding on to the growingly inefficient wet laboratory style. In this vein, it is certain that “a significant part of the future of biology lies with computer methods.” (p. 29)

Through observations made by biologists who seek moonlighting proteins, hints and clues can be discovered that may never have been identified precisely because no other scientist was specifically looking for them. By broadening the focus, biologists sacrifice time in favor of scope; but now there is a tool that can help them save that most precious resource.

Computer programming and coding is not restricted to contributing to any particular field; it is a powerful multipurpose tool for the processing and analysis of vast quantities of data. A specific tool utilized by programmers with particular relevance to the perception of moonlighting proteins is called “data mining.” Data mining is the identification of statistical patterns, predictive models, and hidden relationships usually among massive amounts of data. This tool was primarily used for economical analysis, but many have posited that it could be applied to the field of biology to great effect. Indeed, the successes of this union between computational data analysis and biological study created the field of bioinformatics (Luscombe, 2001). The advantage of applying computer programming and coding to biomedical research through data mining is that it saves immense time, but also that it may help standardize the ontological descriptions of the human genome, further helping to advance the identification and understanding of moonlighting proteins. Yet the sole application of bioinformatics itself is inherently dependent upon the expertise of those who analyze the data it generates, which necessitates the presence of an individual or individuals who are well versed in programming as well as biological research (Raza, 2010). Indeed, one of the core reasons it is so important to unify biological and programming knowledge is because there is a finite amount of data that can be appraised by biologists themselves. As a result, some data sets that are processed may need to be discarded in order to allow memory and attention to be allocated to new, more promising inquiries. Unfortunately, computer programmers alone do not have the required expertise and scientific knowledge to make such decisions independently – thus necessitating the active engagement of biologists in the interdisciplinary field of bioinformatics.

By combining the field of biology with information technology, research groups have already begun compiling databases of moonlighting proteins with standardized descriptions and identifying attributes for the biomedical community to appraise (Hernández & Ferragut, 2014). The biologists “encountered the difficulty of collecting examples of [multitasking] proteins because of the lack of a broad database, so the effort to gather the examples was often one of the main challenges.” As a result of their compilation, the research group began realizing that the existence of moonlighting proteins may be even more significant than previously speculated. What if, for instance, the multitasking was not restricted to a pair of functions? Hernández and his team posited that there may be three or more moonlighting functions which, without their database or one like it, would require excessive time and no small luck to discover. Called “MultitaskProtDB,” their database lists over 288 moonlighting proteins, their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, and bibliographic references so that others are able to follow up on research which supports the database. The system itself was created using MySQL software to form the database while coding was written in PHP to help researchers refine their searches for particular moonlighting proteins within the database. At the same time, another database was created by (Mani et al., 2014) that also lists the known moonlighting proteins in a system called “MoonProt.” At first the creation of two databases may seem redundant, but there is certainly a benefit to this overlap. By creating these interrelated databases, Mani and Hernández’s projects offer a chance to compare and contrast the ontological accuracy of their proteins to help form standardized profiles for each of more than 200 proteins. It also provides constant comparison between the two databases, allowing bioinformatics to perceive what technology, functions, and biological queries are most promising or, in some cases, need to be improved for the advancement of the field.

The value of accuracy in both medical and scientific communities cannot be overstated, yet there are still challenges that must be faced by each field – whether alone or together. For biologists, the time-consuming and challenging process of identifying moonlighting indicators will be substantially augmented by the raw processing power of modern coding. However, even with state-of-the-art technology, databases are incapable of brute forcing through million lines of inconsequential data due to physical memory and storage constraints. To drastically improve the quality of data for increased efficiency, the cooperation of biologists are crucial in assisting in the perception of new moonlighting identifiers and proteins. While bioinformatics is easily one of the most groundbreaking interdisciplinary approaches to decrypting the human genome, it also empowers scientists, researchers, and doctors with one of the most important abilities in their professions: being able to inquire and seek knowledge. Through these databases, hypotheses can be tested and evaluated with nearly infinite scale and in unprecedented brevity. Representing the union of scientific minds and the flawless retention and lightning-fast accessibility of virtual storage, bioinformatics is a wonder to behold, and a key to the future of modern scientific and medical progress.

Works Referenced

Luscombe, N. M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? A proposed definition and overview of the field [Abstract]. Methods of Information in Medicine, (40), 4th ser., 346-358. Retrieved September 23, 2015, from http://www.ncbi.nlm.nih.gov/pubmed/11552348

Felton, M. J. (2002). Biologists: Get with the program! Modern Drug Discovery, 5(3), 28-30, 32. Retrieved September 22, 2015, from http://pubs.acs.org/subscribe/archive/mdd/v05/i03/html/03felton.html

Hernández, S., Calvo, A., Ferragut, G., Franco, L., Hermoso, A., Amela, I., … Cedano, J. (2014). Can bioinformatics help in the identification of moonlighting proteins? Biochemical Society Transactions, 42(6), 1692-1697. doi:10.1042/BST20140241

Hernández, S., Ferragut, G., Amela, I., Perez-Pons, J., Piñol, J., Mozo-Villarias, A., … Querol, E. (2014). MultitaskProtDB: a database of multitasking proteins. Nucleic Acids Research, 42(Database issue), D517–D520. http://doi.org/10.1093/nar/gkt1153

Rasch J., Theuerkorn M., Unal C., Heinsohn N., Tran S., Fischer G., et al. . (2015). Novel cycloheximide derivatives targeting the moonlighting protein Mip exhibit specific antimicrobial activity against Legionella pneumophila. Front. Bioeng. Biotech. Res. Top. 3:41. 10.3389/fbioe.2015.00041.

Raza, K. (2010). Application of data mining in bioinformatics. Indian Journal of Computer Science and Engineering, 1(2), 114-118. Retrieved September 22, 2015, from http://www.ijcse.com/docs/IJCSE10-01-02-18.pdf

Sánchez, D., Batet, M., Martínez, S., & Domingo-Ferrer, J. (2015). Semantic variance: An intuitive measure for ontology accuracy evaluation. Engineering Applications of Artificial Intelligence, 39, 89-99. doi:10.1016/j.engappai.2014.11.012

Watanabe H., Carmi P., Hogan V., Raz T., Silletti S., Nabi I. R., et al. . (1991). Purification of human tumor cell autocrine motility factor and molecular cloning of its receptor. J. Biol. Chem. 266, 13442–13448.

Dartmouth Undergraduate Journal of Science

Leave a Reply Cancel reply