Language and Nature: Using Linguistic Analysis to Design New Antimicrobial Peptides

Bacterial resistance to antibiotics is a growing problem in hospitals today. Scientists are continually searching for ways to win this race against mutating bacteria, and a surprising solution may lie in a seemingly unrelated field—linguistics. Researchers at Massachusetts Institute of Technology created a linguistic model of existing antimicrobial peptides, treating the amino acid sequences as a formal language of their own, to design new proteins that nature has never seen. Many of these new synthetic peptides have shown antibacterial activity against bioterrorism agents and microorganisms notorious for their innate resistance to drugs (1). This new arsenal of antimicrobial peptides may be the key to overcoming bacterial resistance.Antimicrobial peptides (AmPs) are small proteins used by the non-specific, innate immune system to protect against bacteria, fungi, and viruses. They recognize and target features on the microbial cellular membrane that distinguish microbial cells from eukaryotic cells. Their net positive charge and amphipathic alpha-helical structure result in electrostatic attraction to the outer layer of the microbial membrane (2). The peptides can then cause microbial death by binding to and disrupting the membrane or binding to anionic cytoplasmic targets, a “multihit mechanism” (3). Recent studies have shown that these peptides are less prone to causing bacterial resistance, and they are active against a variety of fungi and pathogenic and opportunistic bacteria, including those resistant to several antibiotics (4, 5). Additionally, other research has found that AmPs can stimulate stronger adaptive immune responses by binding to toll-like receptor 4, activating immature dendritic cells and causing them to produce more costimulatory molecules (6), as well as induce anti-cancer activity in mice (7).

The amphipathic structure of AmPs is derived from the organization of amino acids into a general pattern. This led researchers to the conjecture that the AmP sequences could be modeled as a formal language whose vocabulary consisted of the set of twenty naturally- occurring amino acids, represented by their one-letter abbreviations (1).

Every language has a grammar, or underlying set of rules and categories, that describes the permitted arrangement of words (1). English, for example, consists of a set of distinct sentence structures that are used in writing and speech. Grammatical sentences are created by substituting “allowable” words into set formats. Words generally can be grouped by their function in the language, so the set of allowable words can be more easily described by what parts of speech may be used. Depending on the construction, the pool of possibilities may be limited to one part of speech or it may encompass several syntactic categories. An important concept to understand is that whether something “makes sense”—a colloquial description rather than a linguistic one—does not factor into the decision of whether something is “grammatical.” Not all of the words of a certain part of speech will “make sense” in a particular construction, but those words are still part of the allowable set; a nonsensical sentence may still be grammatical, as long as it follows the rules. Essentially, all that is necessary to create a “grammatical” sentence is to choose a base construction and then select words from the pool of possibilities for that particular construction to fill in the blanks.

An example may better illustrate this. One acceptable sentence format in the English language is “x went to y,” where x and y are placeholders for a number of different phrases. The sentence “John went to the library” is said to be “grammatical” because “John” is an allowable substitute for x and “the library” is acceptable for y in this particular construction. If the underlined words are replaced by other allowable phrases, a different—but still grammatically correct—sentence is created, such as “Kate went to the bookstore.”

Depending on what words are used, the sentences that are produced can be similar or vastly different. The two previous examples are fairly similar because they both describe a named human subject going to a location. However, this sentence format is not limited to that purpose alone, as is demonstrated in these three sentences: “The young children went to play”; “Mom and Dad went to eat dinner”; and “Stoic garlic knots went to choke the fire alarm.” The first two “make sense” while the third does not, but they are all grammatically acceptable. The set of allowable words after the phrase “went to” encompasses several different parts of speech, including nouns, intransitive verbs, and transitive verbs; in contrast, only nouns can appear at the beginning of the construction, but they can be singular or plural, as well as proper or common. Another way to modify sentences is to use synonyms with different connotations. For example, using “the kids” in place of “the young children” changes the style, or level of formality, of the sentence. As demonstrated, sentences can vary in structure, complexity, and meaning when different words are used.

In the language of antimicrobial peptides, protein sequences are analogous to sentences and the individual amino acids to the words in those sentences. The equivalents of set sentence formats in English are the conserved sequences of amino acids that occur in many natural AmPs. For example, the pattern QxEAGxLxKxxK, where ‘x’ is any amino acid, is found in more than 90% of the insect AmPs known as cecropins. The MIT researchers used a pattern discovery algorithm called Teiresias to examine 526 eukaryotic AmPs from the Antimicrobial Peptide Database (8), and they discovered 684 commonly- occurring regular constructions, or common arrangements of amino acids. Each construction was designed to be ten amino acids long and specific to AmPs (1); at least 80% of the matches for each pattern in Swiss-Prot/TrEMBL, a protein sequence database of which the Antimicrobial Peptide Database (APD) is a subset (9), are found in peptides annotated as antimicrobial peptides (1).

Here is an example of a pattern they discovered: P[KAYS][ILN][FGI]C[KPSA][IV][TS] [RKC][KR]. The bracketed expression [KAYS] indicates that, at the second position in the construction, only the four amino acids specified are acceptable: lysine (abbreviated K), alanine (A), tyrosine (Y) or serine (S). This is analogous to how “John,” “Kate,” “the young children,” and “Mom and Dad” are equally acceptable substitutes for x in the sentence format “x went to y,” but an adverb is not. The frog AmP brevinin-1E contains the amino-acid sequence fragment PKIFCKITRK, which matches this pattern and is thus considered “grammatical” (1).

The researchers chose to synthesize unnatural AmPs twenty amino acids in length because 20-mers were easy to synthesize and 20 was close to the median length of AmPs in the APD. The peptide sequences were formed by overlapping patterns so that all windows of size ten in the sequence were grammatical. From this set, all peptides that had six or more amino acids in a row in common with a natural AmP were removed. Finally, the remaining peptides were grouped by similarity, and a representative set of forty-two unnatural AmPs were chosen to be synthesized and then tested against different microbes (1).

To test the importance of being “grammatical,” counterparts of these forty-two sequences were then designed for comparison by shuffling the order of the amino acids such that the sequences did not match any of the constructions. These new peptides had the same amino- acid composition as their grammatical counterparts and thus, the same molecular weight, charge, and isoelectric point, which are factors often correlated with antimicrobial activity. The researchers hypothesized that because the shuffled sequences were “ungrammatical” they would have no antimicrobial activity, despite having the same bulk physiochemical characteristics. Additionally, eight AmPs from the APD were selected as positive controls and six peptides of length twenty were randomly selected from the middle of non-antimicrobial peptides as negative controls (1).

The antimicrobial activity of each synthetic peptide was determined using a broth microdilution assay, which measures the minimum inhibitory concentration (MIC), i.e. the minimum concentration at which the peptide inhibits growth of the target organism. Of the grammatical peptides, eighteen had an MIC of 256 !g/ml or less against one of the bacterial targets, whereas only two of the ungrammatical peptides exhibited antibacterial activity; this demonstrates that antibacterial activity is not due to molecular weight, charge, or isoelectric point. In the controls, six out of eight of the natural AmPs showed antibacterial activity, while none of the six peptides from non-antimicrobial peptides did. The results of the broth microdilution assay were confirmed by plating the samples; bacterial samples were generally not viable when treated with peptides at concentrations above the MIC (1).

Further research was conducted with the seven most potent synthetic peptides and their shuffled counterparts for comparison. Compounds that are effective against Gram-positive bacteria are of particular interest and significance today because of the threats posed by bioterrorism agents such as Bacillus anthracis, which causes anthrax, and drug-resistant Staphylococcus aureus, which commonly causes nosocomial, or hospital- acquired, infections (1).

All seven grammatical peptides were active against strains of S. aureus and B. anthracis at concentrations of 256 !g/mL or less, compared to only one of the ungrammatical peptides. Furthermore, two highly active peptides, D28 and D51, had had MICs of 16 !g/mL against B. anthracis, which is equivalent to the activity of Cecropin-melittin hybrid, a strong natural AmP. D28 also had an MIC of 8 !g/mL against S. aureus (1).

Since D28 seemed to be the most potent and promising of the synthetic AmPs, the researchers tried to further optimize its antibacterial properties. Mutations were induced in D28 to try to increase positive charge, increase hydrophobicity, remove interior proline residues (as praline is a helix-breaker), or improve separation of positive and hydrophobic residues based on the helical structure. Forty-four selected variants were tested against E. coli, S. aureus, and B. cereus, and eighteen showed improved activity. Replacement of an internal proline with lysine or glycine was a common theme in those with improved activity against B. cereus. One particular variant, R8, had MICs of 16 µg/ml against E. coli, 8 µg/ml against B. cereus and 4 !g/ml against S. aureus, relative to 64, 16, and 8 µg/ml, respectively, for D28 (1).

The researchers at MIT hypothesized that this linguistic approach was so successful because of the modular nature of natural AmP amino acid sequences. This approach is promising because new peptides can now be rationally synthesized without using structure-activity information or complex simulations of protein-membrane interactions. Also, these new proteins bear limited homology to known proteins (1), which is advantageous because clinical use of AmPs that are similar to human AmPs will most likely elicit bacterial resistance to natural human defenses (10). Although they still maintain the properties required to be peptides, they populate a previously unexplored region of protein sequence space.

However, the researchers do admit certain constraints to this new system. Some sequence families are not well-conserved on the amino acid level, and since this linguistics approach is designed to search for patterns among amino acids, these potentially useful constructions would escape notice. Additionally, this approach is better catered to forming small proteins, simply due to the nature of regular grammars, so it would not be as useful with designing larger proteins, especially those with complex tertiary or quaternary structures (1).

The MIT group is one of several currently pioneering the field of synthetic antimicrobial peptides. The concept of synthesizing novel antimicrobial compounds has been around for several years. In 2005, Robert E. W. Hancock and his colleagues at the University of British Columbia designed optimized AMPs based on a linearized variant of the bovine peptide bactenecin (11). Scientists are no strangers to the search for new ways to combat bacterial resistance using peptides; however, including linguists in this fight may lead to surprising results in the near future.

References
1. C. Loose, K. Jensen, I. Rigoutsos, G. Stephanopoulos, Nature 443, 867 (2006).
2. A. Giangaspero, L. Sandri, A. Tossi, Eur. J. Biochem. 268, 5589 (2001).
3. Y. Shai, Biopolymers 66, 236 (2002).
4. R.E. Hancock, A. Patrzykat, Curr. Drug Targets Infect. Disord. 2, 79 (2002).
5. E. Tiozzo, G. Rocco, A. Tossi, D. Romeo, Biochem. Biophys. Res. Commun. 249, 202 (1998).
6. A. Biragyn et al., Science 298, 1025 (2002).
7. H. M. Ellerby et al., Nature Med. 5, 1032 (1999).
8. Z. Wang, G. Wang, Nucleic Acids Res. 32, D590 (2004).
9. A. Bairoch, R. Apweiler, Nucleic Acids Res. 28, 45 (2000).
10. G. Bell, P.-H. Gouyon, Microbiology 149, 1367 (2003).
11. K. Hilpert, R. Volkmer-Engert, T. Walter, R. E. W. Hancock, Nat. Biotech. 23, 1008 (2005).
12. The author would like to thank Professor Thomas Ernst for his consideration and guidance with the linguistic discussion in this manuscript.

Leave a Reply

Your email address will not be published. Required fields are marked *