Lucas James – Speech Surrogates

Enphrasing isn’t just about disambiguation

In this post, I want to talk about enphrasing, a phenomenon of abridging surrogate systems that Stern (1957) defines simply as: ‘the lexical unit is replaced with a phrase’. This phenomenon is attested in systems across Central and West Africa (Finnegan 2012), in the Amazon (Seifart, Meyer et al. 2018), and Southeast Asia (e.g. Bradley 1979). I have not found any evidence of enphrasing in North America.

Enphrasing substitutes shorter expressions for longer ones, sometimes directly elaborating on the original word or phrase. Here’s an example from Kele, which I’ll continue to draw from below: in drummed Kele, the word songe ‘moon’ is replaced with a sequence meaning ‘the moon looks down on the earth’ (Carrington 1949:33). There’s a classical analogue in the Old Norse kenning, which replaced words like vargr ‘wolf’ with elaborations like svalg áttbogi ylgjar ‘the evil off-spring of the she-wolf’ for aesthetic purposes.

Enphrasing is often explained functionally as a way of disambiguating surrogate speech. Because abridging systems encode a limited amount of phonological information, utterances with similar properties (e.g. the same tone pattern) can often be homophonous. Enphrasing is said to help with this problem: as explained in Seifart et al. (2018), “short words that would come out as homophones in drumming are replaced by longer, less ambiguous expressions, often with poetic creativity.” This explanation dates back to the coining of the term itself in Stern (1957).

But I think there’s another dimension to enphrasing that needs to be mentioned alongside disambiguation: roteness. It’s no earth-shattering observation, but I think it’s useful to include as part of the framing: enphrased sequences can’t just be longer phrases, they also have to be rote expressions in common use.

Here’s a neat example of what I mean using the same Kele word as I mentioned before, songe ‘moon’. According to Carrington, both songe and koko ‘fowl’ would be drummed on their own as two high-toned strokes. He reports an enphrased sequence for koko that corresponds to the phrase ‘the fowl, the little one which says kiokio’. With enphrasing, the homophony between songe and koko is eliminated.

But hold on—couldn’t ‘the moon looks down at the earth’ just as easily be ‘the fowl looks down at the earth’? On the drum, the phrases should be homophonous, beginning with two high-toned strokes. Semantically speaking, a flighted bird is no less likely a candidate to look down at the earth than the moon. Why are these enphrased sequences not just as ambiguous as the original word pair?

The answer isn’t a big revelation, but I think it’s important. These are rote phrases, understood by a community of practice to be part of their existing repertoire. Enphrasing doesn’t just elaborate single words into longer phrases. It also pares down the universe of possible correspondences to a cognitively manageable level, ensuring practitioners and their audiences are drawing from a shared reservoir of mutually understood, appropriate language.

Why is this framing important? For one thing, it’s useful to be reminded that speech surrogacy is as much a cultural phenomenon as it is a linguistic one. The practitioners of a given system share a language, a traditional music, a regional or ethnic identity, perhaps even a familial tie. The same goes for their non-practicing audience. The lexicon of a system—the words that are common enough to have an enphrased sequence, and the content of enphrased sequences themselves—is shaped by this shared language and culture. Notably, Central and West African systems are not only rich with proverbial expressions drawn from pre-existing oral literature but are often seen to form part of that oral literature itself (Finnegan 2012). It’s worth it to consider the cultural role that enphrasing plays, not only its functionality.

More expansively, I also think it helps guide our research into broader questions about roteness in surrogate speech. One of the fundamental typological questions that I am most interested in is: what circumstances produce novel utterances in surrogate speech? It’s clear that surrogate systems have different levels of productivity. Differences in the use of enphrasing may help explain these disparities.

Take the Dagaare gyil tradition. Musically, it is a living tradition, and individual performers certainly compose new pieces and embellish the existing repertoire. But linguistically, is a good example of a system that is currently pretty much unproductive. Its traditional performance is restricted to an existing repertoire of texts; only the “progression of variations [is] left to the performer’s discretion”, and even that is somewhat restrained to a traditional ordering (Campbell 2005:48).

Michael Vercelli illustrates this in the gyil’s role in Dagaare funerals:

“[The] gyil player will directly address the participants through the use of understood phrases spoken on the instrument … Just as a contemporary wedding DJ would choose songs appropriate for the bride and groom’s first dance, the gyil player must select songs appropriate for the specific funeral ceremony.”

—(Vercelli 2012:3)

These “understood phrases” aren’t examples of enphrasing. They don’t replace short words with disambiguated rote expressions. They’re more like fossils, a repertoire of previously generated phrases encased in musical amber. This intrigues me, because I have worked with two Dagaare surrogate practitioners—a gyil player and a whistler—who effortlessly produced novel utterances under elicitation conditions. It’s not impossible to come up with a phrase like “the good red guinea fowl” or “I was walking” and produce it within the Dagaare surrogate system. However, my consultants have always caveated these attempts with words like “artificial”, “experiment”, and “unnatural”. It seems that the surrogate system is intact, but the act of generating novel phrases has been pushed out of the scope of the tradition as practiced.

So while it isn’t enphrasing by definition, I want to argue that this ‘fossilization’ is part of the same process that creates enphrasing. There is a set of functional and cultural pressures to rely on a small repertoire of disambiguated rote expressions: they are not only easier to process and understand, but can also serve to strengthen shared community ties. Dagaare traditional music seems to have responded to those pressures, allowing a rich tradition of recognizable rote expressions to take root.

What’s on the other end of the spectrum? As I mentioned before, I haven’t found any North American whistling systems that rely on elaborated stock phrases. The documentation I have read suggests that these systems are frequently used for short, spontaneous conversations. Familiarity is essential for comprehension; but, rather than being built in with a common repertoire of elaborated rote sequences, these systems arrive at familiarity primarily through everyday speech and context clues.

Here’s an example from a Tlaxcalan Spanish whistling system:

Tlaxcalans, like Gomeros, boast that they can whistle anything they can say. Generally, however, conversations are between family members friends, or neighbors, and are restricted to short which tends to put exchanges in familiar contexts. Such whistled requests or advice as “Bring me a shovel”, or “I’m going to the pueblo” are common.

A typical exchange between a farmer in his field and a friend on the road might consist of the following: “Pedro, ¿ a donde vas?” (“Pedro, where are you going?”). “A Tlaxcala” (“To Tlaxcala, the nearby capital city of the state). “ ¿ Porque?” (“Why?”). “A vender mis cebollas” (“To sell my onions”); and end with a whistled “Adios”.

—Wilken (1979:883)

Mazatec whistling, similarly, is “frequently (though not necessarily) concerned with topics immediately obvious to both parties … and used in situations where cultural context plays a much greater part” (Cowan 1948:283-4).

Neither Tlaxcalan Spanish nor Mazatec whistling allow for complex, sprawling dialogue, and familiarity is still essential to comprehension. But it’s obvious that these short, everyday phrases constitute an opposite strategy to a system like Dagaare’s: . These systems seem to be unaffected by the pressures towards disambiguated rote expressions, instead relying informally on cultural and situational context.

Many surrogate systems probably fall in between these extremes, employing a mixture of spontaneity and roteness. This middle ground is (finally) where enphrasing appears. At the risk of running long on this post, I should give a few examples, because it’s important to describe how enphrasing actually works: creating stock phrases that can be spontaneously combined in novel sequences. I’ll leave it to the reader to evaluate where their own speech surrogates of study fall on this continuum.

My first example is Bora manguaré drumming. Briefly, this Amazonian surrogate system has a ‘singing mode’, where a set of rote rhythmic phrases, associated with sung lyrics, are played as a form of musical performance. That’s just like the Dagaare gyil tradition. But manguaré also has a ‘talking mode … used to transmit relatively informal messages and public announcements” (Seifart et al. 2018:6). These messages take the form of enphrased sequences:

“nouns and verbs are marked with special disyllabic markers…On nouns, the marker –úβù is used…For verbs, the marker – is used… In drummed messages, these markers do not carry any semantic value, but function purely to identify the preceding sequences of beats as representing nouns or verbs…there are conventional long forms for words that occur frequently in manguaré messages to render them less ambiguous… for instance, the Bora noun referring to a commonly hunted deer species nììβúgwà is replaced in manguaré messages with ìámé-tùùtáβààbè néébá-nììβúgwà-úβù, literarily ‘deceased annatto deer, damaged animal’.”

—(ibid:9)

Even more interestingly, these enphrased sequences are embedded into a standard ‘frame’, helping to reduce the cognitive load of comprehension further:

—(ibid:6)

Manguaré shows how enphrasing can balance the need for flexible, spontaneous communication with the advantages of roteness. By elaborating words in identifiable units, situated clearly within a frame, enphrasing allows manguaré a lot of intelligibility without an extremely restrictive lexicon of immediate topics. Given this system, it’s easy to imagine, as Seifart and colleagues suggest, that “manguarés were in daily use [in the 20^th century], including for conversations about almost any subject and that messages were relayed from one roundhouse to another to reach further distances” (ibid:5).

Returning to Kele, there are many similarities here with Bora drumming, though the details differ. As with Bora drumming, Kele drumming employs a variety of communication strategies. There are documented cases of fully fixed paragraph-length sequences in this system. For instance:

“Another stock communication is the announcement of a dance, again with the drum speaking in standardized and repetitive phrases:

All of you, all of you,

come, come, come

let us dance

in the evening

when the sky has gone down river

down to the ground. [Carrington 1949: 61-2]”

—(Finnegan 2012:472)

It’s not apparent that these announcements are embedded in a ‘musical mode’ the way that rote expressions are in the Dagaare and Bora systems. However, Carrington certainly describes them as ‘standardized’, and associates them with cultural practices (like dances and funerals) in a similar way. This portion of the surrogate practice shows a strict reliance on elaborated rote phrases.

In addition, though, we see the famous Kele enphrased sequences in action. Of course, an enphrased sequence corresponding to a single word like ‘moon’ or ‘fowl’ is pragmatically useless without a larger context. And like Bora, that larger context is provided within a frame where multiple sequences can be combined to produce a novel utterance. Take this example (lightly edited from Carrington for ease of comparison), which is not associated with a traditional social function, instead announcing a news event and providing listeners with instructions:

English: ‘The missionary is coming up-river to our village tomorrow. Bring water and firewood to his house.’

Kele (spoken): Bosongo atoya ko nda bokenge wasu lɛlɛngo. eʃaka balia la toala ko nda ndakɔ yande.

Kele (drummed, with spoken equivalent):

bosongo olimo ko nda lokonda	‘white man spirit from the forest
wa lokasa lwa lonjwa	of the leaf used for roofs
atoya likolo atoya likolo	comes up-river, comes up-river
ko lɛlɛngɔ ekaliekele	when tomorrow has risen
likolo ko nda use	on high in the sky
ko nda likelenge liboki	to the town and the village
liaaka la iso	of us
yaku yaku yaku yaku	come, come, come, come
yatikeke balia ba lɔkɔila	bring water of lɔkɔila vine
yatikeke tokolokolo twa toala	bring sticks of firewood
ko nda ndakɔ ya tumbe elundu likolo	to the house with shingles high up above
ya bosongo olimo ko nda lokonda	of the white man spirit from the forest
wa lokasa lwa lonjwa	of the leaf used for roofs’

—Carrington (1949:54)

There are a few interesting points here. First is its spontaneity: given the context, it seems likely that this really is a fairly novel message. It’s not clear why the Kele tradition would have developed such a specific stock phrase out of whole cloth, though of course it’s possible. In any case there are multiple similar cases in Carrington’s documentation, showing that these phrases must have been generated quite readily and productively. This seems like a spontaneous message aided by the availability of relevant enphrased sequences.

The other interesting part is the evidence of a frame. While it’s not as clear cut as in the Bora drumming system, there are differences in the syntax of the drummed message compared to the spoken one, suggesting that the surrogate system has a preferred ordering.

Here’s an informal gloss to highlight the difference:

Spoken Kele: Missionary comes [up-river] to village our tomorrow. Bring water and firewood to house his.

Drummed Kele: Missionary comes up-river tomorrow to village our. Bring water bring firewood to house of missionary.

The main variation is with the verb yatikeke, which appears before both ‘firewood’ and ‘lɔkɔila vine’ only in the drummed phrase. If enphrasing simply replaced these two words with their elaborated equivalents, the conjunction could have remained intact. But it seems that the bring [noun] frame is preferred in this instance, just as it is in Bora drumming.

Without going further into the other differences—namely the restatement of the possessive pronoun’s referent and the change of position of the temporal phrase—I will say that enphrasing systems need to be subject to a lot more detailed analysis, syntactic and otherwise. I would bet that many more systems employ these tools, including the framing devices on display in Bora and Kele, than we’re aware of.

So while much has been written on the basic idea and content of enphrasing, I would like to see a much more detailed discussion around its actual mechanics and properties. I’m encouraged by the rigorous approach taken in papers like Seifart, Meyer, Grawunder, and Dentel’s work on Bora drumming. I hope this post may help inspire a more complex discussion in that vein.

—Lucas James

References

Bradley, D. “Speech through Music: the Sino-Tibetan Gourd Reed-Organ.” Bulletin of the School of Oriental and African Studies, vol. 42, no. 3, 1979, pp. 535–540., doi:10.1017/S0041977X00135773.

Campbell, Corinna Siobhan. Gyil music of the Dagarti people: Learning, performing, and representing a musical culture. Diss. Bowling Green State University, 2005.

Carrington, John F. “A comparative study of some Central African gong-languages”. Vol. 13. Académie royale des sciences d’outre-mer. Classe des sciences morales et politiques, 1949.

Cowan, George M. “Mazateco Whistle Speech.” Language, vol. 24, no. 3, 1948, pp. 280–286. JSTOR, www.jstor.org/stable/410362.

Finnegan, Ruth. Oral Literature in Africa, Open Book Publishers, 2012. ProQuest Ebook Central.

Godsey, Larry Dennis. “The Use of the Xylophone in the Funeral Ceremony of the Birifor of Northwest Ghana.” Diss. University of California, Los Angeles, 1980.

Lewis, T. Becoming a garamut player in Baluan, Papua New Guinea: Musical analysis as a pathway to learning, Taylor and Francis, 2018. doi:10.4324/9781315406503.

Seifart, Meyer et al. “Reducing language to rhythm: Amazonian Bora drummed language exploits speech rhythm for long-distance communication.” Royal Society open science vol. 5,4 170354. 25 Apr. 2018, doi:10.1098/rsos.170354

Stern, Theodore. “Drum and whistle languages: An analysis of speech surrogates.” American Anthropologist 59.3 (1957): 487-506.

Vercelli, Michael. “Ritual Communication Through Percussion: Identity and Grief Governed by Birifor Gyil Music.”.” DMA diss. University of Arizona, 2006.

Wilken, Gene C. “Whistle speech in Tlaxcala (Mexico).” Anthropos H. 5./6 (1979): 881-888.

Field Notes from Ghana, Part 1: Audio recording

Last December, I took a somewhat unusual data-gathering trip to Accra, Ghana. Rather than targeting a single language, the trip focused on a few of Ghana’s extant surrogate systems. Over the course of two weeks, I made field recordings that incorporated several languages, instruments, modalities, and methodologies.

This post is the first in an ongoing series of observations and discussion about that trip. Over this series I intend to describe some of my methodology and goals—and ultimately, some conclusions—in the hope that it helps others plan similar efforts. Today, I’ll give a basic overview and then dig into one of my current technical preoccupations: controlling dynamic range in surrogate language recordings.

Overview

In my limited time, I wanted to scratch the surface of as many available surrogate languages as possible. The intention was a sort of “surrogate language speed dating”, in which I would work with several practitioners and speakers for a few sessions at a time, using existing materials as an aide.

On this trip, I spent most of my time with Benjamin N., a practitioner of the Birifor and Dagara variants of the gyil surrogate tradition. Birifor and (northern) Dagara are typically considered distinct variants within the Dagaare dialect continuum, both associated with northern Ghana and southern Burkina Faso. My work with Benjamin expanded across both language varieties and several modalities: gyil or resonator xylophone, gaŋgaa or double-headed cylindrical drum, and whistling. I dedicated one or more elicitation sessions to each language-modality pairing, during which I would elicit both existing (proverbial) and novel (productive) material. I also worked with speakers of Eʋe, drawing on existing materials, and briefly with a Twi surrogate language practitioner. All told, these language-modality pairings add up to eight surrogate varieties, six from the Birifor/Dagara continuum.

Crucially, the goal was not to gain a full, nuanced picture of any system as a whole. Instead, I wanted a sketch: a small but usable list of existing surrogate words and phrases and their spoken equivalents (in the West African tradition, the target is mostly proverbial expressions), a fairly robust surrogate phonology, and a preliminary sense of the system’s productivity and flexibility.

There’s an obvious disadvantage to this approach: it doesn’t allow for a lot of confidence in any kind of larger scientific point to be drawn.

But I think there’s an advantage worth considering, given the topic at hand: just fifty or a hundred years ago, there’s reason to think these systems were widespread in many parts of the world. Now, they’re increasingly difficult to track down, and quickly losing vitality. This kind of quick-and-dirty approach helps cover a lot of ground, which is necessary if we want the bigger picture of speech surrogacy works, before the population and genetic diversity of these systems is greatly diminished. I see this work as hands-on typology, and we need more of it before it’s too late.

Now, I want to discuss just one of the many methodological considerations I faced in Accra: how to produce listenable field recordings.

The dynamic range problem

I’m not a professional audio engineer, and my approach here is probably very different from a savvy field engineer’s. Still, I think I got decent recordings in a low-profile setup realistic for linguistic research.

Eliciting linguistic and musical surrogate data simultaneously requires a lot of equipment flexibility. An intimate interview setting conditions a consultant to speak quietly; surrogate languages tend to be loud and carry long distances. A consultant may switch from a low speaking voice to a loud drumbeat, yell or whistle many times within the course of a recording.

This can be a strain on equipment that needs a “sweet spot” of sensitivity to produce a faithful recording without too-quiet spots or clipping. This problem is exacerbated by the equipment a field linguist tends to have access to: typically no more than a few pieces of consumer or “pro-sumer” gear. The usual tools of an audio engineer—multiple high-quality microphones, close monitoring and level adjustment—just aren’t realistic for a researcher focused on data gathering.

So, what is the best option for a surrogate language researcher with only, say, a handheld recorder and a lavalier mic?

Familiarity with basic digital audio processing goes a long way here. Compressor/limiters are the basic audio dynamic controls that allow a recording to be made at a safe sensitivity level, then adjusted so that quieter parts are still audible. This approach was sufficient for my Twi recordings, where I used a single lavalier mic to record quiet speech, louder sung vocals and a plucked string instrument at once. Moderate compression was sufficient to keep everything within a listenable dynamic range.

Compression is going to be unsatisfactory in more demanding settings, however. Noisier environments and wider dynamic ranges require more severe compression. At a certain point, this begins to affect both the sound quality and the faithfulness of the data. I reached this point with recordings featuring the gyil and gaŋgaa, both percussion instruments with huge dynamic ranges.

One solution is the “safety track”. You may use two recording devices, or one, if your recorder can send the same signal to multiple inputs. When one track is set at a significantly lower level than the other, a dynamically variable recording will produce two complementary tracks: a less sensitive one (the “safety”) with areas that are too quiet, and a more sensitive one with too-loud clipping areas. Then you may manually cut between the two tracks as appropriate (a “gate” on the safety will silence the quieter parts of that track automatically if you wish). This method is technically appropriate but becomes extremely time intensive if there’s any amount of quick transition between speaking and playing.

One alternate method edges out a little on the branch of “practices a pro engineer may disapprove of”, but it is ultimately what I settled on. Essentially, the approach is to use our two recording devices—the handheld and the lavalier—to divide the frequency space in half, reducing the dynamic load on each one.

I’ll give an example: recording a gyil player. A lavalier mic attached to the player’s collar as normal will hang almost directly above the keys of the gyil, so it will pick up the voice and the mid-high range of the instrument; it is likely to peak and distort in the low-mid frequencies during the louder gyil sections.

The handheld recorder may be positioned pointing at the bass-register keys of the instrument to pick up the “low end”; it won’t be close enough to pick up the voice in detail.

Set up correctly, these two tracks can be mixed together to produce a full-bodied recording. All this requires is some simple but drastic equalizing, “chopping” the high-mid end off one track and the low from the other. Here’s an example from one of my gyil sessions:

Hi-pass filter — The lavalier track (I used an Audio-Technica AT803), with a high-pass filter removing the low end, and some minor adjustments for a clearer sound in the high-mid range.

Low-pass filter — The handheld track (I used a Zoom H4n field recorder), with a low-pass filter removing the high-mid range and some pretty serious gain in the low end.

You can see from these figures how each pattern directly complements the other, with the low- and high-pass filters meeting directly in the low-mid range. I followed up with some compression on the lavalier, which boosted the vocals without overly compromising the sound quality of the gyil. Played together, the two tracks maintain crispness in the high end and a substantial presence in the low end, and any clipping from the low-mid region of the lavalier mic is excised completely. These specific EQ patterns will look different for different equipment and different surrogate systems; the point is to have the option of a drastic EQ intervention on a distorted track without harming its overall sound quality.

There are still tradeoffs to this approach. In my recordings, there is a tendency to lose brilliance and presence in the high end of the gyil, and the gaɲgaa is probably inescapably boomy. To my ear, the results are nevertheless more than acceptable, though not at the level of high-quality field recording equipment tended by an able audio engineer. Some of these recordings will likely be available on this blog in the future for listeners to make their own judgements.

That is just one of the technical considerations I think is fairly particular to surrogate language recordings. In future installments, I’ll write about more of the linguistic methodology I tried and some of the results of the trip, but hopefully these technological points are worth experimenting with as well.

-Lucas James