From the Vault: Transcription Tools 2016

Note from Arielle: When I began my work on the transcription project, I came across unpublished work from a former research assistant, which I used to guide my research. Here is that work, now published for your reference.

Note from Michelle: This post, and Arielle’s encounter with it, are great illustrations of the Remix method of modular research. The research and writing represented below was created during the 2015-16 academic year by Jennifer Zhong ’18 and Qingyu Wang GR ’16.

By Qingyu Wang, MA in Comparative Literature, Dartmouth College, with research by Jennifer Zhong (2015-16)

Introduction and Methodology

Over the last three months, under the guidance of Professor Michelle Warren and the inspiration from the Remix the Manuscript group, I have been exploring the features of the two transcription tools, T-Pen and Diplomat. How would the transcription tools change the process of transcribing the Brut manuscript? To answer this central question, I first compare transcription with the help of each tool with the traditional way of transcription, that is, with pen and paper, taking down verbatim what one reads from the manuscript. Of course, it is hard to give the only definition of the traditional transcription, for different transcribers have different habits;some tend to transcribe at one go, while others prefer to go back and forth several times to determine the less recognizable parts and expand the the abbreviations. During my comparison, I tend to consider the variance in personal transcribing styles as much as possible, but I also recognize that it is impossible to enumerate all the situations. I only pick those cases that stand out most in the comparison.

By answering the central question, I hope that I can contribute to the future improvements of transcription software. During this process, I reflected on the nature of transcription. How does the use of software change the way we look at transcription? Is this change of perspective contributive to our knowledge of the manuscript? Can we completely dispense with the traditional process? What’s the hindrance to the use of the software? For example, are high-resolution photos of manuscripts readily accessible? Can large photos be uploaded to the software? I will bear in mind all these questions during my comparisons.

I chose the 10v of the Brut manuscript for these comparisons for two major reasons:

This page has lots of anomalous marginalia, which can test the software’s susceptibility to complicated layout.
This page also includes various abbreviations and intelligible texts.

We went through a list of tools. Jennifer Zhong researched 17 tools, categorizing them mainly according to stage of development and accessibility. We finally picked T-Pen and Diplomat because both are free and maturely developed transcription tools open to public uses. T-Pen is a web-based tool for working with images of manuscripts. Users can either upload manuscript images to the repository or transcribe from manuscript images stored in the communal pool. The tool has a specific interface which helps avoid eyeskip errors and makes saving and retrieving transcription progress easier. One distinct feature of T-Pen is that it makes transcriptions interoperable. The “Collaboration” tab enables users to invite transcribers to one team to facilitate group transcription work on the selected project. Diplomat is a transcription editor which helps avoid eyeskip errors by allowing the user to type directly beneath each line in the manuscript image. Transcribed text may be exported as plain text or marked-up XML.

Diplomat

Diplomat is an editor for transcribing and marking up manuscript images.

Diplomat has three separate windows-one for viewing manuscript images, one for typing in transcription, and the third for producing transcription synchronously.

When you click the Open button, two windows will open on your screen: the image window on the left and the transcription window on the right:

The software allows a quasi-diplomatic transcription: it allows the same line break as the original manuscript does. It also has an insertion function for you to insert rare characters and abbreviations. It automatically produces marked-up characters. For example, if “that” is abbreviated in the manuscript, you can insert the abbreviation by choosing from the drop-down list. In the transcribed text in the third window, the abbreviation is automatically expanded. Therefore, scribal abbreviations can be transcribed and displayed uniformly through user-configurable menus. You can also insert annotations that are displayed graphically in the transcription and exported as XML markup.

With a frame highlighting the line which is being transcribed, it greatly reduces the possibility of transcribing errors caused by eye-skip or resuming writing but skipping ahead because of the similarity of the endings of two lines, thus leaving out a passage, or copying once what appeared in the exemplar twice (“pewterer” reduced to “pewter,” or “that that” reduced to “that”)

Procedures:

You can click anywhere on the image window or its title bar to select it. One character of the software is that you can transcribe in an arbitrary order. This can change the way people who are accustomed to transcribing with pencils and paper, or WORD document. Being worried about line skip, one often transcribes in the natural order that lines are displayed in the manuscript. With the software, you can easily go from one line to the other to make any change. If one clicks on a line in the transcription window, the image window will show the corresponding line from the manuscript. If I press return it will present a typing window already filled with the line as I had previously typed. I can change this line in any way, and then press return to put the changed line into the transcription, or esc to discard the changes and keep the original line. I can also erase the contents of the typing window and press return; this will remove the line from the transcription and renumber the other lines accordingly.
Choose the first line you want to transcribe. (It can be any line; it needn’t be the first line of the passage.) If the whole line isn’t visible in the image window, zoom in and out using + or – keys respectively, and use the scroll bars to bring the whole line into view. Click-and-drag from one corner of the line to the opposite corner; when you release you’ll have drawn a box around the line.
When one has drawn a box tightly round the line, press return. A typing window will appear directly under the selected line into which one can type their transcription of the line.

Saving and Exporting:

If I invoke File→Save, Diplomat saves the text I have written and the links between transcribed text and manuscript lines. This lets one carry on working in a subsequent Diplomat session from when one saved. But the saved work is in a format that only Diplomat can use; to produce a text file of one’s transcription that can be used outside of Diplomat one needs to invoke File→Export plain text. This will write the text to a .txt file with the same name as the image file, in the same folder. For example, if I am transcribing from an image file called Brut10v.png, exported text will be saved in the file Brut10v.txt in the same folder.

Other Features:

Users can add non-keyboard characters. In this sense, users, by replenishing this character pools, also accomplish programming this application. This is the configuration window:

By clicking the “+” button, users can add a new entry. Users need to input the character, or type its Unicode, or give its HTML code in the HTML entity field. Automatically, the other two fields will be filled.

More characters can be added the same way. Users can later delete or edit or rearrange the order of any characters already in the pool.After populating the character menu with useful characters, one can start insert these characters in the process of transcription.

For characters, typically abbreviations, which do not have Unicode representation, they can be put into the Abbreviations menu which resides under the Insert button. This menu is also empty by default. One can cut out the abbreviation characters from the manuscript and put it into the menu. One need to type a description, transcription of the character, and the XML that users would like generated for the abbreviation when it is exported as an XML file.

Then, one can insert the figure into the text. Similar abbreviations will be transcribed uniformly and easily. It will appear in the transcription window as the transcription users gave. However, to distinguish it from other characters that appear as they were in the original manuscript, a box around the transcribed text indicates that this is an expansion of an abbreviation. When the users click-and hold over the box, they can see the original figure.

Users can also add editorial additions. There are four entries under the button Annotation, Addition, Supply Text, Milestone, and Note. The user can supply some missing text. This is for the editor to supply some missing text. One can click on the arrow on the Reason field to find possible reasons. The user can select one of these or build in one of their own. Similarly, the Source field has built-in entries to select, or one can type their own sources. After the insertion has been finished, it will appear as a green mark in the typed text. If the user moves the mouse over the green mark, the mark will become read. At this time, if the user clicks-and-holds the mark, s/he can see the detail of the supplied text, reason, and source of supply.

Problems with manuscript pages with more than one column:

Diplomat produces line numbers automatically based on how high or low the selected region of the image is. This works well for text in a single column, but if there are two or more columns (or two or more pages) on one image it won’t work properly. The simplest way to deal with this is to make a duplicate copy of each image file for each column it contains; for example if Brut10v.png shows a page with two columns, make a copy of the file and rename them Brut10va.png and Brut10vb.png respectively. Then use the former file when transcribing the first column, and the second (identical) file when transcribing the second column.

This means that one cannot transcribe and display the texts as they are in the manuscript.

Diplomat Versus Traditional Transcription

Diplomat is an editor for transcribing and marking up manuscript images. In this journal entry, I first introduce the basic features of Diplomat. Then, I compare transcription with the help of Diplomat with the traditional transcription.

Diplomat has three separate windows. one for viewing manuscript images, one for typing in transcription, and the third for producing transcription synchronously.The software allows a quasi-diplomatic transcription (copying everything one sees): it allows the same line break as the original manuscript does. It also has an insertion function for the user to insert rare characters and abbreviations. For example, if “that” is abbreviated in the manuscript, you can insert the abbreviation by choosing from the drop-down list. In the transcribed text in the third window, the abbreviation is automatically expanded. Therefore, scribal abbreviations can be transcribed and displayed uniformly through user-configurable menus. Transcribers can also insert their own annotations that are displayed graphically in the transcription and exported as XML markup.

Comparison:

	Transcription with the Help of Diplomat	Traditional Transcription with Paper and Pencil
Eye-Skip (where the transcriber’s eye has jumped from a word to its next appearnce, ommiting the intervening text or letter)	A frame highlights the line which is being transcribed, greatly reducing the possibility of transcribing errors caused by eye-skip.	Mistakes caused by eye-skip or resuming writing but skipping ahead are very common.
Order of transcription	One character of the software is that you can transcribe in an arbitrary order. With the software, you can easily go from one line to the other to make any change. If one clicks on a line in the transcription window, the image window will show the corresponding line from the manuscript.	Worried about line skip, transcribers often transcribe in the natural order in which the lines are displayed in the manuscript.
Locating	If one invokes File→Save, Diplomat saves the text that one has written and the links between transcribed text and manuscript lines. This lets one carry on working in a subsequent Diplomat session from when one saved.	One needs to figure out ways to remember where they have been every time they resume the project, i.e. counting the line number, remembering the idiosyncrasies of the line, etc. People even choose to stop after transcribing a full page to avoid the trouble of re-locating.
Interoperability	The saved work can be in a format that only Diplomat can use or in a .txt file, which is easily accessible by other people besides the transcriber.	The transcriber needs to input their transcription to a word processer to allow their transcription to be accessible to other transcribers. Also, there are very few public platforms for such interoperability.
Special characters	Users custmomize the program’s character menu. They can add non-keyboard characters, for example, thorn (Þ, þ), yogh (Ȝ, ȝ). By replenishing this character pools, the user also accomplishes programming this application. Users can later delete or edit or rearrange the order of any characters already in the pool.	One needs to write down the different forms of non-keyboard characters every time they encounter it. This lack of standardization would become inconvenient when the transcriber inputs the transcription to a word processer.
Abbreviations	For characters typically abbreviations which do not have Unicode representation, they can be put into the Abbreviations menu which resides under the Insert button. This menu is also empty by default. One can cut out the abbreviation characters from the manuscript and put it into the menu. One need to type a description, transcription of the character, and the XML that users would like generated for the abbreviation when it is exported as an XML file. Then, one can insert the figure into the text. Similar abbreviations will be transcribed uniformly and easily. It will appear in the transcription window as the transcription users gave.	One needs to expand the abbreviations every time they encounter it. But one can expand the form differently according to the close context.
Additions	Users can also add editorial additions. There are four entries under the button Annotation, Addition, Supply Text, Milestone, and Note. The user can supply some missing text. One can click on the arrow on the Reason field to find possible reasons. The user can select one of these or build in one of their own. Similarly, the Source field has built-in entries to select, or one can type their own sources. After the insertion has been finished, it will appear as a green mark in the typed text. If the user moves the mouse over the green mark, the mark will become read. At this time, if the user clicks-and-holds the mark, s/he can see the detail of the supplied text, reason, and source of supply.	There are some conventions for adding texts: [xxx] or <xxx> supplied by editor [letters omitted without mark of contraction, e.g. ‘-con’ for -cion; also letters omitted by mistake, and punctuation supplied by transcriber if absolutely necessary. xxx or \xxx/ text inserted by the scribe either between the lines, or in the margins <xxx> = deleted text These conventions are universal. But it requires certain technical knowledge on the part of the transcriber and the reader of transcription.
Working with manuscript pages of more than one column	Diplomat produces line numbers automatically based on how high or low the selected region of the image is. This works well for text in a single column, but if there are two or more columns (or two or more pages) on one image it won’t work properly. The simplest way to deal with this is to make a duplicate copy of each image file for each column it contains; for example if Brut10v.png shows a page with two columns, make a copy of the file and rename them Brut10va.png and Brut10vb.png respectively. Then use the former file when transcribing the first column, and the second (identical) file when transcribing the second column. This means that one cannot transcribe and display the texts as they are in the manuscript.	The transcriber is more flexible to decide how to look at the manuscript while transcribing. In such a case, the transcriber can have a more complete view of the whole manuscript even when s/he is only focusing on one column.

From the Vault: Transcription Tools 2016

Share this post:

Leave a Reply Cancel reply

Recent Posts

Archives

From the Vault: Transcription Tools 2016

Share this post:

Leave a Reply Cancel reply

Recent Posts

Subscribe by Email

Tags

Archives