Fan Fiction by AI

On September 16, 2020 by Michelle Warren

Over the summer, Remix bid farewell to two team members, Digital Library Fellows Madeline Miller and Victoria Corwin. They were an incredible team unto themselves!

Back in January, they sent me a message about a “silliness” project they’d cooked up–producing some “fan fiction” texts based on the Brut edition to explore how algorithms produce meaning out of data that is, computationally speaking, meaningless. Encouraged by Dot Porter, who is great fan of fan fic, we’re releasing these texts into the world!

Here are three samples for a laugh!

And in the spirit of remix, here is Madeline’s mostly-but-not-completely finished account of making Brut fan fic with AI. Read on to find out how AI turns commonplaces of medieval chronicle style into queer genealogies!

Background

“I want to train an AI with the Brut manuscript and have it write fanfiction”

The goal: create something whimsical using an AI trained with Brut text

For the text generating AI, I utilized a simplified version of GPT-2 that already had pre-training. By not having to start from scratch, I was able to do the AI work within a couple hours and focus on fine tuning the existing trained AI with the data from the Brut (downloaded from an Archive.org copy of Brie’s edition).

https://colab.research.google.com/github/ilopezfr/gpt-2/blob/master/gpt-2-playground_.ipynb

*Also comes with handy instructions for people like me who are total AI novices

An AI will reflect the errors of the data it is trained with, and our raw input text contained many errors. We didn’t fully or carefully clean the text (this would have been a herculean endeavor essentially equal to transcribing the entire Brut from scratch), but Victoria and I developed a quick and sloppy way to clean the data. We removed the text at the start and end that didn’t relate and looked for some common fixes such as removing all cases where there were five spaces in the text instead of one.

      • Example: text we don’t care about, weird spaces
      • How we want to deal with medieval text symbology
    • Words in all caps can be challenging (treated as unrelated to lowercase)
    • With our quick and sloppy data clean, there were still a lot of messy components that carried through.
      • EXAMPLE
    • Is there enough? How much data did we feed it?: 1.43 MB
  • The results : And in that same yere…

The AI’s products definitely felt like the Brut. We felt that it did a particularly good job mimicking the transition phrases commonly used in the text.

      • Ex: ‘And in that same yere’

A common issue with AI generated text is continuity. The AI is not processing the text in terms of understanding what the meaning is, so won’t usually ‘remember’ what has happened already. Gpt-2 is actually better at ‘remembering’ than other text generators might be. That said, it will get caught up in repetitive loops:

EXAMPLE

According to Victoria, this still sometimes actually feels like reading the actual Brut, in a comedic sense. The AI creates long lists of people and their titles/retinue, which is also a frequent occurrence in the actual text. So while the AI often repeated the same name, we felt we had to give it credit for capturing the spirit of the text.

EXAMPLE

In addition to long lists of people, the training data included many lists of location names and movements. When introduced to the AI, this resulted in long, incoherent road trips through England and France.

Example: the captives tour de England

Medieval Texts, AI, and Marriage

The AI makes a stand for queer representation in medieval texts! Or, more accurately, the AI doesn’t understand social constructs and the training data didn’t completely bias it towards heteronormativity

AIs are known to reflect the gender or racial bias based on the materials they were trained in. However, in the case of generating text for the Brut, the ‘bias’ emerged in an interesting way. When ‘marriage’ cropped up in the text that was generated from the Brut, it appeared to typically be between two masculine identities.

After noting the first couple of examples, we hypothesized that this might be due to the frequency of male versus female names and identifiers within the text. More common male identifiers would mean that , including marriage, were likely to be between two presumably male characters.

Some initial searches indicated that male ide text-

186 hits for ‘ hir ’, 628 for ‘ hym ’ (spaces around to avoid words like ‘chirch’)

‘ Ham ’: 484

          • ‘Hyr’: 5
          • ‘ Quene ‘: 151
          • ‘ Kyng ‘: 1420
          • Wordclouds.com: 5/10 utilized words relate to male identifier
            • 2444;”fat”;””;”” 1655;”pat”;””;”” 1554;”Kyng”;””;”” 1289;”come”;””;”” 884;”men”;””;”” 864;”hym”;””;”” 822;”King”;””;”” 723;”made”;””;”” 664;”ham”;””;””

Intrigued and wanting to test this, I entered a couple prompts to encourage the AI to produce wedding and marriage related texts. I was rewarded with a plethora of weddings between kings and lords, and one example which began with a king and queen but (thanks to the tendency to create long lists of names) continued on to become a large, polyamorous celebration of love.

Examples

One passage actually married someone off to God.

Ex: the wedding to Sir Oswyn, marriage to god

        • Transition: Another example of when the AI’s lack of actual understanding led to hilarity…
      • Ex. likes to draw and dampyn everyone for absurdly long periods of time
      • Would sometimes draw and dampyn someone repeatedly or resurrect them from the dead…. Leading into
    • The AI knows about Quidditch and Death Eaters
      • This is a pretrained
    • The Mock song of the Battle of Verneiuil: where it picked up the word ‘weeny’, we do not know… but thank you to whatever training data that supplied this word
    • Remix the manuscript?
  • Thoughts for the future: Comedy as a form of engagement and problem solving
    • A fun way to engage with the text

Leave a Reply

Your email address will not be published. Required fields are marked *