The Text in the Eyes of an Encoder

Why Do We Encode Literary Texts?

What is encoding and why is it important? First, it is helpful to understand that encoding is a part of digital humanities. David Birnbaum, a professor from the University of Pittsburgh defines digital humanities as “the use of computer technology to conduct primary humanities research” (Birnbaum, 2015). An important distinction here is that computers are not being used to write reports or find resources the way a student would utilize a laptop, but in this field, the computer is being used to process data and add to one’s understanding of a variety of humanities-related artifacts. In Doing Digital Humanities: Practice, Training, Research, Julia Flanders explains that “text encoding is a process of creating a digital model of a textual source using markup” (Flanders, 2016). Encoding a text entails converting the content of a text such as The Complete Fortune Teller & Dream Book into code so that it can be understood in a digital format and patterns within the work can be more easily found. The Women Writers Project (WWP), a group that encodes text produced by women before the Victorian-era, explains that text encoding is “essentially a translation of the traditional reading experience into digital form” (Connell, 2017).

We encode literary texts not only as a way of sharing them with a larger audience but also to understand them better. The act of encoding forces one to break down the text in an intimate and detailed way. One has to deconstruct the work’s content in a way that allows both them and the computer to recognize its patterns. This project will explore the ways one can do digital humanities research as well as how this field can add to the understanding of Chole Russel’s text, The Complete Fortune Teller & Dream Book.

Just like a language, encoding has a set of rules that should be followed to ensure understandability; the computer needs to be able to read what is being entered. Two major components of encoding are XML and TEI. XML is how we present the text while TEI is the specific words and elements used to code for a specific property. When training with the WWP it was explained to me that XML is the syntax whereas TEI is the language, it is made up of vocabulary and grammar and is a little more specific than XML. Flanders describes XML as “a formal model designed to represent an ordered hierarchy, and to the extent that human documents are logically ordered and hierarchical, they can be formalized and represented easily as XML documents” (Flanders, 2016). XML has a few very important rules that one needs to follow to make sure their document is not ill-formed. Sarah Connell eloquently describes the difference between XML and TEI in the article Learning from the Past: The Women Writers Project and Thirty Years of Humanities Text Encoding writing:

“Through XML, it is possible to define specific markup languages that describe different kinds of data, such as historical documents, or chemical formulae, or web pages, or financial transactions. The TEI is one such language, designed to describe and encode humanities research materials such as primary source documents, oral histories, linguistic data, scholarly editions, manuscripts, and many others. Although the TEI can be used to represent very simple data, it excels at providing a detailed account of editorial, interpretive, semantic, literary, and historical features of texts, which can be used to support nuanced scholarly analysis.” (Connell, 2017)

One important feature of TEI is that it is based on digital humanities community standards. James Cummings in Opening the Book: Data Models and Distractions in Digital Scholarly Editing states that it is “a community-developed open international standard which provides a set of recommendations for the encoding of digital texts” (Cummings, 2019). The customization element of TEI is very significant as it means that different projects can adapt it to fit their needs and explore the areas of a given text that they require.

Breaking Down the Code

Now that you understand a few important definitions of the tools that one uses when encoding we can now look at the actual code. Below is an example of how one could code Chloe Russel’s name.

An element is used when the encoder wants to draw attention to an area of the text whether because it is structurally significant or its content is worth noting. A start and end tag for each element is required for without one or the other the code would be ill-formed and Oxygen (the computer program/XML editor that many use to encode texts) would not be able to read what has been typed. Angle brackets are needed to indicate that persName is not just another word in the text but is in fact an element declaring that “Chloe Russel” is a person’s name. Angle brackets, as well as quotation marks, are used as delimiters which signify the boundaries of plain text; just like there needs to be a start tag and an end tag, delimiters need to cancel each other out so the code can be read by the computer. It is important to note that the name “Chloe Russel” is content not an element, <persName> and </persName> are the elements. Attributes are like adjectives and are used to signify when there is more going on with the content that needs to be recognized. Here the attribute name “rend” is also known as a rendition to explain that “Chloe Russel” is distinctive in its appearance. The attribute value is even more specific not only expressing that “Chloe Russel” looks different but in this case, is italicized.

An important rule of XML is that the code has to follow a hierarchical structure. It is almost like a family tree, there is one root and from that root there are descendants. Information is layered and nested so that it can be read by the computer. Below is an example from Doing Digital Humanities: Practice, Training, Research showing how this type of structure looks when drawn out like a family tree.

Figure 1: Julia Flanders, et al. The same XML citation data visualized in three ways: as a set of nested boxes, as a tree, and as XML encoding. “Text Encoding.” Doing Digital Humanities: Practice, Training, Research, by Constance Alison-Crompton et al., Routledge, 2016, pp. 107.

Above you can see that there is one element that branches off into smaller and more specific elements so that the citation can be fully described. In the XML encoding on the right, one can also note that each element is nested inside another; it is vital that when one element is opened it must be closed before another one is started, or else the document would be ill-formed and incomputable. For example, title has a start and end tag before the imprint tag is used and the tags <publisher>, <place>, and <date> are all used and closed before the imprint tag ends allowing it to encompass all three elements.

Encoding Decisions

There are multiple levels of complexity one can apply to their encoding. These additions can change based on what you want to highlight in your project. You can pick out what you deem is important to tag and decide whether or not to include it in your markup. Let’s look at a page from The Complete Fortune Teller & Dream Book as an example.

Page 1 of The Complete Fortune Teller & Dream Book by Chloe Russel

This is one way that you can encode this text…

One feature we can see here is that the words are surrounded by an <ab> (also known as “anonymous block”) which is often used for a section of text that is not specific enough to be labeled a paragraph. Another element used to mark important structural components of this page is <lb/> which stands for line break; this is smaller but important as without it one would not know when words start on a new line. Other aspects that are noted are nouns such as the author’s name (<persName>) and place names (<placeName>) such as Massachusetts. While this markup points out some of the main structural features of this page it leaves much to be desired. Here is a more detailed and complex encoding of the text...

 

What has changed? One of the main differences between this encoding and the more basic edition is the inclusion and specificity of the text’s rendition. Here the rend attribute is added and given specific values to express how the text can be seen. First, rend="align(center)" is added to <ab> to show that all of the text on this page is centered. The element <mcr> (also known as “meaningful change in rendition”) was added around the last two lines. Why? Well while there are no names of people or places, these last lines are distinct from the surrounding text through their italicization, a feature that this version wants to highlight. To indicate that the author’s name is in all capital letters rend="case(allcaps)" has been added to <persName>. In this version of encoding, the most important visual aspect that has been noted is the illustration of the author. The elements <figure> and <figDesc> were added to express that all of the text below the image is used to describe/explain the drawing while <figDesc> is a chance for the encoder to explain what the illustration looks like as readers of the encoded version will not be able to see it for themselves. The element <ab> was also given a type="caption" to further express that the purpose of the text below the portrait is to describe the illustration. To signify that this page represents a significant division of the text <div> was added with a type="frontispiece" which is used to signify an image (often of the author) that exists at the front of a work. The last element that was added to this version of the encoding is <name> which was placed around the terms “Old Witch” and “Black Interpreter”. The <name> element is used to mark proper nouns, among other things, and the encoder of this text decided that “Old Witch” and “Black Interpreter” fit this description.

Even after adding all of these tags, there are still many features unmarked such as the handwritten numbers on the top left of the page, the size difference in letters, the gender of the author, and an extremely detailed description of the picture. This is why a project must have a set of specific goals. While the size change of letters may not be helpful for someone who wants to read the text to understand its content, it could be vital for someone wanting to research typography during this time period. It is also important to remember that the decisions an encoder makes do not always represent all the encoding possibilities; for example, some may have chosen not to tag “Old Witch” and “Black Interpreter” with <name>.

Just like the expanded example of the text above this explanation does not include all of the features of text encoding. There are many other aspects that a person needs to learn to become an expert but hopefully knowing the basics of markup will help one understand what exactly text encoding entails. Doing this type of analysis with a text like Chloe Russel’s, The Complete Fortune Teller & Dream Book would allow one to have more insight into the text than if they read the work without thinking of how all of these smaller components add up to create the piece we see today.

The Benefits of Encoding

Encoding texts can not only help one better understand the structure of a work but when compared to other encoded pieces patterns can more easily be found and explored. Connell writes that encoding a text allows for the “opportunity to use the structural markup within the texts to create more intelligently focused searches, and it presented the search results in a way that could be used to read patterns across the entire collection” (Connell, 2017). The act of encoding causes one to break down the features of a work and with the use of computer programs like Oxygen the common traits of a piece can be better understood and help one comprehend the unique aspects of a text such as Chole Russel’s, The Complete Fortune Teller & Dream Book.

One important thing to remember when looking at the encoded versions of documents is that there is no one way to encode a text. Cummings reminds his readers that “Not all documents are faithful copies of the work, nor do they represent all possible ways of understanding the text in question” (Cummings, 2019). Depending on the goal or intention of a project different elements may be tagged or ignored; Russell’s work could be encoded multiple ways depending on what the encoder is trying to achieve with their markup. Cummings also notes that even though this means that while one encoded document may not represent all of the ways it could be read “It is precisely the potential of a digital edition to be near-infinitely refactorable and dynamically to provide different views depending on external interactions that is one of its greatest strengths” (Cummings, 2019). It is also important to consider if these documents are being offered free to the public or a group of subscribers. Offering them to anyone who asks means that the text will likely reach a wider audience; however, encoding a variety of texts requires a level of resources making a subscription model sometimes necessary. Text encoding is very important and beneficial as the uses of an encoded document are endless. The work can be understood not only through the process of breaking down all of its structural components but the overall relationships existing in the text can be explored as well as its connections to other texts.

The Women Writers Project (WWP) is a group that has been encoding texts for decades. The goal of the project is to share the works of women written before the Victorian era so that scholars are not only aware of their existence but can use them for research. The organization explains that they “produce a transcription of each text that preserves the original spellings, typographical errors, lineation, hyphenation, and other details of the text” (Women Writers Project, 2019). Looking at the aims of a project like the WWP is helpful as the group is currently encoding Russel’s work. The Complete Fortune Teller & Dream Book is currently being encoded so that it can eventually be added to the list of works the Women Writers Project provides its subscribers. This is also of relevance to me as I work with the WWP and was able to proof Russel’s text. In Doing Digital Humanities: Practice, Training, Research Flanders writes that when encoding there is “a connection between an observing individual and a source object” (Flanders, 2016); this statement will resonate with encoders as they are the ones making the choices of what the author is conveying with their words. While encoders are assisted and guided by the view of others, when encoding it really is just “an observing individual and a source object”. This makes encoding a document feel personal as you are attempting to picture what the author envisioned for their piece.

As I mentioned before I got to take part in the encoding of The Complete Fortune Teller & Dream Book and as such would like to point out a few interesting features of how the WWP has used markup with this work. I was able to proof this document, meaning I went through this text line by line and compared the book with code written by the previous encoder. While this process involves a little less engagement than if I had been the one to encode this document from scratch, proofing requires paying close attention to both the work and the markup as this role’s goal is to spot any errors or possible alterations that could be made.

Page 15 of The Complete Fortune Teller & Dream Book by Chloe Russel

 One of the few questions I ended up presenting for group discussion was whether the word trifling should be adjusted to include the elements <choice>, <sic>, and <corr>. Above you can see an image of page 15 of Russel’s work; under the description of “Nuts”, the word “trifling” is split across two lines. Words existing across two lines are quite common but what is interesting here is that a hyphen connecting the two parts is absent. I proposed that we utilize the elements <choice>, <sic>, and <corr> so that Oxygen would read it as one word rather than two distinct words. The markup was than changed from tri <lb/>fling to <choice><sic>tri <lb/>fling</sic><corr>tri­-<lb/>fling</corr></choice> to indicate that the WWP made a conscious decision to add the hyphen despite it not existing in the original copy of the work. This example shows how sometimes those encoding a document make choices to add and remove aspects of the text to make it more accessible. While this change aligns with the aims of the WWP it may not work for all those trying to research this document showing how the decision one makes in their encoding impacts the final product.

Another interesting aspect of this text’s encoding is that it includes a large quantity of <item> and <label> elements. Part of the goal of this text is to explain what different themes/objects in a dream may represent and this section includes a long list of items and labels as a part of that explanation. Using these elements to break down this unique structural feature of the piece allows one to better understand how this part of the work functions; by acknowledging that Face, Fall, and Feast are labels and that those words, as well as their descriptions, are items, forces one to look at the text a little different than if they had just read the work without seeing the markup. Below I have included an example of how this part of the text would be encoded.

Page 12 of The Complete Fortune Teller & Dream Book by Chloe Russel

Another compelling aspect of using markup on this document is examining what it can tell us about the table on page 19.

Page 19 of The Complete Fortune Teller & Dream Book by Chloe Russel

An obvious way to encode this table is by using the <table> element but looking at how this table is printed on the work and what the markup looks like impacts how one understands what they are looking at.

Looking at how the table from page 19 appears on the physical work and comparing it with how it is encoded, shows just how different the two can be despite representing the same information. The markup expresses the same thing that the image does yet when confronted with the two one can not help but process the information differently. Encoding the table is beneficial as it helps one better see what the “Fortune Table” is structurally, allowing one to break down what it is actually trying to say without an overload of visual information.

One challenging aspect of the encoding process was the fact that the physical document is missing information and as a result so is the markup.

 

Page 22 of The Complete Fortune Teller & Dream Book by Chloe Russel

Above you can see that part of page 22 has been lost; there is a rip in the January astrology section and the reader can see the next page rather than the complete 22nd page. This is both a loss for the reader of the physical text and the encoded version as when information is missing from the original copy encoders must use the <gap> element to express that there is a certain extent of the text that is not available. Luckily there are not many sections in The Complete Fortune Teller & Dream Book that are either too damaged to understand or missing a large portion of the text but it is important to note that having another edition to supply what is lost is beneficial for both a reader of the physical work as well as the encoded version.

Implications for Chloe Russel’s Text

One idea mentioned earlier is the customization granted through the use of TEI. TEI opens up a multitude of opportunities for one wanting to encode Russel’s text on their own. There are many features in The Complete Fortune Teller & Dream Book that make it unique and this is something that could be expressed through encoding if a researcher decided it was their intended focus. There is a wide range of possibilities; one could look at such as gendered language, the role/representation of race, the connection to witchcraft and nature-based religion, etc. One could also use encoding to compare the different versions of this text and explore not only the structural discrepancies but analyze the variations down to the word level. Encoding opens up a realm of possibilities and can be used to understand all of the nuances of this work.

Encoding this piece of literature at the WWP is not only valuable as it gives insight into this specific text but it compares it to the large variety of works already included in the Women Writers Online. Having this work in conversation with others allows for “an opportunity to use the structural markup within the texts to create more intelligently focused searches, and it presented the search results in a way that could be used to read patterns across the entire collection” (Connell, 2017). Using makeup to explore this piece also allows one to better understand the structure of this text. As we have seen, transitioning the physical work into XML can make a reader see the text differently and have a greater understanding of how all of its components create the final product. It also allows one to explore distinct features of this work; by searching for specific elements or changes in rendition such as capitalization and italicization one can contemplate how these renditionally distinct sections may be more important than the surrounding work. It is easier to ask: why is this part different/more significant when one can see which parts are in fact different. Encoding The Complete Fortune Teller & Dream Book also provides quantitative information about a document that is normally understood through qualitative observations (which is true of most literary texts). Knowing the number of labels and being about to count all the times a heading is introduced is quantitative knowledge that can aid one’s compression of Russel’s work.

Another benefit of having this text in the WWO is that it will be available to a wider audience than it is currently. Digitizing The Complete Fortune Teller & Dream Book not only gets the work out to more people but Chloe Russel will now be one of the authors scholars are able to use in their research when searching the WWP’s collection. Adding this text to online databases such as the WWO shares the story of Chloe Russel in ways that this text could not achieve through the utilization of just the remaining few physical copies. Encoding this piece raises awareness about its role as a work produced by an African American woman during this time period. Making Russel’s work known promotes questions, encourages exploration, and increases interest; the more scholars who are aware of this piece the more likely someone will take on the role to research the origin and production of this text granting everyone with more information about the creation of the book. Encoding Russel’s text makes it more accessible and raises awareness of her existence, adding to her story in a similar way this exhibit does.

Works Cited

          Birnbaum, David J. “→ Digital Humanities.” What Is XML and Why Should Humanists Care? An Even Gentler Introduction to XML, 28 Aug. 2015, dh.obdurodon.org/what-is-xml.xhtml.

          Connell, Sarah, Flanders, Julia, Keller, Nicole Infanta, Polcha, Elizabeth, and Quinn, William Reed. "Learning from the Past: The Women Writers Project and Thirty Years of Humanities Text Encoding." Magnificat Cultura I Literatura Medievals 4 (2017): 1. Web.

          Cummings, J. Opening the book: data models and distractions in digital scholarly editing. Int J Digit Humanities 1, 179–193 (2019). https://doi.org/10.1007/s42803-019-00016-6

          Flanders, Julia, et al. “Text Encoding.” Doing Digital Humanities: Practice, Training, Research, by Constance Alison-Crompton et al., Routledge, 2016, pp. 104–122.

          Women Writers Project. “ Methodology for Transcription and Editing.” Statement of WWP Editorial Principles, 2019, wwp.northeastern.edu/about/methods/editorial_principles.html. https://wwp.northeastern.edu/about/methods/editorial_principles.html