top of page
Process Posts: Week 2
​

Introduction: The following Process Posts detail the work accomplished thus far for my study on the appearances of the word “worm” within Romantic period literature. This project compiles archived Romantic texts from the database Hathitrust and runs a text analysis of the documents using the software Voyant. I have purposefully explained my work in a blog-like tone and format so to allow all audiences to comprehend the work plan that will eventually lead to my final conclusions. These posts also serve to outline my work plan for more adept Digital Humanities and Romantic scholars, so that feedback on this project is given with a complete understanding of the project’s methods and journey.

​

The Reflection section serves to document my later thoughts on my original process.

​

Update: I will now be using R instead of Voyant to eventually analyze my corpus of texts.

Process Posts

​

Nov. 7, 2017

 

The work I accomplished today provided some solutions to the setbacks I had been facing, but also presented new challenges. The first task I completed today was adding six more texts to my Hathitrust Collection; the final four texts to make the fifty texts that complete my corpus for this project will come from websites for Mary Shelley’s Frankenstein, Charlotte Smith’s Elegiac Sonnets, Smith’s Beachy Head, and Olaudah Equiano’s The Interesting Narrative. I will provide the links for these texts with the link for the Hathitrust Collection on this website so that all the texts are accessible.

 

This brings me to my next piece of information: I figured out that my Hathitrust collection can be viewed by the public even if the viewer does not have a personal Hathitrust account. To figure this out I found a URL link on the side of my Collection in Hathitrust, copy and pasted this URL into my personal Google+ account, logged out of Hathitrust (so that the link would not send me to my personal Hathitrust), and then clicked the link from my Google+ page which then led me to a public version of my complete Hathitrust Collection. All this preliminary work for today took me about a half hour to complete.

 

Later, I met with Sarah Stanley for an hour today. I explained to Sarah the dilemma I have been facing thus far with Hathitrust; namely, I could not figure out how to view the plain-text versions of the corpus I have gathered. The conclusion Sarah and I reached was that I did not have access to the plain-text versions of the texts on Hathitrust because I have a student rather than faculty login access to the website. Sarah and I are going to try to see if we can have this problem fixed, but this will probably not happen within the confines of the semester. So, for now, Sarah has access to the plain-text versions of my texts through Hathitrust since she has faculty access. Sarah used some basic coding and gathered a zip file for me of the ten earliest texts in my Hathitrust Collection. These ten texts will have to be my corpus to work with in Voyant for the time being.

 

Which leads me to mention that I also ran into a brick wall trying to work with Voyant. Luckily, the ten texts will work in Voyant without issue; however, I cannot download the Voyant server on my laptop, so I am having to use the public, smaller version. For a larger corpus, I will need to figure out how to download the server. But every time I have tried to download the server on my computer the download has stalled and then failed. I still do not have an answer to remedy this issue.

 

Finally, Sarah suggested I learn some basic R and provided me with some links which I will add to the bibliography of this project on how I will explore this. Sarah suggested R because the Voyant server would not load to my laptop and so R would be the best way to analyze a large corpus of texts. However, I will not be able to get very far with this study and analyze my texts within the confines of the semester, which is frustrating. What Sarah and I decided is that my main focus should be on analyzing the ten texts from Hathitrust in Voyant as a sample set for the purposes of this semester project, but that I also to a lesser extent explore R because it is a tool I will need to complete the larger span of this project in the future. The prospect of learning R makes me very nervous because it feels like I have been given the task of learning a foreign language, but I know this task will come with major benefits for me as a scholar.

 

Nov. 9, 2017

 

Today I began to work on the bibliography for this project. The bibliography is split into four sections alphabetically that I believe are the major focuses for this project: Archives/ing, Text Analysis, Learning Code (R and Python), and Vermiculture Studies. I have added three texts under Archives/ing and three texts under Learning Code so far. I ended up spending about an hour on this task even though I only gathered six texts because I spent a lot of time reading Boris Groys’s Under Suspicion: A Phenomenology of Media. I was interested in the ways in which Groys describes how archives are shaping our history based on what is incorporated into these archival spaces. Groys writes about a particularly valuable discussion in relation to my project because I am only considering Romantic texts from one database that does not contain all Romantic works within it – my framework for Romantic texts is a limited scope constrained to what is in Hathitrust.

 

I also met with Dr. Pascoe today at 4:30 pm and had a thirty-minute discussion on the revision process of these Process Posts and what I may want these posts to look like in their finished form. I have decided to use them to simply detail my process and then, next to these posts on the website, create a reflection section that will detail my thoughts on my process. To aid the structure of these posts on the website, I have also decided to have tabs for the posts week by week. This way there will not be too much text on one webpage and these posts will be easier to follow and sort for viewers of the website.

 

This discussion leads me to the last task I accomplished today: I have posted the first week’s Process Posts to the website and designed the webpage to leave space for the post’s reflection(s), which I will add at a later date. I ended up getting caught up in designing the website because this is the aspect of the project I find the most enjoyable; I added images, boxes, pages, lines, and hyperlinks to various pages of the website tonight. I also experimented with font styles, sizes, and bolding and italicizing words. The goal of my website design is to make the website as approachable as possible, so I need every aspect to be inviting to the viewer. Next week I will probably visit the Digital Studio at Florida State University to get suggestions on ways in which I can make the design of the website even more appealing. But for tonight, working on putting the first week’s Process Posts on the website and experimenting with the design of the website probably took me around 30-45 minutes.

 

Nov. 10, 2017

 

Today I am analyzing my data in Voyant! The first thing I did is load my corpus of ten texts into Voyant from my documents on my laptop. My texts are:

 

 

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

Image 1. My Hathitrust Corpus.

​

Then I took words out of the dataset that I did not want Voyant to analyze such as “page,” “like,” and “section” in the texts. After I did this, I set every section of Voyant to search for the word “worm.” After further consideration, I also excluded Voyant from analyzing the words “the,” “that,” and “it” in relation to “worm” because those three uninteresting words were very popular near the word worm in the Phrases Frequency section of Voyant. Unfortunately, this action took those words out of the word cloud that analyzed popular words near “worm,” but did not change the Phrases section. I am still new to Voyant, so more issues like this are sure to ensue because of my unfamiliarity with the software. As for the Phrases Frequency section, here are the Phrases most common in the ten texts that include the word “worm”:

 

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

Image 2. Phrase Frequencies in Voyant.

​

To say the least, this information at first glance did not seem the most interesting, but still allowed for a foundation of connects I could look for moving forward. The fact that “worm” occurs near words such as “breedeth,” “flies,” and “puny” remind me of Janelle Swartz’s focus in Worm Work on the material quality of worms and their taxonomy. Because this data is so preliminary though, there is not much more I can do with this data at this time until I explore the texts more in Voyant.

 

Much more immediately useful and interesting were the Bubbleline frequencies of the term “worm” in my ten texts. The Bubbleline chart’s horizontal lines are timelines for the span of each text. On each line, bubbles appear where the word “worm” is referenced. The size of the bubbles indicates the frequency of the word worm in the text (the bigger the bubble the more the word “worm” is used in a portion of the text). And at the far right of each horizontal line the total number of occurrences of the word “worm” are listed for each text.

 

I find this chart to be immediately useful because it gives me a perception of which texts may be skewing my data by mentioning “worm” much more often than others. As a starting point though, I also generally like being able to tell where and how often in these texts “worm” is being mentioned. Since I am not reading any of the texts I am analyzing in Voyant, having a conception of the layout of each text in this chart helps ground me just a little in each text:

 

​

​

​

​

​

​

​

​

​

​

​

​

​

Image 3. Bubbline Chart in Voyant.

​

As well as my Voyant endeavor, I also spent 30 minutes reading the Introduction to Janelle Schwartz’s Worm Work: Recasting Romanticism today. Schwartz’s Introduction spends considerable space explaining the relationship among worms, Romantic literature, and Romantic thoughts on taxonomy. Even though Schwartz does not use large-scale data analysis in her study, my data reinforces Schwartz’s claims since Francis Bacon’s texts reference worms the most. I assume that Bacon was interested in worms in relation to scientific inquiries. One downside of Digital Humanities large-scale data analysis is that I do not know if this claim about Bacon is true or not because I have not read the texts. So, next I will analyze the word “worm” in relation to other words near it in Bacon’s text.

 

Unfortunately, I couldn’t figure out how to do this right away, so I had to completely re-load Voyant’s webpage and start my work over from the beginning. This time I only put Bacon’s text into Voyant for the time being so that I could figure out how to analyze the word “worm” only in that text. Most of the time when Bacon writes on worms he writes on specific types of worms: silkworms, earthworms, or glowworms rather than on generic worms. Worms are also common in Bacon’s texts in lists with other animals such as flies and vermin. Bacon also often writes about worms “that breedeth.” For some reason though, in Bacon’s text “worm” comes up most often near “sugar.” I need to do more text analysis before I can make sense of these occurrences.

 

I am quickly learning that I have to be very careful when drawing conclusions from the data. Tipping Silvester’s Romantic work in my collection of texts seems to reference worms the most when analyzed in Voyant’s Frequency graph:

​

 

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

Image 4. Frequency of "Worm" in Texts in Voyant.

​

However, if that is the case then this contradicts the Bubbleline I had created earlier. Rather, the Frequency graph takes into account that Silvester’s text is the shortest and so for the text’s length, Silvester uses the term “worm” most frequently, but not most abundantly. As the Bubbleline demonstrated, Bacon mentions “worm” eighty-five times in his text and Silvester only mentions “worm” three times in his text. Clearly, Bacon mentions “worm” most abundantly, but Bacon’s text is much longer than Silvester’s and once the length of the texts are taken into account, Silvester mentions “worm” at a higher ratio of the word “worm” to pages of text than Bacon does. My work in Voyant today took me and hour and a half with only preliminary results. The next time I sit down to work I plan to continue my analysis in Voyant.

Reflection

​

I came into this project expecting to find concrete answers to my research questions with clear directions and solutions for how to find such answers and, instead, this project has provided very few definite answers and solutions so far. However, I am really enjoying that this project has had setbacks and has presented me with more than one path to solve any given problem because I have learned a little about a lot of topics through my work. Arguably, my initial plan for this project – the work I believed would be accomplished – fell apart on November 7th (which was very early in to the process). However, the solutions I have found are more useful to me in the future and better methods to conduct my research than the plans I originally had, and I might not have attempted these new methods had my initial plan now fallen into shambles. In my reflection for week 1, I write on working to find my way through the labyrinth, and though in this week the minotaur seemed to almost hunt me down, I have eluded him yet again.

 

I am currently on week 4 of my work for this study, so I am able to reflect on week 2 with less confusion in the process of my work than what I faced in the moment during week 2. Rather than specific research questions, I now feel like I should have been asking research method questions at the beginning of my work.

 

Can I feasibly analyze a corpus of 50 texts in my first semester of attempting a Digital Humanities (DH) project? Is Hathitrust really an appropriate and usable archive for a student to use to gather research? What if I face issues in using Voyant? Then what? Do I really want to learn R? And how will R benefit me in the future?

 

To be honest, when Sarah Stanley suggested to me on November 7th that I learn R I was scared. Even though Sarah directed me toward the website Programming Historian and her blog post on R, the thought of learning R felt like taking on the minotaur head-to-head in single combat. My main reason to be scared and hesitant was that I have never been very good at learning foreign languages. And not only does R feel like a foreign language, but the fact that it was a technological language was even more daunting. Also, I knew at the time that learning R would probably mean the end of working with my corpus of texts for the rest of the semester because learning R and continuing to update my process on the website will take significant time. But these are not concerns that should have bothered me as much as they did and ultimately, I am glad I set these concerns aside and, as of week 4, have begun my adventure into learning R because this process is actually very enjoyable and exciting and nothing like fighting a minotaur. I also know that R will have more benefits to me in the long run than Voyant will because R has a broader range of uses.

 

On a different, final note, when I finally do get to a point where I am comfortable and knowledgeable enough to use R to analyze my corpus of texts, I really need to consider not gathering all my texts from Hathitrust or at least reviewing my texts in Hathitrust. In “Using Text Analysis to Discover Work in JSTOR,” Jason B. Jones writes on JSTOR’s algorithm for giving users suggestions on texts similar to the words the users suggest in the search bar of the site. Jones’ example is about a paper on “the sense of time in psychoanalysis,” and he explains that when he copied and pasted his paper into JSTOR’s search bar he did get some specific suggestions to search philosophers of psychoanalysis, but he also got more broad terms about psychoanalysis in general too that were only loosely related to his paper’s topic (par. 2-3).

 

I am now wondering how relevant some of the texts I gathered in Hathitrust are. Though I searched for “literature,” “English literature,” and “English poetry” in a Romantic database, I should have been more cautious in my selection. I should have asked more questions about the programming of the database.

 

What does Hathitrust’s algorithm consider literature? What dates does Hathitrust’s algorithm believe are Romantic literature? What does Hathitrust’s algorithm consider to be English? English poetry? Etc.

 

One may initially think, as I did, that these are questions with obvious answers and therefore do not need to be asked. But I am now discovering that I cannot give definite answers to any of these questions and that troubles me as a researcher who wants to be in control of my data and variables.

© 2017 by Emily Scott.

bottom of page