Week 4 | consideringworms

Process Posts: Week 4

Introduction: The following Process Posts detail the work accomplished thus far for my study on the appearances of the word “worm” within Romantic period literature. This project compiles archived Romantic texts from the database Hathitrust and runs a text analysis of the documents using the software Voyant. I have purposefully explained my work in a blog-like tone and format so to allow all audiences to comprehend the work plan that will eventually lead to my final conclusions. These posts also serve to outline my work plan for more adept Digital Humanities and Romantic scholars, so that feedback on this project is given with a complete understanding of the project’s methods and journey.

The Reflections section serves to document my later thoughts on my original process.

Update: I will now be using R instead of Voyant to eventually analyze my corpus of texts.

Process Posts

Nov. 22, 2017

The work I did today was surprisingly more exciting than scary. The first task I did was to relook over my Statement of Intent one more time because Dr. Pascoe had pointed out a paragraph that might need further revision. I did change the wording of that paragraph a little bit, but I like how the statement reads now so hopefully this will be my statement’s final form (at least for the duration of the semester). This work took about 15 minutes to complete.

Then, I began my first exploration in learning R. My preliminary training in R started with one of the two lessons from Taryn Dewar’s “R Basics with Tabular Data” from the website Programming Historian that Sarah Stanley showed me. This first lesson taught me how to do very basic commands such as changing the font style and background color of R’s screen, but it also taught me how to manipulate a data table in a few different ways such as finding the median and average of the dataset or computing sums and differences of the dataset. This lesson took me just over an hour to complete. When I was finished my screen showed this:

Image 1. My first lesson in R.

The first computation “data()” in R opened R’s preset datasets. Then, “data(AirPassengers)” allowed me to use the preset dataset in R of air passengers from January 1949 to December 1960 in thousands. In the image, I was given the dataset of air passengers twice because I accidently wrote the code twice in R, which was one of my many mistakes. Then I found the mean, “mean(AirPassengers),” and median, “median(AirPassengers),” of the entire dataset. After this, Dewar’s post on Programming Historian instructed me to find a summary of the dataset by typing, “summary(AirPassengers).” This summary provided a quick way to find the minimum number in the dataset, the 1st quartile, median, mean, 3rd quartile, and maximum number in the dataset. I am still unsure what 1st quartile and 3rd quartile mean, but the meaning of the rest of the summary that I found was understandable.

Next, I was taught that I can add numbers form the dataset together in R. I can also give labels to numbers in the dataset. For example, the number of passengers in January 1949 was 112,000, so in R I wrote, “Jan1949<-112” and then “Jan1949” and then every time I wanted to manipulate the number 112,000 in the data I did not have to type “112” into R and instead could type “Jan1949,” which then stood for “112.” To give a variable name such as this to every piece of data in the dataset would be time consuming, but I can see how it would make working with a larger dataset less confusing in the long run.

In the next step I made quite a few errors, but luckily R tells the user when they make an error so that the user can immediately change the mistake instead of having to hunt for every mistake later. The first mistake I made occurred when I wrote, “1s” so that R would show me the labels I had made for numbers in my data so far. This is a handy tool so that the user does not label a number twice or label a number in two different ways. This code, however, was supposed to be a lowercase L, not the number 1. So, R told me there was an error in my code and I had to retype “ls” using a lowercase L. After this, I learned how to remove a label I had given to a number by writing “rm(Jan1949).” This got rid of the label Jan1949, associated with the number 112,000 in the dataset.

Next, I learned how to open R’s help page by typing “help().” I did not explore the help page of R too much today because I wanted to focus on the task at hand, but I plan on going back to the help page and exploring it at a later date because I realize I will not always be following instructions on R and someday I will be making my own code. When I am making my own code, at first, I may be reliant on R’s help page.

After this, I did a little more labelling of numbers in the data set and then began working with lines of data from the dataset. I made mistakes in trying to do this because I would type the wrong symbol or accidently type uppercase letters that were supposed to be lowercase, but again, R informed me of my mistakes and I immediately fixed them on the next line of code. “Air49<-c(112,118,132,129,121,135,148,148,136,119,204,118)” is the code to be able to manipulate the number of air passengers in every month of 1949. I believe typing “y<-1:10” and then “y” told R that I wanted to manipulate the dataset and future datasets by number order (e.g. 112=1, 118=2, 132=3, etc.). But I am not too sure of this because Dewar’s instructions move on from this step before I could figure out exactly what I was using this specific code for.

Then I told R to show the complete dataset of air passengers in 1949 by typing “Air49<-AirPassengers[1:12]” and I did the same for the number of passengers in 1950 by typing “Air50<-AirPassengers[13:24].” The ratios stand for the order in which the number of passengers are given in the dataset. For example, the number of passengers from January 1949 to December 1949 are the first twelve numbers in the total dataset. Then the number of passengers from January 1950 to December 1950 are the next twelve numbers (so, numbers 13 to 24) in the total dataset. I then found the total sum and length of the number of passengers in both 1949 and 1950. The sum is all the data points added together in a given year. The length represents the total number of data points in a given year (there are twelve data points in each year in this chart). Finally, after this I repeated finding the sum for each year from 1949 to 1960.

Fortunately, I took screen shots of my work in R today so that all is not lost. I have not figured out how to save my work in R yet, and I accidently cleared all of the code I completed today before I could figure out how to save. I thought I had saved my code three times in three different ways, but two of these ways ended up saving as random symbols and not as my code and the third time I got my code history to save (that is, what I actually typed into R), but none of R’s computations to my code were saved with the history. Next time I work in R, I will research how to save my work.

Finally, separate from my work in R, I began to put my second week’s process post up on the website for this project today. I got the post’s images and text (but not the reflection) up on the website. I also worked on the basic design outline and introduction section for week two process post’s page. This work took me another hour to complete because Wix acts up due to the spotty internet connection I have at home (note: I usually work on this project at school, but I am currently home for Thanksgiving break) – the website kept moving different lines or boxes I had placed in certain spots and I had to keep going back and putting them back where I need them to be and then re-publishing the site. Now I know just how temperamental working with Wix is if one does not have a good internet connection. The next time I work with Wix at home I plan on driving to the nearest public library, so I will not face this same issue.

Nov. 24, 2017

This morning I spent just over an hour putting two more pages for process posts (week 4 and week 5) up on the website, so I can eventually put text on these pages. I also created clickable buttons on the bottom of my process post pages on the website that allow users to get back to the top of each page more easily since these pages have a lot of text. Then I edited the introduction to each of my process post pages and wrote a reflection for week 2’s process posts. The reflection for week 2 took me the longest to complete out of all four of these tasks. I have decided that for each reflection I will tie in at least one source from my bibliography since for my first reflection I relied so heavily on the sources from my bibliography about archives and archiving. For week 2, I decided the most appropriate sources to draw from for the reflection from my bibliography were ones on text analysis because week 2 to week 3 is the time in which I decided that Voyant would not be the best tool for me to use to analyze my data and that instead R would be the preferred way for me to eventually conduct my study.

Later today I continued learning R through the second lesson from Taryn Dewar’s “R Basics with Tabular Data.” This second lesson focused on working with larger datasets, which is particularly of use to me since my corpus of Romantic texts, which I will eventually analyze in R, will contain at least 50 texts. The preset data I worked with today was on “Motor Trend Car Road Tests from the 1974 Motor Trend magazine” (Dewar). My work was as follows:

Image 2. My second lesson in R.

The work above was shorter than the work I did on 11/22/17 in R; however, today’s work took me an hour to complete because the data on motorcars was harder to work with than the data on air passengers, and I faced more confusion in what I was doing with the data on motorcars.

The first step I took in my work was to load the data on motorcars into R by typing the code “data(mtcars)” and then “mtcars.” I have to type “data(mtcars)” to first create the data in R and then “mtcars” to load the created data. The blue chart on motorcar trends that is below “mtcars” is the loaded data.

Then I learned how to select a specific column from the data to work with. I did this by typing “mtcars[1,].” The basic structure of this code is “dataset[x,y]”. According to Dewar, dataset is replaced by the name of the data I am working with, x is the row, and y is the column. Since I wanted a row and not a column, in my code I selected the first row and omitted the column section. After this, I typed code for a column instead of a row by writing “mtcars[,2].” This code showed all of the values under the cyl section of the chart, which stands for the type of cylinder engines the motorcars have. To find a specific point in a column and row, Dewar instructed me to type “mtcars[1,2]” or a code with both a row point and a column point. Then, lastly, to find a summary of one column, I typed “summary(mtcars[,1]),” though a summary could also be found for a row by typing a number into the row space in the brackets of the code instead of the column space in the brackets.

All the work I did on the data for motorcars I completely understood and feel like I could repeat with other datasets. However, after this work, Dewar’s instructions turned to matrices on a different dataset. Dewar explained that the benefit of knowing how to create matrices is that if I am ever working with a smaller dataset, creating matrices are less time-consuming than imputing data in a CSV file and uploading that file into R. Dewar has a point, but matrices are definitely something I am going to have to work with for more than one time to completely understand the code I write. The matrices that Dewar had me create today were from the Old Bailey dataset on criminal cases from London’s Central Criminal Court in each decade from 1670 to 1800. The chart for the Old Bailey dataset that I created my matrices based on is below:

Image 3. Old Bailey dataset.

To analyze the number of theft and violent theft offences between 1670 and 1710 I created a matrix. Theft and ViolentTheft were my variables. My code was “Theft<-c(2,30,38,13)” and then “ViolentTheft<-c(7,20,36,3).” These first two pieces of code labelled the data as the variables. Finally, I used the cbind() or column bind function to make the codes a matrix by typing “Crime<-cbind(Theft,ViolentTheft).” Then when I typed in the code “Crime” the binded code, or matrix, was presented to me in column format.

Dewar’s instructions also taught me how to make a matrix using rbind() or row bind so that my matrix would present itself in row format after the code was made for the matrix. To make a matrix that would display in row format I typed “Crime2<-rbind(Theft,ViolentTheft)” and then “Crime2.” Then the matrix displayed in row format instead of column format. I personally believe this matrix was easier to read in column format because then the categories Theft and ViolentTheft were at the top of the chart instead of at the side of the chart, but in either case, I do not completely understand what I am looking at because one axis of my chart is only numbers and not words to describe the data that is presented in the matrix. Dewar also mentioned that the matrix “Crime2” could be created by the code “t(Crime),” which would have given me the inverse of the matrix “Crime,” but I did not try this because there were not clear instructions.

Would I have had to have written “t(Crime)” in R before I created the matrix for “Crime2” or could I still type in “t(Crime)” without messing up the data I had?

In hindsight I probably could have still written “t(Crime)” in R even after having created the matrix “Crime2,” but I had gotten so confused by that time and am so new to R that I did not want to mess anything up needlessly.

The last part of lesson two from Dewar taught me how to create my own matrix using the “matrix()” code, which is helpful if I have not created variables for the data points. The first code, “matrix(c(2,30,3,4,7,2,36,3),nrow=2” put my matrix in a two-row format. The function “nrow=2” was the code that told R how many rows my table should have, and the data points were the numbers in the parentheses. The second code, “matrix(c(2,30,3,4,7,20,36,3),ncol=2)” put my matrix in a two-column format. The function “ncol=2” was the code that told R how many columns my table should have and, again, the data points were the numbers in parentheses. I am unsure how exactly the numbers were chosen to go in the parentheses, but I can see how creating matrices will be helpful for me when I am working with smaller datasets and know how to quickly create matrices. Once I can read a matrix, the table that the data is put into in a matrix easy to read because the table is in such a simple format.

Unfortunately, at this present moment, matrices are completely lost on me. Dewar’s article moved on to teach me how to apply matrices; and while I wrote the code, I did not understand what the code for applying matrices meant. The first step I was instructed to take was to display the data for the variable Crime by typing “Crime.” Then I was told to type “apply(Crime,1,mean)” to apply the variable’s dataset. Looking back on this code and Dewar’s instructions, I now understand the code I wrote a little more. The code “1” presented the dataset in column-format whereas if I had written “2” in my code that dataset would have been presented in row-format. My code “apply(Crime,1,mean)” found the mean of each column for the variable “Crime.”

Interestingly, to find the mean of each row, Dewar instructed me to write the code “apply(Crime,1,mean).” However, the code to find the mean of each row should have been “apply(Crime,2,mean),” so I had to correct Dewar’s instructions. The fact that I was able to figure out how to correct Dewar’s code made me feel accomplished and that I am understanding what I am doing to a limited extent. Right now, even these small victories matter in my exploration of R.

Finally, at the end of Dewar’s article there are instructions on how to save work in R. Fortunately, I took screen shots of my work again today because I tried to save my work as Dewar instructs and this did not work, and I lost everything. Next time I work in R I will email a Digital Humanities librarian at FSU to see how to save my work.

Nov. 25, 2017

I spent an hour today completing some website design for my third week’s process post page and researching references to worms in natural history books, articles, and websites. During my research on natural history and worms I was primarily looking for images of worms from the Romantic period. I began this research in WorldCat, as Dr. Pascoe had suggested last week that I do. Then, I moved to EBSCO Host and FSU’s Strozier Library’s search page on the FSU library website. The only search results I found on all three of these databases were books by or about Charles Darwin’s natural history studies. FSU has a book by Charles Darwin, The Darwin Reader, in storage that I requested on the FSU library website and will retrieve from Strozier library on Monday afternoon. Since every other book on these databases was also on Darwin, I moved on in my research with the search terms “natural history” and “worms” in Google.

The first result I explored in Google was an Amazon page, which I did not realize was an Amazon page at first, selling Charles Darwin’s Darwin on Humus and the Earthworms: The Formation of Vegetable Mould through the Action of Worms with Observations on their Habits. After this semester, this would be a helpful book for me to explore, but because of the time constraint I am facing right now, The Darwin Reader I will retrieve from Strozier should meet my needs. The second source I found during my Google search, “Earthworms: Humble Underground Heroes” is a video and article on earthworms on The Natural History Museum’s website. This source is a more modern study of worms, but may provide great screen shot images for me to put on the website and is an introduction to the importance of earthworm’s function of breaking down soil. In other words, this page from The Natural History Museum ties closely to Janelle Swartz’s concern with the material nature of worms and Swartz’s writing on Romantics’ fascination with taxonomy and worms. Watch the video from The Natural History Museum below:

Video 1. "Earthworms: Humble Underground Heroes"

The last article I found today on “natural history” and “worms” was Georg Zappler’s “Darwin’s Worms” from Natural History magazine’s website. Even though the website ended in “.org,” so the site should be reliable, I briefly skimmed the article just to make sure because I have never personally heard of Natural History magazine before since this project is my introduction into vermiculture research. My conclusion is that the website is factual and reliable. I am very excited I found this website because the site has modern pictures of historical worm studies, which is the closest thing I have found to what I am looking for, which is Romantic period pictures of Romantic worm studies. So, I plan on putting these found pictures on my own website at least for the time being since they are more appropriate than googled images of worms. Next week I plan on working on the Scholarship page of my study’s website and I plan on putting one or more of the picture’s from Zappler’s article on this page of my website.

The final search I made tonight was to explore the Images tab on Google for my “natural history’ and “worms” search because when I had been searching for Google images earlier in the semester I had just searched “worms,” so my search terms now are more specific. This refined search presents me with pictures that are more relevant than what I had previously found when I had just searched “worms” in Google and not “natural history” and “worms.” I plan on using some of these images on my website eventually as well.

Nov. 26, 2017

Tonight, I worked for an hour to add images to the website. I found one image from Google images under “natural history” and “worms,” which was a quick search because I had already discovered yesterday that this search would provide me with relevant findings. Then I spent a considerable amount of my hour’s work re-designing and adding images to the Bibliography page of the website. I added three images to this page. I found one of the images from the same Google search used for the Introduction page’s new image. Then, I returned to Natural History magazine’s website for the other two images I added to the Bibliography page of my study’s website. Organizing these images on the Bibliography page took the most time out of all my work tonight because I added a quote on top of one of the images and I want the page to look professional and neat, not cluttered. But I feel confident that my work made both pages look more inviting for any visitor.

After this task, about ten minutes of my hour’s work tonight consisted of adding backgrounds and introduction sections to week 4 and week 5’s process post pages. I went ahead and accomplished this work so that when I go to post the process posts for these weeks on my website the general outline of each page is already designed.

On a final note, I also noticed a few grammatical errors on the Contact page, which had a sentence missing a word and the same sentence was in need of parallel structure, so I fixed these errors.

Reflection

Considering Worms.

A study of worm imagery in Romantic literature by Emily A. Scott

Process Posts: Week 4

Process Posts: Week 4

​