Topic 5: Large Language Models, Distant Reading, and Text Analysis

Between born-digital texts and the digitization of archival texts, researchers in the humanities and social sciences now have digital access to large bodies of written materials. How might this access enable both new questions and new answers to old questions? What tools and techniques can faculty and scholars use to see patterns in collections of digital or digitized texts? And how do we train computers to understand the meanings encoded in these texts?  

Critics like Franco Moretti refer to this kind of analysis, when we use technology to get a bird’s eye view of a corpus, as distant reading. If close reading gives careful attention to every word in a text, distant reading assumes that we can get new insight from thinking more broadly, by using computers to take in more texts than would otherwise be possible. Thus, we might have a computer give us schematic representations of thousands or even hundreds of thousands of texts. Computers are especially good at reading for things just like this. On our own, we would never be able to read all 19th century British novels. But computers can help us to at least get some sense of this great body of work. Reading at such a great scale can also offer us a chance to chip away at what Margaret Cohen has called the “great unread”, all that writing that has gone unnoticed because it never became part of the literary canon. 

In recent years, the emergence of large language models (commonly referred to in popular media as artificial intelligence applications) have extended the methods available to digital humanists. Distant reading methods and large language models are related in a number of ways:

  • Distant reading involves analyzing large amounts of text using computational methods to identify patterns, trends, and connections within the text.
  • Large language models, such as OpenAI ChatGPT, are designed to understand and generate human-like text based on the vast amount of data they have been trained on.
  • Both distant reading and large language models rely on the processing of extensive amounts of textual data to derive insights and generate new content.
  • Distant reading can benefit from large language models by utilizing their capabilities to process and analyze vast amounts of text data, leading to more comprehensive and nuanced insights.
  • Large language models can be seen as a practical application of the principles behind distant reading, as they use machine learning techniques to process and understand large volumes of text, thereby demonstrating a form of “close reading” on a massive scale.

Reading & Annotation

What is Distant Reading?

Distant Reading and Recent Intellectual History

How AI Works

What Is a Language Model, and Why Should You Care?

Viewing

Big Data + Old History

Paul Schacht on Digital Humanities and Distant Reading

Large Language Models from scratch

Exercise

This week you have a choice of exercises.

Option 1:

Voyant Tools is a web-based reading and analysis environment for digital texts. Use Voyant to analyze a demo corpora from Alan Liu’s Data Collections and Datasets or a collection/dataset of your choice. After you have selected your corpus and uploaded it into Voyant, use About- Voyant Tools Help to help you interpret the resulting visualizations. Share a link to your visualizations a post at your website and discuss how you draw meaning from specific visualizations. What do they tell you about your corpus? Could you have arrived at that interpretation via close-reading, why or why not?

Option 2:

Write a poem about UNBC with https://copilot.microsoft.com using prompt engineering techniques. You are able to access https://copilot.microsoft.com with your UNBC username and password. Follow the prompt engineering instructions and steps found at https://dlinq.middcreate.net/detox-2024/activity/not-magic/. Keep prompting and interacting until you’ve created something that you’re ready to share with others and then post it at your website. Be sure to share the link to this post with the class in the https://chat.opened.ca Town Hall. In your post, address how you moved through each of the steps outlined in the activity instructions and offer your thoughts on how this poem looks to you. What’s good? What’s missing? What changes could you make to the prompt/interaction to improve? What would make this output better? How connected is this output to your human experience at UNBC?