Markdown and GitHub: First Steps Toward learning Modern Digital Practices for Sustainable and Shareable Research.
An Introduction to Topic Modeling
Friday, April 13, 2018
Time: 12:00-3:00 pm
Location: UC Irvine Humanities Hall 251
Lunch will be provided
Slideshow Links Sandbox Workshop Video
Topic modeling is a computational method of exploring what is sometimes described as &ldoquo;hidden” thematic or semantic structures in collections of texts. It is used by Digital Humanities researchers to explore textual data at scale and to offer a different perspective on data from which to gain new insights about the materials they are studying. This workshop will introduce the principles of topic modeling, along with the tools used for topic model construction in the Humanities and Humanities-inflected disciplines. Discussion will include the interpretation of topic models and the tools being created by the Mellon-funded WhatEvery1Says project to make topic modeling workflows more accessible to scholars and students in the Humanities.
The workshop will last two hours, and there will be a third hour for those who wish to gain some hands-on experience in implementing topic models with their data. The workshop is geared for beginners, including students and faculty. No prior knowledge is assumed or required, but it is recommended that those who stay for the third hour bring along a laptop and, optionally, their own digital collections (some test collections will also be available).
Scott Kleinman, one of the WhatEvery1Says project's PIs, will lead the workshop. A professor of English at California State University, Northridge, Kleinman is also project lead for the Lexomics project, which produces the online text-analysis tool Lexos.
The workshop is geared for beginners, including students and faculty. It will also serve the purpose of helping to train some of WE1S's research assistants in methods used by the project.
This workshop is generously sponsored by the UC Irvine Humanities Commons.
Brain image courtesy of Alan Liu. Image source: Medical News Today.
Workshop Links
Tools
- Lexos. General workflow management (pre-processing, analysis, visualisation). The multicloud tool can be used to visualise topic models.
- MALLET
- MALLET GitHub Repository
- GUI Topic Modeling Interface for MALLET
- Topic Modeling Workflow (tmw)
Articles
- Blei, David M. (2012). “Probabilistic topic models”. In: Communications of the ACM, 55(4): 77–84. http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf.
- Matthew Jockers, “The LDA Buffet is Now Open; or, Latent Dirichlet Allocation for English Majors”
- Ted Underwood, “Topic modeling made just simple enough”
- Scott Weingart, “Topic Modeling for Humanists: A Guided Tour” (provides a gentle pathway into the statistical intricacies)
- WhatEvery1Says Report on Topic Modeling Interfaces (2016)
- Jockers, Matt (2013). Macroanalysis - Digital Methods and Literary History. Champaign, IL: University of Illinois Press.
- Rhody, Lisa (2012). “Topic Modeling and Figurative Language”. In: Journal of Digital Humanities, 2(1). http://journalofdigitalhumanities.org/2-1/topic-modeling-and-figurative-language-by-lisa-m-rhody/
- Schöch, Christof (2017, to appear). “Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama”. In: Digital Humanities Quarterly. https://zenodo.org/record/166356.
- Underwood, Ted and Andrew Goldstone (2012). “What can topic models of PMLA teach us about the history of literary scholarship?” In: The Stone and the Shell. http://tedunderwood.com/2012/12/14/what-can-topic-models-of-pmla-teach-us-about-the-history-of-literary-scholarship/.
- Christof Schöch, Topic Modeling Workshop: Beyond the Black Box https://christofs.github.io/topic-modeling-edinburgh/#/.
Tutorials and Guides
- Blevins, Cameron (2010). “Topic Modeling Martha Ballard’s Diary”
- Robert Nelson, Mining the Dispatch
- Matthew Jockers, Macroanalysis
- Jonathan Goodwin, HathiTrust Fiction (1920-1922)
Tutorials and Guides
- The Programming Historian‘s “Getting Started with Topic Modeling and MALLET” tutorial
- DARIAH-DE‘s tutorials on Topic Modelling with MALLET and Topic Modelling in Python
- Beginners Guide to Topic Modeling in Python (uses the Python gensim)
- A Gentle Introduction to Topic Modeling in R
- David Mimno‘s RMallet (Mallet wrapper for R)
- Mattew Jockers, Text Analysis with R for Students of Literature
Videos
- Shawn Graham, “Topic Models”, YouTube.com, 2017 https://www.youtube.com/watch?v=gN2x_KjJI1o
- Jordan Boyd-Graber, “Topic Models”, YouTube.com, 2015. https://www.youtube.com/watch?v=yK7nN3FcgUs
- David Blei, “Topic Models“, Videolectures.net, 2012. http://videolectures.net/mlss09uk_blei_tm/