Text as Data
Participation Prerequisites
Basic knowledge of statistical concepts. Basic familiarity with Pyhton is advantageous but not required.
Course Content
The course is aimed at doctoral students and teaches the current state of the art in drawing inferences from data—specifically, text data. A fundamental problem in empirical research is measurement. Complex and difficult-to-define concepts are often components of important theories we want to test. Think of investor sentiment in behavioral trading models, or economic uncertainty and capital constraints in theories of firm investment cycles. In modern research, many such concepts are measured using text data. This course aims to equip you with the tools to leverage text data for your own research.
Intended Learning Outcomes and Competencies
- Develop a Comprehensive Understanding of Inference and Measurement in Empirical Research. Students will learn to critically evaluate the theoretical underpinnings of causal inference and prediction, utilizing Directed Acyclic Graphs (DAGs) to visualize and reason through complex relationships among variables, confounders, moderators, and mediators. Students will demonstrate proficiency in diagnosing measurement errors and their impacts on empirical identification and inference.
- Master Advanced Textual Analysis Techniques for Empirical Measurement. Students will acquire the skills necessary to quantify textual data into numerical representations and effectively map these into meaningful empirical constructs. They will proficiently apply and critically evaluate methodologies such as dictionary-based approaches, document similarity measures, supervised and unsupervised machine learning techniques, and pre-trained language models like BERT to empirically measure complex concepts such as sentiment, economic uncertainty, and competition.
- Critically Assess and Implement LLM Methods in Textual Analysis. Students will gain expertise in leveraging Generative AI and other LLMs as an advanced methodological tool for textual analysis. They will analyze and articulate the methodological strengths, limitations, and potential biases inherent in LLM-based measurement techniques, demonstrating the capability to apply LLM responsibly and effectively to quantify complex concepts in novel research contexts.
Instruction Type
On-campus
Form of Examination
The grade will be based on a research proposal to be handed in after the in-class sessions are done.
Literature
Gentzkow, Matthew, Bryan T. Kelly, and Matt Taddy. (2019) "Text as Data." Journal of Economic Literature 57(3).
Dikolli, Shane S., Thomas Keusch, William J. Mayew, and Thomas D. Steffen (2020) "CEO behavioral integrity, auditor responses, and firm outcomes." The Accounting Review 95(2): 61-88.
Next events
| 1/4 | Lecture | Mo, 27.04.2026 | 09:00 Uhr | 18:00 Uhr | C-101 Hörsaal / Lecture Hall |
| 2/4 | Lecture | Tu, 28.04.2026 | 09:00 Uhr | 18:00 Uhr | C-101 Hörsaal / Lecture Hall |
| 3/4 | Lecture | We, 29.04.2026 | 09:00 Uhr | 18:00 Uhr | C-101 Hörsaal / Lecture Hall |
| 4/4 | Lecture | Th, 30.04.2026 | 09:00 Uhr | 18:00 Uhr | C-101 Hörsaal / Lecture Hall |
Lecturers
Indicative Student Workload
| Self-Study | 64 h |
| Contact Time | 24 h |
| Examination | 2 h |