Text Mining for Digital Humanities

Course code:
Period 2
Language of tuition:
Faculteit der Geesteswetenschappen
dr. A.S. Fokkens
dr. A.S. Fokkens
drs. E. Maks
dr. A.S. Fokkens
Teaching method(s):
Lecture, Seminar

Course objective

In this course, students are trained in systematic text analysis. In
particular, we explore the process of identifying and annotating
information in historic and contemporaneous texts such as novels,
lyrics, letters, newspaper articles, movie scripts, blogs and
other other social media texts using manual and automatic methods. They
will learn the implications for the theoretical models and concepts they
are familiar with in their own discipline. Students will work on a
research project of their choice and annotate them in a
interdisciplinary context using different tools and
methods. They will apply expert and crowd annotations, develop
code-books and compare the results. Finally, they will use a
machine-learning program for analyzing text and reflect on the
performance of the automatic annotation. We will focus on high-level
semantic annotations of, for example, (historic) events, entities and
emotions that are of interest to a broader range of humanities and
social and computer science students. Students present their findings
in a research paper.

Course content

This module addresses the process of systematic text analysis through
human and automatic annotation. Annotations make information that is
implicit in data explicit allowing researchers to search their data
systematically. This kind of research forces Humanities scholars and
social scientists to represent their Interpretation of texts in a data
structure. Computer science students will learn about how text mining
technologies can be applied in Humanities and Social Sciences.
Annotation requires the use of some type of interpretation model and it
results in an analysis that can be compared across annotators. As such,
annotation can be seen as an important step towards the formalization of
humanities and social science as a discipline. The degree to which
annotators agree or disagree (the so-called Inter Annotator Agreement)
tells us something about the reproducibility of the interpretation
process, the matureness of theoretical notions and the criteria used to
apply them to real data. Different backgrounds of annotators will lead
to different types of annotations. Linguists, (cultural-)historians,
social-scientists, and literature-scientists will consider sources and
data differently and consequently come to different
annotations of the same source/data. The same holds for experts and
non-experts. The former are traditionally involved in assigning metadata
to sources, the latter do the same in crowd-sourcing initiatives.
Finally, annotated data can be used to train machines to do the same.
How does this work? Can a machine do better than humans? How do you
evaluate this?

Form of tuition

Lecture, Seminar (2 hrs a week each)

Type of assessment


Course reading

To be announced

Entry requirements


Recommended background knowledge

Course: From Object to Data

Target audience

3rd year bachelor students, in particular Humanities, Social Science and
Computer Science


This module is taught at the VU. Module registration at the VU is

© Copyright VU University Amsterdam
asnDCcreatorasvVUAmsterdam asnDCdateasv2017 asnstudyguideasvmodule asnDCidentifierasv51406924 asnDCtitleasvTextMiningforDigitalHumanities asnperiodasv120 asnperiodasv asncreditsasv6p0 asnvoertaalasvE asnfacultyasv50000030 asnDCcoverageasvdrsEMaks asnDCcoverageasvdrASFokkens