New article: Social Network Analysis of Middle High German Romances

In the special issue “Digital Mediävistik” of the journal “Das Mittelalter. Perspektiven mediävistischer Forschung”, an article about Social Network Analysis of Middle High German Arthurian romances will soon be published. The article addresses the question of the relationship between fairy tales and Arthurian romances, but aims to systematically and methodically redefine it. For this purpose, we firstly identify properties of the European folktale, which we, secondly, operationalize for the computational analysis and apply to a text corpus consisting of classic Arthurian romances (Hartmann’s von Aue: ‚Erec‘, and ‚Iwein’, Wolfram’s von Eschenbach ‚Parzival‘).
The investigation is carried out using data-driven methods, primarily the Social Network Analysis, and focusses on various aspects of the characters. In this way, we gain a differentiated understanding of the relation between Arthurian romances and the ‘simple form’ of fairy tales on the one hand, and the differences within the selected texts on the other hand. We show that the complex results of the statistical analyses refuse clear interpretations and thus provide new insights into the well-known objects.

Example: Network of ‘Parzival’ (Parzival plot)

The special issue “Digitale Mediävistik” including this article by Manuel Braun and Nora Ketschik will be published presumably in June 2019.

(Canceled:) Learning Machine Learning: A CRETA-Hackatorial on Reflected Text Analysis

… canceled due to conference-host-based organisational difficulties.

This tutorial will be given by Nils Reiter, Gerhard Kremer, and Sarah Schulz;
it takes place at EADH in Galway, Ireland on December 7 (9:00-13:00).

Registration: Please register until Thursday, November 22nd high noon (11:59 AM Berlin time) using our organisator e-mail address <>, letting us know which operating system you will work on (MacOS / Windows / Linux).

Short Overview:

The aim of this tutorial is to give the participants concrete and practical insights into a standard case of automatic text analysis. Using the example of automatic recognition of entity references, we will discuss general assumptions, procedures, and methodological standards in machine learning. The participants can fathom and test the scope of such procedures when editing executable programming code.

There is no reason to blindly trust the results of machine learning tools in general and NLP tools in particular. The concrete insights into the “engine room” of machine learning methods allow participants to more realistically assess the potential and limitations of supervised text analysis tools. Perspectively, we hope to avoid the recurrent frustrations of using automatic text analysis techniques and their sometimes less than satisfactory results, and thus to promote the use and interpretation of the results of machine learning models. For their adequate usage in hermeneutic interpretation steps, the insight into influential technical details is indispensable. In particular, the type and origin of the training data is of importance for the quality of the machine-annotated data, as we will make clear in the tutorial.

In addition to a Python program for automatic annotation of entity references, with which we will work during the tutorial, we provide a heterogeneous, manually annotated corpus as well as the routines for evaluating and comparing annotations.

New article: The analysis of “soft” concepts with “hard” corpus-analytical methods

The Social Science working unit recently published a new methodological article on the analysis of complex theoretical concepts via the use of corpus analytic and computational linguistic methods.
We identify three fundamental challenges impeding the methodological quality and long-term reputation of these promising new technologies. First, generating and pre-processing very large text corpora is still a laborious and costly enterprise. Secondly, Social Scientists want to learn from text about societal context and reconstruct meaning along the lines of complex theoretical concepts. The semantically valid operationalization of complex social-scientific concepts, however, remains a problem. Thirdly, scholars need flexible data output and visualization options to connect the data generated by corpus-linguistic methods with the discipline’s existing research. Many tools designed for linguistic research questions do not provide options suitable for social scientific research. We will conclude that it is possible to solve these problems; however, hermeneutically sensitive uses of computer-linguistic methods will take much more time, work and creativity than often assumed. Moreover, there can be no one-size-fits-all solutions to these problems. Social scientists need and want to decide upon methodological questions in the light of their oftentimes highly specific research questions. The process of reflectively appropriating big-data methods in the Social Sciences has only just begun.

This research article by Cathleen Kantner and Maximilian Overbeck is part of the edited volume ‘Computational Social Science’ edited by Andreas Blätte, Joachim Behnke, Kai-Uwe Schnapp and Claudius Wagemann:

Invited Talk: Emotion Analysis

On October 11, CRETA member Roman Klinger gave a keynote talk titled Emotion Analysis: between Academia, Industry, Linguistics, Humanities, and Computer Science at the AI2Future unconference in Zagreb, Croatia.


Emotion analysis is a obvious extension to the popular task of sentiment analysis, however, methods and applications differ. In a first part of this talk, I provide a brief overview on the psychological background on emotions and their purpose and how this leads to challenges when they are estimated from text. I highlight open research questions and possible research directions. The second part is concerned with applications in a variety of areas: What can emotion analysis contribute to the humanities, for instance literary studies? Finally, I will briefly report on use-cases for emotion analysis in a text analysis platform for direct use by customers.

The slides can be downloaded here.

Class: Reflected Text Analysis in the Digital Humanities

During The European Summer University in Digital Humanities, Sarah Schulz and Nils Reiter will give an introduction into reflected text analysis:

The workshop introduces the concept of reflected text analytics, and covers various relevant topics to that end. The core idea is to “lift the veil”: The participants learn both theoretical concepts and their practical implementation on real code, such that they are able to apply the learned concepts on their own research questions. Topics of the workshop will be: Annotation and concept development through annotation, programming with python for text processing and machine learning, machine learning in theory and application. Participants work on their own programs and data, write code and train models by themselves (under guidance). Previous knowledge is not required, but a laptop and internet connection.

Workshop on network analysis

CRETA invites to a public workshop with external guest speakers (talks will be in English).

This is the agenda for Wednesday, March 14th, 2018:

• 09:00 Nils Reiter, CRETA
Welcome & Introduction

• 09:15 Nora Ketschik, Evgeny Kim & Florian Barth, CRETA
Extracting Character Networks from Arthurian Romances and Werther

• 10:30 Coffee break

• 11:00 Yannick Rochat, Université de Lausanne
Character Network Analysis: A Review

• 12:30 Lunch break

• 14:00 Andreas Kuczera, Akademie der Wissenschaften und der Literatur, Mainz
Regesta Imperii as a Network of Entities

• 15:30 Coffee break

• 16:00 Frederik Elwert, Ruhr-Universität Bochum
Adding Meaning to Literary Networks. A Networked Topic Model of
the Mahābhārata

• 17:30 Closing discussion

• 18:00 End (ca.)

Program for download as PDF: Agenda-WS5.PDF

CRETA at DH 2017

CRETA will be present at the DH conference 2017 with a number of activities:

We’re looking forward to discussing our research in Canada!