Computational stylometry for source detection of ancient Greek texts

Séminaire Données et Aléatoire Théorie & Applications

11/04/2024 - 14:00 Sophie Robert Salle 106
Recent advances in computational science and statistical learning have fundamentally changed many established fields, including those related to social sciences and the humanities. In this specific presentation, our focus will be on applying new statistical methods to gain a better understanding of ancient manuscript redactional processes. Our case study will revolve around the synoptic Gospels, three first-centuries biographies of Jesus written in koine Greek and among the most influential books within Western history. The explanation of their redactional process has eluded scholars for centuries, without to this day a universally accepted solution. By harnessing advancements in statistics, data analysis, and computational linguistics, we suggest to approach this as a statistical problem, to quantitatively assess the likelihood of different theories based on stylometric features.
In this presentation, we will show how we leveraged recent developments in statistical learning to evaluate the likelihood of two different scenarii proposed within New Testament scholarship. We rely on classifier two-sample tests (C2ST), a recently suggested method that examines the success rate of binary classifiers to determine whether two samples are drawn from the same distribution, to identify discrepancies within the analyzed text. The findings indicate significant stylistic differences, suggesting that the Gospel of Luke is likely composed of two different sources, thus providing substantial evidence in support of current research trends in the study of the synoptic Gospels, and specifically the two-source hypothesis.