What Would Cicero Write?

Examining Critical Textual Decisions with a Language Model

  • Todd G. Cook, TGC Classical Language Toolkit (CLTK.org)

Abstract

Recent developments in Transformer language models now allow users to predict the probability of different sentences and to predict missing words more accurately than before. This new information and perspective can be used to form judgments on novel textual emendations and to further quantify existing historical editorial judgments. We examine the importance of analyzing an author’s corpus, and the impact of the Good-Turing theory of frequency estimation when predicting missing words. We will also outline some of the limits of what Transformer language models can do, and how to practically evaluate them.

Downloads

Download data is not yet available.

Author Biography

Todd G. Cook, TGC, Classical Language Toolkit (CLTK.org)

Todd G. Cook is a core contributor to the Classical Language Toolkit (CLTK.org), and he has studied Classics at California State Universities of Chico and Long Beach. He works as a data scientist and software engineer with years of experience writing educational software.

Published
2021-12-31
How to Cite
Cook, T. (2021). What Would Cicero Write?. Ciceroniana on Line, 5(2), 285-296. https://doi.org/10.13135/2532-5353/6523