Text as Data

Text as Data



Workshop companion:


Digital humanists have been using and developing, for decades, a wide range of digital methods and software tools for text analysis that, in recent years, tend to be more user friendly and less demanding in terms of cost and computing resources. At the same time, mass digitization projects and new web-based publishing platforms and environments have opened up new possibilities for the analysis and exploration of text as data in different historical, cultural and linguistic contexts. These trends seem to remove many of the technological entry barriers for scholars interested in the analysis in literary and historical texts and text archives and resources but at the same they pose new problems and challenges in terms of the biases and assumptions, ethical issues, and socio-political implications of compiling, quantifying, analyzing visualizing texts. This series of workshops will introduce participants to some basic computational tools and methods for text analysis and exploration such as Voyant Tools, AntConc, Jstor’s Text Analyzer, Recogito, the Natural Language Toolkit, and will cover different stages in the lifecycle of text analysis projects, from compiling corpora and cleaning data sources, statistical analysis, visualization and publishing, with sustainability in mind. In order to test digital tools and methods, we will be using examples of different texts (by genre or subject) and datasets including some multilingual resources. No coding skills or previous experience with computational text analysis is required.