To analyse the great number of handwritten newsletters that circulated in early modernity and that have been preserved up today, the Euronews Project have combined different Digital Humanities techniques.
The first step in studying the handwritten newsletters of the Medici Archive is digitizing and archiving the images taken in the Florence State Archives. The Euronews Project’s team has currently self-digitized more than 150 volumes of handwritten newsletters and uploaded them on the Medici Interactive Archive (MIA), a community sourcing research portal created and maintained by the Medici Archive Project, with the aim to reconstruct and preserve the Medici Archive. After uploading, the images are numbered, and the documents, objects of our research, are created. Each MIA document corresponds to an actual archival document and is enriched by metadata, such as a unique identifier, the shelfmark, the type of document, the date, the people, the places, and the topics mentioned. Most of the newsletters entered in MIA have also been transcribed, so they are searchable within the portal through metadata and text. MIA enables the Euronews Project to preserve the images in a maintained environment, contribute to the MIA research community and make its sources available to other scholars.
The Euronews Project developed an XML scheme tailored to the research interests of the team members and the exigencies of the type of document we use. The various levels in the XML represent the several layers as found in the archive and the manuscripts newsletter. For these levels, we record different kinds of information. Making transcriptions is only a part of this. Because besides this, also a lot of metadata is interpreted from the documents and encoded. In this way, we collect information about the times of the events, the dispatch of letters and more.
Methods of data science are used to analyze the news corpus. Specifically, we apply time series analysis and techniques of simulation. Our data corpus currently consists of approximately 11000 news items. This is a large amount of data to get insight into how news was produced and spread in early modern Europe. With the help of time series analysis, we find publication patterns and answer questions such as, for instance, in which part of the year most news items were produced. At the same time, the 11000 news items we have do not cover the entire early modern period; we still need to cope with the unevenness of surviving information. To cope with this problem we apply the technique of simulation, which aims to show how news culture worked in general.
Handwritten Text Recognition (HTR) enables computers to transform images of handwritten documents into text. In recent years, it has become a well-known and widely used technology among digital humanists, historians, and archivists because of its potential to transform the scholarship and answer new research questions. Automatically transcribing digitalised archival sources makes the documents searchable, while the resulting transcriptions can be analysed applying text analysis methods. Within the Euronews Project, we are experimenting with HTR on a corpus of ten volumes of letters and newsletters written in Italian by three different hands. The variety of hands and languages is one of the main challenges in training a satisfactory HTR model. After several months of training, we have achieved a 94% accurate model, which means that 94 out of 100 automatically transcribed characters are correct. We will further our research in this field and train a model capable of transcribing more hands with increasing accuracy in order to apply HTR to all newsletters that have been digitalised but not yet transcribed due to time constraints.