CLS INFRA D5.1. Review of the Data Landscape release_u3fufwqtebey7bptep7mkurz7u

by Michał Mrugalski, Carolin Odebrecht, Vera Charvat, Ingo Börner, Matej Durco

Published by Zenodo.

2022  

Abstract

This landscape review, for which Work Package 5 is responsible, focuses on intellectual access, i.e. providing guidance for finding and sharing literary data, while Work Package 6 approaches the task from a more technological side, collecting and analyzing literary corpora, available formats, tools, and metadata in order to create an exploratory catalogue / inventory of literary corpora and to provide a transformation matrix/toolbox for solving common issues. The review's point of departure is the abundance of existing data and their diversity or heterogeneity as regards corpus design and underlying concepts, for example the definitions of text (is it a source, an edition, a data set? see chapter 3), the purpose of a corpus (e.g. general, reference, or monitoring corpora, special purpose corpora; see chapter 4), central considerations or criteria regarding the construction of a corpus (sampling, balancing, representativeness, annotation model(s), data format(s); see likewise chapter 4). How can I go about obtaining data without transgressing ethical or legal boundaries (see chapter 5)? We ask: How can we assist literary scholars in searching for and finding existing data that are relevant to their own research questions? And additionally, what kind of research question is relevant concerning the present-day state of the data landscape and literariness and textuality?
In text/plain format

Archived Files and Locations

application/pdf  216.9 kB
file_ajfyh3zdrva53oa3wwjrnq25ae
zenodo.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2022-07-18
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 457c0f5f-1909-4024-bea6-18904efbe912
API URL: JSON