CLS INFRA D5.1. Review of the Data Landscape
release_u3fufwqtebey7bptep7mkurz7u
by
Michał Mrugalski, Carolin Odebrecht, Vera Charvat, Ingo Börner, Matej Durco
2022
Abstract
This landscape review, for which Work Package 5 is responsible, focuses on intellectual access, i.e. providing guidance for finding and sharing literary data, while Work Package 6 approaches the task from a more technological side, collecting and analyzing literary corpora, available formats, tools, and metadata in order to create an exploratory catalogue / inventory of literary corpora and to provide a transformation matrix/toolbox for solving common issues. The review's point of departure is the abundance of existing data and their diversity or heterogeneity as regards corpus design and underlying concepts, for example the definitions of text (is it a source, an edition, a data set? see chapter 3), the purpose of a corpus (e.g. general, reference, or monitoring corpora, special purpose corpora; see chapter 4), central considerations or criteria regarding the construction of a corpus (sampling, balancing, representativeness, annotation model(s), data format(s); see likewise chapter 4). How can I go about obtaining data without transgressing ethical or legal boundaries (see chapter 5)? We ask: How can we assist literary scholars in searching for and finding existing data that are relevant to their own research questions? And additionally, what kind of research question is relevant concerning the present-day state of the data landscape and literariness and textuality?
In text/plain
format
Archived Files and Locations
application/pdf 216.9 kB
file_ajfyh3zdrva53oa3wwjrnq25ae
|
zenodo.org (repository) web.archive.org (webarchive) |
article-journal
Stage
published
Date 2022-07-18
access all versions, variants, and formats of this works (eg, pre-prints)
Datacite Metadata (via API)
Worldcat
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar