On the (non)utility of Juilland'sDto measure lexical dispersion in large corpora release_wwqzhpxyzjblzlgjqcb4f423y4

by Douglas Biber, Randi Reppen, Erin Schnur, Romy Ghanem

Published in International Journal of Corpus Linguistics by John Benjamins Publishing Company.

2016   Volume 21, p439-464

Abstract

This paper explores the effectiveness of Juilland's<jats:italic>D</jats:italic>as a measure of vocabulary dispersion in large corpora. Through a series of experiments using the BNC, we explored the influence of three variables: the number of corpus-parts used for the computation of<jats:italic>D</jats:italic>, the frequency of the target word, and the distributions of those words. The experiments demonstrate that the effective range for<jats:italic>D</jats:italic>is greatly reduced when computations are based on a large number of corpus-parts: even words with highly skewed distributions have<jats:italic>D</jats:italic>values indicating a relatively uniform distribution. We also briefly explore an alternative measure, Gries'<jats:italic>DP</jats:italic>(Gries 2008), showing that it is a more reliable and effective measure of dispersion in a large corpus divided into many parts. In conclusion, we discuss the implications of these findings for quantitative methods applied to the creation of vocabulary lists as well as research questions in other areas of corpus linguistics.
In application/xml+jats format

Archived Files and Locations

application/pdf  3.8 MB
file_zkiuqkqxhfdkbm4jogz446qkha
elib.uni-stuttgart.de (web)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Year   2016
Language   en ?
Journal Metadata
Not in DOAJ
In Keepers Registry
ISSN-L:  1384-6655
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 59e027b0-d7fe-4ba4-ab7a-854f97581f82
API URL: JSON