IsoScore: Measuring the Uniformity of Embedding Space Utilization
release_l3h2cgmrpvaifota7htc5zsno4
by
William Rudman, Nate Gillman, Taylor Rayne, Carsten Eickhoff
2022
Abstract
The recent success of distributed word representations has led to an
increased interest in analyzing the properties of their spatial distribution.
Several studies have suggested that contextualized word embedding models do not
isotropically project tokens into vector space. However, current methods
designed to measure isotropy, such as average random cosine similarity and the
partition score, have not been thoroughly analyzed and are not appropriate for
measuring isotropy. We propose IsoScore: a novel tool that quantifies the
degree to which a point cloud uniformly utilizes the ambient vector space.
Using rigorously designed tests, we demonstrate that IsoScore is the only tool
available in the literature that accurately measures how uniformly distributed
variance is across dimensions in vector space. Additionally, we use IsoScore to
challenge a number of recent conclusions in the NLP literature that have been
derived using brittle metrics of isotropy. We caution future studies from using
existing tools to measure isotropy in contextualized embedding space as
resulting conclusions will be misleading or altogether inaccurate.
In text/plain
format
Archived Files and Locations
application/pdf 1.3 MB
file_xhbcrhx5brf7zmfmbeiieau63y
|
arxiv.org (repository) web.archive.org (webarchive) |
2108.07344v2
access all versions, variants, and formats of this works (eg, pre-prints)