IsoScore: Measuring the Uniformity of Embedding Space Utilization release_l3h2cgmrpvaifota7htc5zsno4

by William Rudman, Nate Gillman, Taylor Rayne, Carsten Eickhoff

Released as a article .

2022  

Abstract

The recent success of distributed word representations has led to an increased interest in analyzing the properties of their spatial distribution. Several studies have suggested that contextualized word embedding models do not isotropically project tokens into vector space. However, current methods designed to measure isotropy, such as average random cosine similarity and the partition score, have not been thoroughly analyzed and are not appropriate for measuring isotropy. We propose IsoScore: a novel tool that quantifies the degree to which a point cloud uniformly utilizes the ambient vector space. Using rigorously designed tests, we demonstrate that IsoScore is the only tool available in the literature that accurately measures how uniformly distributed variance is across dimensions in vector space. Additionally, we use IsoScore to challenge a number of recent conclusions in the NLP literature that have been derived using brittle metrics of isotropy. We caution future studies from using existing tools to measure isotropy in contextualized embedding space as resulting conclusions will be misleading or altogether inaccurate.
In text/plain format

Archived Files and Locations

application/pdf  1.3 MB
file_xhbcrhx5brf7zmfmbeiieau63y
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2022-04-18
Version   v2
Language   en ?
arXiv  2108.07344v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 57b3e0e2-2eeb-4726-acd8-be46d35407ec
API URL: JSON