Statistical embedding: Beyond principal components release_4vwj5epnkfapxhgeybhkzxv47a

by Dag Tjøstheim and Martin Jullum and Anders Løland

Released as a article .

2021  

Abstract

There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and kernel based methods. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. The final part of the survey deals with embedding in ℝ^2, which is visualization. Three methods are presented: t-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triple of noisy Ranunculoid curves, and one consisting of networks of increasing complexity and with two types of nodes.
In text/plain format

Archived Files and Locations

application/pdf  1.6 MB
file_todfb5x3frc4rp3kyptwabc3qe
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-06-03
Version   v1
Language   en ?
arXiv  2106.01858v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 4d1d0eba-c199-410d-817a-3b45817f061a
API URL: JSON