Table2Vec: Automated Universal Representation Learning to Encode All-round Data DNA for Benchmarkable and Explainable Enterprise Data Science
release_jolg4eiidvekbhbo2s4brkac5e
by
Longbing Cao, Chengzhang Zhu
2021
Abstract
Enterprise data typically involves multiple heterogeneous data sources and
external data that respectively record business activities, transactions,
customer demographics, status, behaviors, interactions and communications with
the enterprise, and the consumption and feedback of its products, services,
production, marketing, operations, and management, etc. A critical challenge in
enterprise data science is to enable an effective whole-of-enterprise data
understanding and data-driven discovery and decision-making on all-round
enterprise DNA. We introduce a neural encoder Table2Vec for automated universal
representation learning of entities such as customers from all-round enterprise
DNA with automated data characteristics analysis and data quality augmentation.
The learned universal representations serve as representative and benchmarkable
enterprise data genomes and can be used for enterprise-wide and domain-specific
learning tasks. Table2Vec integrates automated universal representation
learning on low-quality enterprise data and downstream learning tasks. We
illustrate Table2Vec in characterizing all-round customer data DNA in an
enterprise on complex heterogeneous multi-relational big tables to build
universal customer vector representations. The learned universal representation
of each customer is all-round, representative and benchmarkable to support both
enterprise-wide and domain-specific learning goals and tasks in enterprise data
science. Table2Vec significantly outperforms the existing shallow, boosting and
deep learning methods typically used for enterprise analytics. We further
discuss the research opportunities, directions and applications of automated
universal enterprise representation and learning and the learned enterprise
data DNA for automated, all-purpose, whole-of-enterprise and ethical machine
learning and data science.
In text/plain
format
Archived Files and Locations
application/pdf 1.8 MB
file_sf6jtaswgvb5hbeun2ozfik2jm
|
arxiv.org (repository) web.archive.org (webarchive) |
2112.01830v1
access all versions, variants, and formats of this works (eg, pre-prints)