Big Data Science Over the Past Web
release_ki66cfomgjbb7nalwqzt7takja
by
Miguel Costa, Julien Masanès
2021
Abstract
Web archives preserve unique and historically valuable information. They hold
a record of past events and memories published by all kinds of people, such as
journalists, politicians and ordinary people who have shared their testimony
and opinion on multiple subjects. As a result, researchers such as historians
and sociologists have used web archives as a source of information to
understand the recent past since the early days of the World Wide Web. The
typical way to extract knowledge from a web archive is by using its search
functionalities to find and analyse historical content. This can be a slow and
superficial process when analysing complex topics, due to the huge amount of
data that web archives have been preserving over time. Big data science tools
can cope with this order of magnitude, enabling researchers to automatically
extract meaningful knowledge from the archived data. This knowledge helps not
only to explain the past but also to predict the future through the
computational modelling of events and behaviours. Currently, there is an
immense landscape of big data tools, machine learning frameworks and deep
learning algorithms that significantly increase the scalability and performance
of several computational tasks, especially over text, image and audio. Web
archives have been taking advantage of this panoply of technologies to provide
their users with more powerful tools to explore and exploit historical data.
This chapter presents several examples of these tools and gives an overview of
their application to support longitudinal studies over web archive collections.
In text/plain
format
Archived Files and Locations
application/pdf 751.5 kB
file_tiaa6nmslzcrthpw5o6h2xptaq
|
arxiv.org (repository) web.archive.org (webarchive) |
2108.01605v1
access all versions, variants, and formats of this works (eg, pre-prints)