A Machine Learning Pipeline to Examine Political Bias with Congressional Speeches
release_e6vwafinfrel7ellyygla7qfr4
by
Prasad hajare, Sadia Kamal, Siddharth Krishnan, Arunkumar Bagavathi
2021
Abstract
Computational methods to model political bias in social media involve several
challenges due to heterogeneity, high-dimensional, multiple modalities, and the
scale of the data. Political bias in social media has been studied in multiple
viewpoints like media bias, political ideology, echo chambers, and
controversies using machine learning pipelines. Most of the current methods
rely heavily on the manually-labeled ground-truth data for the underlying
political bias prediction tasks. Limitations of such methods include
human-intensive labeling, labels related to only a specific problem, and the
inability to determine the near future bias state of a social media
conversation. In this work, we address such problems and give machine learning
approaches to study political bias in two ideologically diverse social media
forums: Gab and Twitter without the availability of human-annotated data. Our
proposed methods exploit the use of transcripts collected from political
speeches in US congress to label the data and achieve the highest accuracy of
70.5% and 65.1% in Twitter and Gab data respectively to predict political bias.
We also present a machine learning approach that combines features from
cascades and text to forecast cascade's political bias with an accuracy of
about 85%.
In text/plain
format
Archived Files and Locations
application/pdf 446.6 kB
file_pyok7ojz75cv5p5bhmx5cgsvvi
|
arxiv.org (repository) web.archive.org (webarchive) |
2109.09014v1
access all versions, variants, and formats of this works (eg, pre-prints)