Topic Identification Of Noisy Texts: Statistical Approaches release_oicdjqsqenhojo6luhaz2w3i2q

by K. Abainia

Published by Zenodo.

2015  

Abstract

This paper deals with the problem of automatic theme identification of noisy Arabic texts. Actually, there exist several works in this field based on statistical and machine learning approaches for different text categories. Unfortunately, most of the proposed approaches are suitable in clean and long texts. In this investigation, we carried out a comparative study between two different statistical approaches based on tf-idf. Hence, different configurations were used in both approaches to provide a large comparison. Furthermore, an in-house corpus called ANTSIX was created to evaluate the proposed approaches, which contains discussion forum texts related to 6 different topics. Experimental results show that the two statistical approaches are suitable for topic identification of noisy Arabic texts, but each technique has advantages and drawbacks.
In text/plain format

Archived Files and Locations

application/pdf  177.9 kB
file_bpjcqnajpfgx3g5ajfxtto5cm4
scholarpage.org (web)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2015-06-01
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 6fa015b2-7e27-4704-be69-070f00d3b229
API URL: JSON