Óscar Muñoz-García Blog: Exploiting web-based collective knowledge for micropost normalisation

Microposts published on social media are characterised by informality, brevity, frequent grammatical errors and misspellings, and by the use of abbreviations, acronyms, and emoticons. These features add additional difficulties in text mining processes that frequently make use tools designed for dealing with texts which conform to the canons of standard grammar and spelling.

The micropost normalisation task enhances the accuracy of NLP tools when applied to short fragments of texts published in social media, e.g., the syntactic normalisation of tweets may improve the accuracy of existing part-of-speech taggers.

The collective knowledge freely available on the Web, and particularly Wikipedia, has been used in different NLP tasks, such as text categorization, topic identification, measuring the semantic similarity between texts, and word sense disambiguation among others.

The paper "Exploiting web-based collective knowledge for micropost normalisation" presents a technique for morphological normalisation of microposts by the use of two open data sources namely, Wikipedia and the SMS dictionary of the Spanish Association of Internet Users (AUI).

Related paper:

Exploiting web-based collective knowledge for micropost normalisation

Normalisation process:

Exploiting web-based collective knowledge for micropost normalisation from omunozgarcia

Óscar Muñoz-García Blog

Links

Tuesday, 17 September 2013

Exploiting web-based collective knowledge for micropost normalisation

No comments: