The micropost normalisation task enhances the accuracy of NLP tools when applied to short fragments of texts published in social media, e.g., the syntactic normalisation of tweets may improve the accuracy of existing part-of-speech taggers.
The collective knowledge freely available on the Web, and particularly Wikipedia, has been used in different NLP tasks, such as text categorization, topic identification, measuring the semantic similarity between texts, and word sense disambiguation among others.
The paper "Exploiting web-based collective knowledge for micropost normalisation" presents a technique for morphological normalisation of microposts by the use of two open data sources namely, Wikipedia and the SMS dictionary of the Spanish Association of Internet Users (AUI).
Related paper:
Normalisation process:
No comments:
Post a Comment