with short texts, misspellings, and set-phrases, among other characteristics that challenge content analysis.
The paper "Comparing user generated content published in different social media sources" shows the differences of the language used in heterogeneous social media sources, by analysing the distribution of the part-of-speech categories extracted from the analysis of the morphology of a sample of texts published in such sources. In addition, we evaluate the performance of three natural language processing techniques (i.e., language identification, sentiment analysis, and topic identification) showing the differences on accuracy when applying such techniques to different types of user generated content.
Related paper:
Related slides:
No comments:
Post a Comment