Tuesday, 17 September 2013

Characterising social media users by gender and place of residence


Social media has revolutionized the way in which organizations and consumers interact. Users have adopted massively these channels to engage in conversations about content, products, and brands, while organizations are striving to adapt proactively to the threats and opportunities that this new dynamic environment poses. Social media is a knowledge mine about users, communities, preferences and opinions, which has the potential to impact positively marketing and product development activities.

Social media monitoring tools are being used successfully in a range of domains (including market research, online publishing, etc.). Most of these tools generate its reports from metrics based on volume of posts and on opinion polarity about the subject that is being studied. Although such metrics are good indicators of subject popularity and reputation, these metrics are often inadequate for capturing complex multi-modal dimensions of the subjects to be measured that are relevant to business, and must be complemented with ad-hoc studies such as opinion polls.

The validity of these social metrics depends to a large extent on the population over which they are applied. However, social media users cannot be considered a representative sample until the vast majority of people regularly use social media. Therefore, until then, it is necessary to identify the different strata of users in terms of socio-demographic attributes (e.g., gender, age or geographical precedence), in order to weight their opinions according to the proportion of each stratum in the population. Author and content metadata is not enough for capturing such attributes. As an example, not all the social media channels qualify their users neither with gender nor with geographical location. Some channels, such as Twitter, allow their authors to specify their geographical location via a free text field. However, this text field is often left empty, or filled with ambiguous information (e.g., Paris - France vs. Paris - Texas), or with other data that is useless for obtaining real geographical information (e.g., “Neverland”). For these cases, the friendship networks and the content shared and produced by social media users can be used for estimating their socio-demographic attributes, applying techniques such as geographical entity recognition.

The paper "Characterising social media users by gender and place of residence" explores different techniques for obtaining the place of residence and gender attributes. Such techniques exploit social users’ metadata, the content published and shared by the users to be categorised, and their friendship networks. 

Related paper:
Related slides:

Exploiting web-based collective knowledge for micropost normalisation

Microposts published on social media are characterised by informality, brevity, frequent grammatical errors and misspellings, and by the use of abbreviations, acronyms, and emoticons. These features add additional difficulties in text mining processes that frequently make use tools designed for dealing with texts which conform to the canons of standard grammar and spelling.

The micropost normalisation task enhances the accuracy of NLP tools when applied to short fragments of texts published in social media, e.g., the syntactic normalisation of tweets may improve the accuracy of existing part-of-speech taggers.

The collective knowledge freely available on the Web, and particularly Wikipedia, has been used in different NLP tasks, such as text categorization, topic identification, measuring the semantic similarity between texts, and word sense disambiguation among others.

The paper "Exploiting web-based collective knowledge for micropost normalisation" presents a technique for morphological normalisation of microposts by the use of two open data sources namely, Wikipedia and the SMS dictionary of the Spanish Association of Internet Users (AUI).

Related paper:
Normalisation process:




Monday, 16 September 2013

Towards Concept Identification using a Knowledge-Intensive Approach

The paper "Towards Concept Identification using a Knowledge-Intensive Approach" presents an approach to identify concepts and their types in micro posts relying on the DBpedia knowledge base and ontology. Our approach consist first in carrying out a preprocessing task where messages are normalised. Then we attempt to identify candidate concepts leveraging part-of-speech tags and Wikipedia article titles. Next we associate the candidate concepts with DBpedia resources and tap into the ontology hierarchy of classes and resource properties to classify the resource in one of the following types: Person, Organization, Location, and Miscellaneous, which covers films, sport events, software, awards and television shows.

Related paper: