Sunday, 13 December 2015

Methods and Techniques for Segmentation of Consumers in Social Media


Social media has revolutionised the way in which consumers relate to each other and with brands. The opinions published in social media have a power of influencing purchase decisions as important as advertising campaigns. Consequently, marketers are increasing efforts and investments for obtaining indicators to measure brand health from the digital content generated by consumers.

Given the unstructured nature of social media contents, the technology used for processing such contents often implements Artificial Intelligence techniques, such as natural language processing, machine learning and semantic analysis algorithms.

This thesis contributes to the State of the Art, with a model for structuring and integrating the information posted on social media, and a number of techniques whose objectives are the identification of consumers, as well as their socio-demographic and psychographic segmentation. The consumer identification technique is based on the fingerprint of the devices they use to surf the Web and is tolerant to the changes that occur frequently in such fingerprint. The psychographic profiling techniques described infer the position of consumer in the purchase funnel, and allow to classify the opinions based on a series of marketing attributes. Finally, the socio-demographic profiling techniques allow to obtain the residence and gender of consumers.

Related thesis:
Related Slides:






    Tuesday, 2 September 2014

    Content Analytics for Media Agencies

    This presentation shows different practises carried at Havas Media and needs regarding content analytics.

    We summarise different innovation projects we are working on Havas Media oriented to understand consumer behavior by analysing content extracted from social media, and for activating the brands' communication strategies in real time in an omni-channel environment that takes into account the different touch points between brands and consumers.

    Friday, 18 July 2014

    A classification of user-generated content into consumer decision journey stages

    In the last decades, the availability of digital user-generated documents from social media has dramatically increased. This massive growth of user-generated content has also affected traditional shopping behaviour.

    Customers have embraced new communication channels such as microblogs and social networks that enable them not only just to talk with friends and acquaintances about their shopping experience, but also to search for opinions expressed by complete strangers as part of their decision making processes. Uncovering how customers feel about specific products or brands and detecting purchase habits and preferences has traditionally been a costly and highly time-consuming task which involved the use of methods such as focus groups and surveys. However, the new scenario calls for a deep assessment of current market research techniques in order to better interpret and profit from this ever-growing stream of attitudinal data. 

    With this purpose, we present a novel analysis and classification of user-generated content in terms of it belonging to one of the four stages of the Consumer Decision Journey (i.e. the purchase process from the moment when a customer is aware of the existence of the product to the moment when he or she buys, experiences and talks about it). Using a corpus of short texts written in English and Spanish and extracted from different social media, we identify a set of linguistic patterns for each purchase stage that will be then used in a rule-based classifier.

    Additionally, we use machine learning algorithms to automatically identify business indicators such as the Marketing Mix elements. The classification of the purchase stages achieves an average precision of 74%. The proposed classification of texts depending on the Marketing Mix elements expressed achieved an average precision of 75% for all the elements analysed.

    Related paper:

    Tuesday, 24 June 2014

    ¿Cómo puede ayudar el Big Data a dirigir las campañas de comunicación?

    Presentación que trata sobre diferentes proyectos de innovación en los que estamos trabajando en Havas Media orientados a entender el comportamiento de los consumidores mediante el análisis de Big Data procedente de medios sociales, y a activar la estrategia de comunicación de marca en tiempo real en un entorno omnicanal que tenga encuenta todos los puntos de contacto entre marcas y consumidores.

    Tuesday, 17 September 2013

    Characterising social media users by gender and place of residence


    Social media has revolutionized the way in which organizations and consumers interact. Users have adopted massively these channels to engage in conversations about content, products, and brands, while organizations are striving to adapt proactively to the threats and opportunities that this new dynamic environment poses. Social media is a knowledge mine about users, communities, preferences and opinions, which has the potential to impact positively marketing and product development activities.

    Social media monitoring tools are being used successfully in a range of domains (including market research, online publishing, etc.). Most of these tools generate its reports from metrics based on volume of posts and on opinion polarity about the subject that is being studied. Although such metrics are good indicators of subject popularity and reputation, these metrics are often inadequate for capturing complex multi-modal dimensions of the subjects to be measured that are relevant to business, and must be complemented with ad-hoc studies such as opinion polls.

    The validity of these social metrics depends to a large extent on the population over which they are applied. However, social media users cannot be considered a representative sample until the vast majority of people regularly use social media. Therefore, until then, it is necessary to identify the different strata of users in terms of socio-demographic attributes (e.g., gender, age or geographical precedence), in order to weight their opinions according to the proportion of each stratum in the population. Author and content metadata is not enough for capturing such attributes. As an example, not all the social media channels qualify their users neither with gender nor with geographical location. Some channels, such as Twitter, allow their authors to specify their geographical location via a free text field. However, this text field is often left empty, or filled with ambiguous information (e.g., Paris - France vs. Paris - Texas), or with other data that is useless for obtaining real geographical information (e.g., “Neverland”). For these cases, the friendship networks and the content shared and produced by social media users can be used for estimating their socio-demographic attributes, applying techniques such as geographical entity recognition.

    The paper "Characterising social media users by gender and place of residence" explores different techniques for obtaining the place of residence and gender attributes. Such techniques exploit social users’ metadata, the content published and shared by the users to be categorised, and their friendship networks. 

    Related paper:
    Related slides:

    Exploiting web-based collective knowledge for micropost normalisation

    Microposts published on social media are characterised by informality, brevity, frequent grammatical errors and misspellings, and by the use of abbreviations, acronyms, and emoticons. These features add additional difficulties in text mining processes that frequently make use tools designed for dealing with texts which conform to the canons of standard grammar and spelling.

    The micropost normalisation task enhances the accuracy of NLP tools when applied to short fragments of texts published in social media, e.g., the syntactic normalisation of tweets may improve the accuracy of existing part-of-speech taggers.

    The collective knowledge freely available on the Web, and particularly Wikipedia, has been used in different NLP tasks, such as text categorization, topic identification, measuring the semantic similarity between texts, and word sense disambiguation among others.

    The paper "Exploiting web-based collective knowledge for micropost normalisation" presents a technique for morphological normalisation of microposts by the use of two open data sources namely, Wikipedia and the SMS dictionary of the Spanish Association of Internet Users (AUI).

    Related paper:
    Normalisation process:




    Monday, 16 September 2013

    Towards Concept Identification using a Knowledge-Intensive Approach

    The paper "Towards Concept Identification using a Knowledge-Intensive Approach" presents an approach to identify concepts and their types in micro posts relying on the DBpedia knowledge base and ontology. Our approach consist first in carrying out a preprocessing task where messages are normalised. Then we attempt to identify candidate concepts leveraging part-of-speech tags and Wikipedia article titles. Next we associate the candidate concepts with DBpedia resources and tap into the ontology hierarchy of classes and resource properties to classify the resource in one of the following types: Person, Organization, Location, and Miscellaneous, which covers films, sport events, software, awards and television shows.

    Related paper:

    Friday, 7 June 2013

    Big Data and Marketing Technology. Challenges and Case Studies for Media Agencies

    Big Data represent an opportunity for organizations with data analysis needs. Companies need to prepare a number of functions to address the Big Data Challenge.

    The following presentation describes the Big Data landscape for marketing technology, introducing several applications, and describing the three key aspects a media agency must focus on when dealing with Big Data analysis applications.

    Tuesday, 23 April 2013

    Análisis de emociones en medios sociales

    El análisis de sentimientos de textos en las redes sociales se ha convertido en un área de investigación cada vez más relevante debido a la influencia que las opiniones expresadas tienen en los usuarios de los medios sociales.

    Habitualmente, el análisis de sentimientos caracteriza el contenido generado por el usuario de acuerdo a la polaridad de opinión (por ejemplo, un comentario es positivo o negativo para una marca). Esta aproximación es insuficiente para capturar mayores matices de la opinión del usuario, como por ejemplo, dada una opinión positiva sobre un producto, saber si en dicha opinión se expresa satisfacción, felicidad, o confianza en el producto.

    De acuerdo con una clasificación conceptual de emociones y basándonos en un corpus de diversos dominios comerciales, hemos trabajado en la confección de reglas de procesamiento lingüístico que permitan la clasificación de dichos textos según el sentimiento expresado con respecto a una marca, empresa o producto.

    Artículo relacionado:


    Presentación relacionada:

    Tuesday, 11 December 2012

    Detecting browser fingerprint evolution for identifying unique users

    User tracking consists in registering the activity of users as they interact with one or more websites so that such activity can be related with specific, uniquely identified users. Counting unique visitors of websites is an essential activity in order to perform web analytics, since many web analytics metrics depend on individuals counted only once (e.g., new visitors, return visitors, etc.).
    Web analytics provides the measurement model to digital marketing, letting to analyse and measure the effectiveness of advertisement campaigns. Data gathered by applying web analytics (e.g., number of persons that have visualised a banner) is typically compared against key performance indicators (e.g., reach of a campaign), and used to improve the audience response to marketing campaigns (e.g., move the banner to a site with more audience). The most significant KPIs depend on counting unique visitors.

    Uniquely identifying users is also needed for performing behavioural targeting, which involves tracking of the online activities of users in order to deliver tailored ads to them.
    The technique most used for uniquely identifying users from captured web activity is the one that combines cookies and web bugs. This technique is being affected by several factors, such as, strict privacy restrictions implemented by web browsers, or the use of new devices for navigating the web that do not support cookies (e.g., many set-top boxes and certain video game consoles). Furthermore, several security programmes, such the antispyware ones, remove cookies periodically, making it difficult to trace recurring visits to websites. Thus, these security measures, enabled to protect the privacy of users, affect basic aggregated metrics obtained with web analytics, from which valuable business insights can be derived, such as the number of unique visitors of a website, or the bounce rate.

    An alternative to cookies for uniquely identifying users consists of capturing distinctive technical attributes of the system used by such users to navigate the web (i.e., their browser fingerprint). While the effectiveness of this technique has been demonstrated, such technique is not entirely accurate, since browser fingerprint is built from attributes that evolve over time. Thus, changes in values of fingerprint attributes lead to incorrectly accounting new users.

    In summary, existing techniques for counting unique visitors are losing effectiveness, because of privacy restrictions and new devices for navigating the web. The fingerprinting technique deals with such restrictions and devices, but is quite sensible to changes in the attributes of the web browser, which leads to counting unique visitors imprecisely. The paper "Detecting browser fingerprint evolution for identifying unique users" describes an algorithm, based on the fingerprinting technique, which allows identifying unique visitors accurately, regardless changes in browser attributes. For doing so, such algorithm is able to detect the evolution of fingerprints, therefore, effectively grouping distinct fingerprints that correspond to the same user. 

    Related paper: