Author: Janna Lipenkova

Market research surveys typically consist of two types of questions, namely “closed” and “open” questions. Closed questions limit the possible range of responses and result in structured data. By contrast, “open” questions allow the respondent to reply with free text, expressing their full-fledged, authentic opinion. This flexibility makes open questions attractive in market research: given the right wording, an open question can trigger responses that are broader, deeper and provide more authentic insight than the rigid, black-and-white multiple-choice question. 

Challenges with the analysis of open-ended questions

Why, then, are open-ended questions not widely adopted in market research? One of the reasons is that they are difficult to analyze due to their unstructured character. Most researchers use manual coding, where open questions are manually structured into a classification scheme (the “coding frame”) according to the topics they represent. The following table shows some examples:

Manual coding comes with several issues:

  • High cost: manual coding is labor-intensive and thus expensive.
  • Errors: depending on their mental and physical shape, human coders make mistakes or provide inconsistent judgments at different points in time.  
  • Subjectivity: due to the inherent ambiguity and the subtleties involved in human language, different people might code the same response in different ways. 

Last but not least, coding hundreds or even thousands of open-end questions can be a frustrating endeavor. In today’s world, where AI is used to automate and optimize virtually every repetitive task, it seems natural to turn to automated processing to eliminate the monotone parts of coding. Beyond the several benefits of automation, this also creates time for more involved challenges that require the creativity, experience and intellectual versatility of the human brain. 

Using Natural Language Processing as a solution

Natural Language Processing (NLP) automates the manual work researchers do when they code open-ended questions. It structures a text according to the discussed topics and concepts as well as other relevant metrics, such as the frequency, relevance and sentiment. Beyond speeding up the coding process, NLP can be used to discover additional insights in the data and enrich the end result. The capacity of a machine to look at a large dataset as a whole and discover associations, regularities and outliers is larger than that of the human brain. 

Three algorithms – topic modeling and classification, concept extraction and sentiment analysis – are particularly useful in the coding process. 

Topic modeling and classification

Topic modeling detects abstract topics in the text. Topic modeling is an unsupervised learning method similar to clustering, and learns lexical similarities between texts without a predefined set of classes. Thus, it is particularly useful in the initial stage of the construction of a coding frame. The following word cloud shows words that are frequently mentioned in texts about comfort:

Topic classification is similar to topic modeling. However, it works with a given coding frame and classifies each text into one of the predefined classes. This means, that it can be used for coding after the coding frame has been constructed.

Concept extraction

Concept extraction matches concrete strings in the text. Whereas topic modeling and classification work with – often implicit – lexical information distributed everywhere in the text, concept extraction matches the exact words and phrases that occur in the text. On a more advanced level, concept extraction also uses the structure of the lexicon and can deal with lexical relations, such as:

  • Synonymy: EQUALS-relationship, e. g. VW EQUALS Volkswagen 
  • Hypernymy: IS-A-relationship, e. g. Sedan IS-A Vehicle
  • Meronymy: PART-OF relationship, e. g. Engine PART-OF Car

Concept extraction usually focuses on nouns and noun phrases (engine, Volkswagen). In the context of evaluations (open-ended questions), it is also useful to extract concepts that are “hidden” in adjectives (fast ➤ Speed, cozy ➤ Comfort) and verbs (overpay ➤ Price, fail ➤ Reliability).

In terms of implementation, there are two main approaches to concept extraction: the dictionary-based approach and the machine-learning approach. The dictionary-based approach works with predefined lists of terms for each category (also called “gazeteers”). The machine-learning approach, on the other hand, learns concepts of specific types from large quantities of annotated data. As a rule of thumb, the smaller and more specific the available dataset, the more efficient the use of pre-defined lists of concepts and linguistic expressions. 

Sentiment analysis

Sentiment analysis detects whether a given text has a positive or a negative connotation. Sentiment analysis can be further detailed to the level of individual aspects mentioned in a text, thus allowing to detect mixed sentiments on the phrase level: 

“Classy and reliable, but expensive.”

Sentiment analysis operates on an emotional, subjective and often implicit linguistic level. This subtlety raises several challenges for automation. For example, sentiment analysis is highly context dependent: a vacuum cleaner that sucks would probably get a neutral-to-positive sentiment; by contrast, the internet connection in a car normally shouldn’t “suck”. Another complication is irony and sarcasm: on the lexical level, ironic statements often use vocabulary with a clear polarity orientation. However, when put into the surrounding context, this polarity is inversed:

“Really great engineering… the engine broke after only three weeks!”

Irony is mostly detected from anomalies in the polarity contrasts between neighboring text segments. For instance, in the example above, “really great engineering” gets a strong positive sentiment which radically clashes with the negative feel of “the engine broke after only three weeks”. Since the two phrases are directly juxtaposed without a conjunction such as “but” or “although”, the machine is able to recognize the overall negative connotation. 

Combining Human and Artificial Intelligence

Summing up, using NLP for the coding of open-ended questions leverages several benefits of automation: it speeds up the process and saves human labor on the “frustrating” side of things. It achieves better consistency and objectivity, mitigating the effects of human factors such as fatigue and inconsistent judgment. Finally, the ability of the machine to process large data quantities at a high level of detail allows a level of granularity that might be inaccessible to the human brain. 

While it is out of question that NLP automation increases the efficiency of verbatim coding, keep in mind that current AI technology is not perfect and should always have a human in the driving seat. Methods such as NLP can process large quantities of data in no time, but they do not yet capture the full complexity of language. A combination of high-quality NLP with a carefully engineered process for continuous optimization will ensure a rewarding journey towards in-depth understanding of the opinions, needs and wants of end consumers.   

Authors: Xiaoqiao Yu, Daryna Konstantinova, Sonja Anton

Comparing Alessandro Michele and Virgil Abloh

Louis Vuitton and Gucci are quickly climbing the ranks as the most valuable international fashion brands of 2018. We compared the public image of Gucci’s design director Alessandro Michele, with Louis Vuitton’s menswear designer Virgil Abloh in China:

So what?

As the founder of the successful urban lifestyle brand Off-white, Virgil Abloh is a blow of fresh air in high fashion. His designs are perceived as fresh and original against Louis Vuitton’s luxurious backdrop. Alessandro Michele incorporated renaissance aspects in his designs, giving them a romance and vintage feel.

Heated discussion online

We are excited to start our cooperation with GIM Gesellschaft für innovative Marktforschung mbH, which gives us the opportunity to benefit from the long-standing expertise of GIM in the area of market research. Together, we are going to work on innovative approaches to consumer insight and produce the best blend of “traditional” and technology-based methods. Read more…

In the current flood of Business Intelligence and insight tools, there is a phrase causing users to abandon the fanciest tools and leading to serious self-doubt for the provider – the “so what?” question. Indeed, your high-quality analytics application might spit out accurate, statistically valid data and package them into intuitive visualisations – but if you stop there, your data has not yet become a basis for decision and action. Most users will be lost or depend on the help and expertise of a business translator, thus creating additional bumps on their journey to data-driven action.

In this article, we focus on applications of Web-based Text Analytics – not “under-the-hood” technological details, but the practical use of Text Analytics and Natural Language Processing (NLP) to answer central business questions. Equipped with this knowledge, you will be able to tap into the full power of Text Analytics and fully benefit from large-scale data coverage and machine intelligence. A real-time mastery of the oceans of data floating on the Web will allow you to make your market decisions and moves with ease and confidence.


1. The basics

Before diving into details, let’s first get an understanding of how Text Analytics works. Text Analytics starts out with raw, semi-structured data – text combined with some metadata. The metadata have a custom format, although some fields, such as dates and authors, are pretty consistent across different data sources. The first step is a one-by-one analysis of these datapoints, resulting in a structured data basis with a unified schema. Even more important than the structuring is the transformation of the data from qualitative to quantitative. This transformation enables the second step – aggregation, which condenses a huge number of structured representations into a small number of consolidated and meaningful analyses, ready for visualization and interpretation by the end user.

2. Answering questions with Text Analytics

A number of questions can be answered with Text Analytics and NLP. Let’s start with the basics – what do users talk about and how do they talk about it? We’ll be providing examples from the Chinese social media landscape on the way.

First, the what – what is relevant, popular or even hot? This question can be answered with two algorithms:

  • Text categorisation classifies a text into one or multiple predefined categories. The category doesn’t need to explicitly be named in the text – instead, the algorithm takes words and their combinations as cues (so-called features) to recognise the category of the text. Text categorisation is a coarse-grained algorithm and thus well-suited for initial filtering or getting an overview over the dataset. For example, the following chart shows the categorisation of blog articles around the topic of automotive connectivity:
  • Concept extraction digs more into depth and identifies concepts such as brands, companies, locations and people that are directly mentioned in the text. Thus, it can identify multiple concepts of different types, and each concept can occur multiple times in the text. For example, the following chart shows mention frequencies for the most common automotive brands in the Chinese social web in February 2018:

Using time series analysis in the aggregation, text categorisation and concept extraction can be used to identify upcoming trends and topics. Let’s look into the time development for Volkswagen, the most frequent auto brand:

Once we have identified what people talk about, it is time to dig deeper and understand howthey talk about it. Sentiment analysis allows to analyze how the topics and concepts are perceived by customers and other stakeholders. Again, sentiment analysis can be applied at different levels: whole texts can be analysed for an initial overview. At an advanced stage, sentiment analysis can be applied to specific concepts to answer more detailed questions. Thus, competitor brands can be analysed for sentiment to determine the current rank of one’s own brand. Products can be analysed to find out where to focus improvement efforts. And finally, product features are analysed for sentiment to understand how to actually make improvements. As an example, the following chart shows the most positively perceived models for Audi in the Chinese web:

3. From insights to actions

Insights from Web-based Text Analytics can be directly integrated into marketing activities, product development and competitive strategy.

Marketing intelligence

By analysing the contexts in which your products are discussed, you learn the “soft” facts which are central for marketing success, such as less tangible connotations of your offering – these can be used as hints to optimise your communication. You can also understand the interest profile of your target crowd and use it to improve your story and wording. Finally, Text Analytics allows to monitor the response to your marketing efforts in terms of awareness, attitude and sentiment.

Product intelligence

With Text Analytics, you can zoom in on awareness and attitudes about your own products and find out their most relevant aspects with concept extraction. Using sentiment analysis, you can compare the perception of different products amongst each other and focus on their fine-grained features. Once you place your products and features on a positive-negative scale, you know where to focus your efforts to maximise your strengths and neutralise your weaknesses.

Competitive intelligence

Your brand doesn’t exist in a vacuum – let’s broaden our research scope. Text Analytics allows you to answer the above questions not only for your own brand, but also for your competitors. Thus, you will learn about the marketing, positioning and branding of your competitors to better differentiate yourself and present your USPs in a sharp and convincing manner. You can also analyse competitor products to learn what they did right – especially on those features where your own company went wrong. And, in a more strategic perspective, Text Analytics allows you to monitor technological trends to respond early to market developments.

So what?

How to show that your findings are not only accurate and correct, but also relevant to business success? Using continuous real-time monitoring, you can track your online KPIs and validate your actions based on the response of the market. Concept extraction can be used to measure the changes in brand awareness and relevance, whereas sentiment analysis shows how brand, product and product feature perceptions have improved based on your efforts.

With the right tools, Text Analytics can be efficiently used in stand-alone mode or as a complement to traditional field research. In the age of digitalisation, it allows you to listen to the voice of your market on the Web and turns your insight journey into an engaging path to actionable, transparent market insights.


Get in touch with Anacode’s specialists to learn how your business challenges can turn into opportunities with Text Analytics.

Just as the rest of the China’s financial system, the Chinese stock market is subject to rather strict government regulations. However, in recent years, it offers more and more opportunities to risk-tolerant foreign investors.

This report sample provides an overview over the Chinese stock market based on data from the Chinese finance portal 金融界 (; Finance World).


Download the report sample here.

This report provides a descriptive overview of the Chinese Web 2.0 landscape for automotive feedback, focussing on BMW 7 Series and comparing it with Audi A8 and Mercedes-Benz S-Class. The feedback is analysed both from a qualitative and a quantitative perspective. The main observations and findings are as follows:

  • Popular topics and concepts: We find that users are most concerned about the price and optical aspects (design, visual appearance) of the three considered series. Competitor brands that are discussed in a comparative perspective are mostly high-end or consumer-oriented foreign brands from Germany, US and Japan, whereas native Chinese brands are much less frequent. Geographically, users concentrate in the big cities and more affluent regions along the East coast.
  • Temporal evolutions: The quantity of buzz grows relatively evenly for all three series before 2015, with BMW 7 and S-Class leading. In 2015 – 2016, there is a burst in the quantity of data for BMW 7, which correlates with the introduction of the new generation of the series.
  • User satisfaction and sentiment: Users are generally satisfied with the frequently mentioned major product features of BMW 7. There are, however, some categories that are perceived negatively – specifically, components related to the front part of the car, the fuel consumption and aspects related to acoustic quality and insulation.
  • Social influencers: Among the key influencers on WeChat, China’s leading social network, we mostly find media accounts posting on general automotive topics. There are no accounts with a wide social reach that would specialize on the BMW brand. Thus, influencer marketing is an opportunity yet to be explored by BMW’s marketing and branding strategy.

Download the social report.