Author: Adrian De Riz

In China, you are what you drive

It is not unusual for Chinese grooms and brides to be chauffeured in a Bentley, Maybach or Rolls Royce, while their entourage follows in a uniform suit of upper-class vehicles. The same holds true for Chinese business executives, who expect and are expected to be driven in higher class cars. In many aspects of Chinese life, the car reflects a person’s “face”. This cultural importance of cars, together with the growth of the Chinese economy, creates a strong demand in the Chinese luxury car market.

The Chinese car market: different and too big to miss

Historically, the Chinese luxury car segment has been served by non-Chinese players from Europe, the US and Japan. Built in and for Western markets, these luxury cars were often not designed with Chinese customers in mind. However, for the past 10 years China has been the biggest car market, and will remain so for at least another decade to come. The Chinese car market has become a crucial battleground that these brands are not willing to give up.

Catering to the needs of Chinese customers means winning the market

For global car brands, product localisation can decide over success or failure in China. Audi serves as a great example for a brand that recognized this opportunity and acted on it. In Europe, the majority of executives drive themselves to work. Therefore, the driving experience behind the wheel often dictates the purchasing decision. Chinese executives, on the other hand, are driven to work by their chauffeurs. Aspects of driving the car are secondary to the perceived comfort in the back of the car. The following chart shows the relevance of various interior components in executive cars and sedans as distilled from discussions in Chinese social media:

The backseat is clearly the most relevant component for executive cars, while front seats are more relevant for sedans. Additionally, maneuvering constraints and parking problems from excessive car length are less of a concern than it would be in Western markets.

Understanding these differences in customer needs, Audi focused its product development on the customer experience in the back of the car. In 2005, it introduced their first products designed exclusively for the Chinese market: the Audi A6L and A8L. The two models explicitly target the Chinese executive segment with enlarged wheelbases of up to 30cm. This additional length is applied in the back of the car, allowing for more leg space and room for movement. Additionally, Audi used the finest materials and accessories, normally found in the front row, and moved them to the back.

The result: a Chinese champion in the executive car segment was born. A6L sales compared to the base model soared by 27% in the first quarter. It took competitors half a decade to close this product development gap. The following chart shows sentiments for A6L seats and overall model perception, compared to sentiments for the competing products by BMW and Mercedes Benz:

A6L manifests the best sentiment both for the back seat and for the overall perception of the product.

In the end, customer centricity wins

By recognizing the cultural context and tailoring their product accordingly, Audi was able to design a car that perfectly addresses the requirements of the Chinese executive car market. This insight of the relative importance of front and back seat made Audi the trendsetter. Listening closely to the needs and wants of the local target group, the brand became the #1 choice among Chinese executives and gained a competitive advantage of several years. 

Author: Janna Lipenkova

Market research surveys typically consist of two types of questions, namely “closed” and “open” questions. Closed questions limit the possible range of responses and result in structured data. By contrast, “open” questions allow the respondent to reply with free text, expressing their full-fledged, authentic opinion. This flexibility makes open questions attractive in market research: given the right wording, an open question can trigger responses that are broader, deeper and provide more authentic insight than the rigid, black-and-white multiple-choice question. 

Challenges with the analysis of open-ended questions

Why, then, are open-ended questions not widely adopted in market research? One of the reasons is that they are difficult to analyze due to their unstructured character. Most researchers use manual coding, where open questions are manually structured into a classification scheme (the “coding frame”) according to the topics they represent. The following table shows some examples:

Manual coding comes with several issues:

  • High cost: manual coding is labor-intensive and thus expensive.
  • Errors: depending on their mental and physical shape, human coders make mistakes or provide inconsistent judgments at different points in time.  
  • Subjectivity: due to the inherent ambiguity and the subtleties involved in human language, different people might code the same response in different ways. 

Last but not least, coding hundreds or even thousands of open-end questions can be a frustrating endeavor. In today’s world, where AI is used to automate and optimize virtually every repetitive task, it seems natural to turn to automated processing to eliminate the monotone parts of coding. Beyond the several benefits of automation, this also creates time for more involved challenges that require the creativity, experience and intellectual versatility of the human brain. 

Using Natural Language Processing as a solution

Natural Language Processing (NLP) automates the manual work researchers do when they code open-ended questions. It structures a text according to the discussed topics and concepts as well as other relevant metrics, such as the frequency, relevance and sentiment. Beyond speeding up the coding process, NLP can be used to discover additional insights in the data and enrich the end result. The capacity of a machine to look at a large dataset as a whole and discover associations, regularities and outliers is larger than that of the human brain. 

Three algorithms – topic modeling and classification, concept extraction and sentiment analysis – are particularly useful in the coding process. 

Topic modeling and classification

Topic modeling detects abstract topics in the text. Topic modeling is an unsupervised learning method similar to clustering, and learns lexical similarities between texts without a predefined set of classes. Thus, it is particularly useful in the initial stage of the construction of a coding frame. The following word cloud shows words that are frequently mentioned in texts about comfort:

Topic classification is similar to topic modeling. However, it works with a given coding frame and classifies each text into one of the predefined classes. This means, that it can be used for coding after the coding frame has been constructed.

Concept extraction

Concept extraction matches concrete strings in the text. Whereas topic modeling and classification work with – often implicit – lexical information distributed everywhere in the text, concept extraction matches the exact words and phrases that occur in the text. On a more advanced level, concept extraction also uses the structure of the lexicon and can deal with lexical relations, such as:

  • Synonymy: EQUALS-relationship, e. g. VW EQUALS Volkswagen 
  • Hypernymy: IS-A-relationship, e. g. Sedan IS-A Vehicle
  • Meronymy: PART-OF relationship, e. g. Engine PART-OF Car

Concept extraction usually focuses on nouns and noun phrases (engine, Volkswagen). In the context of evaluations (open-ended questions), it is also useful to extract concepts that are “hidden” in adjectives (fast ➤ Speed, cozy ➤ Comfort) and verbs (overpay ➤ Price, fail ➤ Reliability).

In terms of implementation, there are two main approaches to concept extraction: the dictionary-based approach and the machine-learning approach. The dictionary-based approach works with predefined lists of terms for each category (also called “gazeteers”). The machine-learning approach, on the other hand, learns concepts of specific types from large quantities of annotated data. As a rule of thumb, the smaller and more specific the available dataset, the more efficient the use of pre-defined lists of concepts and linguistic expressions. 

Sentiment analysis

Sentiment analysis detects whether a given text has a positive or a negative connotation. Sentiment analysis can be further detailed to the level of individual aspects mentioned in a text, thus allowing to detect mixed sentiments on the phrase level: 

“Classy and reliable, but expensive.”

Sentiment analysis operates on an emotional, subjective and often implicit linguistic level. This subtlety raises several challenges for automation. For example, sentiment analysis is highly context dependent: a vacuum cleaner that sucks would probably get a neutral-to-positive sentiment; by contrast, the internet connection in a car normally shouldn’t “suck”. Another complication is irony and sarcasm: on the lexical level, ironic statements often use vocabulary with a clear polarity orientation. However, when put into the surrounding context, this polarity is inversed:

“Really great engineering… the engine broke after only three weeks!”

Irony is mostly detected from anomalies in the polarity contrasts between neighboring text segments. For instance, in the example above, “really great engineering” gets a strong positive sentiment which radically clashes with the negative feel of “the engine broke after only three weeks”. Since the two phrases are directly juxtaposed without a conjunction such as “but” or “although”, the machine is able to recognize the overall negative connotation. 

Combining Human and Artificial Intelligence

Summing up, using NLP for the coding of open-ended questions leverages several benefits of automation: it speeds up the process and saves human labor on the “frustrating” side of things. It achieves better consistency and objectivity, mitigating the effects of human factors such as fatigue and inconsistent judgment. Finally, the ability of the machine to process large data quantities at a high level of detail allows a level of granularity that might be inaccessible to the human brain. 

While it is out of question that NLP automation increases the efficiency of verbatim coding, keep in mind that current AI technology is not perfect and should always have a human in the driving seat. Methods such as NLP can process large quantities of data in no time, but they do not yet capture the full complexity of language. A combination of high-quality NLP with a carefully engineered process for continuous optimization will ensure a rewarding journey towards in-depth understanding of the opinions, needs and wants of end consumers.   

Authors: Xiaoqiao Yu, Daryna Konstantinova, Sonja Anton

Comparing Alessandro Michele and Virgil Abloh

Louis Vuitton and Gucci are quickly climbing the ranks as the most valuable international fashion brands of 2018. We compared the public image of Gucci’s design director Alessandro Michele, with Louis Vuitton’s menswear designer Virgil Abloh in China:

So what?

As the founder of the successful urban lifestyle brand Off-white, Virgil Abloh is a blow of fresh air in high fashion. His designs are perceived as fresh and original against Louis Vuitton’s luxurious backdrop. Alessandro Michele incorporated renaissance aspects in his designs, giving them a romance and vintage feel.

Heated discussion online

Author: Janna Lipenkova

In the past years, the tech world has seen a surge of Natural Language Processing (NLP) applications in various areas, including adtech, publishing, customer service and market intelligence. According to Gartner’s hype cycle, NLP has reached the peak of inflated expectations in 2018. Many businesses see it as a “go-to” solution to generate value from the 80% of business-relevant data that comes in unstructured form. To put it simply – NLP is wildly adopted with wildly variable success.

In this article, I share some practical advice for the smooth integration of NLP into your tech stack. The advice summarizes the experience I have accumulated on my journey with NLP — through academia, a number of industry projects, and my own company which develops NLP-driven applications for international market intelligence. The article does not provide technical details but focusses on organisational factors including hiring, communication and expectation management.

Before starting out on NLP, you should meditate on two questions:

1. Is a unique NLP component critical for the core business of our company?

Example: Imagine you are a hosting company. You want to optimise your customer service by analysing incoming customer requests with NLP. Most likely, this enhancement will not be part of your critical path activities. By contrast, a business in targeted advertising should try to make sure it does not fall behind on NLP — this could significantly weaken its competitive position.

2. Do we have the internal competence to develop IP-relevant NLP technology?

Example: You hired and successfully integrated a PhD in Computational Linguistics with the freedom to design new solutions. She will likely be motivated to enrich the IP portfolio of your company. However, if you are hiring middle-level data scientists without a clear focus on language that need to split their time between data science and engineering tasks, don’t expect a unique IP contribution. Most likely, they will fall back on ready-made algorithms due to lack of time and mastery of the underlying details.

Hint 1: if your answers are “yes” and “no” — you are in trouble! You’d better identify technological differentiators that do match your core competence.

Hint 2: if your answers are “yes” and “yes” — stop reading and get to work. Your NLP roadmap should already be defined by your specialists to achieve the business- specific objectives.

If you are still there, don’t worry – the rest will soon fall in place. There are three levels at which you can “do NLP”:

  1. Black belt level, reaching deep into mathematical and linguistic subtleties
  2. Training & tuning level, mostly plugging in existing NLP/ML libraries
  3. Blackbox level, relying on “buying” third-party NLP

The black belt level

Let’s elaborate: the first, fundamental level is our “black belt”.  This level comes close to computational linguistics, the academic counterpart of NLP. The folks here often split into two camps — the mathematicians and the linguists. The camps might well befriend each other, but the mindsets and the way of doing things will still differ.

The math guys are not afraid of things like matrix calculus and will strive on details of newest methods of optimisation and evaluation. At the risk of leaving out linguistic details, they will generally take the lead on improving the recall of your algorithms. The linguists were raised either on highly complex generative or constraint-based grammar formalisms, or alternative frameworks such as cognitive grammar. These give more room to imagination but also allow for formal vagueness. They will gravitate towards writing syntactic and semantic rules and compiling lexica, often needing their own sandbox and taking care of the precision part. Depending on how you handle communication and integration between the two camps, their collaboration can either block productivity or open up exciting opportunities.

In general, if you can inject a dose of pragmatism into the academic perfectionism you can create a unique competitive advantage. If you can efficiently combine mathematicians and linguists on your team — even better! But be aware that you have to sell them on an honest vision — and then, follow through. Doing hard fundamental work without seeing its impact on the business would be a frustrating and demotivating experience for your team.

The training & tuning level

The second level involves the training and tuning of models using existing algorithms. In practice, most of the time will be spent on data preparation, training data creation and feature engineering. The core tasks — training and tuning — do not require much effort. At this level, your people will be data scientists pushing the boundaries of open-source packages, such as nltk, scikit-learn, spacy and tensorflow, for NLP and/or machine learning. They will invent new and not always academically justified ways of extending training data, engineering features and applying their intuition for surface-side tweaking. The goal is to train well-understood algorithms such as NER, categorisation and sentiment analysis, customized to the specific data at your company.

The good thing here is that there are plenty of great open-source packages out there. Most of them will still leave you with enough flexibility to optimize them to your specific use case. The risk is on the side of HR — many roads lead to data science. Data scientists are often self-taught and have a rather interdisciplinary background. Thus, they will not always have the innate academic rigour of level 1 scientists. As deadlines or budgets tighten, your team might get loose on training and evaluation methods, thus accumulating significant technical debt.

The blackbox level

On the third level is a “blackbox” where you buy NLP. Your developers will mostly consume paid APIs that provide the standard algorithm outputs out-of-the-box, such as Rosette, Semantria and Bitext (cf. this post for an extensive review of existing APIs). Ideally, your data scientists will be working alongside business analysts or subject matter experts. For example, if you are doing competitive intelligence, your business analysts will be the ones to design a model which contains your competitors, their technologies and products.

At the blackbox level, make sure you buy NLP only from black belts! With this secured, one of the obvious advantages of outsourcing NLP is that you avoid the risk of diluting your technological focus. The risk is a lack of flexibility — with time, your requirements will get more and more specific. The better your integration policy, the higher the risk that your API will stop satisfying your requirements. It is also advisable to invest into manual quality assurance to make sure the API outputs deliver high quality.

Final Thoughts

So, where do you start? Of course, it depends — some practical advice:

  • Talk to your tech folks about your business objectives. Let them research and prototype and start out on level 2 or 3.
  • Make sure your team doesn’t get stuck in low-level details of level 1 too early. This might lead to significant slips in time and budget since a huge amount of knowledge and training is required.
  • Don’t hesitate — you can always consider a transition between 2 and 3 further down the path (by the way, this works in any direction). The transition can be efficiently combined with the generally unavoidable refactoring of your system.
  • If you manage to build up a compelling business case with NLP — welcome to the club, you can use it to attract first-class specialists and add to your uniqueness by working on level 1!

About the author: Janna Lipenkova holds a PhD in Computational Linguistics and is the CEO of Anacode, a provider of tech-based solutions for international market intelligence. Find out more about our solution here

We are excited to start our cooperation with GIM Gesellschaft für innovative Marktforschung mbH, which gives us the opportunity to benefit from the long-standing expertise of GIM in the area of market research. Together, we are going to work on innovative approaches to consumer insight and produce the best blend of “traditional” and technology-based methods. Read more…

In the current flood of Business Intelligence and insight tools, there is a phrase causing users to abandon the fanciest tools and leading to serious self-doubt for the provider – the “so what?” question. Indeed, your high-quality analytics application might spit out accurate, statistically valid data and package them into intuitive visualisations – but if you stop there, your data has not yet become a basis for decision and action. Most users will be lost or depend on the help and expertise of a business translator, thus creating additional bumps on their journey to data-driven action.

In this article, we focus on applications of Web-based Text Analytics – not “under-the-hood” technological details, but the practical use of Text Analytics and Natural Language Processing (NLP) to answer central business questions. Equipped with this knowledge, you will be able to tap into the full power of Text Analytics and fully benefit from large-scale data coverage and machine intelligence. A real-time mastery of the oceans of data floating on the Web will allow you to make your market decisions and moves with ease and confidence.


1. The basics

Before diving into details, let’s first get an understanding of how Text Analytics works. Text Analytics starts out with raw, semi-structured data – text combined with some metadata. The metadata have a custom format, although some fields, such as dates and authors, are pretty consistent across different data sources. The first step is a one-by-one analysis of these datapoints, resulting in a structured data basis with a unified schema. Even more important than the structuring is the transformation of the data from qualitative to quantitative. This transformation enables the second step – aggregation, which condenses a huge number of structured representations into a small number of consolidated and meaningful analyses, ready for visualization and interpretation by the end user.

2. Answering questions with Text Analytics

A number of questions can be answered with Text Analytics and NLP. Let’s start with the basics – what do users talk about and how do they talk about it? We’ll be providing examples from the Chinese social media landscape on the way.

First, the what – what is relevant, popular or even hot? This question can be answered with two algorithms:

  • Text categorisation classifies a text into one or multiple predefined categories. The category doesn’t need to explicitly be named in the text – instead, the algorithm takes words and their combinations as cues (so-called features) to recognise the category of the text. Text categorisation is a coarse-grained algorithm and thus well-suited for initial filtering or getting an overview over the dataset. For example, the following chart shows the categorisation of blog articles around the topic of automotive connectivity:
  • Concept extraction digs more into depth and identifies concepts such as brands, companies, locations and people that are directly mentioned in the text. Thus, it can identify multiple concepts of different types, and each concept can occur multiple times in the text. For example, the following chart shows mention frequencies for the most common automotive brands in the Chinese social web in February 2018:

Using time series analysis in the aggregation, text categorisation and concept extraction can be used to identify upcoming trends and topics. Let’s look into the time development for Volkswagen, the most frequent auto brand:

Once we have identified what people talk about, it is time to dig deeper and understand howthey talk about it. Sentiment analysis allows to analyze how the topics and concepts are perceived by customers and other stakeholders. Again, sentiment analysis can be applied at different levels: whole texts can be analysed for an initial overview. At an advanced stage, sentiment analysis can be applied to specific concepts to answer more detailed questions. Thus, competitor brands can be analysed for sentiment to determine the current rank of one’s own brand. Products can be analysed to find out where to focus improvement efforts. And finally, product features are analysed for sentiment to understand how to actually make improvements. As an example, the following chart shows the most positively perceived models for Audi in the Chinese web:

3. From insights to actions

Insights from Web-based Text Analytics can be directly integrated into marketing activities, product development and competitive strategy.

Marketing intelligence

By analysing the contexts in which your products are discussed, you learn the “soft” facts which are central for marketing success, such as less tangible connotations of your offering – these can be used as hints to optimise your communication. You can also understand the interest profile of your target crowd and use it to improve your story and wording. Finally, Text Analytics allows to monitor the response to your marketing efforts in terms of awareness, attitude and sentiment.

Product intelligence

With Text Analytics, you can zoom in on awareness and attitudes about your own products and find out their most relevant aspects with concept extraction. Using sentiment analysis, you can compare the perception of different products amongst each other and focus on their fine-grained features. Once you place your products and features on a positive-negative scale, you know where to focus your efforts to maximise your strengths and neutralise your weaknesses.

Competitive intelligence

Your brand doesn’t exist in a vacuum – let’s broaden our research scope. Text Analytics allows you to answer the above questions not only for your own brand, but also for your competitors. Thus, you will learn about the marketing, positioning and branding of your competitors to better differentiate yourself and present your USPs in a sharp and convincing manner. You can also analyse competitor products to learn what they did right – especially on those features where your own company went wrong. And, in a more strategic perspective, Text Analytics allows you to monitor technological trends to respond early to market developments.

So what?

How to show that your findings are not only accurate and correct, but also relevant to business success? Using continuous real-time monitoring, you can track your online KPIs and validate your actions based on the response of the market. Concept extraction can be used to measure the changes in brand awareness and relevance, whereas sentiment analysis shows how brand, product and product feature perceptions have improved based on your efforts.

With the right tools, Text Analytics can be efficiently used in stand-alone mode or as a complement to traditional field research. In the age of digitalisation, it allows you to listen to the voice of your market on the Web and turns your insight journey into an engaging path to actionable, transparent market insights.


Get in touch with Anacode’s specialists to learn how your business challenges can turn into opportunities with Text Analytics.

Just as the rest of the China’s financial system, the Chinese stock market is subject to rather strict government regulations. However, in recent years, it offers more and more opportunities to risk-tolerant foreign investors.

This report sample provides an overview over the Chinese stock market based on data from the Chinese finance portal 金融界 (; Finance World).


Download the report sample here.