Author: Janna Lipenkova

In the past years, the tech world has seen a surge of Natural Language Processing (NLP) applications in various areas, including adtech, publishing, customer service and market intelligence. According to Gartner’s hype cycle, NLP has reached the peak of inflated expectations in 2018. Many businesses see it as a “go-to” solution to generate value from the 80% of business-relevant data that comes in unstructured form. To put it simply – NLP is wildly adopted with wildly variable success.

In this article, I share some practical advice for the smooth integration of NLP into your tech stack. The advice summarizes the experience I have accumulated on my journey with NLP — through academia, a number of industry projects, and my own company which develops NLP-driven applications for international market intelligence. The article does not provide technical details but focusses on organisational factors including hiring, communication and expectation management.

Before starting out on NLP, you should meditate on two questions:

1. Is a unique NLP component critical for the core business of our company?

Example: Imagine you are a hosting company. You want to optimise your customer service by analysing incoming customer requests with NLP. Most likely, this enhancement will not be part of your critical path activities. By contrast, a business in targeted advertising should try to make sure it does not fall behind on NLP — this could significantly weaken its competitive position.

2. Do we have the internal competence to develop IP-relevant NLP technology?

Example: You hired and successfully integrated a PhD in Computational Linguistics with the freedom to design new solutions. She will likely be motivated to enrich the IP portfolio of your company. However, if you are hiring middle-level data scientists without a clear focus on language that need to split their time between data science and engineering tasks, don’t expect a unique IP contribution. Most likely, they will fall back on ready-made algorithms due to lack of time and mastery of the underlying details.

Hint 1: if your answers are “yes” and “no” — you are in trouble! You’d better identify technological differentiators that do match your core competence.

Hint 2: if your answers are “yes” and “yes” — stop reading and get to work. Your NLP roadmap should already be defined by your specialists to achieve the business- specific objectives.

If you are still there, don’t worry – the rest will soon fall in place. There are three levels at which you can “do NLP”:

  1. Black belt level, reaching deep into mathematical and linguistic subtleties
  2. Training & tuning level, mostly plugging in existing NLP/ML libraries
  3. Blackbox level, relying on “buying” third-party NLP

The black belt level

Let’s elaborate: the first, fundamental level is our “black belt”.  This level comes close to computational linguistics, the academic counterpart of NLP. The folks here often split into two camps — the mathematicians and the linguists. The camps might well befriend each other, but the mindsets and the way of doing things will still differ.

The math guys are not afraid of things like matrix calculus and will strive on details of newest methods of optimisation and evaluation. At the risk of leaving out linguistic details, they will generally take the lead on improving the recall of your algorithms. The linguists were raised either on highly complex generative or constraint-based grammar formalisms, or alternative frameworks such as cognitive grammar. These give more room to imagination but also allow for formal vagueness. They will gravitate towards writing syntactic and semantic rules and compiling lexica, often needing their own sandbox and taking care of the precision part. Depending on how you handle communication and integration between the two camps, their collaboration can either block productivity or open up exciting opportunities.

In general, if you can inject a dose of pragmatism into the academic perfectionism you can create a unique competitive advantage. If you can efficiently combine mathematicians and linguists on your team — even better! But be aware that you have to sell them on an honest vision — and then, follow through. Doing hard fundamental work without seeing its impact on the business would be a frustrating and demotivating experience for your team.

The training & tuning level

The second level involves the training and tuning of models using existing algorithms. In practice, most of the time will be spent on data preparation, training data creation and feature engineering. The core tasks — training and tuning — do not require much effort. At this level, your people will be data scientists pushing the boundaries of open-source packages, such as nltk, scikit-learn, spacy and tensorflow, for NLP and/or machine learning. They will invent new and not always academically justified ways of extending training data, engineering features and applying their intuition for surface-side tweaking. The goal is to train well-understood algorithms such as NER, categorisation and sentiment analysis, customized to the specific data at your company.

The good thing here is that there are plenty of great open-source packages out there. Most of them will still leave you with enough flexibility to optimize them to your specific use case. The risk is on the side of HR — many roads lead to data science. Data scientists are often self-taught and have a rather interdisciplinary background. Thus, they will not always have the innate academic rigour of level 1 scientists. As deadlines or budgets tighten, your team might get loose on training and evaluation methods, thus accumulating significant technical debt.

The blackbox level

On the third level is a “blackbox” where you buy NLP. Your developers will mostly consume paid APIs that provide the standard algorithm outputs out-of-the-box, such as Rosette, Semantria and Bitext (cf. this post for an extensive review of existing APIs). Ideally, your data scientists will be working alongside business analysts or subject matter experts. For example, if you are doing competitive intelligence, your business analysts will be the ones to design a model which contains your competitors, their technologies and products.

At the blackbox level, make sure you buy NLP only from black belts! With this secured, one of the obvious advantages of outsourcing NLP is that you avoid the risk of diluting your technological focus. The risk is a lack of flexibility — with time, your requirements will get more and more specific. The better your integration policy, the higher the risk that your API will stop satisfying your requirements. It is also advisable to invest into manual quality assurance to make sure the API outputs deliver high quality.

Final Thoughts

So, where do you start? Of course, it depends — some practical advice:

  • Talk to your tech folks about your business objectives. Let them research and prototype and start out on level 2 or 3.
  • Make sure your team doesn’t get stuck in low-level details of level 1 too early. This might lead to significant slips in time and budget since a huge amount of knowledge and training is required.
  • Don’t hesitate — you can always consider a transition between 2 and 3 further down the path (by the way, this works in any direction). The transition can be efficiently combined with the generally unavoidable refactoring of your system.
  • If you manage to build up a compelling business case with NLP — welcome to the club, you can use it to attract first-class specialists and add to your uniqueness by working on level 1!

About the author: Janna Lipenkova holds a PhD in Computational Linguistics and is the CEO of Anacode, a provider of tech-based solutions for international market intelligence. Find out more about our solution here

We are excited to start our cooperation with GIM Gesellschaft für innovative Marktforschung mbH, which gives us the opportunity to benefit from the long-standing expertise of GIM in the area of market research. Together, we are going to work on innovative approaches to consumer insight and produce the best blend of “traditional” and technology-based methods. Read more…

In the current flood of Business Intelligence and insight tools, there is a phrase causing users to abandon the fanciest tools and leading to serious self-doubt for the provider – the “so what?” question. Indeed, your high-quality analytics application might spit out accurate, statistically valid data and package them into intuitive visualisations – but if you stop there, your data has not yet become a basis for decision and action. Most users will be lost or depend on the help and expertise of a business translator, thus creating additional bumps on their journey to data-driven action.

In this article, we focus on applications of Web-based Text Analytics – not “under-the-hood” technological details, but the practical use of Text Analytics and Natural Language Processing (NLP) to answer central business questions. Equipped with this knowledge, you will be able to tap into the full power of Text Analytics and fully benefit from large-scale data coverage and machine intelligence. A real-time mastery of the oceans of data floating on the Web will allow you to make your market decisions and moves with ease and confidence.

 

1. The basics

Before diving into details, let’s first get an understanding of how Text Analytics works. Text Analytics starts out with raw, semi-structured data – text combined with some metadata. The metadata have a custom format, although some fields, such as dates and authors, are pretty consistent across different data sources. The first step is a one-by-one analysis of these datapoints, resulting in a structured data basis with a unified schema. Even more important than the structuring is the transformation of the data from qualitative to quantitative. This transformation enables the second step – aggregation, which condenses a huge number of structured representations into a small number of consolidated and meaningful analyses, ready for visualization and interpretation by the end user.

2. Answering questions with Text Analytics

A number of questions can be answered with Text Analytics and NLP. Let’s start with the basics – what do users talk about and how do they talk about it? We’ll be providing examples from the Chinese social media landscape on the way.

First, the what – what is relevant, popular or even hot? This question can be answered with two algorithms:

  • Text categorisation classifies a text into one or multiple predefined categories. The category doesn’t need to explicitly be named in the text – instead, the algorithm takes words and their combinations as cues (so-called features) to recognise the category of the text. Text categorisation is a coarse-grained algorithm and thus well-suited for initial filtering or getting an overview over the dataset. For example, the following chart shows the categorisation of blog articles around the topic of automotive connectivity:
  • Concept extraction digs more into depth and identifies concepts such as brands, companies, locations and people that are directly mentioned in the text. Thus, it can identify multiple concepts of different types, and each concept can occur multiple times in the text. For example, the following chart shows mention frequencies for the most common automotive brands in the Chinese social web in February 2018:

Using time series analysis in the aggregation, text categorisation and concept extraction can be used to identify upcoming trends and topics. Let’s look into the time development for Volkswagen, the most frequent auto brand:

Once we have identified what people talk about, it is time to dig deeper and understand howthey talk about it. Sentiment analysis allows to analyze how the topics and concepts are perceived by customers and other stakeholders. Again, sentiment analysis can be applied at different levels: whole texts can be analysed for an initial overview. At an advanced stage, sentiment analysis can be applied to specific concepts to answer more detailed questions. Thus, competitor brands can be analysed for sentiment to determine the current rank of one’s own brand. Products can be analysed to find out where to focus improvement efforts. And finally, product features are analysed for sentiment to understand how to actually make improvements. As an example, the following chart shows the most positively perceived models for Audi in the Chinese web:

3. From insights to actions

Insights from Web-based Text Analytics can be directly integrated into marketing activities, product development and competitive strategy.

Marketing intelligence

By analysing the contexts in which your products are discussed, you learn the “soft” facts which are central for marketing success, such as less tangible connotations of your offering – these can be used as hints to optimise your communication. You can also understand the interest profile of your target crowd and use it to improve your story and wording. Finally, Text Analytics allows to monitor the response to your marketing efforts in terms of awareness, attitude and sentiment.

Product intelligence

With Text Analytics, you can zoom in on awareness and attitudes about your own products and find out their most relevant aspects with concept extraction. Using sentiment analysis, you can compare the perception of different products amongst each other and focus on their fine-grained features. Once you place your products and features on a positive-negative scale, you know where to focus your efforts to maximise your strengths and neutralise your weaknesses.

Competitive intelligence

Your brand doesn’t exist in a vacuum – let’s broaden our research scope. Text Analytics allows you to answer the above questions not only for your own brand, but also for your competitors. Thus, you will learn about the marketing, positioning and branding of your competitors to better differentiate yourself and present your USPs in a sharp and convincing manner. You can also analyse competitor products to learn what they did right – especially on those features where your own company went wrong. And, in a more strategic perspective, Text Analytics allows you to monitor technological trends to respond early to market developments.

So what?

How to show that your findings are not only accurate and correct, but also relevant to business success? Using continuous real-time monitoring, you can track your online KPIs and validate your actions based on the response of the market. Concept extraction can be used to measure the changes in brand awareness and relevance, whereas sentiment analysis shows how brand, product and product feature perceptions have improved based on your efforts.

With the right tools, Text Analytics can be efficiently used in stand-alone mode or as a complement to traditional field research. In the age of digitalisation, it allows you to listen to the voice of your market on the Web and turns your insight journey into an engaging path to actionable, transparent market insights.

 

Get in touch with Anacode’s specialists to learn how your business challenges can turn into opportunities with Text Analytics.

Just as the rest of the China’s financial system, the Chinese stock market is subject to rather strict government regulations. However, in recent years, it offers more and more opportunities to risk-tolerant foreign investors.

This report sample provides an overview over the Chinese stock market based on data from the Chinese finance portal 金融界 (http://finance.jrj.com.cn; Finance World).

 

Download the report sample here.

In this white paper, you will learn how we use text and data analytics to extract actionable, statistically relevant insights from Web data. The paper shows how AI and Machine Learning technology can be used to build competitive advantage with a crystal-clear, up-to-date  understanding of customer needs.

Please download the white paper here.

On September 20th, we open the event series “Doing Business in China”, a cooperation between Anacode, TechCode, kleef&co and Portus Corporate Finance GmbH. The series will provide talks, workshops and case studies on different aspects of China market entry, incl. market research, local business development, funding and legal topics.

 

Please download the program here.

In the ideal business world, market and consumer research precedes any marketing activity. The world is not ideal, but when it comes to capricious emerging markets such as China, the need for solid research turns into an acute necessity: sound and specific knowledge of these markets allows to minimize the business risks which go hand in hand with their complexity and volatility. By contrast, an insufficient understanding of market context, customers and competitors can lead to failure, as we have seen at large scale in examples such as Barbie, eBay and BestBuy.

Not surprisingly, market research in China presents a challenge in itself. Multiple factors come into play. First, the Chinese market is inherently difficult to structure and systematize due to its heterogeneity and quick change. Its developments are conditioned by a unique mix of social, political, ethnic and cultural variables. Therefore, they cannot be anticipated by analogies in the familiar context of developed Western markets.

Second, China’s market research industry is relatively young and thus immature: whereas the discipline of market research was introduced in the West at the beginning of the 20th century, it was not until the 1980’s that the first market research unit, a subsidiary of Procter&Gamble, was established in China. Since then, Chinese marketers have gone a long way in mastering Western methods of market research and adapting them to the Chinese reality. However, as of now, the industry is still fragmented and lacks a unified quality standard.

Finally, as a foreign company, you will not only witness the “inherent” challenges of China, but also bump into linguistic, cultural and legal access barriers. The installation of additional intermediaries in the intent of overcoming these – be they consultants, local providers or native employees recruited for that purpose – often does not lead to the expected results. Instead, it further complicates the information flow and pulls the company into a vicious circle of dropping quality at an increased cost.

The potential of social media for market research

Where there is a problem, there is a solution – in the case of China market research, one solution, intriguing and challenging at the same time, is to step out of the comfort zone of familiar methods such as surveys and interviews, and “ride the wave” with social media and advanced analytics technology. More than in any other region of the world, social media in China have developed into a powerful and ubiquitous digital infrastructure. WeChat, the uncontested leader among Chinese social networks, counts 1.1 billion of accounts and 517 millions of daily users; other national platforms such as Weibo and Zhihu, as well as endless topic-specific or regional resources, complete the picture and cover almost all conceivable communication topics – thus contributing to a self-sufficient ecosystem which flourishes hand in hand with the informational liberation of the country after decades of strict censorship.

Chinese social media contains a wealth of information about consumers and markets. This is due, on the one hand, to the strong orientation towards consumption of the Chinese society, and especially of the younger, online-savvy generations. On the other hand, digital channels for sales and service are gaining in popularity, which also contributes to the creation of market-relevant data. Beyond the availability of the relevant data, social media has some additional advantages when compared to traditional, “old-school” research:

  • It is big – millions of posts and comments are posted daily. By contrast, field data projects normally range in the thousands of samples.
  • It is up-to-date — with the appropriate technologies, social media data can be harvested and analysed in near real-time. Traditional field research produces static data for one point in time, with a high cost for subsequent updates and follow-ups.
  • It is to the point – users talk about what is directly relevant to them and invite the researcher to discover and explore. By contrast, market research surveys and interviews prime the respondent to specific topics, thus preshaping and limiting the information he provides.
  • It is authentic – the lack of personal contact often neutralises culture-specific communication barriers. For example, whereas Chinese respondents normally remain polite in face-to-face communication, the Web 2.0 encourages uninhibited, authentic self-expression, often leading to frank negative statements which uncover important opportunities for improvement.
  • Last but not least, it is free – as opposed to data solicitation which comes with a high price per sample and creates a trade-off between cost and data quantity.

Integrating social insights in the organization

Obviously, these advantages come at a price – social media data is not available in the familiar, structured and well-focussed format of market research data. It is online and cannot be directly “imported” into common analytics programs. Besides, most of the data is unstructured and has a high degree of noise. Inside a company, three ingredients should be mingled to successfully generate insights from social media:

  1. Technology and tools
  2. Data science expertise
  3. Mastery of the business context

Tools ensure the feasibility of the research – the right technology will allow to collect data that contains the relevant information and to actually extract this information. In most cases, there will be no single “one-stop shop” that can do the job. Instead, multiple tools are combined into a pipeline that produces detailed findings and is customized to the specific business circumstances. Special attention should be paid to the technical details behind unstructured data analytics. While applications in this domain are often marketed based on alluring concepts such as Artificial Intelligence, Machine Learning and Natural Language Processing, the algorithms don’t always produce high-accuracy results and thus can devaluate even the smartest data strategy.

Data science expertise is needed to pick and mix the right tools so as to produce relevant and correct output. The data scientist makes sure that the right tools are correctly integrated into an analysis pipeline. The main requirement is that the pipeline produces results that are maximally close to concrete actions and decisions. A point which is often neglected here is the cleaning and preprocessing of the data: as noted above, social media data comes with high levels of noise in form of spam, advertising etc. The “garbage in, garbage out” principle applies at full scale – thus, before going into analysis, the data should undergo a carefully designed cleaning and filtering process.

Finally, mastery of the business context is required to use the social media tool set with maximum benefit. On the input side, this means translating business issues and reframing information needs into the query framework of the used data and applications. On the output side, the analysis results are fed back into the real-world business context and translated into concrete and actionable insights.

Adopting social media for market research is a challenge that requires the right tools, skills and judgment. However, efforts put into designing a customized social insight strategy will pay off and solve many a productivity issue associated with traditional market research. Especially in a market as volatile and diverse as China, leapfrogging over familiar research methods to leverage advanced analytics and the wealth of available online data appears to be a promising, future-proof strategy for an up-to-date and actionable understanding of the market.

 

Janna Lipenkova, CEO Anacode GmbH

This report provides a descriptive overview of the Chinese Web 2.0 landscape for automotive feedback, focussing on BMW 7 Series and comparing it with Audi A8 and Mercedes-Benz S-Class. The feedback is analysed both from a qualitative and a quantitative perspective. The main observations and findings are as follows:

  • Popular topics and concepts: We find that users are most concerned about the price and optical aspects (design, visual appearance) of the three considered series. Competitor brands that are discussed in a comparative perspective are mostly high-end or consumer-oriented foreign brands from Germany, US and Japan, whereas native Chinese brands are much less frequent. Geographically, users concentrate in the big cities and more affluent regions along the East coast.
  • Temporal evolutions: The quantity of buzz grows relatively evenly for all three series before 2015, with BMW 7 and S-Class leading. In 2015 – 2016, there is a burst in the quantity of data for BMW 7, which correlates with the introduction of the new generation of the series.
  • User satisfaction and sentiment: Users are generally satisfied with the frequently mentioned major product features of BMW 7. There are, however, some categories that are perceived negatively – specifically, components related to the front part of the car, the fuel consumption and aspects related to acoustic quality and insulation.
  • Social influencers: Among the key influencers on WeChat, China’s leading social network, we mostly find media accounts posting on general automotive topics. There are no accounts with a wide social reach that would specialize on the BMW brand. Thus, influencer marketing is an opportunity yet to be explored by BMW’s marketing and branding strategy.

Download the social report.