Author: Janna Lipenkova

In the past years, the tech world has seen a surge of Natural Language Processing (NLP) applications in various areas, including adtech, publishing, customer service and market intelligence. According to Gartner’s hype cycle, NLP has reached the peak of inflated expectations in 2018. Many businesses see it as a “go-to” solution to generate value from the 80% of business-relevant data that comes in unstructured form. To put it simply – NLP is wildly adopted with wildly variable success.

In this article, I share some practical advice for the smooth integration of NLP into your tech stack. The advice summarizes the experience I have accumulated on my journey with NLP — through academia, a number of industry projects, and my own company which develops NLP-driven applications for international market intelligence. The article does not provide technical details but focusses on organisational factors including hiring, communication and expectation management.

Before starting out on NLP, you should meditate on two questions:

1. Is a unique NLP component critical for the core business of our company?

Example: Imagine you are a hosting company. You want to optimise your customer service by analysing incoming customer requests with NLP. Most likely, this enhancement will not be part of your critical path activities. By contrast, a business in targeted advertising should try to make sure it does not fall behind on NLP — this could significantly weaken its competitive position.

2. Do we have the internal competence to develop IP-relevant NLP technology?

Example: You hired and successfully integrated a PhD in Computational Linguistics with the freedom to design new solutions. She will likely be motivated to enrich the IP portfolio of your company. However, if you are hiring middle-level data scientists without a clear focus on language that need to split their time between data science and engineering tasks, don’t expect a unique IP contribution. Most likely, they will fall back on ready-made algorithms due to lack of time and mastery of the underlying details.

Hint 1: if your answers are “yes” and “no” — you are in trouble! You’d better identify technological differentiators that do match your core competence.

Hint 2: if your answers are “yes” and “yes” — stop reading and get to work. Your NLP roadmap should already be defined by your specialists to achieve the business- specific objectives.

If you are still there, don’t worry – the rest will soon fall in place. There are three levels at which you can “do NLP”:

  1. Black belt level, reaching deep into mathematical and linguistic subtleties
  2. Training & tuning level, mostly plugging in existing NLP/ML libraries
  3. Blackbox level, relying on “buying” third-party NLP

The black belt level

Let’s elaborate: the first, fundamental level is our “black belt”.  This level comes close to computational linguistics, the academic counterpart of NLP. The folks here often split into two camps — the mathematicians and the linguists. The camps might well befriend each other, but the mindsets and the way of doing things will still differ.

The math guys are not afraid of things like matrix calculus and will strive on details of newest methods of optimisation and evaluation. At the risk of leaving out linguistic details, they will generally take the lead on improving the recall of your algorithms. The linguists were raised either on highly complex generative or constraint-based grammar formalisms, or alternative frameworks such as cognitive grammar. These give more room to imagination but also allow for formal vagueness. They will gravitate towards writing syntactic and semantic rules and compiling lexica, often needing their own sandbox and taking care of the precision part. Depending on how you handle communication and integration between the two camps, their collaboration can either block productivity or open up exciting opportunities.

In general, if you can inject a dose of pragmatism into the academic perfectionism you can create a unique competitive advantage. If you can efficiently combine mathematicians and linguists on your team — even better! But be aware that you have to sell them on an honest vision — and then, follow through. Doing hard fundamental work without seeing its impact on the business would be a frustrating and demotivating experience for your team.

The training & tuning level

The second level involves the training and tuning of models using existing algorithms. In practice, most of the time will be spent on data preparation, training data creation and feature engineering. The core tasks — training and tuning — do not require much effort. At this level, your people will be data scientists pushing the boundaries of open-source packages, such as nltk, scikit-learn, spacy and tensorflow, for NLP and/or machine learning. They will invent new and not always academically justified ways of extending training data, engineering features and applying their intuition for surface-side tweaking. The goal is to train well-understood algorithms such as NER, categorisation and sentiment analysis, customized to the specific data at your company.

The good thing here is that there are plenty of great open-source packages out there. Most of them will still leave you with enough flexibility to optimize them to your specific use case. The risk is on the side of HR — many roads lead to data science. Data scientists are often self-taught and have a rather interdisciplinary background. Thus, they will not always have the innate academic rigour of level 1 scientists. As deadlines or budgets tighten, your team might get loose on training and evaluation methods, thus accumulating significant technical debt.

The blackbox level

On the third level is a “blackbox” where you buy NLP. Your developers will mostly consume paid APIs that provide the standard algorithm outputs out-of-the-box, such as Rosette, Semantria and Bitext (cf. this post for an extensive review of existing APIs). Ideally, your data scientists will be working alongside business analysts or subject matter experts. For example, if you are doing competitive intelligence, your business analysts will be the ones to design a model which contains your competitors, their technologies and products.

At the blackbox level, make sure you buy NLP only from black belts! With this secured, one of the obvious advantages of outsourcing NLP is that you avoid the risk of diluting your technological focus. The risk is a lack of flexibility — with time, your requirements will get more and more specific. The better your integration policy, the higher the risk that your API will stop satisfying your requirements. It is also advisable to invest into manual quality assurance to make sure the API outputs deliver high quality.

Final Thoughts

So, where do you start? Of course, it depends — some practical advice:

  • Talk to your tech folks about your business objectives. Let them research and prototype and start out on level 2 or 3.
  • Make sure your team doesn’t get stuck in low-level details of level 1 too early. This might lead to significant slips in time and budget since a huge amount of knowledge and training is required.
  • Don’t hesitate — you can always consider a transition between 2 and 3 further down the path (by the way, this works in any direction). The transition can be efficiently combined with the generally unavoidable refactoring of your system.
  • If you manage to build up a compelling business case with NLP — welcome to the club, you can use it to attract first-class specialists and add to your uniqueness by working on level 1!

About the author: Janna Lipenkova holds a PhD in Computational Linguistics and is the CEO of Anacode, a provider of tech-based solutions for international market intelligence. Find out more about our solution here

We are excited to start our cooperation with GIM Gesellschaft für innovative Marktforschung mbH, which gives us the opportunity to benefit from the long-standing expertise of GIM in the area of market research. Together, we are going to work on innovative approaches to consumer insight and produce the best blend of “traditional” and technology-based methods. Read more…

In the current flood of Business Intelligence and insight tools, there is a phrase causing users to abandon the fanciest tools and leading to serious self-doubt for the provider – the “so what?” question. Indeed, your high-quality analytics application might spit out accurate, statistically valid data and package them into intuitive visualisations – but if you stop there, your data has not yet become a basis for decision and action. Most users will be lost or depend on the help and expertise of a business translator, thus creating additional bumps on their journey to data-driven action.

In this article, we focus on applications of Web-based Text Analytics – not “under-the-hood” technological details, but the practical use of Text Analytics and Natural Language Processing (NLP) to answer central business questions. Equipped with this knowledge, you will be able to tap into the full power of Text Analytics and fully benefit from large-scale data coverage and machine intelligence. A real-time mastery of the oceans of data floating on the Web will allow you to make your market decisions and moves with ease and confidence.

 

1. The basics

Before diving into details, let’s first get an understanding of how Text Analytics works. Text Analytics starts out with raw, semi-structured data – text combined with some metadata. The metadata have a custom format, although some fields, such as dates and authors, are pretty consistent across different data sources. The first step is a one-by-one analysis of these datapoints, resulting in a structured data basis with a unified schema. Even more important than the structuring is the transformation of the data from qualitative to quantitative. This transformation enables the second step – aggregation, which condenses a huge number of structured representations into a small number of consolidated and meaningful analyses, ready for visualization and interpretation by the end user.

2. Answering questions with Text Analytics

A number of questions can be answered with Text Analytics and NLP. Let’s start with the basics – what do users talk about and how do they talk about it? We’ll be providing examples from the Chinese social media landscape on the way.

First, the what – what is relevant, popular or even hot? This question can be answered with two algorithms:

  • Text categorisation classifies a text into one or multiple predefined categories. The category doesn’t need to explicitly be named in the text – instead, the algorithm takes words and their combinations as cues (so-called features) to recognise the category of the text. Text categorisation is a coarse-grained algorithm and thus well-suited for initial filtering or getting an overview over the dataset. For example, the following chart shows the categorisation of blog articles around the topic of automotive connectivity:
  • Concept extraction digs more into depth and identifies concepts such as brands, companies, locations and people that are directly mentioned in the text. Thus, it can identify multiple concepts of different types, and each concept can occur multiple times in the text. For example, the following chart shows mention frequencies for the most common automotive brands in the Chinese social web in February 2018:

Using time series analysis in the aggregation, text categorisation and concept extraction can be used to identify upcoming trends and topics. Let’s look into the time development for Volkswagen, the most frequent auto brand:

Once we have identified what people talk about, it is time to dig deeper and understand howthey talk about it. Sentiment analysis allows to analyze how the topics and concepts are perceived by customers and other stakeholders. Again, sentiment analysis can be applied at different levels: whole texts can be analysed for an initial overview. At an advanced stage, sentiment analysis can be applied to specific concepts to answer more detailed questions. Thus, competitor brands can be analysed for sentiment to determine the current rank of one’s own brand. Products can be analysed to find out where to focus improvement efforts. And finally, product features are analysed for sentiment to understand how to actually make improvements. As an example, the following chart shows the most positively perceived models for Audi in the Chinese web:

3. From insights to actions

Insights from Web-based Text Analytics can be directly integrated into marketing activities, product development and competitive strategy.

Marketing intelligence

By analysing the contexts in which your products are discussed, you learn the “soft” facts which are central for marketing success, such as less tangible connotations of your offering – these can be used as hints to optimise your communication. You can also understand the interest profile of your target crowd and use it to improve your story and wording. Finally, Text Analytics allows to monitor the response to your marketing efforts in terms of awareness, attitude and sentiment.

Product intelligence

With Text Analytics, you can zoom in on awareness and attitudes about your own products and find out their most relevant aspects with concept extraction. Using sentiment analysis, you can compare the perception of different products amongst each other and focus on their fine-grained features. Once you place your products and features on a positive-negative scale, you know where to focus your efforts to maximise your strengths and neutralise your weaknesses.

Competitive intelligence

Your brand doesn’t exist in a vacuum – let’s broaden our research scope. Text Analytics allows you to answer the above questions not only for your own brand, but also for your competitors. Thus, you will learn about the marketing, positioning and branding of your competitors to better differentiate yourself and present your USPs in a sharp and convincing manner. You can also analyse competitor products to learn what they did right – especially on those features where your own company went wrong. And, in a more strategic perspective, Text Analytics allows you to monitor technological trends to respond early to market developments.

So what?

How to show that your findings are not only accurate and correct, but also relevant to business success? Using continuous real-time monitoring, you can track your online KPIs and validate your actions based on the response of the market. Concept extraction can be used to measure the changes in brand awareness and relevance, whereas sentiment analysis shows how brand, product and product feature perceptions have improved based on your efforts.

With the right tools, Text Analytics can be efficiently used in stand-alone mode or as a complement to traditional field research. In the age of digitalisation, it allows you to listen to the voice of your market on the Web and turns your insight journey into an engaging path to actionable, transparent market insights.

 

Get in touch with Anacode’s specialists to learn how your business challenges can turn into opportunities with Text Analytics.

Just as the rest of the China’s financial system, the Chinese stock market is subject to rather strict government regulations. However, in recent years, it offers more and more opportunities to risk-tolerant foreign investors.

This report sample provides an overview over the Chinese stock market based on data from the Chinese finance portal 金融界 (http://finance.jrj.com.cn; Finance World).

 

Download the report sample here.

This report provides a descriptive overview of the Chinese Web 2.0 landscape for automotive feedback, focussing on BMW 7 Series and comparing it with Audi A8 and Mercedes-Benz S-Class. The feedback is analysed both from a qualitative and a quantitative perspective. The main observations and findings are as follows:

  • Popular topics and concepts: We find that users are most concerned about the price and optical aspects (design, visual appearance) of the three considered series. Competitor brands that are discussed in a comparative perspective are mostly high-end or consumer-oriented foreign brands from Germany, US and Japan, whereas native Chinese brands are much less frequent. Geographically, users concentrate in the big cities and more affluent regions along the East coast.
  • Temporal evolutions: The quantity of buzz grows relatively evenly for all three series before 2015, with BMW 7 and S-Class leading. In 2015 – 2016, there is a burst in the quantity of data for BMW 7, which correlates with the introduction of the new generation of the series.
  • User satisfaction and sentiment: Users are generally satisfied with the frequently mentioned major product features of BMW 7. There are, however, some categories that are perceived negatively – specifically, components related to the front part of the car, the fuel consumption and aspects related to acoustic quality and insulation.
  • Social influencers: Among the key influencers on WeChat, China’s leading social network, we mostly find media accounts posting on general automotive topics. There are no accounts with a wide social reach that would specialize on the BMW brand. Thus, influencer marketing is an opportunity yet to be explored by BMW’s marketing and branding strategy.

Download the social report.