In-Language Coding for International Market Research

By Jen Horner

Market researchers have long relied on “open ended” questions when interviewing subjects about their consumer experiences. The method has obvious benefits over multiple choice or “closed-ended” formats; if you ask people to explain things in their own words, unexpected concepts and issues can emerge. Some of the most amazing “a-ha!” moments can come from this type of qualitative research, but it’s more costly to analyze than closed-ended survey results (multiple choice, Likert scales). Typically, human “coders” need to categorize and interpret qualitative data in order to extract useful information for the client.

Now that practically every major retail website encourages comments and reviews from customers, marketers have access to a gold mine of online commentary. Add to that the tsunami of opinion on social media, and you have a potentially unmanageable volume of information. Brand managers know that “social listening” is essential to understanding how their brand is perceived, but extracting insights from a deluge of raw material is still a challenge.

What about automated processing?

Software solutions can mine text for recurring terms and phrases and generate automatic reports of their frequency. Researchers are hard at work developing natural language processing software for textual analysis of large data, and still more are developing “sentiment analysis” technology for identifying the moods and feelings expressed in volumes of English-language text. These are already in use in English for some purposes, and inroads have been made in other languages as well, but they still aren’t widely available, especially for multilingual data. At this writing, humans have a striking advantage when it comes to understanding context, tone, and meaning.

How, then, can one analyze large volumes of feedback from customers who use languages other than English? The most time-consuming, costly, and risky method is to translate the entire corpus of open-ended responses, then code and analyze the English translations. It’s time-consuming and costly for obvious reasons, and it’s risky because unless you have highly skilled (and highly paid) translators, much of the nuance of the feedback will be lost, and the affect stripped away. Machine translation– especially for Asian languages – is still very rough. It’s useful for gisting and triage, but can’t catch subtlety and nuance. Well-translated feedback can be a great resource, but to translate all of it can be prohibitively expensive.

One solution: in-language coding

Here is where “in-language coding” comes in. Instead of translating all the data, then analyzing it in English, native language analysts review and code the untranslated data in its original language. Selected results are then translated into English in order to illustrate the findings.

It might seem like a leap of faith for researchers to embrace this option. As with all localization projects (and research projects), careful preparation helps ensure optimal results.