When small words mean a lot: Transcripts, black boxes and evaluation

By Gordon Rugg

You can get a fair amount of information out of what people tell you in interviews and questionnaires and focus groups. However, you can’t get at all the information in a person’s head using those methods. The result is that you often have to use different methods, and/or that you have to glean more information out of what you got with the interviews or questionnaires or focus groups.

One very rich source of information is small, apparently insignificant words that people use; words that often get left out of transcripts because they’re not “real words” or because they’re swearwords or whatever.

This article is about how you can use these words to get an extra dimension of information about real-world problems.

Two of my interests are product requirements/evaluation and human error, particularly in safety-critical systems. In both these areas, careful attention to the apparently insignificant words can be extremely useful.

If you’re trying to find out the requirements for a product, or you’re trying to find out users’ reactions to a product, then a very useful technique is think-aloud. It’s what it sounds like; you ask the user to think aloud as they see the product and try it out. It’s a technique that’s very simple to use, and it gives a lot of very useful information very quickly.

Some of the information that you get with think-aloud technique is easy to handle – for instance, if the response is along the lines of “I really love this feature” or “I hate that feature”. Because you’re getting this information in real time, without distortions due to memory, you easily can get at issues that would be missed or distorted in interviews, questionnaires and focus groups, which typically happen after the event.

Other important information, though, is easily missed if you don’t know what to look out for. A classic case is when a user hesitates while trying to use the product, because they’re confused. If that’s happening, then the design needs to be improved, so that users won’t be confused in future. So, it’s important to find out where the users are getting confused.

There are similar issues in human error, where accident investigators often need to find out where things started to go wrong. Again, what often happens is that there’s a key point where the participants in the accident suddenly realise that they don’t know what’s happening.

The image below shows an example of how you can spot that key moment by looking at one particular small word. I’ve used the Search Visualizer software to look at some transcripts of cockpit dialogue, from the voice recorders in black boxes after air crashes. In brief, the red dots show where that word appears in each transcript. The word is “um”.


In some of the transcripts, such as the third from the left, and the sixth, seventh and eighth, the word doesn’t appear in the section shown on screen. In the second from the left, it occurs in four places on screen, scattered throughout the transcript. In the fifth, ninth and tenth transcripts, it only occurs once.

The interesting cases are the first and the fourth transcripts. I’ll focus on the fourth, because it’s such a clear example of the principle. The image shows the complete transcript, and it shows that there’s a sudden cluster of occurrences of the hesitation utterance “um” about halfway through. What’s happening there?

The image below shows the area of transcript immediately surrounding those four instances of “um”.


Before this section, the dialogue is very calm and mundane, with a lot of discussion of customs procedures. Then at 10.45 there’s the sudden “Um Um We are now having an unexpected strong tailwind”. For the next few minutes there’s a lot of questions and uncertainty, and then after this chunk of transcript, everything calms down again.

So, in summary, small words can tell you a lot, if you know what to look for, and if you have an exact record of what is being said. This is one reason that professionals in this general area tend to take a hard line about the quality of transcripts. If you’re using a transcript made by someone else, you soon discover that a surprisingly high percentage of would-be transcribers will mangle the transcript by leaving out occurrences of “um” and “er” and similar utterances. Another common mistake is leaving out swearwords. Swearwords are a valuable source of information, particularly when they only occur a few times in a transcript; they tell you that something is triggering a strong reaction, and if there’s a strong reaction, then you need to know what that something is. Yet another common mistake is attempting to tidy up the grammar. You need to know if the speaker is suddenly becoming less coherent, because that can be a key piece of information.

There are other things that you can tell from small words – there are entire disciplines, such as identifying who wrote a particular document, that rely heavily on what those small words can tell you. That, though, is a topic for another article.

Notes and links:

The Search Visualizer software is available free, online, here:


For the example in this article, I used the Search Visualizer “match whole word” option, combined with the “Single site” option, on the tailstrike.com site.


The Hyde & Rugg blog contains more information about product evaluation, and about think-aloud technique:



About searchvisualizer

We welcome debate and disagreement, but not abuse, trolling or thread derailment. We reserve the time-honoured right of blog owners and moderators to be arbitrary, capricious and autocratic in our wielding of the ban hammer. Gordon Rugg is a former timberyard worker, archaeologist and English lecturer who ended up in computer science via psychology. He’s the same Gordon Rugg who did the Voynich Manuscript work, and the books with Marian Petre about research. He’s co-inventor of the Search Visualizer.
This entry was posted in About SV, textual analysis and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s