Partial word matching and whole word matching in SV

This option is located under the “More options” link beneath the green “Search” button.

Image

If you click on that link, you get the option to choose between partial word matching and whole word matching.

Image

Each of these has strengths and weaknesses.

 

Partial word matching

Partial word matching is useful for catching variants on the same word – for instance, organisation, organised, organiser.

The disadvantage is that it can also catch words that are unrelated, or related but misleading. Suppose, for instance, that you’re looking at how often the words men and women are used in a particular set of documents. If you did this using the “partial word match” setting, then the keyword men would show up not only in men but also in women.

Whole word searching

Whole word searching does what it says – it only returns a hit if it finds an exact match for the entire word that you’ve used as a keyword.

This can be very useful for avoiding false hits, like the men/women example above. However, it has the disadvantage that it will not show relevant words which contain the keyword (for instance, if you search for sing it won’t show singer or singing).

Hints and tips

A robust way of combining the strengths of both the “whole word” and “partial word” options is to use the SV synonym feature in combination with the “whole word” option. If you were searching for variations on sing, for instance, you could select the “whole word” option and then enter sing,singer,singers,singing,sang,song as synonym keywords of each other. You use the synonym function by simply typing in the words that you want to treat as synonyms, separated by commas, without spaces after the commas (as in the “sing,singer,singers” etc example. If you want to combine this with other keywords, such as opera then you simply leave a space after the synonyms and add the word opera to your keyword list.

Image

There’s a separate blog article which gives more detail about synonym search.

Sometimes you can work round the limitations of the “partial word match” option by choosing the order of your keywords so that the second keyword over-rides any false matches from the first keyword. For example, if you were looking at how often Shakespeare uses the words man and woman, then using those keywords in that order would probably get you fairly accurate results, whereas if you used the same keywords in the opposite order (woman and man) then you’d get a lot of false positives from woman. We don’t recommend this approach, since it can go wrong in various creative ways, but you may find it useful in some circumstances.

Gordon Rugg

Advertisements

About searchvisualizer

We welcome debate and disagreement, but not abuse, trolling or thread derailment. We reserve the time-honoured right of blog owners and moderators to be arbitrary, capricious and autocratic in our wielding of the ban hammer. Gordon Rugg is a former timberyard worker, archaeologist and English lecturer who ended up in computer science via psychology. He’s the same Gordon Rugg who did the Voynich Manuscript work, and the books with Marian Petre about research. He’s co-inventor of the Search Visualizer.
This entry was posted in About SV. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s