Reading Time: 4 minutes
“Partial view of Wikipedia’s category system from 2007. Arrows point from category to sub-category.”

I’ve written before about Wikipedia categories, using the example of a brand of toothpaste, and this post is an attempt to understand them a bit better. And to ask the simple question: what are they for? Who (apart from Wikipedia editors) uses them?

Why does Wikipedia have categories? The article for Euthymol,  a brand of toothpaste, is placed in the category “brands of toothpaste”. Sounds reasonable. The article for Benjamin Disraeli, British prime minister and novelist, has a lot of categories:

That’s a lot of categories – 45, to be precise. The first thing to note is that a list of 45 categories does not work well with human navigation. Humans, as UX experts will tell you, struggle with numbers over nine or so. This is not being facetious; there is a precedent for menu options being limited to what can be shown in a simple list, and 45 is way beyond the optimum number. Even if all 45 categories were valuable, few users will have the patience to identify the relevant ones from this list.

Secondly, some of these categories seem so broad as to be useless: “English Anglicans”, for example. Disraeli was certainly for some of his life an English Anglican, but so were millions of others. It wouldn’t be difficult to find plenty of English Anglicans who are not similarly categorised in Wikipedia – John Donne, for example.

“English Anglicans” is an example of an incomplete list. I wrote about the idiocies of some categories such as “Scottish dentists” here, since at the time I was writing there were just six names in the category. Today there are 24, but the list is almost certainly incomplete. You could say some reference works, such as the wonderful Haydn’s Dictionary of Dates, did this kind of thing in a much more entertaining way, by being so quirky that the results are entertaining. You don’t mind Haydn’s Dictionary being incomplete; but with Wikipedia, incompleteness makes the attempt trivial and demeans the content.

What is the point of categories? To quote Wikipedia itself, at the Wikipedia page on categories, “categories (along with other features like cross-references, lists, and infoboxes) help to find information, even if you don’t know what exists or what it’s called.” And Wikipedia provides a “Special search box” to locate categories (Special search sounds great, let’s see what it does):

When I search for “Scottish dentists” I am taken to the category of Scottish dentists. When I search for a semantic equivalent, such as “Scottish dental surgeons”, as entered in the search box above, I get “no results found”. So the special search was not so special. Categories clearly do not help me to find information “even if you don’t know … what it’s called”. I didn’t know they call it by that name. We have hit the same problems with metadata that many taxonomies have – if you don’t know what the system calls it, you are stuck. Here is one of the fundamental problems of taxonomies: like crosswords, unless you know what the creator was thinking and think in the same way, they are meaningless.

There is an article in Wikipedia on categories, intended as a primer for Wikipedia editors, all about how to categorize articles, although it does contain another definition and a graphic of a partial view of the Wikipedia category system, dating from 2007:

That diagram seems illogical. If, as the documentation states, “categories are arranged as overlapping trees .. every category apart from the top one must be subcategory of at least one other category.”, then is philosophy a subcategory of belief? This would surprise many philosophers.

So a third objection to categories is that it is a human construct that, like most human constructs, is not and cannot be logically perfect. Wikipedia has created an elaborate system of links that will always be incomplete, and which is not scalable. The number of articles in Wikipedia increases inexorably, but the number of human Wikipedia editors does not. Worse, the creation of categories is governed by a collection of rules which make little sense except to their creators. For example,

Many subcategories have two or more parent categories. For example, Category:British writers should be in both Category:Writers by nationality and Category:British people by occupation. When making one category a subcategory of another, ensure that the members of the subcategory really can be expected (with possibly a few exceptions) to belong to the parent also. Category chains formed by parent–child relationships should never form closed loops; that is, no category should be contained as a subcategory of one of its own subcategories.  If two categories are closely related but are not in a subset relation, then links between them can be included in the text of the category pages.

Did you understand that? Do you need to understand it to use Wikipedia?

There is a category “film series endings by year”, in fact there are 44 categories, such as “1986 film series endings”. And, as for categories that could be added, why not “people who live in Kingston Road, Oxford” because I once lived there myself. I know this would result in millions of new categories and would never be finished, but why not?