Dan Kolkmann made an interesting point in his recent post about algorithms, entitled, charmingly, ““F**k the algorithm”?: What the world can learn from the UK’s A-level grading fiasco”. The article looks at an example of algorithmic bias, and how best to deal with it.

Kolkmann describes the recent controversy about A-level examination grades: an algorithm was used to adjust examination grades, but it rapidly became clear there were some major flaws in the algorithm. Schools with small numbers of pupils in the class, for example, were given grades considerably higher than anticipated, and it was not clear that such a correlation was warranted. Another cause was “latent algorithmic bias”, the use of historic data to determine a present-day outcome. For example, there has been a significant growth in science journal articles by Chinese academics in the last twenty years. Today, more science articles are published in China than in any other country. Any algorithm using data from as recently as five years ago would not reflect this rapidly changing East-West balance.  

The A-level grading fiasco was memorably described by prime minister Boris Johnson as a “mutant algorithm”, as if algorithms could themselves seize control of the entire process. Anthropomorphism is an exciting way of looking at the problem, but sadly not very helpful.

What is the solution? According to Mr Kolkmann, we need a “critical audience” to scrutinize and evaluate algorithms in practice.

Without a critical audience that opposes algorithms and points out their shortcomings, we will keep hearing about the occasional incident with automated decision making, but never learn of the majority of algorithms which screw-ups never see the light of day.

It’s a lovely idea, but in reality, where is the agency or body that will comprise this “critical audience”? It sounds like he envisages a kind of Which? report, with expert evaluators asking difficult questions about each algorithm. The system breaks down because the independent evaluator has to be funded in some way, and that funding tends to compromise the autonomy of the evaluation. Which? takes no advertising in its publications, for example, in an effort to remove (or to reduce) payment for influence.

 Kolkmann describes the process in terms of opposition, and there is some truth in what he says. Yet the answer is surely closer to home. Why do we need to look for external solutions when this is simply another example of software development in practice? Think about any software. In any well-managed organisation, software innovation takes place when a developer builds a solution to a use case, to the best of their ability, but it is almost unheard of that the resulting software works when first released and goes live without testing. Software isn’t like that. Users, and typically a testing team, get their hands on it, press the wrong buttons, misunderstand what it does, and all these comments are fed back to the development team, who then iterate and continues to iterate until an acceptable level of performance is found.

Today we are familiar with software development and nobody sacks the developer because the first release of their software has bugs. When it comes to using algorithms, however, we still have a romantic idea of the software emerging in perfect shape, like Botticelli’s Venus rising from the waves. It doesn’t happen like that – at least, it shouldn’t happen.

Perhaps the current position of Ai is something like a honeymoon: for a brief moment, nobody doubts it could possibly be anything but perfect. We have lost such innocence when it comes to any other form of software development, so why do we cling to it when it involves AI?