The full title of this book is Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, which sums up the book, certainly, and at the same time the author’s  approach. In practice she is less concerned about the math, more about the social and political implications and situation. She writes from a liberal standpoint, identifying political consequences of big data, and I agree with her sentiments completely; It’s a wonderful book, yet it is perhaps slightly churlish of me to want to know more about how to apply the maths sensibly, that is,  how to use algorithms that include some human component. For O’Neil, it’s not the maths, it’s the inequality; I optimistically want to write the most democratic, equal-opportunity algorithms, in support of the liberal values she support.

The inequality is blatant. O’Neil reveals very powerfully the blatant inequality of algorithmic systems she describes: “the human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves” (p10)

Like Nate Silver, O’Neil tries to apply her arguments to many domains, including baseball, which, O’Neil claims, are fair because they are transparent – you can see what the algorithm is doing. “Everyone has access to the stats and can understand more or less how they’re interpreted”. O’Neil goes further: “Baseball has statistical rigor. Its gurus have an immense data set to hand … moreover, their data is highly relevant to the outcomes they are trying to predict.”

However, sports are an example of an outcome highly dependent on luck. This doesn’t mean it is not possible to say anything meaningful about it. A fascinating paper, “Luck is hard to beat: the difficult of sports prediction” takes four sports, basketball, handball, soccer and volleyball, and counts the results over several seasons to determine which of them is most influenced by luck. Perhaps unsurprisingly, they find basketball to be the least luck-influence sport of the four. While I admire the rigour of their findings, they don’t mention the obvious common-sense reason why that should be. When the final score in a sport is 1-0, it’s not surprising that luck plays more of a part than a sport where a typical final score is 86-82 or similar. In soccer, one slight mistake may result in a loss, something very unlikely in basketball. Although we can verify this experimentally, we know it as part of our mathematical common-sense.

Now, O’Neil’s complaint is the absence of real data for many of the examples she looks at. By substituting proxies for real data, their results have far less reliable conclusions. Here, then, is an effective message from this book: use a big enough training set, and don’t use proxies for real data.

This is my point: we need a mathematical literacy to apply algorithms, some mathematical common-sense to be able to look at algorithmic conclusions and to have a common-sense validity check on them. In the English Premier League for soccer, for example, one of the most lucrative leagues in the world, there is a clear correlation between team performance and aggregate player wages. The teams that pay their players most get the best results. Of course, we know that the conclusion is not therefore that if you take a minor team and pay the players higher wages, you will therefore create a winning team. The maths doesn’t tell you this: it simply shows you correlation, without any common sense. This example may seem trite until you read the book and the way the educational authorities took one classroom of student data, around 30 students, and treated it as statistically valid. Not surprisingly, such a small sample set is very easy to manipulate, and in one of her telling examples a teacher was fired because the class teacher in the preceding year had faked her data to show better results. In consequence, the teacher’s class appeared to have made no progress during the year – and hence she was fired.

This is a great example; perhaps the best in the book. Unfortunately, many of the other examples that Cathy O’Neil provides, while very powerful, demonstrate political, not mathematical, unfairness. I would have preferred more biography and more of Ms Neil using her mathematical knowledge to reveal how we should apply these new techniques. We have to remember as we plough through the book that these algorithms are not in themselves evil; I would have welcomed more case studies of how they can be applied positively. O’Neil touches on some ideas – for example, measuring positive signals from the experience of prison, such as better food, or more sports: “the goal would be to optimize prisons … for the benefit of both the prisoners and for society at large.” Yet, as she points out, the reason this is not done is a political reason, not a failure of an algorithm.

The full title of this book is Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, which sums up the book, and at the same time the author’s  approach. She is less concerned about the math, more about the social and political implications and situation. She writes from a liberal standpoint, identifying political consequences of big data, and I agree with her sentiments completely; It’s a wonderful book, yet it is perhaps slightly churlish of me to want to know more about how to apply the maths sensibly, that is, algorithms that include some human component. For O’Neil, it’s not the maths, it’s the inequality; I optimistically want to write the most democratic, equal-opportunity algorithms, in support of the liberal values.

The inequality is blatant. O’Neil reveals very powerfully the blatant inequality of algorithmic systems she describes: “the human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves” (p10)

 

Like Nate Silver, O’Neil tries to apply her arguments to many domains, including baseball, which, O’Neil claims, are fair because they are transparent – you can see what the algorithm is doing. “Everyone has access to the stats and can understand more or less how they’re interpreted”. O’Neil goes further: “Baseball has statistical rigor. Its gurus have an immense data set to hand … moreover, their data is highly relevant to the outcomes they are trying to predict.”

However, sports are an example of an outcome highly dependent on luck. This doesn’t mean it is not possible to say anything meaningful about it. A fascinating paper, “Luck is hard to beat: the difficult of sports prediction” [Aoki 17] takes four sports, basketball, handball, soccer and volleyball, and counts the results over several seasons to determine which of them is most influenced by luck. Perhaps unsurprisingly, they find basketball to be the least luck-influence sport of the four. While I admire the rigour of their findings, they don’t mention the obvious common-sense reason why that should be. When the final score in a sport is 1-0, it’s not surprising that luck plays more of a part than a sport where a typical final score is 86-82 or similar. In soccer, one slight mistake may result in a loss, something very unlikely in basketball. Although we can verify this experimentally, we know it as part of our mathematical common-sense.

Now, O’Neil’s complaint is the absence of real data for many of the examples she looks at. By substituting proxies for real data, their results have far less reliable conclusions. Here, then, is an effective message from this book: use a big enough training set, and don’t use proxies for real data.

This is my point: we need a mathematical literacy to apply algorithms, some mathematical common-sense to be able to look at algorithmic conclusions and to have a common-sense validity check on them. In the English Premier League for soccer, for example, one of the most lucrative leagues in the world, there is a clear correlation between team performance and aggregate player wages. The teams that pay their players most get the best results. Of course, we know that the conclusion is not therefore that if you take a minor team and pay the players higher wages, you will therefore create a winning team. The maths doesn’t tell you this: it simply shows you correlation, without any common sense. This example may seem trite until you read the book and the way the educational authorities took one classroom of student data, around 30 students, and treated it as statistically valid. Not surprisingly, such a small sample set is very easy to manipulate, and in one of her telling examples a teacher was fired because the class teacher in the preceding year had faked her data to show better results. In consequence, the teacher’s class appeared to have made no progress during the year – and hence she was fired.

 

This is a great example; perhaps the best in the book. Unfortunately, many of the other examples that Cathy O’Neil provides, while very powerful, demonstrate political, not mathematical, unfairness. I would have preferred more biography and more of Ms O’Neil using her mathematical knowledge to reveal how we should apply these new techniques. We have to remember as we plough through the book that these algorithms are not in themselves evil; I would have welcomed more case studies of how they can be applied positively. O’Neil touches on some ideas – for example, measuring positive signals from the experience of prison, such as better food, or more sports: “the goal would be to optimize prisons … for the benefit of both the prisoners and for society at large.” Yet, as she points out, the reason this is not done is a political reason, not a failure of an algorithm.

Aoki 17: Aoki, Raquel YS, Renato M. Assuncao, and Pedro OS Vaz de Melo. “Luck Is Hard to Beat: The Difficulty of Sports Prediction.” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining  – KDD ’17, 2017, 1367–76. https://doi.org/10.1145/3097983.3098045.