This presentation at the London Text Analytics Meetup (January 2018) alarmed me a little. The title “Reducing Misinformation” sounds a bit like the heavy-handed slogan painted on the sides of Thames Valley Police Force cars some years ago: “Reducing Crime, Disorder and Fear”. Somehow an admirable goal had been mixed with a rather oppressive approach. It smacked a little of late-night security squads taking out suspects.
As it happens, the presentation, by David Corney of new startup Factmata, at The London Text Analytics Meetup, could not have been more polite – even apologetic; the speaker told us more than once how sorry he was to be white and male, as this meant he was not subjected to lots of the hate speech directed at other groups.
Factmata’s business area is working to identify hate speech, fake news, and other unsavoury aspects of the Web. This turns out to be a surprisingly complicated area to deal with – or perhaps it was just the approach his company was taking. One of the most contentious parts of his talk was about how to identify hate speech; his method was to use humans to mark tweets for “misinformation”, in fact two groups of humans – one using CrowdFlower, and the other group feminists. The thinking behind this was that victims of hate speech are the best ones to detect slurs.
The basic premise is simple. The Web, and social media, is largely judged by popularity as a substitute for quality. The PageRank aalgorithm is easily fooled. Retweeting may mean the tweet is endorsed or that it is appalling – we often don’t know which. Generally, sharing a site implies approval.
They aim to enhance PageRank using credibility and quality scores, and have launched their own news site where users are encouraged to endorse and to comment on the news there (he didn’t show an example).
Factmata’s goal is admirable, but their naivety was startling. The question and answer session at the end of the talk revealed some of the gaps in the argument. From the presentation, suggested one questioner, this is clearly an area that does not lend itself to machine learning – it is closer in some ways to spam, and we know how spam is dealt with quite effectively in the main email packages, largely by identifying individual words.
The same could be done to identify the bulk of hate speech – together by identifying sites linking to and from a questionable site.
Sadly, I don’t think the goals are likely to be achieved. The companies with a vested interest in identifying and removing hate speech – Google, Facebook and Twitter – have already stated they are not willing for Factmata to scrape their content. The attempt to identify how “good” a website is will be fraught with difficulty about who is doing the determination. The very term “misinformation” suggests a simple world where facts, tweets and websites are either right or wrong; there seemed to be no awareness of the chaos that might ensue in trying to be a supposedly neutral arbiter. “We aren’t going to fact check the Bible – that’s not the way we are going”, he said confidently, but in practice things might prove a little more difficult. He mentioned himself the use of derogatory and sexist language in rap music. Is this offensive or not? It’s certainly not a challenge a machine can solve any time soon.