Reading Time: 3 minutes

The second year of the Text Analytics Forum took place Washington, DC, last week, and it was good to see the conference building from its first iteration and developing in range.  Tom Reamy, the conference chair and organiser, has shaped this event very much around his view of the text analytics market, and around his book Deep Text (reviewed here).  The event had a great mix of practitioners, users, and an increasing range of solutions were discussed – on which more below.

His view of text analytics is very much an experiential one, based on many assignments with content owners, and has the great value of being thoroughly thought-through, but I think he limits the scope of text analytics in his own presentations rather more than the event as a whole might suggest. Tom is strongly in favour of a rule-based approach, although I tried to show in my presentation that not all analytics problems needed to be solved by rules (and in fact he even suggested himself there were some circumstances where an unsupervised approach works perfectly well).

In his keynote presentation, Tom was very hard on text mining, describing it as “treating words as things rather than understanding them” – but Solr and Elastic Search don’t understand words either, being entirely string-based, and rules are typically dependent on matching strings (and it’s not easy to build a rule that finds “renal” when you search for “kidney”). In any case, text mining can be combined (as UNSILO does) with NLP to provide some meaning-based capability as well, so even if basic text mining is at times meaning-free, it doesn’t mean the result is necessarily inferior to other approaches.

Nor is it true to say that text analytics is primarily auto-classification, since the finding of a peer reviewer for a manuscript of an academic article is not a classification exercise.

But my biggest question was his statement that “deep learning is a dead end in terms of accuracy”, stating it can only reach 60% to 70% accuracy, while rule-based systems could deliver up to 92% accuracy. Such statements as these are meaningless without some description of context, and miss one key point, which is that where a human index is involved in the measurement of a trial, it will inevitably be limited to human agreement – which is rarely above 70%.

The conference itself was admirably wide-ranging, and suggested a number of ways of using text analytics that left everyone at the conference thinking about how they use (or plan to use) this technology. For me, one of the most valuable presentations was not technical at all; it was from the Cognitive Computing Consortium, and showed a simple way of assessing a text analytics application against a number of criteria. Any project could be mapped to a number of sliding scales, such as accuracy or discovery. Key to this visual presentation was the recognition that you could have either one or the other, but moving towards greater accuracy made discovery of necessity more limited. This looks to me like a very simple and effective way of presenting some of the trade-offs from these machine tools. Another useful slogan from Susan Feldman was “no AI without IA”, which I understand to mean there is a very real need for a human managing the AI-based analytics process, to make sure it delivers sensible and effective results. I couldn’t agree more with that.

All in all, an excellent conference, leader (like Tom Reamy’s own book) in a field of its own, and all the more valuable for that. I look forward to next year’s event.