Graham White: My Notes: Text Analytics Project Ends

Monday 12 September 2011

Text Analytics Project Ends

Today sees the end of one of my major work streams for 2011 with a presentation of some research to our sponsors. I've been working for a good chunk of the year researching text analysis, specifically, the automated expression of facts in controlled natural language. It's always nice to see some work come to fruition, well not quite fruition in this case since it's research but at least it's reached an agreed stopping point - for now.

I haven't often been involved with relatively pure research in my day job so that coupled with leading the project presented a few challenges in itself which was most enjoyable. While I can't give away the details, I wanted to express the areas this research concerned here.

The project was a text analytics project, not a new field in itself and a subject on which IBM and my local department (Emerging Technologies) contains many well read and respected experts. For those of you not familiar, text analytics is essentially applying computer systems to text documents such that some sort of processing can be performed e.g. (simple example) the analysis of pages from news web sites to infer what the current news stories are.

One of the complexities we were investigating was natural language processing. This is a major area of research for computer systems at the moment and presents one of the biggest problems of applying computer systems to human written documents. Our brain is able to parse language in ways we've not yet managed to teach computers to do, taking into account context, slang, unknown terms and all sorts of other subtle nuances that make it a hard problem to crack for computers.

My recent work has been investigating how we can express things found in documents in the form of controlled natural language which leads to the question of what on earth is that? Simply put, it's an expression made using normal words but using more rigid semantics than are found in pure natural language. This makes it possible to parse it using a computer but it still feels fairly natural to the human reader as well. This sounds great as you get computers talking a language that feels very usable to humans but with all the added power of memory and processing provided by the computer. It seems to me this approach might only be a stop-gap solution until computers (inevitably it'll happen some time) eventually understand full natural language.

While having a discussion last night with my wife over dinner she expressed a sometimes-heard opinion from her that I occasionally "speak funny". This came to light recently when on holiday in Ireland, I suspect it's a combination of both this type of research seeping into my use of language but also my semi-conscious approach to trialling these techniques in the real world and what better opportunity than when immersed in another English speaking culture.

So, as this article is published I'll be standing at the front of a room of people talking about the details of our work with my colleagues. Wish us luck!

Monday 12 September 2011

Text Analytics Project Ends

No comments: