Graham White: My Notes: nlp

Over the first few months of this year I have been taking part in a mass online learning course in Natural Language Processing (NLP) run by Stanford University. They publicised a group of eight courses at the end of last year and I didn't hesitate to sign up to the Natural Language Processing course knowing it would fit very well with things I'm working on in my professional role where I'm doing more and more with text analytics and continuing my work in speech to text. There were others I could easily have signed up for too, things like security or machine learning, more or less all of them are relevant for something I'm doing. However, given the time commitment required I decided to fully commit to one course and the NLP one was to be it.

I passed the course with a grade of 85% which was well above the required 70% pass mark. However, the effort and time required to get there was way more than I was expecting and quite a lot more than the expected time the lecturers (Chris Manning and Dan Jurafsky) had said. From memory it was an 8 week course with 10 hours a week required effort to complete the work. As it went on the amount of time required went up significantly, so rather than the 80 hours total I think I spent more like 1½ times that at over 120 hours!

There were four of us at work (that I know of) who embarked on the course but due to the commitment of time I've mentioned above only myself and Dale finished. By the way, Dale has written an excellent post on the structure and content of the course so I'd suggest reading his blog for more details on that stuff, there's little point in me re-posting it as he's written such a good summary.

In terms of the participants on the course, it seems to have been quite a success for Stanford University - this is the first time they have run courses in this way it seems. The lecturers gave us some statistics at a couple of strategic points throughout the course and it seems there were around 40,000 people registering an interest, of which around 5000 were watching the lecture material and around 2000 completed the course having taken part in the homework assignments.

I'm glad I committed as much as I did. If I were one of the 5000 just watching the lectures and not doing the homework material I don't think I would have got as much out of it, but the added time required to complete the homework was significant so perhaps there's a trade-off here? It's certainly the first time I've committed this much of my own personal time (it took over the lives of myself and Dale for quite a few weeks) as I was too busy at work to spend many business hours working on the course so it was all done in evenings and weekends. That's certainly one piece of feedback I gave at the end of the course, Stanford could make the course timing more flexible but also allow more time for the course to be completed.

My experience with the way the assignments were marked was a little different to the way Dale has described in his post. I was already very familiar with the concepts of test, development and held-out sets (three different sets of data used when training NLP systems) so wasn't surprised to see that the modules in the course didn't necessarily have an exact answer to them or more precisely that the code your wrote to perfectly analyse some data on your local system may not get full marks as it was marked against a different data set. This may seem unfair but is common practice in all NLP system training that I know of.

All in all, an excellent course that I'm glad I did. From what I hear of the other courses, they're not as deeply involved as the NLP course so I may well give another one a go in the future but for now I need to get a little of my life back and have a well earned rest from education.

Graham White: My Notes

Sunday 27 May 2012

Natural Language Processing Course