Sunday 27 May 2012

Natural Language Processing Course


Over the first few months of this year I have been taking part in a mass online learning course in Natural Language Processing (NLP) run by Stanford University.  They publicised a group of eight courses at the end of last year and I didn't hesitate to sign up to the Natural Language Processing course knowing it would fit very well with things I'm working on in my professional role where I'm doing more and more with text analytics and continuing my work in speech to text.  There were others I could easily have signed up for too, things like security or machine learning, more or less all of them are relevant for something I'm doing.  However, given the time commitment required I decided to fully commit to one course and the NLP one was to be it.

I passed the course with a grade of 85% which was well above the required 70% pass mark.  However, the effort and time required to get there was way more than I was expecting and quite a lot more than the expected time the lecturers (Chris Manning and Dan Jurafsky) had said.  From memory it was an 8 week course with 10 hours a week required effort to complete the work. As it went on the amount of time required went up significantly, so rather than the 80 hours total I think I spent more like 1½ times that at over 120 hours!

There were four of us at work (that I know of) who embarked on the course but due to the commitment of time I've mentioned above only myself and Dale finished.  By the way, Dale has written an excellent post on the structure and content of the course so I'd suggest reading his blog for more details on that stuff, there's little point in me re-posting it as he's written such a good summary.

In terms of the participants on the course, it seems to have been quite a success for Stanford University - this is the first time they have run courses in this way it seems.  The lecturers gave us some statistics at a couple of strategic points throughout the course and it seems there were around 40,000 people registering an interest, of which around 5000 were watching the lecture material and around 2000 completed the course having taken part in the homework assignments.

I'm glad I committed as much as I did.  If I were one of the 5000 just watching the lectures and not doing the homework material I don't think I would have got as much out of it, but the added time required to complete the homework was significant so perhaps there's a trade-off here?  It's certainly the first time I've committed this much of my own personal time (it took over the lives of myself and Dale for quite a few weeks) as I was too busy at work to spend many business hours working on the course so it was all done in evenings and weekends.  That's certainly one piece of feedback I gave at the end of the course, Stanford could make the course timing more flexible but also allow more time for the course to be completed.

My experience with the way the assignments were marked was a little different to the way Dale has described in his post.  I was already very familiar with the concepts of test, development and held-out sets (three different sets of data used when training NLP systems) so wasn't surprised to see that the modules in the course didn't necessarily have an exact answer to them or more precisely that the code your wrote to perfectly analyse some data on your local system may not get full marks as it was marked against a different data set.  This may seem unfair but is common practice in all NLP system training that I know of.

All in all, an excellent course that I'm glad I did.  From what I hear of the other courses, they're not as deeply involved as the NLP course so I may well give another one a go in the future but for now I need to get a little of my life back and have a well earned rest from education.

Tuesday 22 May 2012

New PC Install Notes


This post documents how I installed Linux and Windows side-by-side in a dual boot configuration.  This isn't something particularly difficult to do but I wanted to note down what I did so I remember in years to come.  Also, my new PC build contains some very up-to-date hardware (such as UEFI and an SSD drive) and with the combination of the many changes and updates to Fedora 17 (F17) and Windows 7 SP1 (Win7) it made the installation a first of a kind for me in quite a few different ways.

This post may well be updated in the future if I do further installation tweaks and optimisations to the system.  There's also clearly a lot more you can do in terms of system set up than I've written here, installation of drivers in Win7 very much being one of those.  This is simply intended as an installation and base optimisation guide.

Create USB Boot Sticks

Due to a fault with the DVD writer I ordered, it's on its way back to the retailer for replacement.  Not one to let this stop an installation going ahead I wrote install images for F17 and Win7 onto USB stick.

F17 is still in beta with the full release delayed until 29th May at the time of writing.  I tried writing and booting from the KDE Spin Beta Live image but it stopped with a kernel panic during the boot process after the grub screen.  I found using a later copy of the boot.iso (which is the same as a netinst.iso) from the distribution nightly builds solved the problem so clearly whatever the bug is on the beta image it's already been solved.

Creating a USB stick for Fedora is pretty easy these days, no messing around on the command line, I decided to use the liveusb-creator utility.  It fires up a GUI from which you basically just select the iso image you want to write and the USB device to write to, then hit "go" and it does the magic for you.

For me, writing the Windows stick was a little more complicated as it requires a Windows system to run the stick writing utility with the somewhat strange and long winded name of the Windows 7 USB/DVD Download Tool.  Fortunately, I had a Windows 7 VM laying around so I used that and writing the image to USB stick was similarly easy as it was for Fedora.  Simply choose the image file to write, the target device to write it on, then hit "go" and it does the magic for you.

Install Fedora 17

Once I worked around the kernel panic bug I mentioned above and booted with a later F17 image, the installation process was a fairly familiar affair.

With the new PC being a UEFI box there was no need to add the GPT partition when partitioning the SSD. I simply created one large root partition and a 4GB swap partition, leaving half the disk unallocated for the Win7 install later.

Since I didn't boot from the KDE spin image, I swapped out the default Gnome desktop (I've tried hard for a long time to Like Gnome 3 but I just don't so I'm moving back to KDE) and did a lot of package selection using the installer to save me messing around later but also to optimise the amount of data downloaded since I would be installing over my ADSL connection.

Once tip when installing Windows after Linux these days is to ensure the os-prober RPM gets installed.  This was done by default for me.  This package allows grub to detect the presence of other operating systems and add boot entries for them in the grub menu.  It'll come in very handy later on...!

SSD Optimisations for Linux

Even with a distribution as current as Fedora 17, the settings chosen for SSD usage are really rather sub-standard.  There are a lot of handy tips and guides out there for which settings you should change or add to the system to enhance both the performance and longevity of your SSD.  I decided to go with the rather comprehensive information provided for the Arch Linux distribution as they have a great wiki page on SSD optimisation.


Update /etc/fstab

I've only got one SSD in my system at the moment, I've removed all my old hard disks with no intention to use them right now as all my data is stored on my NAS.

Add the "noatime" and "discard" options to SSD partitions.  The discard option is the really super-important one as it turns on TRIM.

Mount /tmp as tmpfs by adding a line similar to the following:
  tmpfs /tmp tmpfs nodev,nosuid,size=75% 0 0

Doing this allows the system to write temporary files to RAM instead of the SSD.  This will improve performance (RAM is faster than SSD) and reduce writes to the SSD (improving the life of the SSD).  I've added an option to increase the allowable size of the tmpfs file system to 75% of RAM from the default of 50%.  This means I can run compilations or other intensive tasks in RAM without ruining the SSD and get the performance benefits too.  With 16GB main RAM, this will allocate up to 12GB for my /tmp directory.  In normal usage I wouldn't expect to use anywhere near this but it's nice to have it there for when I'm not running too many apps but doing something else such as compiling RPMs.

Change the Scheduler

The recommendation for an SSD is to move away from the default cfq IO scheduler and switch over to the noop scheduler.  This can be done in a variety of ways that are documented in the Arch Linux wiki page I've linked above.  Since I've only got 1 disk in my machine and it's an SSD I changed the scheduler option using grub.

For Fedora 17 this consisted of an edit to the /etc/default/grub file and adding elevator=noop to the existing GRUB_CMDLINE_LINUX stanza.  Then rebuilding the grub configuration with the command:
grub2-mkconfig > /boot/grub2/grub.cfg

Optimise Kernel Parameters

With 16GB main RAM I'm not expecting to do much in the way of swapping.  However, I did add a couple of lines to /etc/sysctl.conf to make the system less likely to do so:
  # make the system less likely to swap
  vm.swapiness=1
  vm.vfs_cache_pressure=50


Install Windows 7

Similarly to the F17 install, the Win7 install proceeded with little in the way of drama.  I did select the advanced install option when given the chance but that was just to ensure Win7 installed into the free space I had left on the SSD rather than rudely splat my shiny new F17 installation.

And yes, for those of you wondering why I'm installing Win7 at all, especially dual boot rather than in a VM, it's simply for those times when only Windows will do... so gaming mostly I should think.

There's not much else to do with Windows after installation.  Unlike current Linux distributions it's said to  detect the presence of an SSD drive and apply the appropriate optimisations automatically.  Whether this is entirely true or not I suspect I'll never be any the wiser to.

Update the Boot Loader

So Win7 didn't kill off my F17 installation which is always a bonus, but it does assume divine rights over the entire system so writes its boot loader all over the existing grub installation allowing you only to boot windows with no menu options for anything else - not ideal.  Now I need to re-install grub which I've always found easiest to do by booting a rescue Linux CD and using a chroot to the installed OS.  This has changed slightly with the inclusion of grub 2 so here's what I did.

Boot the F17 USB stick once again but this time choose the "troubleshooting" boot option and select to boot the "rescue environment".  Red Hat based systems have always done a great job of rescue environments so you'll boot into a text based system that asks you if you want various things turned on in the rescue environment such as networking (although they assume you want that these days) and if you want to attempt detection of installed Linux systems (which you do).

Drop out to the shell prompt and chroot to /mnt/sysimage.  Then rebuild the grub config (this is where os prober installed above comes in very handy) to include an entry for booting Win7.  Then re-install grub.  Job done.  Once you've booted to the rescue shell, the commands are something along the lines of:

  chroot /mnt/sysimage
  grub2-mkconfig > /boot/grub2/grub.cfg
  grub2-install /dev/sda
  exit
  exit
  reboot

This assumes you want to install grub to /dev/sda of course.  You'll need to exit (or ctrl+d) twice, once to get out of the chroot and once more to return to the rescue interface.  Then choose to reboot from there as it'll cleanly unmount your file systems and do a better job of clearing up than if you simply rebooted the system yourself.

Sunday 13 May 2012

New PC Build

It's been quite some time (about eight years) since I last built a PC for myself and I've been promising to build a new one for quite a while, now I've finally got round to ordering some parts.  Not that I need to, but I've justified the expense as a treat to myself after our annual bonus came through, having never really spent the bonus on anything particularly exciting before.

This is a short post detailing what I've gone for and why, not to show off, but so in years to come I can look back (as I have before) on what was available at the time and I'll have a record I can get to should I forget the detail of which parts are in my current PC.

I'm still the sort of geek that likes to build my own computers.  I like to do the research and put them together but it's still definitely the best way to get a good deal and the only way to get each component to be the exact one you want.  I also still like to have a PC for some reason, I have a laptop for work so I've already got something mobile and a PC just feels like the right solution for home use.

So with no further ado, on order are:

Case: I'm sticking with my old Antec SLK3700 case, okay so strictly speaking this isn't on order.

Processor: Intel Core i5-3570K (£175)
Sandybridge was a real game changer when it came out and firmly handed back the power to the hands of Intel  in the processor market.  I've been waiting for Ivybridge to come out for a while, either so I could buy a cheaper Sandybridge or plump for the Ivybridge if the price difference wasn't too great.  Well, for those who know their processors you'll see I've gone for the Ivybridge option.  Currently it's only 20-odd quid more than the equivalent Sandybridge processor and I think the extra expense is worth it to get the better on-chip graphics capabilities and minor improvements in speed and energy efficiency.

Motherboard: Asus P8Z77-V LX (£94)
I'm a sucker for a good Asus board, I've used them in most of my PC builds so that's where I start looking when I want a new machine.  I nearly went for the LE version of this board, it was another 20 quid, but I decided I wouldn't be using the extra features it provides (surround-sound audio and extra sata sockets) so I pocketed the difference and went for this one instead.

Memory: Kingston (4x4GB) DDR3 1600MHz XMP HyperX (£65)
I very nearly went for some Corsair low profile DIMMs but was swayed by the vendor support list for DIMMs for the motherboard chosen above.  I'm still in shock that you can get 16GB RAM for 65 quid!

Storage: Intel 120GB 520 Series SSD (£140)
I've got 4 hard disks in my PC at the moment, one of which is not connected, another is hardly used and the remaining 2 are coupled together in a striped RAID array in an attempt to get some sort of speed out of them.  Hard disks really are the bottleneck in your PC these days so I've decided to shove in a top notch SSD from Intel.  They're the only manufacturer to offer a 5 year warranty and also came highly recommended (under the Hitachi brand, Intel and Hitachi work together on their SSDs) from one of my respected colleagues in the storage department at work.  120GB should be ample for my storage requirements on the PC, the disk will be split to dual boot both Windows 7 64-bit (for those occasions when only Windows will do and a bit of gaming) and the main stay of Fedora x86_64.  My data other than the operating system already lives on a NAS.

PSU: CoolerMaster 600W Silent Pro Modular (£63)
Unfortunately my current 620W ATX power supply only has a 4 pin CPU connector rather than the currently common 8 pin EPS so I've begrudgingly had to fork out for a new PSU.  This CoolerMaster one gets some decent reviews for being quiet, efficient and delivering good consistent power within the tolerances required by ATX.  The 600W rating will also leave me with some overhead should I decide to whack in a high-end GPU at some point in the future.

Thermal Paste: Arctic Silver 5 (£5)
I've got some really old unbranded thermal paste knocking around somewhere but decided to invest in some decent stuff for this build so the Arctic Silver was the way to go.

DVD Writer: Samsung S222AL 22x with Lightscribe (£15)
I've got 2 optical drives at the moment, both are IDE and with the new breed of motherboards (or not necessarily even the ones that are particularly new) IDE is long since dead so I've opted to get a Sata DVD writer for this build.  Similar to the memory, I'm amazed you can pick something like this up for 15 quid!

Front Case Ports: Akasa USB 3.0 Card Reader (£20)
My current case has a couple of USB ports on the front, I thought it would be useful to throw a couple of USB 3.0 ports to the front of the case too.  This unit also has a built-in multi-card reader so I'll no longer have to hunt for my USB SDHC reader every time I want to copy pictures from my camera.

Case Fan: Antec TrueQuiet 120 (£7)
A 12cm fan for the front of my case, I've got a slot to fit another one in so I thought why not given the heat output I'd expect from this build.

Keyboard and Mouse: Logitech Desktop MK120 (£13)
My current keyboard is PS/2 and has seen better days.  I still like a simple keyboard with none of these funny curves or multimedia keys you can get these days so went for this cheap set from Logitech.

The obvious note in this build is a lack of a GPU.  As I mentioned above, I've left overhead in the power supply to put in a GPU in the future should I choose to do so.  I'm going to do this build and run on the GPU built into the processor.  I'll be interested to see what the performance of the HD 4000 is for my needs, if it's sufficient then great, otherwise I'll be tempted towards an NVidia GTX-560 card.  I guess it all depends on if I do a little more gaming then I do currently (which is next to none on the PC) and whether the HD 4000 is up to the job.

The other thing I've got my eye on is an up-rated cooler from the stock cooler supplied with the 3570K.  I quite like the look of the Corsair H60 Hydro should the need arise.