Showing posts with label ibm. Show all posts
Showing posts with label ibm. Show all posts

Wednesday 2 December 2020

Open Sourcing a NetworkManager VPN Plugin

It's not every day I find myself publishing a new project to open source and even less so when that requires release approval at work.  I hope, over the years, I've written some useful bits and pieces and this time around I was keen to publish my work on the Internet rather than internally within the company.  This requires following due process of course and seeking the relevant approval for the publication to take place.

Fortunately, in the right circumstances, IBM are very amenable to releasing code to open source.  I was convinced enough that a NetworkManager plugin to add to the existing list of VPN plugins would not conflict with the business that an open source approval would be fairly trivial.  Happily, I was correct, and going through the process wasn't too arduous with a few forms to fill in.  These were, of course, designed much more for bigger releases than I planned so vastly over-engineered for this particular release but at least due diligence was applied.

On to the project and the code.  It's not a world-changer but a small VPN plugin for NetworkManager to drive Cisco AnyConnect and made available as NetworkManager-anyconnect on GitHub.  So I now know more than I'd care to mention about the inner workings of NetworkManager VPN plugins.  They're not very well documented (hardly documented at all in fact) so they're quite hard work to produce by looking over existing code in available plugins.  I started off from the OpenVPN plugin which turned out to be a mistake as the code base is vastly bigger than that required for a plugin as simple as the one I wanted to write.  Were I to start again, I would recommend starting from the SSH VPN plugin instead as this is actually very nicely set out and doesn't include a lot of the shared bloat that comes with other plugins that are formally a part of NetworkManager.


Wednesday 3 April 2019

Helping Disabled Passengers Travel with Confidence

This is a reproduction of a blog post originally made on the IBM Emerging Technology blog at https://www.ibm.com/blogs/emerging-technology/helping-disabled-passengers-travel-with-confidence/ (link now dead).  Original article published on 3rd April 2019, re-publication to this blog was on 11th June 2021.

Introduction

Disabled or disadvantaged passengers have every right to fear travelling with multiple high profile cases of unfair and appalling levels of service evident in the media, particularly for rail passengers. Our team have been tackling this problem recently in collaboration with the IBM Travel and Transport team and 50% funding through Innovate UK’s call for Accelerating Innovation in Rail (round 4). The project was the brain child of Sam Hopkins and won the internal IBM Hybrid Hackathon event in 2016.

The Problem

 The rail industry in the UK is a complex one and travelling on our railways can be a difficult experience for anybody. If you’re disabled then the experience can be difficult at best and demeaning or impossible at worst. Depending on the type and severity of their disability, disabled travellers need to plan for eventualities such as whether a station is fully accessible, whether and when staff are available to help, when and where they might be able to use a toilet, how and where to board and alight the train, the list is as long as the wide range of issues people may have. Today, assistance is on offer, but this is complex due to the nature of our fragmented system and passengers (or their carers) may need to contact multiple companies 24 hours in advance of travel when planning their journey. This process of organising their journey is extremely time consuming and can be as much of a frustration as the journey itself.

The Challenges 

The UK rail industry is highly fragmented with different parts of the operation split among different companies. For example, Network Rail oversee the infrastructure such as maintaining tracks and signalling; Rolling Stock Operating Companies (ROSCOs) own the trains; and Train Operating Companies (TOCs) operate the trains and run passenger services in different regions around the UK. In order for a disabled passenger to make a journey, the Train Operating Companies are legally obliged to provide support both off the train at the station and on board the train during the journey. However, a passenger may be travelling through a large number of regions and across services provides and hence need to understand which company will be responsible for their journey at any given stage. Understanding this takes quite a bit of research and can be difficult, it also requires that everything they require on their journey lines up across the network they’re travelling on and this is certainly not guaranteed on our network today.

Our Solution

We have created a mobile application for disabled passengers, supported by an enterprise grade back end architecture. During the project we established a principle that the disabled passenger should always be connected to a member of staff. This connection provides continuous support available whenever required by the passenger and is designed to help

alleviate the anxiety felt by this group of passengers when travelling. The connection comes in the form of a chat application similar to how passengers may already use common applications such as WhatsApp or Facebook Messenger. The passenger can type anything they want to the member of staff they’re connected to and get a human-level response to their query from staff local to their situation. During their journey, the passenger is handed over between staff members such that they’re always connected with a local staff member who can offer physical assistance if required.

A typical journey story starts with the user programming their journey in a similar way to how they might on existing rail applications. They search for their departure and destination stations and select a time and route they wish to travel. Some time later, they embark upon their journey. When they arrive at the station, they either press a button to start the process or the app uses geolocation to detect their arrival at the station. They are connected to a member of the station staff who can answer questions and provide physical assistance to board their first train. Once boarded, the station staff hand the passenger over to a member of staff on board the train. This hand-over process is transparent to the user and they will simply see a new greeting from the next member of staff in the chain once they have been connected. The process continues with the on train staff handing over to station staff at the destination station in order to provide assistance alighting the train. This process can repeat as many times as necessary until the final destination is reached.

Technical Implementation 

Our solution has a strong server-side implementation with an MVP level front end to exercise the APIs provided.

We base the solution entirely within the IBM Cloud, using a Cloudant instance for database hosting, App ID for authentication services, Message Sight for highly reliable messaging built on top of the MQTT protocol, and NodeJS for writing our APIs on top of the Express framework and where necessary calling out to other APIs such as the Transport API. Security and privacy have been key concerns in the design to ensure chat messages are securely delivered only to their intended recipient. The application itself is authenticated at each of its routes and all of the server-side APIs also use the same App ID authentication. Novel to this solution is the capability of third-party authentication from Message Sight to App ID, a first-of-a-kind implementation.

The front end application is currently written in VueJS. This has been designed with accessibility in mind given the intended audience. It is a simple interface conforming to web accessibility standards that is compatible with screen readers. This component was never intended to be the final solution and we see another interface being developed in the future that is likely to build upon the current one. For example, a fully native solution or perhaps one written to be more native-like using technologies such as Native Script.

Moving Forward 

The solution is complete and ready, subject to customisation, to be adopted by the UK train operating companies. However, we do see the need for an improved user interface to be developed before the solution is truly ready to be used in the wild.

Further to the current solution for the rail industry, we recognise the issues faced by disabled travellers don’t stop with the UK rail system. Similar issues are faced when using our road network, taxi services, ferries and of course our airports. The principle established within this project of alleviating anxiety through the connection to a supporting human member of staff is likely to remain. We hope this is extended into the other areas we’re considering with the next most obvious choice to tackle being the airport use case.

Monday 5 November 2018

VueJS Example for IBM App ID

I was recently working on a project in VueJS that needed an authorisation layer added to it.  It turns out there aren't any existing examples of how to do this anywhere, unusually not even on Stack Overflow.  So I set about writing one and thought I would share it.  My work was based upon some other useful examples and information, particularly a blog post from the IBM Cloud blog.

Before I go any further, the code samples are available and documented on GitHub as follows:

  1. IBM App ID API Server
  2. App ID VueJS Client

The code is deliberately split into two such that:
  1. the API Server is used to demonstrate how to secure an API on the server side.  This is done with the WebAppStrategy of App ID which is simply an implementation of a strategy package for passportjs.  The code here isn't anything particularly new over existing examples you can find on the web but it's necessary in order to fully demonstrate the capabilities of the client code.
  2. the VueJS Client is used to demonstrate two things:
    1. how to secure a VueJS route for which I can currently find no example implementations on the web
    2. how to call an API that has been secured by App ID by passing credentials through from the client application to the API server
The API Server should be relatively trivial to get up and running as it's a standard NodeJS API implementation using Express.  If you refer to the WebAppStrategy and the blog post I mention above then you'll see the sample code I've come up with is broadly the same i.e. an amalgamation of the two.

The VueJS Client code can be simple to get up and running as well but it's probably more important to understand how it was created such that you can apply the same principles in your own application(s).  For this then, the explanation is a little longer...

Start by running the VueJS command line client (cli) to create a bare project and for the sample to make sense you will need to add VueX and Router components using the tool:
vue create vue-client
Then understand the 3 modifications you need to make in order to have a working set of authenticated routes.

1. A store for state. 
It doesn't really matter how you achieve this in VueJS, you can use any form of local state storage.  The example code I have come up with uses VueX and a modification to the store.js code you get from the client above.  The idea of this is such that the client application can cache whether the user has already authenticated themselves.  If they have not then the client must request authentication via the server.  If they have, then all the credentials required for making an authenticated call to a server-side API are already available in the browser.  Essentially, this is a speed-up mechanism that stops the client from requesting client credentials on each API call since the session store for the authentication actually lives on the server side when using App ID.

2. A new VueJS Component
This is the component whose route is to be protected via authentication.  In the case of the example code below the standard vue cli "About" component has been used and modified slightly to include an authenticated call to the server API.  The thing to note here is that the credentials from the client side must be sent over to the server with each API call.  Using the fetch API as per the below to implement your GET request means you have to add the credentials: 'include' parameter.

<template>
  <div class="about">
    <h1>This is a protected page</h1>
    <h2>hello: {{ hello }}</h2>
  </div>
</template>

<script>
export default {
  data: function () {
    return {
      hello: undefined
    }
  },
  computed: {
    user () {
      return this.$store.state.user
    }
  },
  methods: {
    getProtectedAPI () {
      fetch('http://localhost:3000/protected/get-some-info',{
            credentials: 'include',
          }).then(res => res.text())
          .then(body => {
            console.dir(body)
            this.hello = JSON.parse(body).hello
          })
    },
  },
  created() {
    this.getProtectedAPI()
  }
} 
</script>

3. A VueJS Navigation Guard
You need to write a function that will be added as a VueJS middleware upon each route change.  The middleware is inserted automatically by the VueJS route code when using the beforeEnter call on a route.  This is known in VueJS as a Navigation Guard.

function requireAuth(to, from, next) {
  // Testing authentication state of the user
  if (!store.state.user.logged) {
    // Not sure if user is logged in yet, testing their login
    const isLoggedUrl = "http://localhost:3000/auth/logged"
    fetch(isLoggedUrl, {credentials: 'include'}).then(res => res.json()).then(isLogged => {
      if (isLogged.logged) {
        // User is already logged in, storing
        store.commit("setUser", isLogged)
        next()
      } else {
        // User is not logged in, redirecting to App ID
        window.location.href=`http://localhost:3000/auth/login?redirect=${to.fullPath}`
      }
    }).catch(e => {
      // TODO: do something sensible here so the user sees their login has failed
      console.log("Testing user login failed - D'oh!")
    })
  } else {
    // User already logged in
    next()
  }
}

The requireAuth function does the following in plain English:

  1. Using the VueJS client side cache, test if the user is already logged in
  2. If they are not. then ask the server if the user is already logged in
    1. If they are not, then redirect them to the server login page
    2. If they are, then cache the information and load the next piece of middleware
  3. If they are, then simply load the next piece of middleware


Each route you want to protect with the above function must have a beforeEnter: requireAuth parameter specified on the route.  When this is done, VueJS will call the requireAuth function before the component specified by the route is loaded.

{
  path: '/protected',
  name: 'protected',
  beforeEnter: requireAuth,
  component: Protected
}

Note: there are methods by which you don't have do call window.location.href to redirect the user to the login page (which does seem like a bit of a nasty hack.  However, these methods require the modification of the webpack configuration and so were kept out of scope of this example for the purposes of being simple.

Sunday 23 April 2017

New Thinkpad P50

It's been a while, but true to our 4 year hardware refresh cycle, I've just received my latest laptop - a Lenovo P50.  I've been installing it with Fedora 25 since Friday and configuring and copying data over this weekend ready to swap laptops first thing this week.  I'm looking forward to trying out the new machine although I'm not quite sure why as the specs are barely different from the machine I was given 4 years ago.  It's certainly the best indication yet I've personally experienced of Moore's Law coming to a complete halt as well as many of the other specifications not improving a huge amount either.  The two most noticeable differences are likely to be the more powerful graphics chip and the inclusion of an SSD.  That said, there is twice as much RAM in this machine and I had upgraded my previous machine with an SSD as well so that particular upgrade isn't going to be noticeable for me at least.

My previous machine was a W530 and the one I had before that was a T61p (with a T41p before that) and so I'm well used to this particular line of Thinkpad laptops.

Here's the specifications of the machine I've got, as ever there are variants of the P50 so if you have one or are thinking of getting one the specifications could be a little different but will be broadly similar to this:

  • Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz (5433.79 bogomips in Linux)
  • 32GB DDR4 2133MHz
  • Samsung MZNLN512 (PM871) 512GB SSD
  • 15.6" 1920 x 1080 IPS (non-touch)
  • 6 Cell Battery
  • Wireless A/C
  • NVIDIA Quadro M1000M 4 GB
  • Front Facing Web Cam, Mini Display Port, HDMI Out, Headphone, 4x USB3, Smart Card Reader, GBit Ethernet, Thunderbold, Fingerprint Reader
So looking at those and comparing in more detail to what I had before it seems my gut feeling was pretty good.  The CPU benchmarks are more-or-less exactly the same and certainly within tolerances of error as well as other performance increases that will effect the benchmarks such as the memory clock speed.  Here's the comparison between the W530 CPU and the P50 CPU:


The same can't be said of the GPU benchmarks though so it looks like GPUs are continuing to gain in power even when CPU speed increases have run out of steam:

The other noticeable difference I hadn't spotted before is the battery size.  That's very apparent when you pick the machine up as it's actually a little bit thinner (probably also due to the lack of DVD/combo drive) as well as not as deep i.e. it doesn't have the big battery sticking out of the back that has been common place on this line of Thinkpad machines over the past decade or so.  I'm guessing (without having done any research on the matter) that this is probably due to improvements in battery technology so I'd think Lenovo have probably moved over to Li-ion or Li-po batteries.

In terms of running and using the machine, it does seem very nice so far as one might expect.  It's running Fedora 25 very nicely and hasn't caused me any issues at all during setup.  I'm not really expecting any either as most if not all of the hardware seems pretty well support by Linux these days.  I think, in fact, Lenovo even offer to supply this machine pre-installed with Linux if you want.  That said, there looks to be one possible sticking point in terms of hardware support at the moment but this is very minor.  That is, the build-in fingerprint reader doesn't seem to have a driver available on Linux yet.  I did some very brief research into this yesterday and it's not clear why vendor support is lacking for the device at the moment although I did find at least one effort that has gone a fairly long way towards reverse engineering it and starting to write a driver so I would guess within the next year we'll see some sort of support for the fingerprint reader too.

All in all then it's a good machine even though it's not a huge upgrade over my 4 year old laptop!

Friday 14 November 2014

Tackling Cancer with Machine Learning

For a recent Hack Day at work I spent some time working with one of my colleagues, Adrian Lee, on a little side project to see if we could detect cancer cells in a biopsy image.  We've only spent a couple of days on this so far but already the results are looking very promising with each of us working on a distinctly different part of the overall idea.

We held an open day in our department at work last month and I gave a lightening talk on the subject which you can see on YouTube:


There were a whole load of other talks given on the day that can be seen in the summary blog post over on the ETS (Emerging Technology Services) site.



Monday 22 April 2013

Speech to Text

Apologies to the tl;dr brigade, this is going to be a long one... 

For a number of years I've been quietly working away with IBM research on our speech to text programme. That is, working with a set of algorithms that ultimately produce a system capable of listening to human speech and transcribing it into text. The concept is simple, train a system for speech to text - speech goes in, text comes out. However, the process and algorithms to do this are extremely complicated from just about every way you look at it – computationally, mathematically, operationally, evaluationally, time and cost. This is a completely separate topic and area of research from the similar sounding text to speech systems that take text (such as this blog) and read it aloud in a computerised voice.

Whenever I talk to people about it they always appear fascinated and want to know more. The same questions often come up. I'm going to address some of these here in a generic way and leaving out those that I'm unable to talk about here. I should also point out that I'm by no means a speech expert or linguist but have developed enough of an understanding to be dangerous in the subject matter and that (I hope) allows me to explain things in a way that others not familiar with the field are able to understand. I'm deliberately not linking out to the various research topics that come into play during this post as the list would become lengthy very quickly and this isn't a formal paper after all, Internet searches are your friend if you want to know more.

I didn't know IBM did that?
OK so not strictly a question but the answer is yes, we do. We happen to be pretty good at it as well. However, we typically use a company called Nuance as our preferred partner.

People have often heard of IBM's former product in this area called Via Voice for their desktop PCs which was available until the early 2000's. This sort of technology allowed a single user to speak to their computer for various different purposes and required the user to spend some time training the software before it would understand their particular voice. Today's speech software has progressed beyond this to systems that don't require any training by the user before they use it. Current systems are trained in advance in order to attempt to understand any voice.

What's required?
Assuming you have the appropriate software and the hardware required to run it on then you need three more things to build a speech to text system: audio, transcripts and a phonetic dictionary of pronunciations. This sounds quite simple but when you dig under the covers a little you realise it's much more complicated (not to mention expensive) and the devil is very much in the detail.

On the audial side you'll need a set of speech recordings. If you want to evaluate your system after it has been trained then a small sample of these should be kept to one side and not used during the training process. This set of audio used for evaluation is usually termed the held out set. It's considered cheating if you later evaluate the system using audio that was included in the training process – since the system has already “heard” this audio before it would have a higher chance of accurately reproducing it later. The creation of the held out set leads to two sets of audio files, the held out set and the majority of the audio that remains which is called the training set.

The audio can be in any format your training software is compatible with but wave files are commonly used. The quality of the audio both in terms of the digital quality (e.g. sample rate) as well as the quality of the speaker(s) and the equipment used for the recordings will have a direct bearing on the resulting accuracy of the system being trained. Simply put, the better quality you can make the input, the more accurate the output will be. This leads to another bunch of questions such as but not limited to “What quality is optimal?”, “What should I get the speakers to say?”, “How should I capture the recordings?” - all of which are research topics in their own right and for which there is no one-size-fits-all answer.

Capturing the audio is one half of the battle. The next piece in the puzzle is obtaining well transcribed textual copies of that audio. The transcripts should consist of a set of text representing what was said in the audio as well as some sort of indication of when during the audio a speaker starts speaking and when they stop. This is usually done on a sentence by sentence basis, or for each utterance as they are known. These transcripts may have a certain amount of subjectivity associated with them in terms of where the sentence boundaries are and potentially exactly what was said if the audio wasn't clear or slang terms were used. They can be formatted in a variety of different ways and there are various standard formats for this purpose from an XML DTD through to CSV.

If it has not already become clear, creating the transcription files can be quite a skilled and time consuming job. A typical industry expectation is that it takes approximately 10 man-hours for a skilled transcriber to produce 1 hour of well formatted audio transcription. This time plus the cost of collecting the audio in the first place is one of the factors making speech to text a long, hard and expensive process. This is particularly the case when put into context that most current commercial speech systems are trained on at least 2000+ hours of audio with the minimum recommended amount being somewhere in the region of 500+ hours.

Finally, a phonetic dictionary must either be obtained or produced that contains at least one pronunciation variant for each word said across the entire corpus of audio input. Even for a minimal system this will run into tens of thousands of words. There are of course, already phonetic dictionaries available such as the Oxford English Dictionary that contains a pronunciation for each word it contains. However, this would only be appropriate for one regional accent or dialect without variation. Hence, producing the dictionary can also be a long and skilled manual task.

What does the software do?
The simple answer is that it takes audio and transcript files and passes them through a set of really rather complicated mathematical algorithms to produce a model that is particular to the input received. This is the training process. Once system has been trained the model it generates can be used to take speech input and produce text output. This is the decoding process. The training process requires lots of data and is computationally expensive but the model it produces is very small and computationally much less expensive to run. Today's models are typically able to perform real-time (or faster) speech to text conversion on a single core of a modern CPU. It is the model and software surrounding the model that is the piece exposed to users of the system.

Various different steps are used during the training process to iterate through the different modelling techniques across the entire set of training audio provided to the trainer. When the process first starts the software knows nothing of the audio, there are no clever boot strapping techniques used to kick-start the system in a certain direction or pre-load it in any way. This allows the software to be entirely generic and work for all sorts of different languages and quality of material. Starting in this way is known as a flat start or context independent training. The software simply chops up the audio into regular segments to start with and then performs several iterations where these boundaries are shifted slightly to match the boundaries of the speech in the audio more closely.

The next phase is context dependent training. This phase starts to make the model a little more specific and tailored to the input being given to the trainer. The pronunciation dictionary is used to refine the model to produce an initial system that could be used to decode speech into text in its own right at this early stage. Typically, context dependent training, while an iterative process in itself, can also be run multiple times in order to hone the model still further.

Another optimisation that can be made to the model after context dependent training is to apply vocal tract length normalisation. This works on the theory that the audibility of human speech correlates to the pitch of the voice, and the pitch of the voice correlates to the vocal tract length of the speaker. Put simply, it's a theory that says men have low voices and women have high voices and if we normalise the wave form for all voices in the training material to have the same pitch (i.e. same vocal tract length) then audibility improves. To do this an estimation of the vocal tract length must first be made for each speaker in the training data such that a normalisation factor can be applied to that material and the model updated to reflect the change.

The model can be thought of as a tree although it's actually a large multi-dimensional matrix. By reducing the number of dimensions in the matrix and applying various other mathematical operations to reduce the search space the model can be further improved upon both in terms of accuracy, speed and size. This is generally done after vocal tract length normalisation has taken place.

Another tweak that can be made to improve the model is to apply what we call discriminative training. For this step the theory goes along the lines that all of the training material is decoded using the current best model produced from the previous step. This produces a set of text files. These text files can be compared with those produced by the human transcribers and given to the system as training material. The comparison can be used to inform where the model can be improved and these improvements applied to the model. It's a step that can probably be best summarised by learning from its mistakes, clever!

Finally, once the model has been completed it can be used with a decoder that knows how to understand that model to produce text given an audio input. In reality, the decoders tend to operate on two different models. The audio model for which the process of creation has just been roughly explained; and a language model. The language model is simply a description of how language is used in the specific context of the training material. It would, for example, attempt to provide insight into which words typically follow which other words via the use of what natural language processing experts call n-grams. Obtaining information to produce the language model is much easier and does not necessarily have to come entirely from the transcripts used during the training process. Any text data that is considered representative of the speech being decoded could be useful. For example, in an application targeted at decoding BBC News readers then articles from the BBC news web site would likely prove a useful addition to the language model.

How accurate is it?
This is probably the most common question about these systems and one of the most complex to answer. As with most things in the world of high technology it's not simple, so the answer is the infamous “it depends”. The short answer is that in ideal circumstances the software can perform at near human levels of accuracy which equates to in excess of 90% accuracy levels. Pretty good you'd think. It has been shown that human performance is somewhere in excess of 90% and is almost never 100% accuracy. The test for this is quite simple, you get two (or more) people to independently transcribe some speech and compare the results from each speaker, almost always there will be a disagreement about some part of the speech (if there's enough speech that is).

It's not often that ideal circumstances are present or can even realistically be achieved. Ideal would be transcribing a speaker with a similar voice and accent to those which have been trained into the model and they would speak at the right speed (not too fast and not too slowly) and they would use a directional microphone that didn't do any fancy noise cancellation, etc. What people are generally interested in is the real-world situation, something along the lines of “if I speak to my phone, will it understand me?”. This sort of real-world environment often includes background noise and a very wide variety of speakers potentially speaking into a non-optimal recording device. Even this can be a complicated answer for the purposes of accuracy. We're talking about free, conversational style, speech in this blog post and there's a huge different in recognising any and all words versus recognising a small set of command and control words for if you wanted your phone to perform a specific action. In conclusion then, we can only really speak about the art of the possible and what has been achieved before. If you want to know about accuracy for your particular situation and your particular voice on your particular device then you'd have to test it!

What words can it understand? What about slang?
The range of understanding of a speech to text system is dependent on the training material. At present, the state of the art systems are based on dictionaries of words and don't generally attempt to recognise new words for which an entry in the dictionary has not been found (although these types of systems are available separately and could be combined into a speech to text solution if necessary). So the number and range of words understood by a speech to text system is currently (and I'm generalising here) a function of the number and range of words used in the training material. It doesn't really matter what these words are, whether they're conversational and slang terms or proper dictionary terms, so long as the system was trained on those then it should be able to recognise them again during a decode.

Updates and Maintenance
For the more discerning reader, you'll have realised by now a fundamental flaw in the plan laid out thus far. Language changes over time, people use new words and the meaning of words changes within the language we use. Text-speak is one of the new kids on the block in this area. It would be extremely cumbersome to need to train an entire new model each time you wished to update your previous one in order to include some set of new language capability. The models produced are able to be modified and updated with these changes without the need to go back to a full standing start and training from scratch all over again. It's possible to take your existing model built from the set of data you had available at a particular point in time and use this to bootstrap the creation of a new model which will be enhanced with the new materials that you've gathered since training the first model. Of course, you'll want to test and compare both models to check that you have in fact enhanced performance as you were expecting. This type of maintenance and update to the model will be required to any and all of these types of systems as they're currently designed as the structure and usage of our languages evolve.

Conclusion
OK, so not necessarily a blog post that was ever designed to draw a conclusion but I wanted to wrap up by saying that this is an area of technology that is still very much in active research and development, and has been so for at least 40-50 years or more! There's a really interesting statistic I've seen in the field that says if you ask a range of people involved in this topic the answer to the question “when will speech to text become a reality” then the answer generally comes out at “in ten years time”. This question has been asked consistently over time and the answer has remained the same. It seems then, that either this is a really hard nut to crack or that our expectations of such a system move on over time. Either way, it seems there will always be something new just around the corner to advance us to the next stage of speech technologies.

Tuesday 19 March 2013

Going Back to University



A couple of weeks ago I had the enormous pleasure of returning to Exeter University where I studied for my degree more years ago than seems possible.  Getting involved with the uni again has been something I've long since wanted to do in an attempt to give back something to the institution to which I owe so much having been there to get good qualifications and not least met my wife there too!  I think early on in a career it's not necessarily something I would have been particularly useful for since I was closer to the university than my working life in age, mentality and a bunch of other factors I'm sure.  However, getting a bit older makes me feel readier to provide something tangibly useful in terms of giving something back both to the university and to the current students.  I hope that having been there recently with work it's a relationship I can start to build up.

I should probably steer clear of saying exactly why we were there but there was a small team from work some of which I knew well such as @madieq and @andysc and one or two I hadn't come across before.  Our job was to work with some academic staff for a couple of days and so it was a bit of a departure from my normal work with corporate customers.  It's fantastic to see the university from the other side of the fence (i.e. not being a student) and hearing about some of the things going on there and seeing a university every bit as vibrant and ambitious as the one I left in 2000. Of course, there was the obligatory wining and dining in the evening which just went to make the experience all the more pleasurable.

I really hope to be able to talk a lot more about things we're doing with the university in the future.  Until then, I'm looking forward to going back a little more often and potentially imparting some words (of wisdom?) to some students too.

Saturday 9 March 2013

New Thinkpad W530

It's been quite a while since I got my last laptop upgrade at work, coming up to 5 years in fact.  We have a 4 year refresh programme so I'm a little overdue but have just been given a shiny new Thinkpad W530 from Lenovo.  This seems to be our current standard issue machine for "developers" which is our way of saying "power users".  I'm part of the software development business and hence get one of these.  The up side of course is the latest and most powerful technology at a reasonably high specification, the downside is they're really quite big and heavy and the power brick - well it really is a brick.  I'll spare giving a full review of the laptop itself as there are plenty of them out there already and you'll know how to find them, however, there are one or two things I wanted to say about the machine and in particular regarding my preferred use of Linux rather than the software it comes pre-installed with.

Here's the specification highlights of the machine I've got (there is a bit of variation available with the W530):

  • Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz (5187.87 bogomips in Linux)
  • 16GB DDR3 1600MHz
  • 500GB (7200rpm) HDD
  • 15.6" 1920x1080
  • 9 Cell Battery
  • Wireless N
  • NVidia Quadro K1000M
  • Front Facing Web Cam, Mini Display Port, VGA Out, Headphone, 2x USB3, 2xUSB2.0, Smart Card Reader, Firewire,DVD Burner, GBit Ethernet
It came with firmware version 2.07 which was only 3 months old but had already been superseded by two newer versions when I got it earlier this week (there is a firmware readme available).  The newer versions fixed a couple of well known issues with screen corruption under Linux and the fan always running at full speed (and hence being noisy).  So I downloaded and applied the updated version before I did anything else.


The next thing I did was tweak a few settings in the BIOS to my liking and install Fedora 18 with the KDE desktop.  The installation went very smoothly using the Integrated graphics card on the Ivy Bridge CPU.  The W530 has Optimus built in for efficient switching between the integrated card and discrete NVidia card for a great combo of power/performance, it is however, designed for Windows and Linux support hasn't quite caught up yet although there is an open source option available - which I'm yet to try.  Post installation I installed the latest NVidia drivers available from the RPM Fusion repository (304.64) ready to switch to using the graphics subsystem in discrete only mode.  The advantage of this is greater graphical processing power and also the ability to use external display devices.  The integrated graphics card is only able to drive the laptop screen and doesn't output via the VGA port or display port.  The down side to the NVidia card is a greater power draw so reduced battery life.  Also, at the time of writing the Nouveau driver doesn't support the Quadro K1000M card so you're forced into using the proprietary driver.  This situation can only improve over time and hopefully Optimus support will grow in Linux too but I'm not holding my breath on that one given NVidia's attempt to put support into the Linux kernel was refuted by the kernel developers last year due to it not being GPL code.

Away from the graphics subsystem which was always going to be the most difficult thing under Linux on this machine, the rest of  it appears to be very well supported.  There are a few bits and pieces I haven't quite got around to trying in the couple of days since I got it but my impression is generally quite good.  Speed, as you would expect is very good although nowhere near my home machine which is a similar specification but contains an SSD instead of HDD.  Consequently, I put the speed boost I see at home down to this more or less entirely.

I've also moved away from Gnome (I don't get on with Gnome 3) and gone back to using KDE once again which I had moved away from 5 years ago when I installed my previous laptop as KDE 4 was pretty shocking at the time as well.  I've used KDE a lot more than I have Gnome in terms of years of elapsed usage but I did get on very well with Gnome 2 for the past 5 years and I'm sure I'll miss it.  That said, I can't see myself ever moving to Gnome 3 unless the developers go back on their current manifesto of treating users like idiots.  It'll be interesting to see how the Mate desktop progresses and whether XFCE picks up as well given they both have benefited from Gnome 3's unfortunate design decisions and have a much smaller community of users and developers than either Gnome or KDE.

In general then, I'm pleased with the new machine.  It's up and running to my liking in a very short period of time.  The graphics are bound to be a pain until I get used to relying on the nvidia-settings utility once again.  However, the other benefits it brings in terms of larger memory and greater processing power over my old machine are probably worth it.

Thursday 1 March 2012

Failing to Invent

We IBM employees are encouraged, indeed incented, to be innovative and to invent.  This is particularly poignant for people like myself working on the leading edge of the latest technologies.  I work in IBM emerging technologies which is all about taking the latest available technology to our customers.  We do this in a number of different ways but that's a blog post in itself.  Innovation is often confused for or used interchangeably with invention but they are different, invention for IBM means patents, patenting and the patent process.  That is, if I come up with something inventive I'm very much encouraged to protect that idea using patents and there are processes and help available to allow me to do that.


This comic strip really sums up what can often happen when you investigate protecting one of your ideas with a patent.  It struck me recently while out to dinner with friends that there's nothing wrong with failing to invent as the cartoon above says Leibniz did.  It's the innovation that's important here and unlucky for Leibniz that he wasn't seen to be inventing.  It can be quite difficult to think of something sufficiently new that it is patent-worthy and this often happens to me and those I work with while trying to protect our own ideas.

The example I was drawing upon on this occasion was an idea I was discussing at work with some colleagues about a certain usage of your mobile phone [I'm being intentionally vague here].  After thinking it all through we came to the realisation that while the idea was good and the solution innovative, all the technology was already known available and assembled in the way we were proposing, but used somewhere completely different.

So, failing to invent is no bad thing.  We tried and on this particular occasion decided we could innovate but not invent.  Next time things could be the other way around but according to these definitions we shouldn't be afraid to innovate at the price of invention anyway.

Monday 12 September 2011

Text Analytics Project Ends

Today sees the end of one of my major work streams for 2011 with a presentation of some research to our sponsors.  I've been working for a good chunk of the year researching text analysis, specifically, the automated expression of facts in controlled natural language.  It's always nice to see some work come to fruition, well not quite fruition in this case since it's research but at least it's reached an agreed stopping point - for now.

I haven't often been involved with relatively pure research in my day job so that coupled with leading the project presented a few challenges in itself which was most enjoyable.  While I can't give away the details, I wanted to express the areas this research concerned here.

The project was a text analytics project, not a new field in itself and a subject on which IBM and my local department (Emerging Technologies) contains many well read and respected experts.  For those of you not familiar, text analytics is essentially applying computer systems to text documents such that some sort of processing can be performed e.g. (simple example) the analysis of pages from news web sites to infer what the current news stories are.

One of the complexities we were investigating was natural language processing.  This is a major area of research for computer systems at the moment and presents one of the biggest problems of applying computer systems to human written documents.  Our brain is able to parse language in ways we've not yet managed to teach computers to do, taking into account context, slang, unknown terms and all sorts of other subtle nuances that make it a hard problem to crack for computers.

My recent work has been investigating how we can express things found in documents in the form of controlled natural language which leads to the question of what on earth is that?  Simply put, it's an expression made using normal words but using more rigid semantics than are found in pure natural language.  This makes it possible to parse it using a computer but it still feels fairly natural to the human reader as well.  This sounds great as you get computers talking a language that feels very usable to humans but with all the added power of memory and processing provided by the computer.  It seems to me this approach might only be a stop-gap solution until computers (inevitably it'll happen some time) eventually understand full natural language.

While having a discussion last night with my wife over dinner she expressed a sometimes-heard opinion from her that I occasionally "speak funny".  This came to light recently when on holiday in Ireland, I suspect it's a combination of both this type of research seeping into my use of language but also my semi-conscious approach to trialling these techniques in the real world and what better opportunity than when immersed in another English speaking culture.

So, as this article is published I'll be standing at the front of a room of people talking about the details of our work with my colleagues.  Wish us luck!

Wednesday 7 September 2011

Hursley Extreme Blue 2011 Presentations

For the first time since starting my own blog I've written a post on the Eightbar blog.  It's a site originally set up by a bunch of us working at Hursley to talk about the interesting stuff we're working on in order to show the many different faces of Hursley and IBM.  I didn't want to reproduce the entire blog post here so I'll leave you with a link to the Hursley Extreme Blue 2011 Presentations post but since comments are currently turned off on Eightbar feel free to have any discussion here.

Sunday 3 January 2010

Teach, Yourself

ChalkI've recently had the opportunity to teach a class of students on a couple of different occasions. Something I've not done for quite a while now but something well worth doing every now and then. I say every now and then, I don't think I have the vocal stamina to do it for more than a few days and I especially don't think I have the special quality it takes to teach children. The classes were both very different, one a knowledgeable internal audience for a days course, the other a three day course for a customer where the participants were only a short way along the road towards learning what I had to say. Both were similar, on the topics of Linux cluster administration and all the various technologies it takes to run a cluster.

Clustering technologies really are varied and it takes a few years experience before an administrator has a good overview of the inner workings of how everything hangs together. For example, I mostly covered Linux administration and clustered administration with xCAT but to fully understand it you need a fair bit of background knowledge. Your Linux OS, hardware configuration and control, network architecture, storage, clustered file systems, remote management, parallel computing, computer service (NTP, DNS, FTP, NFS, HTTP, TFTP, etc) administration, etc. The list really is quite long and while this is starting to sound like a "Look at all the stuff I know!" blow-your-own-trumpet type blog post, the point I think I'm trying to make is the list of people who know (and I mean really know) all this stuff in any given company isn't very long. You'll know the guy, the one who everyone always asks when something is wrong with their machine, the "he fixes everything" guy. While these people are hard to find locked away in a small room somewhere (think IT Crowd), it can be even harder to teach the "I already know quite a lot" guy but I took up the challenge anyway.

There's nothing like teaching every now and then to keep you grounded. Some people in your class will challenge what you're saying so you have to make sure you're right and know how and why you're right, the how and the why are very important when teaching. Some times the class miss the point of what you said, this resets your view on the assumptions you make, the assumptions you work with every day. When someone in the class misses the point it's often to do with how you've expressed it rather than their lack of understanding. Some people might find this challenge to their knowledge of the fundamentals of what they work with quite stressful but I look at it as an opportunity. Sure, you might set yourself up for a fall but that's all part of the thrill. Fortunately, I don't mind speaking in front of a crowd, at least not about something with which I'm familiar, but the audience does add to the experience.

I can be a bit of a show-off, I know that, so teaching can be very satisfactory. Great for the ego, almost. However, I take great pleasure from imparting knowledge on others. The opportunity to help others understand something about which they previously knew little is not to be missed as far as I'm concerned. It plays to your inner show-off if you're prepared to stand on and risk falling off of the pedestal you put yourself on when you stand in front of a class.

If you're knowledgeable in a certain area then I would heartily recommend you share what you know in front of an audience, risk making a fool of yourself in front of your audience. I think the rewards are good if you don't often teach. Have what you know challenged. Have your assumptions highlighted. Find out the little gaps you didn't think you had. Most of all, have fun doing it.

Monday 17 November 2008

Super Computing Project Ends

Not blogged in a while, will spare the details/excuses but back in September I had the opportunity to get back into Super Computing for a one-off project. It's this that has kept me so busy throughout October and into November where I was really very well submerged into work for an extended period. Normally I like to keep balanced in my work-life balance. However, this project demanded a lot of time and attention and fortunately my wife, Beth, was away for a couple of weeks on business too so I really had the opportunity to get stuck in.

As I suspected (from much experience in the area) in my previous post there really was a lot of information missing at the beginning of the project. This is to be expected, the customer cannot be expected to know 100% what they want, they may not even know what is available, possible or on offer. To compound this, even the best sales team can't look into minute details when proposing a solution or making bid. So we had a productive kick-off meeting, made a lot of good decisions and recorded these to concrete the design details.

What happened next you can never be prepared for, but we've probably all experienced it. Yes, the iron fist of the bean counters barged their ugly way into the project. The previously neatly agreed deadlines and design proposals were lobbed into the air with such careless abandon it was almost humorous. The promise at the start of the project from my ex-manager that "this one would be different", and we "wouldn't have to work silly hours or cut corners" was about to be viciously broken into pieces. (Wow, I can sound like such a drama queen). Project deadlines were brought forward by give-or-take 50% simply to meet a financial deadline (of one of the parties involved not IBM I should add) way out of my control. When you're met with this kind of single-minded decision making as the technical leader of a project it's very frustrating but you know there is nothing more to be done except save time where you can and work your butt off. Anybody who knows me and my work will know I hate to deliver anything less than the very best solution I think is right. So I openly request to Mr Financial Man, whoever and wherever you are to listen to people, who like yourselves, are considered the expert in their area.

This was a Linux cluster, which for the uninitiated consists of a number of computers set up to be joined to work together on a single large problem that would take any one of them an unrealistic amount of time to compute. Using the example raised by OzBeefer on my last post, climate modelling algorithms are quite large and complex taking a single computer a long time to run, so loads of them need to work together in order to predict the weather in the future before it becomes the past! Fortunately, the design of this cluster wasn't rocket science and it was quite small. Things progressed very smoothly without any potential problems that might otherwise have occurred and with some very long working hours (read 12+ hours/day) the cluster was eventually delivered to the new schedule at the customer location.

Well I could ramble on about Linux, clustering, algorithms, networks, storage, optimisations and the projects I've worked on for hours yet, which gives me an idea... for those people reading this in the Hursley community answers on the back of a postcard, comment, e-mail or whatever if you think hearing my ramblings about this stuff might be vaguely interesting some time over lunch, tea/coffee, beers, etc?

Wednesday 17 September 2008

Dipping my toe back into super computing

I've been out of university and working in IT for a good few years now and held several different roles along the way. When I started work I was doing an internal support role after which I moved to my current department, Emerging Technologies, where I've held several roles including being an emerging tech specialist and working for IBM's Linux Integration Centre.

I've used the attention-grabbing title for the work I very much enjoy doing but due to certain circumstances thought I had left behind in my career. Super computing sounds very grand and conjures up all sorts of ideas along with other grand titles for the type of work I do such as Beowulf Clustering. The term I prefer, but can also be misleading, is High Performance Computing (HPC). There are all sorts of misconceptions about HPC but easily my favourite is when people pretend to joke, but expecting a serious answer ask something like “Wow, how many frames per second can you get on Quake with that?”.

The opportunity has come up for me to go back to this area for a one-off project, working with some of my ex-colleagues. I'm very much looking forward getting stuck in as the work is usually interesting and my old team are fantastic. As ever with these projects, there are an enormous number of unknown's at the start of the project. I already feel at home knowing the list of things we don't know yet – where the hardware is right now (delivery due soon), what the software stack will be, firmware levels, network layout and design, naming conventions, management and monitoring required, storage requirements, job scheduling, operating system, tweaks and configuration, etc. That's all part of the fun though, I get to work things out along the way and fill in the gaps for areas that, for no fault of their own, people just don't think about answering until implementation time. The kick-off meeting is due soon now so I'm looking forward to getting people to think about all the tiny details I'll need in order to supply our customer with the best suited system I can.

Wednesday 23 July 2008

Next Generation Linux

The folk following me on Twitter are probably sick of hearing about identity management, the main stay of my work this year. So I was glad to get out of the office last week to present at an IBM conference in London called "Next Generation Linux". A thank you note I received reminded me I should blog about it, always nice to receive those! Next Generation Linux is an event IBM are running in various worldwide locations this year looking at what comes next for Linux for businesses.

Being a Linux geek working in a software services organisation called Emerging Technology Services and with my contacts I like to think I was the natural choice for the pitch titled "Emerging Linux Technologies". I only had a short amount of time to present a vast field of topics so I narrowed it down to just five topics compelling for business and talked about the following:
  1. Virtualisation
    OK, not strictly an emerging technology as many businesses have already adopted it. But, it was a good opener setting the scene for some of my other topics and allowed the opportunity for me to briefly run through a few virtualisation technologies for Linux.
  2. Cloud Computing
    An exciting name and concept for what is essentially some very well thought out system administration. This technology has always been feasible but it's being made possible now with commodity hardware capable of remote management and some neat software ideas holding it all together. The really novel thing is the way applications can be deployed to run in the cloud environment and the fact we can actually package this up as a solution now. It's the realisation of "On Demand" computing.
  3. Project Big Green
    Green computing is becoming much more of a concern as business starts to run out of room in data centers, power requirements head skywards and running costs steadily increase. Last year IBM announced a re-investment of $1 billion into research towards green computing which gives business the opportunity to cut running costs and jump on the green band wagon at the same time. Green computing is essentially about consolidation of services, allowing spare compute power to be utilised elsewhere, and making sure equipment is environmentally produced and disposed of. It's those three words we hear in all good green campaigns, reduce-reuse-recycle, do it!
  4. Security Enhanced Linux (SELinux)
    One of my specialisms and a topic I could ramble on about for a long time, I'll try to keep it brief. In this short pitch I indicated security is still an issue in 2008 and it can cost you big time if your security is breached. Enter SELinux, an overview of what SELinux is and where it comes from and a comparison with other technologies such as AppArmour is a good start. To get to the crunch of SELinux though, I explain the differences between Discretionary Access Control (DAC) and Mandatory Access Control (MAC) and the ultimate advantages SELinux brings for security.
  5. Real Time Linux
    Real time really is an emerging area with both of IBMs current Linux partners, Red Hat and SUSE, bringing out offerings recently. Real time is built from the hardware up through the OS and in the case of hard real time into the applications too. IBM have certified some particular System X hardware to be real time capable and provide firmware and support for this now. Next comes the Linux piece where some of the firmware functionality removed from hardware must now be implemented in the kernel, there's loads of ways of doing this but to get support for it SUSE and Red Hat take care of that. IBM have also built some enhancements to Java, by introducing a modified garbage collector (Metronome) and providing ahead of time (AOT) compilation while complying with RTSJ, all of which add up to the ability to write real time Java apps - interesting! Now we can offer a full real time system on non-specialised hardware, using a commercially available operating system and a language loads of people can program, backed by IBM through Websphere Real Time. Boy that sounds like an advert, sorry about that, but it is a great idea, very cool!

This is all very much in brief, if you want to know more then get in touch or leave a comment.

Wednesday 18 June 2008

New Thinkpad T61p

Gutted! A couple of weeks ago now I had a bad Friday, the train home from London where I'd been working with a customer all day was stupidly late and I had to change twice instead of going directly home. Then I get home and fire up my laptop to send the e-mail's I'd written during the day and the darned thing didn't work, argh! Seems in spite of working all day, my T41p had died on the trip home. After reporting the problem at work the following Monday it was decided the T41p needed a new motherboard and this wasn't economical to fix, so I was issued with a shiny new T61p a few days later.


I've been pleasantly surprised by my new laptop, I wasn't expecting great things since IBM sold the Thinkpad business to Lenovo but this thing is actually quite nice. I'll spare listing the full gory details to the technical specifications page. However, it has some nice additions over my previous laptop, namely built-in firewire (not that I'm likely to use it), built-in SD card reader (used that already), an extra USB port (always handy), a DVD writer, a hardware wireless off switch (presumably for use in planes), an enormous hard disk (compared to the T41p anyway), and a lovely 15.4" widescreen capable of 1920x1200 backed by a 256MB NVidia graphics card.

Unfortunately, it came pre-installed with Vista so that (along with the stupid Vista sticker next to the keyboard) were the first things to go. I've installed Redhat Enterprise Workstation 5.2 on it which may sound like an odd choice, but IBM have a layer of software designed to sit on top of Redhat to enable us to install things like Lotus Notes, Sametime, etc. This is known as the Open Client internally and works really nicely. Clearly, there are later and greater distributions I could use but on this issue I like to support IBM and the internal community of Linux desktop users so I choose to go with the officially provided solution.

I've been up and running for a week now with no problems so far, I've been able to do all the things I could do with my old laptop and all the things I need to be able to do in order to do my job. Of course, I make some modifications to the way things work to suit my tastes (such as running KDE instead of Gnome) but all these work well too which is a great reflection on the modular nature of all things involved with Linux. I hope I continue to be surprised and pleased with the machine, and I'm definitely surprised at the ease of transition between the two machines for me.

Tuesday 3 June 2008

Showing Off Linux

Thanks to Ian Hughes for the picture on his flickr. Yesterday, at work, the Hursley Linux Special Interest Group ran a little trade show type event for a couple of hours after lunch. The idea was to provide a bit of away from your desk time for folks around the lab to see what we Linux geeks have been getting up to. Various people interested in using Linux inside and outside work came along to demo their gadgets.

The picture shows me showing off my old Linux audio centre. But, also at the event were the main organiser of the day Jon Levell (showing Fedora 9 and an eeepc), and Nick O'Leary (showing his N800 and various arduino gadgets), Gareth Jones (showing his accelerometer based USB rocket launcher and bluetooth tweetjects), Andy Stanford-Clark (showing his NSLU2 driven house, and an OLPC), Laura Cowen (showing an OLPC), Steve Godwin (showing MythTV), and Chris law (showing Amora).

I thought it was quite a nice little selection of Linux related stuff to look through for the masses of people turning up, plenty of other things we could have shown too of course. The afternoon seemed very much a success, generating some real interest in the various demo items and lots of interesting questions too. Thanks to everyone for taking part!