Finally I end up with writing the main subroutines of HMM implementation. Now the testing phase of code is going on. There were some modifications done in the code specially in vector quantization during testing. Besides this the work on openmoko tools and platform is also going on.
Finally reached near to an end of code for HMM implementation for speech recognition. I finished up writing the training algorithm and codebook design using vector quantization with some optimizations.
The optimization was done by choosing K-means segmental procedure for training the HMM models rather than Baum Welch algorithm which requires more processing since it accounts for all the possible hidden states for a given sequence. On the other hand K-means segmental method uses viterbi algorithm to find the best state sequence and then iterates for re-estimation and training the HMM model. K-means segmental method has been proved to show good results and fast processing than Baum-Welch. The other optimization is regarding the probability density function. As this project aims for a small vocabulary (around 5 or 10) for recognition, vector quantization will be used instead of continuous observation sequence. Vector quantization procedure is faster and yields good result for applications in small embedded devices.
Now the testing of code using the speech samples from .wav files have started.
I finally got my Neo freerunner this week. Thanks to Daniel Willman and to DHL who were quite fast. I had to pay the custom clearance charges (about $18), but It was nothing in comparison of getting a freerunner in my hands and working on it. Here are some pictures of freerunner in running mode.
It is really great to work on openmoko:)
For many days, I didn’t get the access of svn repository for my project because of non authorization of ssh public keys. So I updated the codes at the open moko projects wiki. But at last, after struggling a bit, I fixed the problem and committed my codes to svn repository. You can access the codes and documentation regarding speech recognition project from here.
Initially, I have written some codes using floating point calculations. But after discussing with developers, mentor and other contributors, I came to realize that the success of speech recognition lies on the time and memory usage for processing. As the ARM processors do not have inbuilt floating point hardware, So all floating point calculations are emulated in software. As a result the efficiency is terribly poor. I then converted write the whole processing in fixed point 16:16 notation. I have written some codes which are available here. I have used the 32 bit integer to represent a fixed point. The multiplication and division subroutines for fixed points are written in macro for faster operation.
The coding phase is going on and as enough amount of code is finished, I will start working on the open moko tools. I will then try to build all my codes using open moko cross compiler for arm processor.
Comments and suggestions will be highly appreciated.
With the passing days, the progress of GSoC 2008, project is going on. I have submitted the first release of code and design document (although not complete). The same can be viewed at Design Document and
code. The ongoing code will be soon uploaded on the same page with a notification here. I will like to have reader’s feedbacks and comments.
This is my professional blog which i will also use as the progress report for my GSoC 2008 project: Speech recognition facility in open moko.
My Project includes a detailed study of Hidden Markov Model, DSP techniques related to it, understanding Openmoko tools and softwares, developing code for speech recognition, porting it on open moko platform and finally testing it on the real Hardware. During My first week of project, I understood the open moko tools and read a lot of Documentation about it. However owing to some problem in Moko Make file build, I decided with my Mentor to go ahead on HMM theory. With the help of some IEEE journals and two books(Fundamental of speech Recognition and Digital Processing of Speech Signal by Prof. L. Rabiner), I went in to a detailed study of Hidden Markov Model for speech Recognition. I have also written some piece of codes and they are soon going to be uploaded(details will be out later). A detailed Document regarding the application design is also being prepared simultaneously.