Object categorization with generative models

Monday, March 17, 2008

Wednesday, March 12, 2008

Monday, March 10, 2008

Levenshtein Distances working nicely

These are the 1-NN edit distances to the training set for each of my 9 test images. The first 4 motorbike images are showing a significantly lower edit distance than the last 5. The edit distances are computed by considering the extracted appearance patches to be words where each patch is a single character. Here, the matching cost between two image patches was computed using a straight SSD between the patches. The cost of inserting a gap was computed as the matching cost (SSD) of the patch with a canonical 11x11 patch having uniform intensity of 128.

Friday, March 7, 2008

My project report..

in its draft form is here.

Monday, March 3, 2008

Three are better than two

When I use three Gaussian components instead of two, things look a little better.

Throwing in more Gaussians

The variability in the appearance of a single part across different training images here suggests that a single Guassian may not be sufficient in capturing the underlying data. I decided to try out a mixture of Gaussians for each part (with diagonal covariances). The Netlab software for Matlab turned out to be very useful here as it has inbuilt routines for learning and using Gaussian mixture models (e.g. gmm, gmminit, gmmem and gmmprob scripts were a big help).

Here are the resulting log probabilites when using 2 mixture components for each part's appearance. In this case, the default EM initialization is used (uniform priors, random means and identity covariances).

Next, EM was initialized using the gmminit script, which initializes the centers and priors using k-means on the data. The covariance matrices are calculated as the sample covariance of the points closest to the corresponding centres.

Sunday, March 2, 2008

Reducing dimensionality with random projections instead of PCA

Instead of reducing the dimensionality of the appearance patches using PCA, I tried using a random projection matrix instead (similar to the one defined in question 1 here). The matrix was generated once during training and the same one was used again during testing. This approach does not seem to work any better than the previous PCA approach.

Here are the total log probabilities of the same test images that were used previously. Image 1 has taken an undesirable dip and image 4 hasn't been pulled up enough from the other negative test images.

The appearances of the negative test images has gone up relative to the bike images.

Of course, the location probabilities are exactly the same as before because the these are unaffected by the method of dimensionality reduction on appearance patches.

Here are the reconstructed patches obtained by back projecting to 121 dimensions. For this, the reduced dimensionality patches were multiplied by the pseudo-inverse of the random projection matrix that was used.