Wednesday, February 27, 2008

Fixing a bug

After closer examination of the extracted patches, I discovered that the sorting of the patches was actually happening by Y-coordinate instead of X. I fixed that bug and then the tests were showing better results.

This figure shows the patches extracted from the 47 training images. Each row shows the 10 patches extracted from a single motorbike training image, now sorted by X-coordinate.The image below shows the 9 images used for testing (in row major order):Here are the resulting log probabilities (for location, appearance and sum) for each of the test images:The location probability of the fifth image, which is a car, is quite high. This can be seen easily from the locations of the extracted patches here (which also shows why the location probability of the ninth image is so low, as should be the case). However, the appearance probability for it is low. In general, the locations of the patches are doing a better job at differentiating the classes. The appearance probability of the fourth bike is very low. Showing it below along with its extracted patches.

Wednesday, February 20, 2008

Reconstructed patches

For debugging purposes, I reconstructed the patches by projecting them back to 121-dimensions and displaying them as images. The first four rows show motorbike patches and the remaining rows show patches from cars and faces. Can't see much difference.

Edit: These patches were sorted (erroneously) by Y-coordinate. The correct patches, sorted by X-coordinate, are shown below in the second figure.

Wednesday, February 13, 2008

Sorting by X-coordinate

Running the same experiment after sorting the features by X coordinate (instead of saliency), I get these probabilities:

CombinedLogProb =

-240.8994
-206.9385
-228.6303
-249.9772
-293.5449
-261.3568
-279.4719
-255.3435
-270.4987
-296.9481

>> LogProbApp

LogProbApp =

-144.7198
-114.0909
-130.4656
-151.1481
-166.9280
-150.5252
-166.1861
-108.0655
-125.5857
-161.1468

>> LogProbLoc

LogProbLoc =

-96.1796
-92.8476
-98.1648
-98.8291
-126.6168
-110.8315
-113.2858
-147.2780
-144.9131
-135.8014

Using more clean motorbike data

Previously, I had used 20 training images for parameter estimations of the location and appearance models. I re-ran the tests with 47 training images of motorbikes (sans background clutter). I then ran the recognition procedure on 10 different test images consisting of motorbikes, cars and faces.




The results were as follows:

CombinedLogProb =

-249.1764
-233.0733
-226.2195
-293.5680
-257.1131
-304.4388
-284.6287
-251.9015
-254.5245
-297.4117

>> LogProbApp

LogProbApp =

-141.6296
-118.7043
-118.2580
-179.9357
-138.6968
-186.3235
-162.7556
-110.6465
-137.2324
-161.1289

>> LogProbLoc

LogProbLoc =

-107.5468
-114.3690
-107.9615
-113.6323
-118.4163
-118.1153
-121.8730
-141.2551
-117.2921
-136.2827

Images 1-4 were bikes, 5-7 were cars, 8-9 were faces and 10 was another bike.

Wednesday, February 6, 2008

Model learning and recognition sans clutter and occlusion (for now)

Rob Fergus has been kind enough to email me a link to his code for his CVPR '03 paper. However, it's not running for me at the moment and it seems that I need to recompile some MEX files. The difficulty with that is that there is a different version of the gcc compiler installed on the Linux workstations in the APE Lab than the one that's needed and installing that first is a bit of a pain.

So I'm going ahead with this on my own at the moment. The main complications in this method arise from trying to deal with occlusion and clutter. That's what forces an exhaustive search over an exponentially large hypothesis space during both learning and recognition. For now, I'll work with clean data and assume all the features arise from the object and not from background (as is the case with most of the images in the Caltech motorbike dataset).

Using this idea, I ran an experiment with 20 training images and about 10 features (also equal to the number of parts, since all features are assumed to arise from the object for now). Since there is no hidden variable now, I estimated the parameters for the appearance of each part using plain Maximum Likelihood estimation. In addition, I estimated the ML parameters for the joint density of the locations of all parts. Then, using these parameters, I ran the recognition procedure on the following images:The first three images were selected from within the training set of 20 images. Thus, the probability of recognition is expected to be high for these. The last image is selected from outside the training set and is deliberately chosen to be quite dissimilar from the training images. While running the code for recognition, there were numerical issues due to the location parameters being ill-conditioned. The covariance matrix of the joint Gaussian density for the locations of the parts was nearly singular. Perhaps this happened because I wasn't using enough data. Also, I haven't imposed an ordering constraint on the X coordinates of the features detected. If I look at the log probabilities for recognition from just the appearance models, they were -50.9192, -54.2892, -57.3182 and -792.5911 for the 4 images respectively.

It's probably a good thing that the fourth image had a lower matching probability as it does seem quite different from the other motorbike images in the training data.