Wednesday, February 6, 2008

Model learning and recognition sans clutter and occlusion (for now)

Rob Fergus has been kind enough to email me a link to his code for his CVPR '03 paper. However, it's not running for me at the moment and it seems that I need to recompile some MEX files. The difficulty with that is that there is a different version of the gcc compiler installed on the Linux workstations in the APE Lab than the one that's needed and installing that first is a bit of a pain.

So I'm going ahead with this on my own at the moment. The main complications in this method arise from trying to deal with occlusion and clutter. That's what forces an exhaustive search over an exponentially large hypothesis space during both learning and recognition. For now, I'll work with clean data and assume all the features arise from the object and not from background (as is the case with most of the images in the Caltech motorbike dataset).

Using this idea, I ran an experiment with 20 training images and about 10 features (also equal to the number of parts, since all features are assumed to arise from the object for now). Since there is no hidden variable now, I estimated the parameters for the appearance of each part using plain Maximum Likelihood estimation. In addition, I estimated the ML parameters for the joint density of the locations of all parts. Then, using these parameters, I ran the recognition procedure on the following images:The first three images were selected from within the training set of 20 images. Thus, the probability of recognition is expected to be high for these. The last image is selected from outside the training set and is deliberately chosen to be quite dissimilar from the training images. While running the code for recognition, there were numerical issues due to the location parameters being ill-conditioned. The covariance matrix of the joint Gaussian density for the locations of the parts was nearly singular. Perhaps this happened because I wasn't using enough data. Also, I haven't imposed an ordering constraint on the X coordinates of the features detected. If I look at the log probabilities for recognition from just the appearance models, they were -50.9192, -54.2892, -57.3182 and -792.5911 for the 4 images respectively.

It's probably a good thing that the fourth image had a lower matching probability as it does seem quite different from the other motorbike images in the training data.

No comments: