The Kadir and Brady feature detector picks out a bunch of salient features from the image and gives us their locations and scale. For notational convenience, the locations and scales for all these features are aggregated into the vectors X and S. The third key source of information is appearance and we now need to compute the vector A for a given image, which will contain the appearances of all the features.
For computing appearance of a single feature, it is cropped out of the image using a square mask and then scaled down to an 11 x 11 patch. This patch can be thought of as a single point in a 121-dimensional appearance space. However, 121 dimensions is too high and we need to reduce the dimensionality of the appearance space. This is done using PCA and selecting the top 10-15 components. The best reference for PCA that I have found so far are Prof. Nuno Vasconselos' slides (nos. 28 and 29 give an outline) from his ECE 271A course. My code for computing the principal components from training data and projecting new data onto these principal components is posted here and here.
During the learning stage, a fixed PCA basis of 10-15 dimensions is computed. This fixed basis is computed by using patches around all detected regions across all training images. I'm not sure if I need to compute a single basis for all the classes or I should compute a separate basis for each class.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment