So the appearance extraction process seems to be working quite well for bikes with a starting scale of 23. I wasn't sure that a single scale will work well for all categories. The detected features for faces and the tiled appearance patches are shown below:
Perhaps, a smaller starting scale would work better? But that would mean tweaking the starting scale for each different type of category which would defeat the whole purpose. So that's ruled out. Here are similar results for cars:
Wednesday, January 30, 2008
Improving the appearances of the parts
Looking at the features that were extracted earlier, they didn't seem to be providing much information. It's quite difficult even for a human to look at those features extracted and say that they belong to a motorbike. So I compared the results of my feature detection phase (which looked mostly like this) with the results of feature detection from Rob Fergus' paper which looks like this:
The problem seemed to be the scale of the features detected. Somehow small, local features were firing more strongly than more important larger features. I started gradually increasing the smallest scale admissible for detected features and finally settled on a starting scale of 23 (earlier it was 3). Using this value for starting scale and choosing the top 20 saliency values, the outputs on various bikes looked like this:
This seems much better and closer to the output of Fergus et. al. I extracted these newly detected features, resized them and tiled them into the image shown below. The 9 rows show the rescaled features (into an 11 x 11 patch) extracted from the 9 motorbikes shown above in row major order.
Now, we can at least see the tyres of the motorbike in almost all the input images. The new appearances of the parts seem to provide more information about the image's category.
The problem seemed to be the scale of the features detected. Somehow small, local features were firing more strongly than more important larger features. I started gradually increasing the smallest scale admissible for detected features and finally settled on a starting scale of 23 (earlier it was 3). Using this value for starting scale and choosing the top 20 saliency values, the outputs on various bikes looked like this:
This seems much better and closer to the output of Fergus et. al. I extracted these newly detected features, resized them and tiled them into the image shown below. The 9 rows show the rescaled features (into an 11 x 11 patch) extracted from the 9 motorbikes shown above in row major order.
Now, we can at least see the tyres of the motorbike in almost all the input images. The new appearances of the parts seem to provide more information about the image's category.
Monday, January 28, 2008
Appearance of detected of features
Wednesday, January 23, 2008
Feature Extraction (Appearance)
The Kadir and Brady feature detector picks out a bunch of salient features from the image and gives us their locations and scale. For notational convenience, the locations and scales for all these features are aggregated into the vectors X and S. The third key source of information is appearance and we now need to compute the vector A for a given image, which will contain the appearances of all the features.
For computing appearance of a single feature, it is cropped out of the image using a square mask and then scaled down to an 11 x 11 patch. This patch can be thought of as a single point in a 121-dimensional appearance space. However, 121 dimensions is too high and we need to reduce the dimensionality of the appearance space. This is done using PCA and selecting the top 10-15 components. The best reference for PCA that I have found so far are Prof. Nuno Vasconselos' slides (nos. 28 and 29 give an outline) from his ECE 271A course. My code for computing the principal components from training data and projecting new data onto these principal components is posted here and here.
During the learning stage, a fixed PCA basis of 10-15 dimensions is computed. This fixed basis is computed by using patches around all detected regions across all training images. I'm not sure if I need to compute a single basis for all the classes or I should compute a separate basis for each class.
For computing appearance of a single feature, it is cropped out of the image using a square mask and then scaled down to an 11 x 11 patch. This patch can be thought of as a single point in a 121-dimensional appearance space. However, 121 dimensions is too high and we need to reduce the dimensionality of the appearance space. This is done using PCA and selecting the top 10-15 components. The best reference for PCA that I have found so far are Prof. Nuno Vasconselos' slides (nos. 28 and 29 give an outline) from his ECE 271A course. My code for computing the principal components from training data and projecting new data onto these principal components is posted here and here.
During the learning stage, a fixed PCA basis of 10-15 dimensions is computed. This fixed basis is computed by using patches around all detected regions across all training images. I'm not sure if I need to compute a single basis for all the classes or I should compute a separate basis for each class.
Wednesday, January 16, 2008
Detecting Salient Regions
There is some useful Matlab code here for running the Kadir and Brady feature detector. The detected salient regions are marked by circles in the picture. There are probably too many features detected here. The desired number of features should be around 30. I played around a bit with the the parameters in the code and was able to get a reduction in the number of detected features. The new detections are shown in the second figure.
Subscribe to:
Posts (Atom)