<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5796160763961086590</id><updated>2011-07-07T13:31:55.357-07:00</updated><title type='text'>Object categorization with generative models</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>18</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-7568337951135445926</id><published>2008-03-17T16:41:00.000-07:00</published><updated>2008-03-17T16:45:59.291-07:00</updated><title type='text'>Project Report</title><content type='html'>is uploaded &lt;a href="http://www.cse.ucsd.edu/%7Eysaraf/projects/ysaraf.pdf"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-7568337951135445926?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/7568337951135445926/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=7568337951135445926' title='40 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7568337951135445926'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7568337951135445926'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/03/project-report.html' title='Project Report'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>40</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-8237254456139552596</id><published>2008-03-12T12:06:00.000-07:00</published><updated>2008-03-12T12:09:35.426-07:00</updated><title type='text'>Presentation Slides</title><content type='html'>Are uploaded &lt;a href="http://www.cse.ucsd.edu/~ysaraf/projects/ysaraf_presentation.ppt"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-8237254456139552596?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/8237254456139552596/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=8237254456139552596' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/8237254456139552596'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/8237254456139552596'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/03/presentation-slides.html' title='Presentation Slides'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-8791481617588232042</id><published>2008-03-10T15:27:00.001-07:00</published><updated>2008-12-10T09:40:37.202-08:00</updated><title type='text'>Levenshtein Distances working nicely</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R9W2CAbnShI/AAAAAAAAAfs/Eo2k-7k4mi0/s1600-h/1-NN-dists.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R9W2CAbnShI/AAAAAAAAAfs/Eo2k-7k4mi0/s400/1-NN-dists.jpg" alt="" id="BLOGGER_PHOTO_ID_5176243492324067858" border="0" /&gt;&lt;/a&gt;These are the 1-NN edit distances to the training set for each of my 9 test images. The first 4 motorbike images are showing a significantly lower edit distance than the last 5. The edit distances are computed by considering the extracted appearance patches to be words where each patch is a single character. Here, the matching cost between two image patches was computed using a straight SSD between the patches. The cost of inserting a gap was computed as the matching cost (SSD) of the patch with a canonical 11x11 patch having uniform intensity of 128.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-8791481617588232042?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/8791481617588232042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=8791481617588232042' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/8791481617588232042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/8791481617588232042'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/03/levenshtein-distances-working-nicely.html' title='Levenshtein Distances working nicely'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_pLBr18s-GoE/R9W2CAbnShI/AAAAAAAAAfs/Eo2k-7k4mi0/s72-c/1-NN-dists.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-7283935211037010865</id><published>2008-03-07T13:32:00.000-08:00</published><updated>2008-03-07T13:39:55.163-08:00</updated><title type='text'>My project report..</title><content type='html'>in its draft form is &lt;a href="http://www.cse.ucsd.edu/%7Eysaraf/projects/project_draft.pdf"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-7283935211037010865?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/7283935211037010865/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=7283935211037010865' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7283935211037010865'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7283935211037010865'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/03/my-project-report.html' title='My project report..'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-7830976828579817859</id><published>2008-03-03T02:02:00.000-08:00</published><updated>2008-12-10T09:40:37.895-08:00</updated><title type='text'>Three are better than two</title><content type='html'>When I use three Gaussian components instead of &lt;a href="http://learningtoseethings.blogspot.com/2008/03/throwing-in-more-gaussians.html"&gt;two&lt;/a&gt;, things look a little better. &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R8vNR83g-9I/AAAAAAAAAfk/g-J6gSPe2a4/s1600-h/LogProbApp_MoG3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R8vNR83g-9I/AAAAAAAAAfk/g-J6gSPe2a4/s400/LogProbApp_MoG3.jpg" alt="" id="BLOGGER_PHOTO_ID_5173454305245592530" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R8vNRc3g-8I/AAAAAAAAAfc/WHuLazU6QL4/s1600-h/LogProb_MoG3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R8vNRc3g-8I/AAAAAAAAAfc/WHuLazU6QL4/s400/LogProb_MoG3.jpg" alt="" id="BLOGGER_PHOTO_ID_5173454296655657922" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-7830976828579817859?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/7830976828579817859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=7830976828579817859' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7830976828579817859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7830976828579817859'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/03/three-are-better-than-two.html' title='Three are better than two'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_pLBr18s-GoE/R8vNR83g-9I/AAAAAAAAAfk/g-J6gSPe2a4/s72-c/LogProbApp_MoG3.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-7990847133494974707</id><published>2008-03-03T01:15:00.001-08:00</published><updated>2008-12-10T09:40:38.387-08:00</updated><title type='text'>Throwing in more Gaussians</title><content type='html'>The variability in the appearance of a single part across different training images &lt;a href="http://learningtoseethings.blogspot.com/2008/02/fixing-bug.html"&gt;here&lt;/a&gt; suggests  that a single Guassian may not be sufficient in capturing the underlying data. I decided to try out a mixture of Gaussians for each part (with diagonal covariances). The &lt;a href="http://www.ncrg.aston.ac.uk/netlab/"&gt;Netlab&lt;/a&gt; software for Matlab turned out to be very useful here as it has inbuilt routines for learning and using Gaussian mixture models (e.g. gmm, gmminit, gmmem and gmmprob scripts were a big help).&lt;br /&gt;&lt;br /&gt;Here are the resulting log probabilites when using 2 mixture components for each part's appearance. In this case,  the default EM initialization is used (uniform priors, random means and identity covariances).&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R8vBys3g-2I/AAAAAAAAAes/25v-j7l-akU/s1600-h/LogProbApp_MoG2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R8vBys3g-2I/AAAAAAAAAes/25v-j7l-akU/s400/LogProbApp_MoG2.jpg" alt="" id="BLOGGER_PHOTO_ID_5173441673746774882" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pLBr18s-GoE/R8vBzM3g-3I/AAAAAAAAAe0/bueHKsQq7P4/s1600-h/LogProb_MoG2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_pLBr18s-GoE/R8vBzM3g-3I/AAAAAAAAAe0/bueHKsQq7P4/s400/LogProb_MoG2.jpg" alt="" id="BLOGGER_PHOTO_ID_5173441682336709490" border="0" /&gt;&lt;/a&gt;Next, EM was initialized using the gmminit script, which initializes the centers and priors using k-means on the data. The covariance matrices are calculated as the sample covariance of the points closest to the corresponding centres.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R8vJXs3g-4I/AAAAAAAAAe8/qimUjxsvRWo/s1600-h/LogProbApp_MoG2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R8vJXs3g-4I/AAAAAAAAAe8/qimUjxsvRWo/s400/LogProbApp_MoG2.jpg" alt="" id="BLOGGER_PHOTO_ID_5173450005983329154" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R8vJX83g-5I/AAAAAAAAAfE/NHJt5YXk6HY/s1600-h/LogProb_MoG2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R8vJX83g-5I/AAAAAAAAAfE/NHJt5YXk6HY/s400/LogProb_MoG2.jpg" alt="" id="BLOGGER_PHOTO_ID_5173450010278296466" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-7990847133494974707?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/7990847133494974707/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=7990847133494974707' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7990847133494974707'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7990847133494974707'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/03/throwing-in-more-gaussians.html' title='Throwing in more Gaussians'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_pLBr18s-GoE/R8vBys3g-2I/AAAAAAAAAes/25v-j7l-akU/s72-c/LogProbApp_MoG2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-396090251073517857</id><published>2008-03-02T14:35:00.001-08:00</published><updated>2008-12-10T09:40:38.963-08:00</updated><title type='text'>Reducing dimensionality with random projections instead of PCA</title><content type='html'>Instead of reducing the dimensionality of the appearance patches using PCA, I tried using a  random projection matrix instead (similar to the one defined in question 1 &lt;a href="http://www-cse.ucsd.edu/classes/fa07/cse252c/hw2.pdf"&gt;here&lt;/a&gt;). The matrix was generated once during training and the same one was used again during testing. This approach does not seem to work any better than the previous PCA approach.&lt;br /&gt;&lt;br /&gt;Here are the total log probabilities of the same test images that were used previously. Image 1 has taken an undesirable dip and image 4 hasn't been pulled up enough from the other negative test images.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pLBr18s-GoE/R8ssDM3g-yI/AAAAAAAAAeM/Yze_3wSTMDc/s1600-h/LogProb_G.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_pLBr18s-GoE/R8ssDM3g-yI/AAAAAAAAAeM/Yze_3wSTMDc/s400/LogProb_G.jpg" alt="" id="BLOGGER_PHOTO_ID_5173277030470449954" border="0" /&gt;&lt;/a&gt;The appearances of the negative test images has gone up relative to the bike images.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R8ssDs3g-zI/AAAAAAAAAeU/tMzlSjH6RQc/s1600-h/LogProbApp_G.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R8ssDs3g-zI/AAAAAAAAAeU/tMzlSjH6RQc/s400/LogProbApp_G.jpg" alt="" id="BLOGGER_PHOTO_ID_5173277039060384562" border="0" /&gt;&lt;/a&gt;Of course, the location probabilities are exactly the same as before because the these are unaffected by the method of dimensionality reduction on appearance patches.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pLBr18s-GoE/R8ssGM3g-0I/AAAAAAAAAec/79Lx5ov-VdY/s1600-h/logProbLoc_G.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_pLBr18s-GoE/R8ssGM3g-0I/AAAAAAAAAec/79Lx5ov-VdY/s400/logProbLoc_G.jpg" alt="" id="BLOGGER_PHOTO_ID_5173277082010057538" border="0" /&gt;&lt;/a&gt;Here are the reconstructed patches obtained by back projecting to 121 dimensions. For this, the reduced dimensionality patches were multiplied by the pseudo-inverse of the random projection matrix that was used.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R8ssGs3g-1I/AAAAAAAAAek/zw0A4GDblx0/s1600-h/reconstructed_G.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R8ssGs3g-1I/AAAAAAAAAek/zw0A4GDblx0/s400/reconstructed_G.jpg" alt="" id="BLOGGER_PHOTO_ID_5173277090599992146" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-396090251073517857?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/396090251073517857/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=396090251073517857' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/396090251073517857'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/396090251073517857'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/03/reducing-dimensionality-with-random.html' title='Reducing dimensionality with random projections instead of PCA'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_pLBr18s-GoE/R8ssDM3g-yI/AAAAAAAAAeM/Yze_3wSTMDc/s72-c/LogProb_G.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-7671980270585514156</id><published>2008-02-27T15:07:00.001-08:00</published><updated>2008-12-10T09:40:40.248-08:00</updated><title type='text'>Fixing a bug</title><content type='html'>&lt;div style="text-align: left;"&gt;After closer examination of the extracted patches, I discovered that the sorting of the patches was actually happening by Y-coordinate instead of X. I fixed that bug and then the tests were showing better results.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;This figure shows  the patches extracted from the 47 training images. Each row shows the 10 patches extracted from a single motorbike training image, now sorted by X-coordinate.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XttDbXn3I/AAAAAAAAAdU/-J8bYMEO_kU/s1600-h/bike_patches.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XttDbXn3I/AAAAAAAAAdU/-J8bYMEO_kU/s400/bike_patches.jpg" alt="" id="BLOGGER_PHOTO_ID_5171801105374879602" border="0" /&gt;&lt;/a&gt;The image below shows the 9 images used for testing (in row major order):&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R8Xt_jbXn4I/AAAAAAAAAdc/OUGhCPSqGj8/s1600-h/test_images.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R8Xt_jbXn4I/AAAAAAAAAdc/OUGhCPSqGj8/s400/test_images.jpg" alt="" id="BLOGGER_PHOTO_ID_5171801423202459522" border="0" /&gt;&lt;/a&gt;Here are the resulting log probabilities (for location, appearance and sum) for each of the test images:&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R8XuGjbXn5I/AAAAAAAAAdk/aQS9DreYT24/s1600-h/logProbLoc.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R8XuGjbXn5I/AAAAAAAAAdk/aQS9DreYT24/s400/logProbLoc.jpg" alt="" id="BLOGGER_PHOTO_ID_5171801543461543826" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XuLDbXn6I/AAAAAAAAAds/7EECP9TitCU/s1600-h/LogProbApp.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XuLDbXn6I/AAAAAAAAAds/7EECP9TitCU/s400/LogProbApp.jpg" alt="" id="BLOGGER_PHOTO_ID_5171801620770955170" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XuPDbXn7I/AAAAAAAAAd0/yvjeFb0p2cI/s1600-h/LogProb.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XuPDbXn7I/AAAAAAAAAd0/yvjeFb0p2cI/s400/LogProb.jpg" alt="" id="BLOGGER_PHOTO_ID_5171801689490431922" border="0" /&gt;&lt;/a&gt;The location probability of the fifth image, which is a car, is quite high. This can be seen easily from the locations of the extracted patches &lt;a href="http://learningtoseethings.blogspot.com/2008/02/using-more-clean-motorbike-data.html"&gt;here&lt;/a&gt; (which also shows why the location probability of the ninth image is so low, as should be the case). However, the appearance probability for it is low. In general, the locations of the patches are doing a better job at differentiating the classes.  The appearance probability of the fourth bike is very low. Showing it below along with its extracted patches.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XyvDbXn8I/AAAAAAAAAd8/6gzwhwlidLI/s1600-h/bike4.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XyvDbXn8I/AAAAAAAAAd8/6gzwhwlidLI/s400/bike4.jpg" alt="" id="BLOGGER_PHOTO_ID_5171806637292756930" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R8XyvTbXn9I/AAAAAAAAAeE/lMD6T9ZXWrE/s1600-h/bike4patches.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R8XyvTbXn9I/AAAAAAAAAeE/lMD6T9ZXWrE/s400/bike4patches.jpg" alt="" id="BLOGGER_PHOTO_ID_5171806641587724242" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-7671980270585514156?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/7671980270585514156/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=7671980270585514156' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7671980270585514156'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7671980270585514156'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/02/fixing-bug.html' title='Fixing a bug'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_pLBr18s-GoE/R8XttDbXn3I/AAAAAAAAAdU/-J8bYMEO_kU/s72-c/bike_patches.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-6131803752708760774</id><published>2008-02-20T15:55:00.000-08:00</published><updated>2008-12-10T09:40:40.619-08:00</updated><title type='text'>Reconstructed patches</title><content type='html'>For debugging purposes, I reconstructed the patches by projecting them back to 121-dimensions and displaying them as images. The first four rows show motorbike patches and the remaining rows show patches from cars and faces. Can't see much difference.&lt;br /&gt;&lt;br /&gt;Edit: These patches were sorted (erroneously) by Y-coordinate. The correct patches, sorted by X-coordinate, are shown below in the second figure.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pLBr18s-GoE/R7y-VDbXn1I/AAAAAAAAAdE/1W3ml2e5Jiw/s1600-h/reconstructed.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_pLBr18s-GoE/R7y-VDbXn1I/AAAAAAAAAdE/1W3ml2e5Jiw/s400/reconstructed.jpg" alt="" id="BLOGGER_PHOTO_ID_5169215741221117778" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XsBDbXn2I/AAAAAAAAAdM/4XFhhXv6hf0/s1600-h/reconstructed.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R8XsBDbXn2I/AAAAAAAAAdM/4XFhhXv6hf0/s400/reconstructed.jpg" alt="" id="BLOGGER_PHOTO_ID_5171799249949007714" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-6131803752708760774?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/6131803752708760774/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=6131803752708760774' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/6131803752708760774'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/6131803752708760774'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/02/reconstructed-patches.html' title='Reconstructed patches'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_pLBr18s-GoE/R7y-VDbXn1I/AAAAAAAAAdE/1W3ml2e5Jiw/s72-c/reconstructed.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-1483649946667480323</id><published>2008-02-13T16:21:00.000-08:00</published><updated>2008-02-13T16:23:20.397-08:00</updated><title type='text'>Sorting by X-coordinate</title><content type='html'>Running the same experiment after sorting the features by X coordinate (instead of saliency), I get these probabilities:&lt;br /&gt;&lt;br /&gt;CombinedLogProb =&lt;br /&gt;&lt;br /&gt; -240.8994&lt;br /&gt; -206.9385&lt;br /&gt; -228.6303&lt;br /&gt; -249.9772&lt;br /&gt; -293.5449&lt;br /&gt; -261.3568&lt;br /&gt; -279.4719&lt;br /&gt; -255.3435&lt;br /&gt; -270.4987&lt;br /&gt; -296.9481&lt;br /&gt;&lt;br /&gt;&gt;&gt; LogProbApp&lt;br /&gt;&lt;br /&gt;LogProbApp =&lt;br /&gt;&lt;br /&gt; -144.7198&lt;br /&gt; -114.0909&lt;br /&gt; -130.4656&lt;br /&gt; -151.1481&lt;br /&gt; -166.9280&lt;br /&gt; -150.5252&lt;br /&gt; -166.1861&lt;br /&gt; -108.0655&lt;br /&gt; -125.5857&lt;br /&gt; -161.1468&lt;br /&gt;&lt;br /&gt;&gt;&gt; LogProbLoc&lt;br /&gt;&lt;br /&gt;LogProbLoc =&lt;br /&gt;&lt;br /&gt;  -96.1796&lt;br /&gt;  -92.8476&lt;br /&gt;  -98.1648&lt;br /&gt;  -98.8291&lt;br /&gt; -126.6168&lt;br /&gt; -110.8315&lt;br /&gt; -113.2858&lt;br /&gt; -147.2780&lt;br /&gt; -144.9131&lt;br /&gt; -135.8014&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-1483649946667480323?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/1483649946667480323/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=1483649946667480323' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/1483649946667480323'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/1483649946667480323'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/02/sorting-by-x-coordinate.html' title='Sorting by X-coordinate'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-4125310463617953134</id><published>2008-02-13T15:07:00.000-08:00</published><updated>2008-12-10T09:40:40.952-08:00</updated><title type='text'>Using more clean motorbike data</title><content type='html'>Previously, I had used 20 training images for parameter estimations of the location and appearance models. I re-ran the tests with 47 training images of motorbikes (sans background clutter). I then ran the recognition procedure on 10 different test images consisting of motorbikes, cars and faces.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R7N9YzbXnzI/AAAAAAAAAc0/WdEDxiw6EyY/s1600-h/test_image9out.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R7N9YzbXnzI/AAAAAAAAAc0/WdEDxiw6EyY/s400/test_image9out.jpg" alt="" id="BLOGGER_PHOTO_ID_5166611062599425842" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pLBr18s-GoE/R7N9UTbXnyI/AAAAAAAAAcs/QwXPdyLCH14/s1600-h/test_image5out.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_pLBr18s-GoE/R7N9UTbXnyI/AAAAAAAAAcs/QwXPdyLCH14/s400/test_image5out.jpg" alt="" id="BLOGGER_PHOTO_ID_5166610985290014498" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The results were as follows:&lt;br /&gt;&lt;br /&gt;CombinedLogProb =&lt;br /&gt;&lt;br /&gt;-249.1764&lt;br /&gt;-233.0733&lt;br /&gt;-226.2195&lt;br /&gt;-293.5680&lt;br /&gt;-257.1131&lt;br /&gt;-304.4388&lt;br /&gt;-284.6287&lt;br /&gt;-251.9015&lt;br /&gt;-254.5245&lt;br /&gt;-297.4117&lt;br /&gt;&lt;br /&gt;&gt;&gt; LogProbApp&lt;br /&gt;&lt;br /&gt;LogProbApp =&lt;br /&gt;&lt;br /&gt;-141.6296&lt;br /&gt;-118.7043&lt;br /&gt;-118.2580&lt;br /&gt;-179.9357&lt;br /&gt;-138.6968&lt;br /&gt;-186.3235&lt;br /&gt;-162.7556&lt;br /&gt;-110.6465&lt;br /&gt;-137.2324&lt;br /&gt;-161.1289&lt;br /&gt;&lt;br /&gt;&gt;&gt; LogProbLoc&lt;br /&gt;&lt;br /&gt;LogProbLoc =&lt;br /&gt;&lt;br /&gt;-107.5468&lt;br /&gt;-114.3690&lt;br /&gt;-107.9615&lt;br /&gt;-113.6323&lt;br /&gt;-118.4163&lt;br /&gt;-118.1153&lt;br /&gt;-121.8730&lt;br /&gt;-141.2551&lt;br /&gt;-117.2921&lt;br /&gt;-136.2827&lt;br /&gt;&lt;br /&gt;Images 1-4 were bikes, 5-7 were cars, 8-9 were faces and 10 was another bike.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-4125310463617953134?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/4125310463617953134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=4125310463617953134' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/4125310463617953134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/4125310463617953134'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/02/using-more-clean-motorbike-data.html' title='Using more clean motorbike data'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_pLBr18s-GoE/R7N9YzbXnzI/AAAAAAAAAc0/WdEDxiw6EyY/s72-c/test_image9out.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-1389481253360333603</id><published>2008-02-06T00:47:00.000-08:00</published><updated>2008-12-10T09:40:43.954-08:00</updated><title type='text'>Model learning and recognition sans clutter and occlusion (for now)</title><content type='html'>Rob Fergus has been kind enough to email me a link to his code for his CVPR '03 paper. However, it's not running for me at the moment and it seems that I need to recompile some MEX files. The difficulty with that is that there is a different version of the gcc compiler installed on the Linux workstations in the APE Lab than the one that's needed and installing that first is a bit of a pain.&lt;br /&gt;&lt;br /&gt;So I'm going ahead with this on my own at the moment. The main complications in this method arise from trying to deal with occlusion and clutter. That's what forces an exhaustive search over an exponentially large hypothesis space during both learning and recognition. For now, I'll work with clean data and assume all the features arise from the object and not from background (as is the case with most of the images in the Caltech motorbike dataset).&lt;br /&gt;&lt;br /&gt;Using this idea, I ran an experiment with 20 training images and about 10 features (also equal to the number of parts, since all features are assumed to arise from the object for now). Since there is no hidden variable now, I estimated the parameters for the appearance of each part using plain Maximum Likelihood estimation. In addition, I estimated the ML parameters for the joint density of the locations of all parts. Then, using these parameters, I ran the recognition procedure on the following images:&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R6o_4oA-tUI/AAAAAAAAAcE/jE3pFVwn3k0/s1600-h/image10.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R6o_4oA-tUI/AAAAAAAAAcE/jE3pFVwn3k0/s400/image10.jpg" alt="" id="BLOGGER_PHOTO_ID_5164010164780447042" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R6pBjoA-tVI/AAAAAAAAAcM/OMb06pfCoZw/s1600-h/image11.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R6pBjoA-tVI/AAAAAAAAAcM/OMb06pfCoZw/s400/image11.jpg" alt="" id="BLOGGER_PHOTO_ID_5164012003026449746" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pLBr18s-GoE/R6pBn4A-tWI/AAAAAAAAAcU/N6WANVQ-hIw/s1600-h/image12.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_pLBr18s-GoE/R6pBn4A-tWI/AAAAAAAAAcU/N6WANVQ-hIw/s400/image12.jpg" alt="" id="BLOGGER_PHOTO_ID_5164012076040893794" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pLBr18s-GoE/R6pBs4A-tXI/AAAAAAAAAcc/rsR4DYGUbjI/s1600-h/image40.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_pLBr18s-GoE/R6pBs4A-tXI/AAAAAAAAAcc/rsR4DYGUbjI/s400/image40.jpg" alt="" id="BLOGGER_PHOTO_ID_5164012161940239730" border="0" /&gt;&lt;/a&gt;The first three images were selected from within the training set of 20 images. Thus, the probability of recognition is expected to be high for these. The last image is selected from outside the training set and is deliberately chosen to be quite dissimilar from the training images. While running the code for recognition, there were numerical issues due to the location parameters being ill-conditioned. The covariance matrix of the joint Gaussian density for the locations of the parts was nearly singular. Perhaps this happened because I wasn't using enough data. Also, I haven't imposed an ordering constraint on the X coordinates of the features detected. If I look at the log probabilities for recognition from just the appearance models, they were -50.9192, -54.2892, -57.3182 and -792.5911 for the 4 images respectively.&lt;br /&gt;&lt;br /&gt;It's probably a good thing that the fourth image had a lower matching probability as it does seem quite different from the other motorbike images in the training data.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-1389481253360333603?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/1389481253360333603/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=1389481253360333603' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/1389481253360333603'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/1389481253360333603'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/02/model-learning-and-recognition-sans.html' title='Model learning and recognition sans clutter and occlusion (for now)'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_pLBr18s-GoE/R6o_4oA-tUI/AAAAAAAAAcE/jE3pFVwn3k0/s72-c/image10.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-5001546244325867526</id><published>2008-01-30T13:52:00.000-08:00</published><updated>2008-12-10T09:40:44.994-08:00</updated><title type='text'>Extracting features from faces and cars</title><content type='html'>So the appearance extraction process seems to be working quite well for bikes with a starting scale of 23. I wasn't sure that a single scale will work well for all categories. The detected features for faces and the tiled appearance patches are shown below:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_pLBr18s-GoE/R6D0HYA-tQI/AAAAAAAAAbk/ApQklY5k2S8/s1600-h/salient_features_faces.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_pLBr18s-GoE/R6D0HYA-tQI/AAAAAAAAAbk/ApQklY5k2S8/s400/salient_features_faces.png" alt="" id="BLOGGER_PHOTO_ID_5161393580509410562" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R6D0NIA-tRI/AAAAAAAAAbs/gl--BGBKPDc/s1600-h/face_patches.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R6D0NIA-tRI/AAAAAAAAAbs/gl--BGBKPDc/s400/face_patches.png" alt="" id="BLOGGER_PHOTO_ID_5161393679293658386" border="0" /&gt;&lt;/a&gt;Perhaps, a smaller starting scale would work better? But that would mean tweaking the starting scale for each different type of category which would defeat the whole purpose. So that's ruled out. Here are similar results for cars:&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R6D0YIA-tSI/AAAAAAAAAb0/uy6KqP4FpMo/s1600-h/salient_features_cars.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R6D0YIA-tSI/AAAAAAAAAb0/uy6KqP4FpMo/s400/salient_features_cars.png" alt="" id="BLOGGER_PHOTO_ID_5161393868272219426" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R6D0eIA-tTI/AAAAAAAAAb8/jJdL-vt1MrE/s1600-h/cars_patches.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R6D0eIA-tTI/AAAAAAAAAb8/jJdL-vt1MrE/s400/cars_patches.png" alt="" id="BLOGGER_PHOTO_ID_5161393971351434546" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-5001546244325867526?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/5001546244325867526/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=5001546244325867526' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/5001546244325867526'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/5001546244325867526'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/01/extracting-features-from-faces-and-cars.html' title='Extracting features from faces and cars'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_pLBr18s-GoE/R6D0HYA-tQI/AAAAAAAAAbk/ApQklY5k2S8/s72-c/salient_features_faces.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-7849778195920402593</id><published>2008-01-30T00:15:00.001-08:00</published><updated>2008-12-10T09:40:45.496-08:00</updated><title type='text'>Improving the appearances of the parts</title><content type='html'>Looking at the features that were extracted earlier, they didn't seem to be providing much information. It's quite difficult even for a human to look at &lt;a href="http://learningtoseethings.blogspot.com/2008/01/appearance-of-detected-of-features.html"&gt;those features&lt;/a&gt; extracted and say that they belong to a motorbike. So I compared the results of my feature detection phase (which looked mostly like &lt;a href="http://3.bp.blogspot.com/_pLBr18s-GoE/R6A59oA-tNI/AAAAAAAAAbM/08hyQAt_B9c/s1600-h/image_0001_out2.jpg"&gt;this&lt;/a&gt;) with the results of feature detection from Rob Fergus' paper which looks like this:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R6Ayr4A-tJI/AAAAAAAAAas/Xw_VET7VHt4/s1600-h/salient_features_fergus.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R6Ayr4A-tJI/AAAAAAAAAas/Xw_VET7VHt4/s320/salient_features_fergus.PNG" alt="" id="BLOGGER_PHOTO_ID_5161180902318847122" border="0" /&gt;&lt;/a&gt;The problem seemed to be the scale of the features detected. Somehow small, local features were firing more strongly than more important larger features. I started gradually increasing the smallest scale admissible for detected features and finally settled on a starting scale of 23 (earlier it was 3). Using this value for starting scale and choosing the top 20 saliency values, the outputs on various bikes looked like this:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_pLBr18s-GoE/R6A1fIA-tKI/AAAAAAAAAa0/j2N7AAV9UVE/s1600-h/salient_features.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_pLBr18s-GoE/R6A1fIA-tKI/AAAAAAAAAa0/j2N7AAV9UVE/s400/salient_features.png" alt="" id="BLOGGER_PHOTO_ID_5161183981810398370" border="0" /&gt;&lt;/a&gt;This seems much better and closer to the output of Fergus et. al. I extracted these newly detected features, resized them and tiled them into the image shown below. The 9 rows show the rescaled features (into an 11 x 11 patch) extracted from the 9 motorbikes shown above in row major order.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R6A2IoA-tMI/AAAAAAAAAbE/UHFip4PW-vg/s1600-h/bike_patches.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R6A2IoA-tMI/AAAAAAAAAbE/UHFip4PW-vg/s400/bike_patches.png" alt="" id="BLOGGER_PHOTO_ID_5161184694774969538" border="0" /&gt;&lt;/a&gt;Now, we can at least see the tyres of the motorbike in almost all the input images. The new appearances of the parts seem to provide more information about the image's category.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-7849778195920402593?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/7849778195920402593/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=7849778195920402593' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7849778195920402593'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7849778195920402593'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/01/improving-appearances-of-parts.html' title='Improving the appearances of the parts'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_pLBr18s-GoE/R6Ayr4A-tJI/AAAAAAAAAas/Xw_VET7VHt4/s72-c/salient_features_fergus.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-4481909039474474247</id><published>2008-01-28T15:32:00.000-08:00</published><updated>2008-12-10T09:40:45.716-08:00</updated><title type='text'>Appearance of detected of features</title><content type='html'>I ran through a bunch of motorbike images (Caltech dataset) and ran the feature detector on them. I extracted an appearance patch around the top 20 features in each image. The picture below shows what those patches look like (from 10 images).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R55nGYA-tII/AAAAAAAAAak/cjtKUXzw4xU/s1600-h/patches.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R55nGYA-tII/AAAAAAAAAak/cjtKUXzw4xU/s320/patches.png" alt="" id="BLOGGER_PHOTO_ID_5160675582236603522" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-4481909039474474247?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/4481909039474474247/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=4481909039474474247' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/4481909039474474247'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/4481909039474474247'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/01/appearance-of-detected-of-features.html' title='Appearance of detected of features'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_pLBr18s-GoE/R55nGYA-tII/AAAAAAAAAak/cjtKUXzw4xU/s72-c/patches.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-2281171355160551271</id><published>2008-01-23T14:37:00.000-08:00</published><updated>2008-01-23T15:28:01.890-08:00</updated><title type='text'>Feature Extraction (Appearance)</title><content type='html'>The Kadir and Brady feature detector picks out a bunch of salient features from the image and gives us their locations and scale. For notational convenience, the locations and scales for all these features are aggregated into the vectors &lt;span style="font-weight: bold;"&gt;X&lt;/span&gt; and &lt;span style="font-weight: bold;"&gt;S&lt;/span&gt;. The third key source of information is appearance and we now need to compute the vector &lt;span style="font-weight: bold;"&gt;A&lt;/span&gt; for a given image, which will contain the appearances of all the features.&lt;br /&gt;&lt;br /&gt;For computing appearance of a single feature, it is cropped out of the image using a square mask and then scaled down to an 11 x 11 patch. This patch can be thought of as a single point in a 121-dimensional appearance space. However, 121 dimensions is too high and we need to reduce the dimensionality of the appearance space. This is done using PCA and selecting the top 10-15 components. The best reference for PCA that I have found so far are Prof. Nuno Vasconselos' &lt;a href="http://www.svcl.ucsd.edu/courses/ece271A-F06/handouts/Dimensionality.pdf"&gt;slides&lt;/a&gt; (nos. 28 and 29 give an outline) from his ECE 271A course. My code for computing the principal components from training data and projecting new data onto these principal components is posted &lt;a href="http://docs.google.com/Doc?docid=dt2n4wj_1jd7pb43w&amp;amp;hl=en"&gt;here&lt;/a&gt; and &lt;a href="http://docs.google.com/Doc?docid=dt2n4wj_2cjcczjhr&amp;amp;hl=en"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;During the learning stage, a fixed PCA basis of 10-15 dimensions is computed. This fixed basis is computed by using patches around all detected regions across all training images. I'm not sure if I need to compute a single basis for all the classes or I should compute a separate basis for each class.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-2281171355160551271?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/2281171355160551271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=2281171355160551271' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/2281171355160551271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/2281171355160551271'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/01/feature-extraction-appearance.html' title='Feature Extraction (Appearance)'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-7458495104225522250</id><published>2008-01-16T15:05:00.000-08:00</published><updated>2008-12-10T09:40:46.172-08:00</updated><title type='text'>Detecting Salient Regions</title><content type='html'>There is some useful Matlab code &lt;a href="http://www.robots.ox.ac.uk/%7Etimork/salscale.html"&gt;here&lt;/a&gt; for running the Kadir and Brady feature detector. The detected salient regions are marked by circles in the picture. There are probably too many features detected here. The desired number of features should be around 30. I played around a bit with the the parameters in the code and was able to get a reduction in the number of detected features. The new detections are shown in the second figure.&lt;div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_pLBr18s-GoE/R6A6O4A-tOI/AAAAAAAAAbU/dHjw8vlqaBY/s1600-h/image_0001_out1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_pLBr18s-GoE/R6A6O4A-tOI/AAAAAAAAAbU/dHjw8vlqaBY/s400/image_0001_out1.jpg" alt="" id="BLOGGER_PHOTO_ID_5161189200195663074" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_pLBr18s-GoE/R6A59oA-tNI/AAAAAAAAAbM/08hyQAt_B9c/s1600-h/image_0001_out2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_pLBr18s-GoE/R6A59oA-tNI/AAAAAAAAAbM/08hyQAt_B9c/s400/image_0001_out2.jpg" alt="" id="BLOGGER_PHOTO_ID_5161188903842919634" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-7458495104225522250?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/7458495104225522250/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=7458495104225522250' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7458495104225522250'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7458495104225522250'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2008/01/detecting-salient-regions.html' title='Detecting Salient Regions'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_pLBr18s-GoE/R6A6O4A-tOI/AAAAAAAAAbU/dHjw8vlqaBY/s72-c/image_0001_out1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5796160763961086590.post-7154359546105430038</id><published>2007-11-29T00:14:00.000-08:00</published><updated>2007-11-29T00:19:57.804-08:00</updated><title type='text'>For starters..</title><content type='html'>My project proposal is  done and    &lt;a href="http://www-cse.ucsd.edu/classes/wi08/cse190-a/blogs.html"&gt;up&lt;/a&gt;  on the class website. It should be fun working on the project this Winter quarter!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5796160763961086590-7154359546105430038?l=learningtoseethings.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learningtoseethings.blogspot.com/feeds/7154359546105430038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5796160763961086590&amp;postID=7154359546105430038' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7154359546105430038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5796160763961086590/posts/default/7154359546105430038'/><link rel='alternate' type='text/html' href='http://learningtoseethings.blogspot.com/2007/11/for-starters.html' title='For starters..'/><author><name>Yatharth</name><uri>http://www.blogger.com/profile/00878779902122318694</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
