Powered By Blogger

Thursday, June 15, 2006

Computer vision and understanding

From ScienceDaily:

Now [...] researchers in Carnegie Mellon University's School of Computer Science have found a way to help computers understand the geometric context of outdoor scenes and thus better comprehend what they see. The discovery promises to revive an area of computer vision research all but abandoned two decades ago because it seemed insoluble. It may ultimately find application in vision systems used to guide robotic vehicles, monitor security cameras and archive photos.

Using machine learning techniques, Robotics Institute researchers Alexei Efros and Martial Hebert, along with graduate student Derek Hoiem, have taught computers how to spot the visual cues that differentiate between vertical surfaces and horizontal surfaces in photographs of outdoor scenes. They've even developed a program that allows the computer to automatically generate 3-D reconstructions of scenes based on a single image.

This is a tremendous advance. On the surface, no pun intended, this reads like a small step in computers somehow telling if edges run vertically or horizontally. In actuality, as I see this (pun intended) , it is a way for a computer to interpret what it sees in a very human-like way.

Let me explain.

When we see something, we don't just see it as a computer might see an image. When a computer sees an image, it merely computes the picture samples (called pixels) within its field of view. Pixels are just numbers put in a certain order, like a rectangle. That's all.

When humans see a scene or image, we see the pixels with our eyes and we also understand what we see to interpret the image in a familiar context. It is this last part that is vital to humans and now to computers. To see what I'm talking about, recall that you can tell the distance to a building simply by looking. In truth, there's no real way for you to know that distance. You can't actually tell because usually you don't know the height of the building.

The way you intuit the distance is through your understanding and early knowledge of the usual sizes of buildings (say, by the number of rows of windows that tell you how many floors there are). From this information you get an idea for the distance to that building. That is, your mind allows you to tell distance based on image appreciation from earlier experiences.

This step with computers being capable of similar reasoning and learning is truly amazing and one more step to computers seeing as we see.

No comments: