MIT Press, 1990. — 342 p.
The aim of research into computational vision is to perform on a machine that task which our own visual systems appear to achieve quite effortlessly. That task is seeing - taking those two dimensional images that fall on our retinae and from them obtaining percepts of the three dimensional world around us. It is those three dimensional percepts that enable us and, at an infinitely humbler level, our machines to take intelligent actions in the world, actions which most often will involve motion.
Motion in our universe is all pervading, on scales ranging from the atomic to the astronomic. It threads through our everyday existence, and is the symptom of change and purpose. It is crucial to have an understanding of motion and an appreciation of its likely consequences at an early stage in our sensory processes. Robots, if they are to act intelligently in our own everyday environment, will have to be similarly endowed.
To those who, while gazing from the window of a moving car, have considered why roadside objects seem to flash by whereas the distant landscape appears to move hardly at all, it comes as no surprise that moving imagery tells us a great deal about the disposition of the world around us. Emulating this ability to recover information about the world around us by processing such visual motion has been the aim of the research we describe in this book.
But a moment's comparison of the current state of machine visual intelligence with that of a baby only a few months old will indicate that our understanding of, and ability to solve, such natural processing problems is somewhat restricted. Most problems turn out to be hard - and visual motion interpretation is no exception. In these circumstances, setting anything in print seems an act of pompous futility, like scratching in the sand and hoping that the next wave of progress will leave the scribble legible. And yet it seems to us that the last decade of work in the machine interpretation of visual motion has yielded a key fact: that it is both feasible and computationally practicable to derive visual motion from moving imagery, to compute scene structure from that visual motion and, in simple cases, to use that recovered geometry for the recognition of objects. In other words, visual motion can enable traversal of bottom-up, data-driven paths through the visual processing hierarchy, paths which link images to objects. The computational approach espoused perhaps most forcibly in Marr's writings is (yet again) vindicated. This book evidences that through an experimental exploration of a few of those paths.
Image, Scene and Motion
Computing Image Motion
Structure from Motion of Points
The Structure and Motion of Edges
From Edges to Surfaces
Structure and Motion of Planes
Visual Motion Segmentation
Matching to Edge Models
Matching to Planar Surfaces
Commentary