Project Description

Presenter: James M. Rehg, Ph.D.

Presented: August 6, 2015

Recent progress in miniaturizing digital cameras and improving battery life has created a growing market for wearable cameras, exemplified by products such as GoPro and Google Glass. At the same time, the field of computer vision, which is concerned with the automatic extraction of information about the world from images and video, has also made rapid progress due to the increasing availability of image data, increases in computational power, and the emergence of machine learning methods such as deep learning.

The analysis of video captured from body-worn cameras is an emerging subfield of computer vision known as First Person Vision (FPV). FPV provides new opportunities to model and analyze human behavior, create personalized records of visual experiences, and improve the treatment of a broad range of mental and physical health conditions. In this talk I will provide an introduction to some of the concepts and methods from computer vision which underlie the analysis of first person videos. In particular, I will focus the automatic analysis of video to track the motion of the camera and recover the 3D geometry of the scene, recognize activities, and detect and recognize objects of interest. I will also briefly discuss the role of visual attention in FPV. The presentation won’t assume any prior knowledge of computer vision. This is the first talk in a series of two talks, and the second presentation, scheduled for Aug. 20, will focus on specific FPV technologies in the context of MD2K.

Learning objectives

Following the presentation, attendees will be able to:

  • Describe some basic analysis goals for first person video and identify some of the challenges posed by automatic video analysis.
  • Summarize the relationship between the movement of a body-worn camera in 3D, the motion induced in a video sequence, and methods for estimating video motion.
  • Outline a basic approach to activity recognition in first person video using either object or motion features, including the major system components and sources of error.

Suggested Readings

About the Presenter

Dr. James M. Rehg  (pronounced “ray”) is a Professor in the School of Interactive Computing at the Georgia Institute of Technology, where he is co-Director of the Computational Perception Lab (CPL) and Director of the Center for Behavioral Imaging. More about Dr. Rehg.

Project Details