Abstract:
This thesis describes a method for efficient summarization of long surveillance videos.
The method consists of four phases: ground plane calibration, detection and tracking of
scene objects, extracting information about objects in the scene, generating and visualizing
the summarizations. The method assumes a static camera. Both extrinsic parameters—
3D position and orientation, and intrinsic parameters—focal length, principal point, lens
distortion of the camera are unknown. Ground plane calibration is achieved by computing
a homography [1] between the scene and corresponding location in Google Earth. Detection
and tracking are based on techniques described in [2,3]. Planar homography and single view
metrology [4, 5] are used to calculate widths, heights, position and speed of objects in the
scene. The method generates video summarization for video sequence by choosing a single
image of each tracked object and overlaying it on the background image. The method
chooses images of tracked objects in a way to minimize the overlap between them. For
each tracked object its trajectory is shown as a sequence of vectors corresponding to object
motion between successive frames. The method generates video synopsis—a summary of
all activity for specified period of time. Since speeds and sizes of objects are calculated, the
method also generates sequences using various combinations of object properties.