W4S is a real time visual surveillance system for detecting and tracking people and monitoring their activities in an outdoor envi- ronment by integrating realtime stereo computation into an intensity- based detection and tracking system. Unlike many systems for tracking people, W4S makes

W 4S: A RealTime System for Detecting andTracking People in 2 1 2 D Ismail Haritaoglu, David Harwo o d and Larry S Davis Computer Vision Lab oratoryUniversity of Maryland College Park, MD 20742, USA AbstractW 4Sis a real time visual surveillan ce system for detecting and tracking p eople and monitoring their activities in an outdo or envi ronmentbyintegrating realtime stereo computation into an intensity based detection and tracking system Unlike many systems for tracking p eople,W 4Smakes no use of color cues Instead,W 4Semploys a com bination of stereo, shap e analysis and tracking to lo cate p eople and their parts (head, hands, feet, torso) and create mo dels of p eople's app earance so that they can b e tracked through interactions such as o cclusionsW 4S is capable of simultaneousl y tracking multiple p eople even with o cclu sion It runs at 520 Hz for 320x120 resolution images on a dualp entium 200 PC 1Intro duction W 4Sis a real time system for tracking p eople and their b o dy parts in mono chro matic stereo imagery It constructs dynamic mo dels of p eople's movements to answer questions ab outWhatthey are doing, andWhereandWhenthey act It constructs app earance mo dels of the p eople it tracks in 2 1 2 Dso that it can track p eople (Who?) through o cclusion events in the imageryW 4Srepresents the integration of a realtime stereo (SVM) system with a realtime p erson de tection and tracking system (W 4[14]) to increase its reliability SVM [12] is a compact, inexp ensive realtime device for computing dense stereo range images whichwas recently develop ed by SRIW 4[14] is a real time visual surveillance system for detecting and tracking p eople in an outdo or environment using only mono chromatic video In this pap er we describ e the computational mo dels employed byW 4Sto detect and track p eople These mo dels are designed to allowW 4Sto determine typ es of interactions b etween p eople and ob jects, and to overcome the inevitable errors and ambiguities that arise in dynamic image analysis (such as instabil ity in segmentation pro cesses over time, splitting of ob jects due to coincidental alignment of ob jects parts with similarly colored background regions, etc)W 4S employs a combination of shap e analysis and robust techniques for tracking to detect p eople, and to lo cate and track their b o dy parts using b oth intensity and stereoW 4Sbuilds \app earance" mo dels of p eople so that they can b e identi ed after o cclusions or after interactions during whichW 4Scannot track them indi vidually The incorp oration of stereo has allowed us to overcome the diculties thatW 4encountered with sudden illuminationchanges, shadows and o cclusions Even low resolution range maps allow us to continue to track p eople successfully, since stereo analysis is not signi cantly e ected by sudden illuminationchanges and shadows, which make tracking much harder in intensity images Stereo is also very helpful in analyzing o cclusions and other interactionsW 4Shas the capability to construct a 2 1 2 Dmo del of the scene and its human inhabitants by combining a 2D cardb oard mo del, which represents the relative p ositions and size of the b o dy parts, and range as shown in gure 1 Fig 1Examples of detection result: intensity image (left), detected p eople and their b o dy parts form the intensity only (middle) and their placement in the 2 1 2 Dscene by W 4S(right) W 4Shas b een designed to work with only visible mono chromatic video sources While most previous work on detection and tracking of p eople has relied heavily on color cues,W 4Sis designed for outdo or surveillance tasks, and partic

