Abstract
This paper presents a complete functional system capable of detecting people and tracking their motion in either live camera feed or pre-recorded video sequences. The system consists of two main modules, namely the detection and tracking modules. Automatic detection aims at locating human faces and is based on fusion of color and feature-based information. Thus, it is capable of handling faces in different orientations and poses (frontal, profile, intermediate). To avoid false detections, a number of decision criteria are employed. Tracking is performed using a variant of the well-known Kanade-Lucas-Tomasi tracker, while occlusion is handled through a re-detection stage. Manual intervention is allowed to assist both modules if required. In manual mode, the system can track any object of interest, so long as there are enough features to track. The system caters for calibrated cameras and can provide 3-D coordinates of any tracked object(s) of interest. It has been tested with very good results on a variety of video sequences, including a database of studio video sequences, for which 3-D ground truth data, originating from a 4-camera infrared tracking system, exist.
Similar content being viewed by others
References
I. Haritaoglu, D. Harwood, and L. S. Davis, “Ghost: a human body part labeling system using silhouettes”, inFourteenth International Conference on Pattern Recognition (ICPR98), vol. 1, (Vienna, Austria), pp. 77–82, August 1998, 31
J. Han and B. Bhanu, “Detecting moving humans using color and infrared video”, inIEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI2003), (Tokyo, Japan), pp. 228–233, July 2003. 31
A. Wu, M. Shah, and N. D. V. Lobo, “A virtual 3D blackboard: 3D finger tracking using a single camera”, inFourth IEEE International Conference on Automatic Face and Gesture Recognition (AFGR2000), (Grenoble, France), pp. 536–543, March 2000. 31
Z. Duric, F. Li, Y. Sun, and H. Wechsler, “Using normal flow for detection and tracking of limbs in color images”, inSixteenth International Conference on Pattern Recognition (ICPR2002), vol. 4, (Quebec, Canada), pp. 268–271, August 2002. 31
M.-H. Yang, D. J. Kriegman, and N. Ahuja, “Detecting Faces in Images: A survey”,IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 1, pp. 34–58, 2002. 31
E. Hjelmas and B. K. Low, “Face Detection: A survey”,Computer Vision and Image Understanding, vol. 83, pp. 236–274, 2001. 31
G. Stamou, M. Krinidis, E. Loutas, N. Nikolaidis, and I. Pitas, “2D and 3D Motion Tracking in Digital Video”, inHandbook of Image and Video Processing (A. C. Bovik, ed.), Academic Press, 2005. 31
T. B. Moeslund, A. Hilton, and V. Krüger, “A Survey of Advances in Vision-Based Human Motion Capture and Analysis”,Computer Vision and Image Understanding, vol. 104, no. 2-3, pp. 90–127, 2006. 31
D. M. Gavrila, “The Visual Analysis of Human Movement: A Survey”,Computer Vision and Image Understanding, vol. 73, no. 1, pp. 82–98, 1999. 31
J. K. Aggarwal and Q. Cai, “Human Motion Analysis: A Review”,Computer Vision and Image Understanding, vol. 73, no. 3, pp. 428–440, 1999. 31
C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland, “PFinder: Real-Time Tracking of the Human Body”,IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 780–785, 1997. 31
O. Bernier, M. Collobert, R. Feraud, V. Lemaire, J. E. Viallet, and D. Collobert, “MULTRAK: A system for automatic multiperson localization and tracking in real-time”, inFifth IEEE International Conference on Image Processing (ICIP98), vol. 1, (Chicago, United States), pp. 136–140, October 1998. 31
A. Colmenarez, B. Frey, and T. S. Huang, “Detection and tracking of faces and facial features”, inSixth IEEE International Conference on Image Processing (ICIP99), vol. 1, (Kobe, Japan), pp. 657–661, October 1999. 31
L. L. Yang and M. A. Robertson, “Multiple-face tracking system for general region-of-interest video coding”, inSeventh IEEE International Conference on Image Processing (ICIP2000), vol. 1, (Vancouver, Canada), pp. 347–350, September 2000. 31
I. Haritaoglu, D. Harwood, and L. S. David, “W4: Real-Time Surveillance of People and Their Activities”,IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809–830, 2000. 32
M. Krinidis, N. Nikolaidis, and I. Pitas, “2D Feature-Point Selection and Tracking Using 3-D Physics-Based Deformable Surfaces”,IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, pp. 876–888, July 2007. 32
K. Sobottka and I. Pitas, “Looking for Faces and Facial Features in Color Images”,Pattern Recognition and Image Analysis: Advances in Mathematical Theory and Applications, Russian Academy of Sciences, vol. 7, no. 1, pp. 124–137, 1997. 32, 33
P. Viola and M. J. Jones, “Robust Real-time Object Detection”, Tech. Rep. 01, Cambridge Research Laboratory, 2001. 32, 33, 37
C. Tomasi and T. Kanade, “Shape and Motion from Image Streams: a Factorization Method—Part 3 Detection and Tracking of Point Features”, Tech. Rep. 91-132, Computer Science Department, Carnegie Mellon University, 1991. 32, 35, 36, 37
J. Shi and C. Tomasi, “Good Features to Track”, inIEEE International Conference on Computer Vision and Pattern Recognition (CVPR94), (Seattle, United States), pp. 593–600, June 1994. 32, 35, 36, 37
C. Terrillon, M. David, and S. Akamatsu, “Automatic Detection of Human Faces in Natural scene Images by Use of a Skin Color Model and Invariant Moments”, inThird IEEE International Conference on Automatic Face and Gesture Recognition (AFGR98), (Nara, Japan), pp. 112–117, April 1998. 32
A. Saber and A. Tekalp, “Frontal-View Face Detection and Facial Feature Extraction Using Color, Shape and Symmetry Based Cost Functions”,Pattern Recognition Letters, vol. 17, no. 8, pp. 669–680, 1998. 32
S. Tsekeridou and I. Pitas, “Facial Feature Extraction in Frontal Views using Biometric Analogies”, inIX European Signal Processing Conference (EUSIPCO98), vol. 1, (Rhodes, Greece), pp. 315–318, September 1998. 32
H. Graf, E. Cosatto, D. Gibbon, M. Kocheisen, and E. Petajan, “Multimodal System for Locating Heads and Faces”, inSecond IEEE International Conference on Automatic Face and Gesture Recognition (AFGR97), (Killington, VT), pp. 41–46, October 1996. 32
K. Yow and R. Cipolla, “Locating Human Faces in Photographs”,Image and Vision Computing, vol. 15, no. 9, pp. 713–735, 1996. 32
V. Govindaraju, “Feature-Based Human Face Detection”,International Journal of Computer Vision, vol. 19, no. 2, pp. 129–146, 1996. 32
A. Samal and P. Iyengar, “Human Face Detection Using Silhouettes”,International Journal of Pattern Recognition and Artificial Intelligence, vol. 9, no. 6, pp. 845–867, 1995. 32
J. Miao, B. Yin, K. Wang, L. Shen, and X. Chen, “A Hierarchical Multiscale and Multiangle System for Human Face Detection in a Complex Background Using Gravity-Center Template”,International Journal of Pattern Recognition, vol. 32, no. 7, pp. 1237–1248, 1999. 32
M. J. Jones and P. Viola, “Fast Multi-view Face Detection”, Tech. Rep. 96, Mitsubishi Electric Research Laboratories, 2003. 32, 38
H. Rowley, S. Baluja, and T. Kanade, “Rotation Invariant Neural Network-Based Face Detection”, inIEEE International Conference on Computer Vision and Pattern Recognition (CVPR98), (Santa Barbara, CA, United States), pp. 38–44, June 1998. 32
H. Schneiderman and T. Kanade, “Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition”, inIEEE International Conference on Computer Vision and Pattern Recognition (CVPR98), (Santa Barbara, CA, United States), pp. 45–51, June 1998. 32
K. Mikolajczyk, R. Choudhury, and C. Schmid, “Face detection in a video sequence—a temporal approach”, inIEEE International Conference on Computer Vision and Pattern Recognition (CVPR2001), vol. 2, (Kauai, Hawaii), pp. 96–101, December 2001. 32
B. D. Zarit, B. J. Super, and F. K. H. Quek, “Comparison of Five Color Models in Skin Pixel Classification”, inICCV99 International Workshop on Recognition. Analysis, and Tracking of Faces and Gestures in Real-Time Systems (RATFG-RTS99), (Corfu, Greece), pp. 58–63, September 1999. 32
B. Martinkauppi, M. Soriano, and M. Laaksonen, “Behavior of skin color under varying illumination seen by different cameras in different color spaces”, inMachine Vision Applications in Industrial Inspection IX. Proceedings of SPIE (M. Hunt, ed.), vol. 4301, (San Jose California, USA), pp. 102–113, January 2001. 32
V. Vezhnevets, V. S. V, and A. Andreeva, “A Survey on Pixel-Based Skin Color Detection Techniques”, inInternational Conference on Computer Graphics between Europe and Asia (GRAPHICON-2003), (Moscow, Russia), September 2003. 32
A. Fitzgibbon and R. Fisher, “A Buyer’s Guide to Conic Fitting”, inFifth British Machine Vision Conference (BMVC99), (Birmingham, UK), pp. 513–522, 1995. 33
R. Lienhart and J. Maydt, “An Extended Set of Haar-Like Features for Rapid Object Detection”, inIEEE International Conference on Image Processing (ICIP02), (Rochester, New York, USA), pp. 900–903, September 2002. 33, 34
E. Loutas, K. Diamantaras, and I. Pitas, “Occlusion resistant object tracking”, inIEEE International Conference on Image Processing (ICIP01), vol. 2, (Thessaloniki, Greece), pp. 65–68, October 2001. 35
Z. Zhang, “Flexible Camera Calibration by Viewing a Plane from Unknown Orientations”, inSeventh IEEE International Conference on Computer Vision (ICCV99), vol. 1, (Corfu, Greece), pp. 667–673, September 1999. 37
S. J. Maybank and O. D. Faugeras, “A theory of selfcalibration of a moving camera”,The International Journal of Computer Vision, vol. 8, no. 2, pp. 123–152, 1992. 37
Q.-T. Luong and O. Faugeras, “Self-calibration of a moving camera from point correspondences and fundamental matrices”,The International Journal of Computer Vision, vol. 22, no. 3, pp. 261–289, 1997. 37
E. Trucco and A. Verri,Introductory Techniques for 3-D Computer Vision. Prentice Hall, 1998. 37
S. Pingali and J. Segen, “Performance Evaluation of People Tracking Systems”, inThird IEEE Workshop on Applications of Computer Vision (WACV96), (Sarasota, Florida, USA), pp. 33–38, December 1996. 37
M. Krinidis, G. Stamou, H. Teutsch, S. Spors, N. Nikolaidis, R. Rabenstein, and I. Pitas, “An Audio-Visual Database For Evaluating Person Tracking Algorithms”, inProceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), (Philadelphia), March 2005. 38, 39
“Commission of the European Communities, IST project CARROUSO (Creating, Assessing and Rendering in Real Time of High Quality Audio-Visual Environments in MPEG-4 Context)”. http://www.emt.iis. fraunhofer.de/projects/carrouso/.38
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Stamou, G.N., Krinidis, M., Nikolaidis, N. et al. A monocular system for person tracking: Implementation and testing. J Multimodal User Interfaces 1, 31–47 (2007). https://doi.org/10.1007/BF02910057
Issue date:
DOI: https://doi.org/10.1007/BF02910057

