英语论文网

centered computation
fashion of Visual Inteface Cues (VICs) paragigm [3, 24].
We propose that interaction gestures can be captured in
a local neighborhood around the manipulated object based
on the fact that the user only initiates manipulative gestures
when his or her hands are close enough to the objects. The
advantage of this scheme is that it is efficient and highly
flexible. The dimension of the volume of the local neighborhood
around the manipulated object can be adjusted conveniently
according to the nature of the particular interaction
environment and the applicable gestures. For example, in a
desktop interaction environment where the interaction elements
are represented as small icons on a flat panel and manipulative
gestures are only initiated when the user’s hand
is near the surface of the panel, we only need to observe
a small volume above the panel with the icon sitting at the
center of the bottom. The height and diameter of the volume
is also limited to be able to capture enough visual cues
to carry out successful gesture recognition.
The remainder of this paper is structured as follows. In
Section 2 we present a novel method to efficiently capture
the 3D spatial information of the gesture without carrying
out a full-scale disparity computation. We discuss how
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’04)
1063-6919/04 $ 20.00 IEEE
Authorized licensed use limited to: MACQUARIE UNIV. Downloaded on July 5, 2009 at 22:40 from IEEE Xplore. Restrictions apply.
to learn the cluster structure of the appearance and motion
features via an unsupervised learning process in Section
3. Two ways to model the dynamics of the gestures,
i.e., forward HMMs [10, 19] and multilayer neural networks
[6], are also presented. In Section 4 we demonstrate
our real-time system that implements the proposed method
and present the results of gesture recognition.
1.1 RelatedWork
[22] gives a general overview of the state of the art in gesture
analysis for vision-based human computer interaction.
Robust hand localization and tracking, modeling the constraints
of hand motion and recognizing temporal gesture
patterns are among the most difficult and active research areas.
Compared to other technique, such as neural network,
rule-based method [14],HMM[23, 24] and its extension [2]
is a popular scheme to model temporal gestures.
Many HCI systems [12, 14, 16, 21, 22] have been reported
that enable the user to use gestures as a controlling
or communicative media to manipulate interaction objects.
The hand or fingertips are detected based on such cues as visual
appearance, shape and even human body temperature
via infrared cameras, etc. A variety of algorithms have been
applied to track the hand [22], such as the Kalman filter and
particle filter [5].
With a model-based approach [1, 13], it is possible to
capture the gesture in higher dimensionality than 2D. In [1]
the 3D hand model is represented as a set of synthetic images
of the hand with different configurations of the fingers
under different viewpoints. Image-to-model matching
is carried out using Chamfer distance computation. One of
the difficulties of this approach is to construct a good 3D
model of the hand that can deal with variance between different
users.