英语论文网

Gesture Recognition Using 3D Appearance and Motion Features
Guangqi Ye, Jason J. Corso, Gregory D. Hager
Computational Interaction and Robotics Laboratory
The Johns Hopkins University
grant@cs.jhu.edu
Abstract
We present a novel 3D gesture recognition scheme that combines
代写留学生论文the 3Dappearance of the hand and the motion dynamics
of the gesture to classify manipulative and controlling
gestures. Our method does not directly track the hand. Instead,
we take an object-centered approach that efficiently
computes the 3D appearance using a region-based coarse
stereo matching algorithm in a volume around the hand.
The motion cue is captured via differentiating the appearance
feature. An unsupervised learning scheme is carried
out to capture the cluster structure of these feature-volumes.
Then, the image sequence of a gesture is converted to a series
of symbols that indicate the cluster identities of each
image pair. Two schemes (forward HMMs and neural networks)
are used to model the dynamics of the gestures. We
implemented a real-time system and performed numerous
gesture recognition experiments to analyze the performance
with different combinations of the appearance and motion
features. The system achieves recognition accuracy of over
96% using both the proposed appearance and the motion
cues.
1 Introduction
Gestures have been one of the important interaction media
in current human-computer interaction (HCI) environments
[3, 4, 11, 12, 14, 16, 18, 21, 24, 25, 26]. Furthermore,
for 3D virtual environments (VE) in which the user
manipulates 3D objects, gestures are more appropriate and
powerful than traditional interactionmedia, such as a mouse
or a joystick. Vision-based gesture processing also provides
more convenience and immersiveness than those based on
mechanical devices.
Most reported gesture recognition work in the literature
(see Section 1.1) relies heavily on visual tracking and
template recognition algorithms. However general human
motion tracking is well-known to be a complex and difficult
problem [8, 17]. Additionally, while template matching
may be suitable for static gestures, its ability to capture
the spatio-temporal nature of dynamic gestures is in doubt.
Alternatively, methods that attempt to capture the 3D information
of the hand [11] have been proposed. However, it is
well-known that, in general circumstances, the stereo problem
is difficult to solve reliably and efficiently.
Human hands and arms are highly articulate and deformable
objects and hand gestures normally consist of 3D
global and local motion of the hands and the arms. Manipulative
and interaction gestures [14] have a temporal nature
that involve complex changes of hand configurations. The
complex spatial properties and dynamics of such gestures
render the problem too difficult for pure 2D (e.g. template
matching) methods. Ideally we would capture the full 3D
information of the hands to model the gestures [11]. However,
the difficulty and computational complexity of visual
3D localization and robust tracking prompts us to question
the necessity of doing so for gesture recognition.
To that end, we present a novel scheme to model and
recognize 3D temporal gestures using 3D appearance and
motion cues without tracking and explicit localization of
the hands. Instead we follow the site-