英语论文网

s of the project during the workshop will be to research and
develop fusion strategies and to develop applications in the following areas:
• Gesture and speech recognition
• 3D search and retrieval
• Descriptor extraction using sketches
• Multimodal user interface for 3D S&R
• Speech driven manipulation of the S&R interface
• Prototype implementation of the speech driven 3D S&R platform using
sketches.
• Implementation of the virtual assembly application environment
Detailed technical description:
A. Technical description
This project aims mainly to develop a 3D search and retrieval platform, which will
not use for query a 3D model, but speech and sketches generated from gestures. The
usability of the platform will be tested in a virtual assembly application. It is clear that
the project is very ambitious for the short period of 4 weeks. However, it is expect to
contribute significantly in the following technical points:
1. 3D S&R platform: Low-level geometrical characteristics of the objects/parts in
the database will be extracted using the spherical trace transform. Variations of
the method, which will help to the development of the final multimodal S&R
platform, will be studied.
2. Gesture recognition: Specific gestures will be used to guide the multimodal S&R
platform. In addition, the system will be able to track the motions of the hands so
as to extract the sketches of the objects.
3. Speech recognition: Specific speech commands will guide the S&R interface as
well as the virtual assembly application.
4. Integration of gesture & speech: Integration of gesture and speech modalities in
a unified multimodal interface.
5. Definition of “3D sketch descriptors”: The 3D sketches obtained using gesture
recognition and tracking, will be processed so as to create their descriptor, which
will be used to evaluate the similarity of the sketch with existing 3D models. This
is the most challenging research part of the project, since there is no direct link
between a 3D sketch (i.e. lines in the 3D space) and a 3D virtual object. Specific
gestures, which will correspond to deforming, combining, etc, actions are
expected to be useful for the descriptor extraction.
6. Multimodal 3D S&R platform: The 3D S&R platform will be manipulated by
the multimodal (gesture & speech) interface. The following Figure illustrates a
block diagram of the architecture of the platform.
Multimodal search and retrieval platform
3D search and retrieval
Search and
retrieval module
Descriptor
extraction using
sketches
Gestures
Speech
User interface
Speech
recognition
Gesture
recognition
Action selection
7. Virtual assembly application: The multimodal 3D search and retrieval platform
will be tested in the context of a virtual assembly application. The testing scenario
will be the assembly of a 3D puzzle using sketches.
B. Resources needed
Equipment:
• A personal computer will be needed for each participant.
• At least one of them should be equipped with at least 512 MB of memory and a
GeForce 5700 or higher graphics card.
• Cameras for gesture capturing.
• Microphones for voice recording.
Software:
• Speech recognition software (to be provided by the participators)
• G