高效的可视化搜索对象的影片:Efficient Visual Search for Objects in Videos

关键词:Object recognitiontext retrievalviewpointscale invariance

Efficient Visual Search for Objects in Videos
Visual search using text-retrieval methods can rapidly and accurately
代写留学生论文locate objects in videos despite changes in camera viewpoint,
lighting, and partial occlusions.
By Josef Sivic and Andrew Zisserman
ABSTRACT | We describe an approach to generalize theconcept of text-based search to nontextual information. Inparticular, we elaborate on the possibilities of retrievingobjects or scenes in a movie with the ease, speed, and accuracywith which Google [9] retrieves web pages containing particularwords, by specifying the query as an image of the object orscene. In our approach, each frame of the video is represented
by a set of viewpoint invariant region descriptors. Thesedescriptorsenable recognition to proceed successfully despitechanges in viewpoint, illumination, and partial occlusion.Vector quantizing these region descriptors provides a visualanalogy of a word, which we term a Bvisual word.[ Efficientretrieval is then achieved by employing methods from statisticaltext retrieval, including inverted file systems, and text anddocument frequency weightings. The final ranking also dependson the spatial layout of the regions. Object retrievalresults are reported on the full length feature films BGroundhog
Day,[ BCharade,[ and BPretty Woman,[ including searches from
within the movie and also searches specified by external
images downloaded from the Internet. We discuss three
research directions for the presented video retrieval approach
and review some recent work addressing them: 1) building
visual vocabularies for very large-scale retrieval; 2) retrieval of
3-D objects; and 3) more thorough verification and rankingusing the spatial structure of objects.

KEYWORDS | Object recognition; text retrieval; viewpoint and scale invariance

The aim of this research is to retrieve those key frames andshots of a video containing a particular object with theease, speed, and accuracy with which web search enginessuch as Google [9] retrieve text documents (web pages)containing particular words. An example visual object
query and retrieved results are shown in Fig. 1. This paper
investigates whether a text retrieval approach can besuccessfullyemployed for this task.
Identifying an (identical) object in a database of imagesis a challenging problem because the object can have adifferent size and pose in the target and query images, andalso the target image may contain other objects (Bclutter[)that can partially occlude the object of interest. However,successful methods now exist which can match an object’svisual appearance despite differences in viewpoint, lighting,
and partial occlusion [22]–[24], [27], [32], [38], [39],
[41], [49], [50]. Typically, an object is represented by a set
of overlapping regions each represented by a vector
computed from the region’s appearance. The region
extraction and descriptors are built with a controlled
degree of invariance to viewpoint and illumination
conditions. Similar descriptors are computed for all images
in the database. Recognition of a particular object proceeds
by nearest neighbor matching of the descriptor vectors,
followed by disambiguating or voting using the spatial
consistency of the matched regions, for example by
computing an affine transformation between the query
and target image [19], [22]. The resul论文英语论文网提供整理,提供论文代写英语论文代写代写论文代写英语论文代写留学生论文代写英文论文留学生论文代写相关核心关键词搜索。

