英语论文网

A Review of Algorithms for Audio Fingerprinting
Pedro Cano and Eloi Batlle
Universitat Pompeu Fabra
Barcelona, Spain
Email: 􀀀 pedro.cano, eloi.batlle
Ton Kalker and Jaap Haitsma
Philips Research Eindhoven
Eindhoven, The Netherlands
代写留学生论文Email: ton.kalker@ieee.org, jaap.haitsma@philips.com
Abstract—An audio fingerprint is a content-based compact
signature that summarizes an audio recording. Audio Fingerprinting
technologies have recently attracted attention since theyallow the monitoring of audio independently of its format andwithout the need of meta-data or watermark embedding. Thedifferent approaches to fingerprinting are usually describedwith different rationales and terminology depending on thebackground: Pattern matching, Multimedia (Music) InformationRetrieval or Cryptography (Robust Hashing). In this paper, wereview different techniques mapping functional parts to blocks
of a unified framework.
I. INTRODUCTION
Audio fingerprinting is best known for its ability to linkunlabeled audio to corresponding metadata (e.g. artist and songname), regardless of the audio format. Although there are moreapplications to audio fingerprinting, such us: Content-basedintegrity verification or watermarking support, this reviewfocuses primarily on identification. Audio fingerprinting orContent-based audio identification (CBID) systems extract aperceptual digest of a piece of audio content, i.e. the fingerprintand store it in a database. When presented with unlabeledaudio, its fingerprint is calculated and matched against thosestored in the database. Using fingerprints and matching algorithms,distorted versions of a recording can be identified as
the same audio content.
A source of difficulty when automatically identifying audio
content derives from its high dimensionality and the significant
variance of the audio data for perceptually similar content.
The simplest approach that one may think of – the direct
comparison of the digitalized waveform – is neither efficient
not effective. An efficient implementation of this approach
could use a hash method, such as MD5 (Message Digest 5)
or CRC (Cyclic Redundancy Checking), to obtain a compact
representation of the binary file. In this setup, one compares
the hash values instead of the whole files. However, hash
values are fragile, a single bit flip is sufficient for the hash
to completely change. Of course this setup is not robust to
compression or minimal distorions of any kind and, in fact, it
cannot be considered as content-based identification since it
does not consider the content, understood as information, just
the bits.
An ideal fingerprinting system should fulfill several requirements.
It should be able to accurately identify an item,
regardless of the level of compression and distortion or
interference in the transmission channel. Depending on the
application, it should be able to identify whole titles from
excerpts a few seconds long (property known as granularity
or robustness to cropping), which requires methods for dealing
with shifting, that is lack of synchronization between the
extracted fingerprint and those stored in the database. It should
also be able to deal with other sources of degradation such
as pitching (playing audio faster or slower), equalization,
background noise, D/A-A/D conversion, speech and audio
coders (such as GSM or MP3), etc. Th