The Universe of Disco

Mark Dominus (陶敏修)
mjd@pobox.com

12 recent entries

Loki the comedian
A triviality about numbers that look like abbc
My reply to the people who want to designate my neighborhood a "historic district"
A potpourri of cool-looking scripts
Horst Wessel and John Birch
ChatGPT opines on cruciferous vegetables, Decameron, and Scheherazade
It's an age of marvels
Hawat! Hawat! Hawat! A million deaths are not enough for Hawat!
Rod R. Blagojevich will you please go now?
Well, I guess I believe everything now!
R.I.P. Oddbins
Talking Dog > Stochastic Parrot

Archive:

2024: JF M A M J
J
2023: JF M A M J
J A S O N D
2022: J F M A M J
JAS O N D
2021: J F M AMJ
J A S O N D
2020: J F M A M J
J A S O N D
2019: JFM A M J
J A S O N D
2018: J F M A M J
J A S O N D
2017: J F M A M J
J A S O N D
2016: JF M A M J
JASON D
2015: JFM A M J
J A S O N D
2014: J F M AMJ
JASON D
2013: JFMAMJ
JAS OND
2012: J F MAMJ
JASOND
2011: JFMAM J
JASOND
2010: JFMAMJ
JA S O ND
2009: J F MAM J
JASOND
2008: J F M A M J
JAS O ND
2007: J F M A M J
J A S O N D
2006: J F M A M J
JAS O N D
2005: O N D

Subtopics:

Mathematics 240

Programming 99

Language 93

Miscellaneous 69

Book 50

Tech 49

Etymology 34

Haskell 33

Oops 30

Unix 27

Cosmic Call 25

Math SE 24

Physics 21

Law 21

Perl 17

Biology 15

Comments disabled

Sun, 06 Aug 2017

How Shazam works

Yesterday I discussed an interesting failure on the part of Shazam, a phone app that can recognize music by listening to it. I said I had no idea how it worked, but I did not let that stop me from pulling the following vague speculation out of my butt:

I imagine that it does some signal processing to remove background noise, accumulates digests of short sections of the audio data, and then matches these digests against a database of similar digests, compiled in advance from a corpus of recordings.

Julia Evans provided me with the following reference: “An Industrial-Strength Audio Search Algorithm” by Avery Li-Chun Wang of Shazam Entertainment, Ltd. Unfortunately the paper has no date, but on internal evidence it seems to be from around 2002–2006.

M. Evans summarizes the algorithm as follows:

find the strongest frequencies in the music and times at which those frequencies happen

look at pairs !!(freq_1, time_1, freq_2, time_2)!! and turn those into pairs into hashes (by subtracting !!time_1!! from !!time_2!!)

look up those hashes in your database

She continues:

so basically Shazam will only recognize identical recordings of the same piece of music—if it's a different performance the timestamps the frequencies happen at will likely be different and so the hashes won't match

Thanks Julia!

Moving upwards from the link Julia gave me, I found a folder of papers maintained by Dan Ellis, formerly of the Columbia University Electrical Engineering department, founder of Columbia's LabROSA, the Laboratory for the Recognition and Organization of Speech and Audio, and now a Google research scientist.

In the previous article, I asked about research on machine identification of composers or musical genre. Some of M. Ellis’s LabROSA research is closely related to this. See for example:

There is a lot of interesting-looking material available there for free. Check it out.

(Is there a word for when someone gives you a URL like http://host/a/b/c/d.html and you start prying into http://host/a/b/c/ and http://host/a/b/ hoping for more goodies? If not, does anyone have a suggestion?)

[Other articles in category /tech] permanent link