Quibbles and Bits

Brainz the Size of a Planet, Part 3

I have been describing the free-access crowd-sourced music metadata database MusicBrainz over the course of the last two issues of Copper.  I started out by describing the need for what I term ‘rich metadata’, and then gave an overview of MusicBrainz itself.  In this final installment I want to discuss a practical application for MusicBrainz in a real-world consumer product like a Music Server, and conclude by mentioning some of the limitations and drawbacks that remain posed by MusicBrainz.

Fundamental to the concept of a Music Server is the metadata associated with the music in the server.  So if iTunes (to use a widely-recognized example) is able to display your music by Album, including images of the Album Art, lists of track names, performers, composers and the like, it is only because it has access to that information in the individual tracks’ metadata.  Try importing a track into iTunes with no metadata and see what you get!  So metadata is mission-critical to the whole concept of a functioning Music Server, and the more in-depth and detailed the metadata, the richer can be the experience of using the Music Server.

Simple software like the aforementioned iTunes gets all its metadata from your music files.  That doesn’t make for an enticing user experience.  These days, most music files come with basic metadata, which is why you do actually see data when you import them into iTunes.  But if you ever ask yourself why certain data isn’t displayed in iTunes, it’s probably because that data isn’t in the files in the first place.  So if you want to develop your own Music Server, and you want it to improve upon what iTunes has to offer, you need to be able to find better metadata than iTunes can find.  This is where resources like MusicBrainz come in.

So here I am, and I’m writing the software that imports the first track of the first album into my new Music Server, and I want to download its metadata from MusicBrainz.  How do I go about that?  Let’s just assume I am importing the least helpful possible category of track – one that has no metadata at all associated with it.  So I don’t even know what song it is.  How can I find that on MusicBrainz?  It turns out there a tool to do just that.  I can include in my Music Server App a process to analyze any given music file and produce what is called an Acoustic Fingerprint (if you are familiar with Shazam you will have some idea of the concept).  I can submit my Acoustic Fingerprint to a free on-line service called AcoustID, and if it has that fingerprint on file it will give me a whole bunch of information about the track.  Included in that information is usually a direct link to the track on MusicBrainz.  So, finally, I can go to MusicBrainz, and download a whole raft of metadata associated to my track.

It sounds brilliant – and it is – but life is never quite as simple as that.  You will encounter a number of obstacles which in some cases cannot be worked around at all, and in some cases can only be resolved by logic of the if … but … and … so variety which always sounds straightforward in the general but falls apart in the particular.  Let me give you a couple of specific examples.

Here’s how the music industry works.  The Bee Gees’ record label owns the rights to all their recordings.  They sell those rights separately to distributors in different countries.  After a while the Bee Gees stop releasing new albums and so the distributors grow impatient and start to release Greatest Hits and other compilation albums.  Usually this will be done at the prompting of the record label who can support the activity with album art, material to fill the booklet, and such like.  So the Greatest Hits albums in each country will tend to look the same.   But sometimes a track that was generally a bust turns out to have been a big hit in, say, Paraguay.  So the Paraguayan distributor will want that track on his version of the Greatest Hits release.  We end up with multiple versions of the Greatest Hits album.  And it just goes on from there.

So here I am with a track I am trying to import into my new Music Server.  AcoustID has identified it as the Bee Gees track “Massachussetts”.  However, MusicBrainz now tells me that the following Bee Gees albums all feature this recording:

  • Horizontal
  • The Studio Albums 1967-1968: Horizontal
  • Best of the Bee Gees
  • The Very Best of the Bee Gees
  • Best of the Best
  • Bee Gees Story
  • Bee Gees Gold, Vol 1
  • Best Ballads
  • Greatest Hits
  • Their Greatest Hits: The Record
  • Number Ones
  • Tales From the Brothers Gibb: A History in Song 1967-1990
  • Mythology The 50th Anniversary Collection
  • For Whom The Bell Tolls

Go on, how many of those could you have named … ?  And to complicate it further, the track also appears on a number of compilation albums that combine tracks from various other popular artists.

My task may be made a little easier if I already had some metadata in my files to be going on with.  For example, if the file’s metadata said the album title was “Horizontal” then I would be able to home in on the best match straight away.  But if my metadata said “Bee Gees Greatest Hits” what would I do then?  That’s not on the MusicBrainz list.  Of course, whoever entered “Bee Gees Greatest Hits” into my metadata may have made an error, or may have intentionally edited “Greatest Hits” to avoid confusion with another album of the same name from another artist.  Or the album may actually exist, but nobody has entered it into MusicBrainz yet.  MusicBrainz can’t help you if you’re dealing with problems like that.

You see, a big problem is that the MusicBrainz’ database is far from complete.  For example, if I do a MusicBrainz search for recordings of the Duke Ellington standard “Caravan” I find a staggering 202 different Recordings of it.  It is possible, of course, that some of these could be duplicates that need to be merged together, but let’s ignore that for the time being.  However, if I next look separately for all of the Recordings on MusicBrainz entitled “Caravan”, there are thousands of them!  The large majority are not associated to a Work, and so it is not immediately clear how many of those are covers of the Ellington classic.  But a good many of them evidently are, and so if one of those albums is in your music library, MusicBrainz won’t know if is a recording of the Ellington tune, and furthermore, won’t know who to credit with the songwriting (or composer) credits.

This is something that will annoy jazz aficionados in particular.  If a song is playing, you would really like to be able to click on that song title and get a list of other covers of that song in your library.  It would be doubly frustrating if you know for sure that you had such a cover version, but your Music Server was unable to find it.

Lets go back to the end of the last-but one paragraph.  There is an important limitation there.  Writers don’t write Tracks, they write Works which are then recorded and made into Tracks.  To hammer the point home, in MusicBrainz a Track itself does not have a writer (or composer) … only a Work has that.  So if an album has been entered into MusicBrainz, but the person entering it has not entered the Work relationships for each Track, then there can be no songwriter or composer metadata associated to those Tracks.

This is both a weakness and a strength of MusicBrainz.  It is a strength, because it is the only way to cleanly handle a milieu where, for example, there can be multiple entirely different and unrelated songs, all having the same title, but having different writers … such as “Caravan”.  But it is a weakness because of the workload it imposes on someone who has to enter this information.  This workload arises primarily because each time you create a new Work in MusicBrainz it becomes incumbent upon you to search MusicBrainz to establish that the Work has not already been created.  And having created the Work, you will have the same issue to deal with when it comes to adding in the writers (or composers).  Choosing an obvious example, in MusicBrainz there are currently nine different persons called “John Smith”, as well as a John Stafford Smith, a John Christopher Smith, a John Angus Smith, a John “Jubu” Smith, and a Johnny Smith.  Is your “John Smith” one of these, or will you have to create yet another one?

Few people who have the patience to enter a new album into MusicBrainz have the additional patience required to enter what you might call a “minimum desirable” selection of metadata.  And if you’re the impatient sort, and take shortcuts, you could end up entering bad data.  Hopefully somebody would pick that up and correct it, but oftentimes they don’t (particularly when the data in question is obscure), and the bad data will be struck in there for a long time.  In any case, it can take a good hour out of your day to do a proper job of entering a single album, particularly if the Works need to be created.

Myself, I tend to enter an album into MusicBrainz whenever I import it into my Music Server, if it’s not already there, and I’m usually quite diligent.  Since January 2017 I’ve been a prolific contributor.  But sometimes even I get discouraged.  For instance, I recently entered the album Xover by Blue Lab Beats (I got it as part of my B&W Society of Sound subscription).  The principals of this band have both real names and stage names (which is not immediately obvious from the album credits), and many of the contributors (performers, writers, etc.) look as though they too may have both.  So, concerned that I may spend a lot of time and effort entering erroneous data that I know nothing about, I simply declined to enter any Works relationships for the Tracks on the album.  Maybe someone who is more au courant than I will step in at some stage and create the missing relationships.

Another significant issue is that the tools available to enter data into MusicBrainz are arcane and sometimes quite opaque.  There is little in the way of educational material to help you climb what is a rather steep learning curve.  There are a few web browser plug-ins that can be used to automate some of the more cumbersome tasks, but frankly, you’ll need to have climbed very close to the top of the learning curve before you want to consider messing with those.  The support community is not an easy place to hang about in, unless you are very comfortable with IRC (unlikely, if you’ve never been a hacker).  And the ‘Style Guidelines’, which are there to guide you through the morass of ambiguities that you quickly encounter, are like road signs here in Montreal – they only really make sense to people who have already figured out what they are trying to say.  Having said all that, the MusicBrainz community is a very welcoming and a very helpful one … not to mention exceedingly intelligent, diligent, and well-informed.  It’s just not particularly user-friendly.

As a crowd-sourced enterprise, and a free one at that, MusicBrainz relies on the freely-given contributions and efforts of individuals around the globe.  If I have encouraged you to get involved, that can only be a good thing.  Even if all you decide to do is make sure that your own music library is properly represented on MusicBrainz – and you can do that at your own comfortable pace – that would be a great thing.