More permanent stuff at

23 January 2009

Mp3 Blog Aggregator: status and some code

My last post was a lamentation about how the current set of mp3 blog aggregators don't do it for me, and at the same time a declaration that I would do something about it.

I've spent my spare moments this week hacking at the problem and it's starting to bear fruit.

The first of it is a simple id3 reader implemented in python. It simply reaches out over the tubes and grabs the id3 information from an mp3 that is hosted on a server somewhere. Nothing too complicated, except that it can be configured to extract any images that might be embedded in the mp3.

Knowledgeable readers might be asking: "why didn't he use one of the three or four existing python id3 libraries?" The answer is this: I planned on creating a blog crawler (mentioned later) and a website for this idea, and would do it all in python. As a warm-up exercise, I figured it would be good to create a simple id3 reader. I had already done it in Java, so it mainly became an act of seeing how the Java idioms I am currently used to translate over into python. (Note: if you bother to download and read the code, please be gentle. It's the first real python I've written. Feedback is appreciated too.)

The crawler is mostly done. It came together more quickly than I thought, although it still has rough edges. It runs a few times a day, notes new blog posts and gathers what information it finds into a database (postgres).

The website is where the work needs to be done now. I have gotten no further than creating a few simple query+display pages that I've been using to view results from the crawler. I've experimented with different ways to present data (entry-centric vs mp3-centric) and still haven't come up with something I like. I've got time though. And the longer I wait, the more useful data I'll have from the crawler.

I'm still using pylons for the website, although I had second thoughts after spending too much time fighting mako and the way it manhandled my nice unicode mp3 tags.

I have yet to tackle the problem of dynamic RSS generation, but I have some good ideas in my head for that.