More permanent stuff at http://www.dusbabek.org/~garyd

05 August 2005

The 43rd Artist

Back to Tagfriendly. The 43rd artist was actually "Various," having 277990 albums listed. The root of the problem was, of course, me. You see, I have created a simple java object to represent the minimum indexable data for Song, Album and Artist.

It all starts with a particular artist. I look up the data and create an Artist object. Then I look up the album data for the artist. If there are less than 1000 albums, I create an Album object for each album right there. I then iterate over the Albums, creating Song objects. Once the Song objects are created the indexing can begin. The links between Artist--Album--Song work both ways. The reason is that I want to include album and artist data in the song index (fast lookups). Same kind of thing for album and song.

So what happens if an artist has more than 1000 albums? Easy, I create an array with just the album ids and iterate over that. At every iteration I create the Album object and proceed as usual, creating the Song objects and linking everything together.

Even this isn't a problem, but it just delays the inevitable. Artist 43 "Various" comes allong with some 200k albums. The indexer is bound to get bogged down. I could make this work if I had unlimited memory, but I don't. So for now, if an artist has more than 5000 albums, they don't all get indexed with the artist. Sorry folks.

According to my records, this only affects four artists, all of them falling into the "Various" category. So as far as my use-cases go, this is a no brainer. Anyone curous about all the albums falling under the artist category of "Various" is going to get incomplete data.

0 comments: