More permanent stuff at http://www.dusbabek.org/~garyd

03 August 2005

Many Artists

I've been working more lately on Tagfriendly. I've decided to ditch Tomcat/Servlets. I don't know what took me so long to realize it was a dead end. I think I was in denial most of the time.

I am still using Java so, so feel free to kick me while I am down.

Anyway, I've got the database builder so that it can load all 24 million songs in under 8 hours. Not too bad. The indexer has problems though. (It connects to the database and builds a full-text index.) It chokes up on the 43rd artist, which is "Various Artists." And by that I mean that the indexer quickly goes from processing about 30 albums a second down to about 10. Yuck. I cut off each index at 750k songs, so I know the problem is not with index size*.

I knew there would be a lot of albums under that listing, but I didn't think it would be too big of a deal. Apparently, to the postgres JDBC driver, it is. I haven't solved the problem yet, but I will probably chunk the artists that have more than a set threshold of albums. That is, if JDBC is the problem.

The purpose of this post is to document the number of albums assigned to some of the artists. The query I ran was to select the artists who had more than 1000 albums listed. Many I expected to see, but there were a few surprises there. Here goes:

Artist/Album Count

  1. Various Artists 125807
  2. Varios 6870
  3. various 7294
  4. Various 277990
  5. Depeche Mode 1772
  6. The Rolling Stones 1329
  7. Various artists 1764
  8. Sampler 1957
  9. Pink Floyd 2547
  10. Beethoven 1819
  11. Verschiedene 1121
  12. Compilation 1856
  13. Miles Davis 1568
  14. VA 3595
  15. Eric Clapton 1195
  16. Led Zeppelin 2000
  17. V.A. 1521
  18. Bruce Springsteen 1706
  19. Wolfgang Amadeus Mozart 1894
  20. various artists 1528
  21. U2 2185
  22. Frank Sinatra 1448
  23. The Beatles 2428
  24. Queen 1277
  25. Various Artist 3298
  26. Metallica 2089
  27. Mozart 2557
  28. David Bowie 1144
  29. Diverse 4036
  30. Soundtrack 1210
  31. V¬?rios 1437
  32. Pearl Jam 1155
  33. Bob Dylan 2171
  34. Madonna 1331
  35. Grateful Dead 2032
  36. Nirvana 1032
  37. Jimi Hendrix 1054
  38. VARIOUS 1214
  39. Santana 1092
  40. Deep Purple 1476
  41. Elton John 1044
  42. Prince 1591
  43. Johann Sebastian Bach 1472
  44. Elvis Presley 3101
  45. Iron Maiden 1228
  46. Louis Armstrong 1266
  47. Black Sabbath 1335
  48. Bach 1024
  49. Ludwig van Beethoven 1013


I was probably most surprised to see Pearl Jam and Nirvana make the list (especially Nirvana). Then I remember that Pearl Jam is probably one of the most bootlegged bands of the past 10 years, and bootlegs certainly show up on CDDB/FreeDb.

It is nice to see that there is still a market for new classical recordings after a century or more. How much of todays pop do you think will stand up to that kind of scrutiny?

* Yes, I take a hit at the end when the indices have to be merged, but believe me, it is worth it.

0 comments: