I've been working more lately on Tagfriendly. I've decided to ditch Tomcat/Servlets. I don't know what took me so long to realize it was a dead end. I think I was in denial most of the time.
I am still using Java so, so feel free to kick me while I am down.
Anyway, I've got the database builder so that it can load all 24 million songs in under 8 hours. Not too bad. The indexer has problems though. (It connects to the database and builds a full-text index.) It chokes up on the 43rd artist, which is "Various Artists." And by that I mean that the indexer quickly goes from processing about 30 albums a second down to about 10. Yuck. I cut off each index at 750k songs, so I know the problem is not with index size*.
I knew there would be a lot of albums under that listing, but I didn't think it would be too big of a deal. Apparently, to the postgres JDBC driver, it is. I haven't solved the problem yet, but I will probably chunk the artists that have more than a set threshold of albums. That is, if JDBC is the problem.
The purpose of this post is to document the number of albums assigned to some of the artists. The query I ran was to select the artists who had more than 1000 albums listed. Many I expected to see, but there were a few surprises there. Here goes:
Artist/Album Count
- Various Artists 125807
- Varios 6870
- various 7294
- Various 277990
- Depeche Mode 1772
- The Rolling Stones 1329
- Various artists 1764
- Sampler 1957
- Pink Floyd 2547
- Beethoven 1819
- Verschiedene 1121
- Compilation 1856
- Miles Davis 1568
- VA 3595
- Eric Clapton 1195
- Led Zeppelin 2000
- V.A. 1521
- Bruce Springsteen 1706
- Wolfgang Amadeus Mozart 1894
- various artists 1528
- U2 2185
- Frank Sinatra 1448
- The Beatles 2428
- Queen 1277
- Various Artist 3298
- Metallica 2089
- Mozart 2557
- David Bowie 1144
- Diverse 4036
- Soundtrack 1210
- V¬?rios 1437
- Pearl Jam 1155
- Bob Dylan 2171
- Madonna 1331
- Grateful Dead 2032
- Nirvana 1032
- Jimi Hendrix 1054
- VARIOUS 1214
- Santana 1092
- Deep Purple 1476
- Elton John 1044
- Prince 1591
- Johann Sebastian Bach 1472
- Elvis Presley 3101
- Iron Maiden 1228
- Louis Armstrong 1266
- Black Sabbath 1335
- Bach 1024
- Ludwig van Beethoven 1013
I was probably most surprised to see Pearl Jam and Nirvana make the list (especially Nirvana). Then I remember that Pearl Jam is probably one of the most bootlegged bands of the past 10 years, and bootlegs certainly show up on CDDB/FreeDb.
It is nice to see that there is still a market for new classical recordings after a century or more. How much of todays pop do you think will stand up to that kind of scrutiny?
* Yes, I take a hit at the end when the indices have to be merged, but believe me, it is worth it.
0 comments:
Post a Comment