More permanent stuff at http://www.dusbabek.org/~garyd

11 March 2010

Running Multiple Cassandra Nodes on a Single Host

One of the first Cassandra tickets I worked on had me reviewing some code that visualized the node ring.  Properly testing the code required that I run a cluster. 

But I didn't have access to a cluster. Neither did I feel like creating a virtual cluster by building a VM and cloning it several times.  What I wanted was to run several instances of Cassandra on a single machine with multiple interfaces, all pointed at the same compiled code (without multiple svn checkouts).

The Cassandra wiki explains how to tweak Cassandra settings by editing cassandra.in.sh, but doesn't explain what needs to be done to run concurrent instances.

It turned out not to be too difficult.  I figured it might be daunting enough to Cassandra noobs (of whom we're seeing more of lately due to some great exposure), that a blog post might be helpful. 

This tutorial assumes that you'll want to run multiple instances of Cassandra on code built by ant and not a standalone jar.  I am also assuming that you are a) just playing around, or b) intend to do some development.  This is not a tutorial explaining how Cassandra should be run in production.

Note: I apologize for the way this looks.  Blogger is not a friend of ordered lists.

  1. Make sure you've got aliases to localhost (e.g.: 127.0.0.2, 127.0.0.3, etc.).  Mac OS X doesn't have this enabled by default, so you'll have to manually create aliases:

    sudo ifconfig lo0 alias 127.0.0.2 up
    sudo ifconfig lo0 alias 127.0.0.3 up
  2. Decide where you're going to keep things.  You can keep them with your code, but that just isn't neat.  Pick a directory somewhere, call it $cass_stuff.
  3. Then, for each node in your little cluster, do this:

    1. From your svn checkout, copy the conf directory into $cass_stuff.  You can rename it to something like conf0 (or conf1, etc.).  I'll assume $conf from here on out.
    2. Copy bin/cassandra.in.sh to $cass_stuff.  Give it a name that helps you associate it with the conf directory you just created (node0.in.sh or whatever).
    3. Open node0.in.sh in an editor and make the following changes:

      1. Hardcode cassandra_home to the location of your trunk.  This will give you the flexibility to run Cassandra from anywhere.
      2. Set CASSANDRA_CONF to the conf directory you just created.
      3. In the JVM_OPTS change the jdwp address= setting.  The default is 8888, but you should include the unique IP you chose for this node along with the port, e.g.: 127.0.0.2:8888.  Not specifying a host causes the debugger to bind to 0.0.0.0:8888 and you'll have port binding problems when you bring up more than one node.
      4. pick a unique port for com.sun.management.jmxremote.port, but make sure you have at least one node listening on 8080 since all the Cassandra tools assume JMX is listening there.  Unfortunately, you can't pick the JMX host, 0.0.0.0 is assumed.  I was under the impression this could be changed by specifying java.rmi.server.hostname, but had no luck going down that road.  (Please leave a comment if you figure out a way for this to work, but I think it might be hopeless.)
    4. Open $cass_stuff/$conf/storage-conf.xml in an editor and make the following changes:

      1. specify unique locations for CommitLogDirectory and DataFileDirectory.  Don't bother with CalloutLocation or StagingFileDirectory.
      2. replace ListenAddress with the IP of your host.
      3. replace RPCAddress with the IP of your host.
To run you may wish to use another script for each node:

#!/bin/sh
CASSANDRA_INCLUDE=$cass_stuff/
export CASSANDRA_INCLUDE
cd
bin/cassandra -f

One downside to this approach is that if you're tracking trunk, it is your responsibility to make sure you notice changes to the default storage-conf.xml and cassandra.in.sh and apply them to your environments.


Cassandra is supported by an active and welcoming community.  If you'd like to learn more about the project, check out our wiki, mailing list or hop on #cassandra on freenode.

15 December 2009

Dear Entrepreneurs, this is something I would pay for...

Dear Entrepreneurs,

This is something I would gladly pay $20 a month for...

A device that, according to my tastes, downloads new music from the Internet whenever it connects.  I would be able to listen to music without restriction while I am disconnected from the network.  I wouldn't own the music, except for roughly 20 tracks a month that I select which would then become mine as MP3s (for FLAC or whatever DRM-less technology makes sense).  I could then load them into iTunes, give them to my brother, or (if I'm feeling sinister) make them available on a P2P network.

The music could come from anywhere: iTunes, Amazon, The Labels, or artists themselves.

The content sources exist.  The recommendation engines exist.  Devices exist. 

I suspect the audience/market exists.  (At least, I hope so.  If not, and nobody is willing to pay for music, we're going to need to find another model.  And it will still necessarily involve a money exchange between producers and cosumers and/or advertisers.)

Is there such a system already?

13 December 2009

Christmas Mix 2009

I've been making  Christmas mixes for my family the last couple years.  It's not your typical Bing Crosby stuff, and requires some digging on my part.  I finally started blogging about it last year and think I'm going to make it a tradition.  So here goes... Christmas with an indie slant.  And I did a better job checking on the lyrics this year for family appropriateness.

The links this year are coming at you from Lala by way of Google.  Message me if things stop working.  (This blog post has turned out much like my Christmas shopping: it gets sloppy towards the end.)

1.  "Holiday Road" by Matt Pond PA.  This is the only repeat from last years list.  I love this song because the vacation movies still connect with me at a level I am entirely uncomfortable with. 

2.  "Blue Christmas" by Dread Zeppelin.  Believe it or not, there is a nice smattering of Christmas to choose from with these guys. Where else can you get Elvis, Led Zeppelin, Reggae and Christmas in one track?

3.  "Christmas is Going to the Dogs" by Eels.  Hard choice between this and "Everything's Gonna Be Cool This Christmas".

4.  "I Wish It Was Christmas Today"  by Julian Casablancas.  This one is for the kids.  I wish that I could still feel the way I did when I was a young boy after Thanksgiving.  Christmas, although only four weeks away, seemed like it sat on the other side of eternity.  As an adult, it comes and goes so fast I barely have time to enjoy it.  Message to kids: enjoy it while you can.  Responsibility steals the fun from Christmas!

5.  "Christmas Time is Here Again (Bring Out the Joy!)" by My Morning Jacket.  Peaceful.  I'll let you google for this one.  It's a live take from a radio broadcast.

6.  "Listening to Otis Redding at Home During Christmas" by Okkervil River.  Not a traditional Christmas tune, but a good one to follow MMJ, if only for the indie vibe.  This song reminds me of "New Slang" by The Shins, but with less jade and desperation.  Slightly more hopeful. :)

7.  "X-Mas Card" by MU330.  Not my normal thing, but the instrumental intro with the horns is fun.

8.  "Yule Shoot Your Eye Out" by Fall Out Boy.  If you haven't checked out "Can You See Santa From the Southside," now is the time to skedaddle over to Amazon and do so.

9.  "Baby, It's Cold Outside (Mulato Beat Remix)" by Louis Armstrong and Velma Middleton.  Shopko gave away a Christmas sampler in 2004 and this was on it.  This is, by far, the Christmas album that gets play in our house (not the one this song links to).  It comes on while we're preparing meals and we find ourselves breaking frequently to get our grooves on.  No kidding.  Six people from 2 to 35 shaking a leg in the kitchen.

10. "O Come All Ye Faithful" by Weezer.  Traditional Christmas tune done right by a modern band.

And some bonus songs from last years mix:

Bonus 1:  "Fairytale of New York" by the Pogues.  This one is definitely not for the kids and is a guilty pleasure of mine.  Who can resist: "You're a bum, you're a punk / You're an old slut on junk."  Ahh, the holidays.

Bonus 2:  "Frosty the Snowman" by the Cocteau Twins.  Year after year, my favorite Frosty rendition.

02 November 2009

Building Cassandra Thrift Bindings on OS X

A few weeks ago, I came to Rackspace to work full-time on Cassandra in their cloud division.  So far, I'm having fun and learning new things.  Apart from Cassandra, one of the projects I get to figure out is Thrift.  Thrift is a tool that allows you to define a service interface and then generate stubbed service bindings in different programming languages.  A programmer then takes the generated code and makes it do the things it is supposed to.  In an ideal world, this simplifies the process of, say, stubbing in a PHP client that can speak to a server stubbed in Java.

Right now, I'm the lone wolf in the office doing development on OS X.  The glitches so far have been minor, but I was forewarned that I might want to reconsider [using linux] when it came time to work with Thrift.  Well, that time started today.  I've been a faithful linux user for about 10 years, but I've been a faithful Mac user even longer.  I'm not ready to make the switch to full-time linux yet; I like my Mac.

Fortunately, Google was my friend when it came to figuring out the secrets of building Thrift on OS X.  Credit goes to Nathan Ostgard and his blog post for getting me going in the right direction.

1.  You definitely want to install macports.
2.  Install boost and log4j
sudo port install boost
sudo port install jakarta-log4j

3.  Download and install thrift
curl -o thrift.tgz "http://gitweb.thrift-rpc.org/?p=thrift.git;a=snapshot;h=HEAD;sf=tgz"
tar -xvf thrift.tgz
cd thrift
echo "thrift.extra.cpath = /opt/local/share/java/jakarta-log4j.jar" > ~/.thrift-build.properties
./bootstrap.sh
./configure --prefix=/opt/local

4.  You're going to get an error during configure:
./configure: line 16440: syntax error near unexpected token `MONO,'
./configure: line 16440: `  PKG_CHECK_MODULES(MONO, mono >= 2.0.0, net_3_5=yes, net_3_5=no)'

I couldn't figure out how to tell configure "no csharp, please" through the command line, so I just commented out lines 16439-16442 and ran configure again:

./configure --prefix=/opt/local

5.  You know the drill:
make
sudo make install

That's it for Thrift.  The next step is to generate the Cassandra client.  The Cassandra wiki has steps to generate a python client.  This works fine except that the thrift python module was installed to a place where the OS X python can't see it. You'll get the following error if you try to run Cassandra-remote:

Traceback (most recent call last):
  File "./Cassandra-remote", line 11, in
    from thrift.transport import TTransport
ImportError: No module named thrift.transport

There is probably a right way to fix this problem, a way that is right for OS X, but I had no patience.  I added the the following line to my ~/.profile:

export PYTHONPATH=/usr/lib/python2.6/site-packages

Restart terminal, navigate back to the directory where the python client was generated and try again.  Cassandra-remote should spew out a verbose usage directive.

That's it; you're done.

If you found this useful, or have feedback, please let me know.  I use gmail (gdusbabek).  I also emit the occasional tweet; just follow gdusbabek. 

If you're interested in learning more about Cassandra, there is an active and helpful IRC channel (#cassandra) on freenode and mailing lists as well.  The wiki also contains useful information for beginners.

14 September 2009

id3 for Python

I've been meaning to package up some python code I wrote earlier this year and release it for free as open source software.  Several things held me back from doing this.  The biggest reason, by far, is that I'm still not proficient at python, and feel like I'm exposing myself by putting this code out for the world to see.

But then I remembered that my blog doesn't have a lot of readers anyway.  So no worries there.  And besides, maybe I can garner some constructive criticism to make my python code better.  :)


http://www.dusbabek.org/~garyd/id3_python/

This library is capable of reading most any correct Id3v2.3 or 2.4 tag, some incorrect ones, and then fails gracefully when things get hopeless.  (I run smoke tests on my mp3 collection, which has a lot of nasty debris from the Napster years.) 

It supports unicode, and does a good job handing PIC/APIC tags.  I should also mention it is in production at Tagfriendly.

For those that do come across this, I'm still trying to figure something out.  All the documentation I've come across says that I should structure my directories as such:

MyModule/
    id3/
        __init__.py
        stuff.py
    setup.py
    tests.py

I include id3.py inside the id3/ directory because that's where I think it should go.  But then when I build and test the module, the only way I can access the code is if I import id3.id3, but I wish I only had to import id3.  Clearly, I'm not doing something right.

My solution, and I know this isn't right, is to do away with the id3/ directory altogether and just have id3.py rubbing elbows with setup.py and test.py.  Anyone know what gives?

Well, enough of all that... id3 for Python is available at http://www.dusbabek.org/~garyd/id3_python/.

P.S.  Thanks to my friends on IRC for reminding me about this.

31 August 2009

My New Workout

I have been attending the gym regularly, religiously even, for the last 10 years for early morning workouts.  What started out as short 20 minute workouts for an out-of-shape 25 year old have turned into 45 minute cardio sessions where I routinely burn 900 calories.  It's been a very good thing for me; I wouldn't trade it for the world.

When I started, I would bring headphones and watch the morning news.  Then began a succession of portable music players that started with a Diamond Rio 500, which I cherished, and culminated with the two iPods I currently use.  Over the last several years, gradually, the monotony of a daily workout began combining with my music against me, hampering my motivation.  Maybe I can chalk it up to age, or plateauing, but regardless of the cause: I needed to find something else.

I thought that an iPod video (30 GB) would help, but it turns out the screen is much to small to watch while the rest of my body is moving on a treadmill or elliptical machine.  (It is still great on an airplane though.)  I've been using an iPod shuffle for the last two years, alternating between general conference sessions and music.  I tried podcasts for a while, but the iPod has such a crappy podcast interface--you can't queue them up in a playlist, and I don't want to fiddle with buttons in the middle of my workout.

Total frustration.

Then something happened.  My gym began installing televisions on all the cardio equipment.  I thought my problem was solved--I'd be able to go back to just bringing a pair of headphones again.  That joy was short lived, as I realized that most of what's on television in the early morning is generally crap (informercials) and sometimes downright crude (infomercials for porn--no kidding!).  And the news is, well, frankly: not worth watching anymore.  Back to square one...

I don't know why it took so long for the idea to sink in, but I realized that I could connect my iPod video directly to the televisions on the cardio machines using a $5 composite cable I already owned (thank you, Sony), and watch my iPod videos on the TVs.  I finally gave it a try this morning.  I watched two TED talks, and an APM Marketplace podcast about high-frequency trading.  The best part is that when it was all over, my 45 minute workout felt like only 15 minutes had gone by! 

Undeniable WIN!

My next task is to find more podcasts that don't suck.  I've already found a decent language training podcast, but what would be really nice is to get a hold of some of the Google tech talks, as they are usually excellent and I don't mind watching them over (indeed, some of them ought to be watched multiple times to absorb the information).  I couldn't find them through the iTunes store, but maybe some kind soul has been kind enough to make them available on the outside.  Google, ironically, has not.  In fact, the Google tech talks are strewn across video.google.com and youtube.com now that Youtube is part of Google.  The older video.google videos are easily downloadable, but the Youtube ones are not.  At least, not without some real work.

Then again, there are enough [free] things at iTunes U to keep my mornings occupied for a long time.  They have a special section just for computer science educational content.  Double plus woot!

Anyway, all this is a long way of saying something short: iPod + cardio TV == excellent workout.

22 July 2009

Adventures in Javascript

Note: Blogger doesn't give me a good way to preview the image before it goes live.  If it is too small, my apologies.  I'll try to get that fixed before your news reader pulls the feed.

The Question
Sometimes I think programmers tend to accept things the way they are without really questioning them, especially when it comes to language quirks.  This is something I remember coming across when I was first learning Javascript.  I was reminded of it recently while reading a Javascript book.

Javascript pros will probably quickly recognize what's going on here.



At first I thought the third definition of MyFunc would wipe out the others.  And just to verify that declaring a function in that manner normally makes it into the assumed contexts, I did it with YourFunc.

What gives?

Go ahead, take a guess... 

The Answer
The interpreter evaluates standalone functions before the other expressions.  So it is as if 'function MyFunc...' were written before everything else.  This can be verified by calling MyFunc() in the first line of the script.  The expressions (which include assignments) are then evaluated.  So what appears as the second assignment of MyFunc is really the third, and the third is really first.