More permanent stuff at http://www.dusbabek.org/~garyd

11 March 2010

Running Multiple Cassandra Nodes on a Single Host

One of the first Cassandra tickets I worked on had me reviewing some code that visualized the node ring.  Properly testing the code required that I run a cluster. 

But I didn't have access to a cluster. Neither did I feel like creating a virtual cluster by building a VM and cloning it several times.  What I wanted was to run several instances of Cassandra on a single machine with multiple interfaces, all pointed at the same compiled code (without multiple svn checkouts).

The Cassandra wiki explains how to tweak Cassandra settings by editing cassandra.in.sh, but doesn't explain what needs to be done to run concurrent instances.

It turned out not to be too difficult.  I figured it might be daunting enough to Cassandra noobs (of whom we're seeing more of lately due to some great exposure), that a blog post might be helpful. 

This tutorial assumes that you'll want to run multiple instances of Cassandra on code built by ant and not a standalone jar.  I am also assuming that you are a) just playing around, or b) intend to do some development.  This is not a tutorial explaining how Cassandra should be run in production.

Note: I apologize for the way this looks.  Blogger is not a friend of ordered lists.

  1. Make sure you've got aliases to localhost (e.g.: 127.0.0.2, 127.0.0.3, etc.).  Mac OS X doesn't have this enabled by default, so you'll have to manually create aliases:

    sudo ifconfig lo0 alias 127.0.0.2 up
    sudo ifconfig lo0 alias 127.0.0.3 up
  2. Decide where you're going to keep things.  You can keep them with your code, but that just isn't neat.  Pick a directory somewhere, call it $cass_stuff.
  3. Then, for each node in your little cluster, do this:

    1. From your svn checkout, copy the conf directory into $cass_stuff.  You can rename it to something like conf0 (or conf1, etc.).  I'll assume $conf from here on out.
    2. Copy bin/cassandra.in.sh to $cass_stuff.  Give it a name that helps you associate it with the conf directory you just created (node0.in.sh or whatever).
    3. Open node0.in.sh in an editor and make the following changes:

      1. Hardcode cassandra_home to the location of your trunk.  This will give you the flexibility to run Cassandra from anywhere.
      2. Set CASSANDRA_CONF to the conf directory you just created.
      3. In the JVM_OPTS change the jdwp address= setting.  The default is 8888, but you should include the unique IP you chose for this node along with the port, e.g.: 127.0.0.2:8888.  Not specifying a host causes the debugger to bind to 0.0.0.0:8888 and you'll have port binding problems when you bring up more than one node.
      4. pick a unique port for com.sun.management.jmxremote.port, but make sure you have at least one node listening on 8080 since all the Cassandra tools assume JMX is listening there.  Unfortunately, you can't pick the JMX host, 0.0.0.0 is assumed.  I was under the impression this could be changed by specifying java.rmi.server.hostname, but had no luck going down that road.  (Please leave a comment if you figure out a way for this to work, but I think it might be hopeless.)
    4. Open $cass_stuff/$conf/storage-conf.xml in an editor and make the following changes:

      1. specify unique locations for CommitLogDirectory and DataFileDirectory.  Don't bother with CalloutLocation or StagingFileDirectory.
      2. replace ListenAddress with the IP of your host.
      3. replace RPCAddress with the IP of your host.
To run you may wish to use another script for each node:

#!/bin/sh
CASSANDRA_INCLUDE=$cass_stuff/
export CASSANDRA_INCLUDE
cd
bin/cassandra -f

One downside to this approach is that if you're tracking trunk, it is your responsibility to make sure you notice changes to the default storage-conf.xml and cassandra.in.sh and apply them to your environments.


Cassandra is supported by an active and welcoming community.  If you'd like to learn more about the project, check out our wiki, mailing list or hop on #cassandra on freenode.

8 comments:

Jin-Su said...

Thank you, Gary. I don't need to mess around with VMs whew...

jeremy said...

One small thing I'm doing differently is hardcoding my cassandra_home in my node(x).in.sh files to a symbolically linked directory. That way I can update that symlink to point to trunk or a branch or whatever.

jeremy said...

Also, the arg in step 3-c-iii was taken out of the default cassandra.in.sh JVM_OPTS - it was "-Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \" in case people are wondering.

jeremy said...

one more small nit - in the script near the end, the directory didn't show up for some reason... Below is what I ended up with:

#!/bin/sh
CASSANDRA_INCLUDE=$cass_stuff/
export CASSANDRA_INCLUDE
cd $CASSANDRA_HOME
bin/cassandra -f

Masood Mortazavi said...

On the current trunk, I tried these procedures, with some variations to make two servers to run on the same machine. Both daemons live and assume a token on the ring. However, neither sees the other. So, they do not divide the ring.

Pavlo Baron said...

you mightne ed topr ovide the include file name

CASSANDRA_INCLUDE=$cass_stuff/

otherwise it tells you it's a directory

keith said...

One simple addition, you need to also change the log4j.appender.R.File value inside log4j.properties for each node

Tyler Hobbs said...

Masood,

I think you need to set the Seeds in storage-conf.xml for this to not happen.