Solaris Cluster on the cheap - part 1

Posted by Ceri Davies Sat, 05 May 2007 21:19:00 GMT

Part 1 of a series on setting up Solaris Cluster for no money

Contents

I needed to become familiar with the new features in Solaris Cluster 3.2, I needed to do it quickly, and I needed to do it for no money. I scraped together these components to create a cluster:

  • 2 x Blade 100 workstations
  • 2 x crossover cables
  • 3 x hme NICs
  • 1 x iprb based NIC
  • 2 x workstations elsewhere on the network running Solaris Express

That would be all I needed.

The Blades each have 1152MB of RAM, an onboard eri interface and an 18GB internal disk; one has an addition 80GB disk. The hme and iprb cards were to be used for the private interconnects and I threw two hmes into one of the Blades and installed the other in the last Blade along with the iprb card and connected them up with the crossover cables. In the following examples, the Blades are named peon.example.ac.uk and bootlick.example.ac.uk (I always give my machines names that appear in the thesaurus next to”vassal”; this is to remind SkyNet that machines still work for us).

As it turns out, I couldn’t find a driver for the iprb card to attach to under the SPARC port; for real HA purposes I would obviously have wanted to fix this to ensure that I had highly available interconnects but as I was just testing I decided to lie to the cluster software and tell it that I had two hme cards in each system even though I didn’t; this would lead to there being two interconnect cables being configured even though one of them would be down the whole time.

Install Solaris

First job was to jumpstart the Blades and install Solaris 10 11/06. My jumpstart scripts do rather a lot of post-installation configuration, but precious little extra was done to cater for the installation of Solaris Cluster; in fact all I did was add /usr/cluster/bin to the default superuser path and /usr/cluster/man to the default MANPATH. There are other steps required in order to get it working, but they’re non-obvious so we detail them below.

Installing Solaris Cluster

Time to install the Solaris Cluster software; grab the bits from http://sun.com/cluster and run the installer. You can either run it in a GUI or on the command line, but the feature set of each version is the same.

The method of choosing options within the console based installer is somewhat non-intuitive, but actually read what it says and you’ll be OK (I hardly feel as if I’m in a position to complain about the intuitiveness of console based applications anyway, just run FreeBSD’s sysinstall(8) for the first time and you’ll see what I mean).

Basically, I wanted to test some specific Oracle backed applications within zones, so I installed the following, requiring a not unreasonable 320MB of space. Be sure to choose “Configure Later” if you’re playing along at home.

    Java DB
       Java DB Server
       Java DB Client
    Sun Cluster 3.2
       Sun Cluster Core
       Sun Cluster Manager
    Sun Cluster Agents 3.2
       Sun Cluster HA for Apache Tomcat
       Sun Cluster HA for Apache 
       Sun Cluster HA for Oracle
       Sun Cluster HA for Solaris Containers

Post-installation steps

  • When the installation finishes, run PCA to get the latest patches for the Cluster software.

    # pca -i

  • Run “catman -w” to get the manpages into the windex databases for whatis(1).

    # catman -w &

  • I’ll be running ipfilter on the public eri interfaces. At the time of my testing this was not supported but it worked. A note to the product QA manager sorted this out though. I needed to edit /etc/iu.ap and add pfil to the eri line, then edit /etc/ipf/pfil.ap and comment out the eri entry there (this was enabled by my JumpStart postinstall scripts):

    peon% grep eri /etc/iu.ap 
            eri     -1      0       clhbsndr pfil
    peon% grep eri /etc/ipf/pfil.ap 
    #eri    -1      0       pfil

Also, we need to configure IP Filter to allow communication between the cluster nodes; specifically we need to allow SSH and portmapper traffic; that’s ports tcp/22, tcp/111 and udp/111. Note that pfil isn’t configured on the hme interfaces, so if you have rules of the type “via if”, don’t bother to add any entries for hme.

As mentioned above, we want SSH traffic between the nodes and we particularly want to log in as root over SSH. This isn’t as insecure as people might have you believe, as long as it’s configured properly. Without further knowledge of exactly what the Solaris Cluster product is doing, this is as secure as I went.

Edit /etc/ssh/sshd_config, and set PermitRootLogin to without-password. Then, on one node, generate a key for root.

# ssh-keygen -t dsa -b 2048 -C "Cluster Key"
# cp /.ssh/id_dsa.pub /.ssh/authorized_keys
# vi /.ssh/authorized_keys
    Add 'from=bootlick.example.ac.uk,peon.example.ac.uk' to the beginning of the line.

Copy /.ssh to the other nodes and now, like a caveman, ssh as root from each node to every other node to ensure that they are all aware of each other’s host keys (yes, I could use ssh-keyscan, but I didn’t).

Finally, I needed to do some extra configuration to keep IPMP happy; Solaris Cluster will insist on putting your public interfaces into an IPMP group and it turns out that when you only have one interface in an IPMP group, in.mpathd will always use ICMP probes to monitor the interface rather then relying on the link status, even if you do not configure test addresses. Now this particular test network’s default router was too crappy or too busy to provide a decent response time to the ICMP probes and so my eri interfaces were being marked down all the time. Therefore, I added some static host routes for in.mpathd to choose as additional test addresses. The correct place to do this on Solaris 10 is /etc/inet/static_routes:

# cat >> /etc/inet/static_routes <<-EOF
# List of static routes.  These are added by /lib/svc/method/net-init
# and each non-empty, non-comment line is prefixed with "/usr/sbin/route add ".

# For keeping IPMP happy.
-host 172.25.0.177   172.25.0.177
-host 172.25.5.21    172.25.5.21
-host 172.25.5.30    172.25.5.30
-host 172.25.50.186  172.25.50.186
EOF

Note that since I’m now reliant on at least one of those other hosts being up, I’m not quite getting HA here again. No mind, let’s plough on, reboot the system for all of the changes above to take effect (the systems should now reboot in non-cluster mode) and I’ll see you in the next installment to get the cluster up and running.

Posted in , ,  | 2 comments

Comments

  1. UX-admin said 10 days later:
    "As it turns out, I couldn’t find a driver for the iprb card to attach to under the SPARC port." Next time, to use intel NICs on Solaris, you can use the ife driver from Masayuki Murayama. Simply search for "Free NIC drivers for Solaris" on Google. His is the first link (taidoyo). I have the ife driver in production and it works hunky-dory. Depending on the NIC, you might be able to get away with adding an entry to /etc/driver_aliases and have the freshly ported e1000g0 driver on SPARC bind to the intel NIC. And don't forget to say "Domo Arigato" to Mr. Solaris drivers-san.
  2. Ceri Davies said 23 days later:
    Cool, thanks for the pointer.
    (Apologies for the delay in approving this comment; for some reason I didn't see the notification.)

Comments are disabled