Posted by Ceri Davies
Sat, 03 Mar 2007 15:52:00 GMT
Where I work, I look after three highly available clusters running Veritas Cluster Services on Solaris. The hardware is old enough that maintenance is becoming prohibitively expensive and we’re therefore planning to buy new hardware over the next six months or so. Veritas tried to hold us over a barrel over support costs not so long ago, and so this seemed to be a good time to investigate moving to a different HA system.
The obvious choice for me was Solaris Cluster 3.2 (or Sun Cluster 3.2 as it was called at release). It had originally seemed that what I wanted to do would be suboptimal, although the release of 3.2 completely fixed all of the issues that existed with the setup that I had wanted to implement.
Mmm, free
Additionally good is that Solaris Cluster is free to run, even in production, although it must be relicensed (at a reasonably rate) if you wish to buy support. It also supports a large variety of server and storage hardware. Therefore it was no hassle to just download the software and crack on with testing; one barrier to adoption nipped in the bud.
What, no host-based packet filter?
After testing out the design that I had envisioned, it seemed that everything that I had wanted the software to do was in there; the only fly in the ointment was that the IP Filter packet filter was not supported. It worked for some scenarios, but the lack of official support would have been a problem for us.
At around this time, the QA manager for Solaris Cluster, John Blair, happened to post a blog entry introducing himself on the Sun Cluster Oasis[1]. So I asked him about the IP Filter situation.
Ask, and ye shall receive
Less than six weeks later, IP Filter is officially supported for failover services. That’s an amazing response time. I’m not even a paying customer.
Professional services
Even before discovering this little nugget, I proceeded to obtain quotes for the licensing and support costs for Solaris Cluster 3.2. As I mentioned above, they’re quite reasonable.
However, I was told at this point that there was a requirement for Sun Professional Services to come in perform the installation and configuration of the cluster before support could be obtained, and this was far from reasonably priced. At this point I was pretty angry and a little disappointed; I’m a big fan of Sun and couldn’t see why they would throw away customers like this.
I went far enough to complain about it publicly, although it was later pointed out that there was an option for a simple installation validation which is much more reasonable and by pointing this out I hereby absolve myself from the FUD-spreading.
You coming or what?
At this point it’s still not clear that we’ll end up running Solaris Cluster on these platforms, but I’m hopeful that we will. The design that I want to implement slots right in to the Solaris Cluster design and the implementation is therefore very simple and easy to understand (and, by extension, it’s easier to document, which is more the point for me!).
The title of this post is a small admission that I may be starting to sound like this around the office, sorry guys :)
[1] Note to Sun Marketing; there’s some rebranding to be done here :)
Posted in Software, Solaris, Sun, Clustering, Veritas | Tags cluster, ha, solaris, sun | 2 comments
Posted by Ceri Davies
Sat, 24 Feb 2007 13:27:00 GMT
I wrote an article for Sun’s BigAdmin a few months ago.
Due to a backlog of articles, it was only published a couple of weeks ago, but Sun gave me double the usual number of free stuff points in compensation for the delay.
What’s extra nice is that my article has been up on the front page of Sun Developer Network for a couple of days now. Not because it’s particularly brilliant, I think it’s just “put a BigAdmin article on SDN” week.
Still cool though :)
Posted in Software, Solaris, Sun | Tags sun | no comments
Posted by Ceri Davies
Fri, 15 Dec 2006 19:46:00 GMT
Patching Solaris is difficult
Patching Solaris is historically hard work involving cross referencing the installed patches (showrev -p) with the installed release (cat /etc/release) and the latest Recommended patch cluster and patch report at SunSolve.
Sun tools actually make it harder
Since this is such a nightmare, Sun have offered a huge number of methods for patching Solaris systems. Some are no longer properly maintained or don’t support recent releases, some are heavy X based monsters, some have a huge dependency list (33 packages for smpatch in Solaris 10 6/06, and that’s not only the ‘light’ version, but is also incomplete). The one thing they have in common is that they suck.
smpatch is the worst of the lot
After a recent experience where smpatch not only rendered a production machine unbootable, but required three reboots to do so and then had failed to even offer all of the available patches, I’ve had enough.
From now on we’ll be using Patch Check Advanced (PCA) from http://www.par.univie.ac.at/solaris/pca/.
Read more...
Posted in Software, Solaris, Sun
Posted by Ceri Davies
Mon, 31 Jul 2006 19:29:00 GMT
Some guy asked on comp.unix.solaris for a tool for Solaris that could drop a TCP connection without killing the associated process.
I pointed out that we had tcpdrop(8) from OpenBSD which did this, whereupon Casper Dik informed me of the existence of the TCP_IOC_ABORT_CONN ioctl which does the same job on Solaris.
So I did a dirty port of tcpdrop(8) which you can download if you want.
It’s been tested on Solaris 9, but might not even compile on other versions. You get to keep both pieces if it breaks.
Posted in FreeBSD, Software, Solaris | no comments
Posted by Ceri Davies
Wed, 14 Jun 2006 19:52:00 GMT
I recently had cause to arrange for Oracle’s RDBMS 10g to be started at boot time. This led me to despair somewhat at the state of shell scripting in general, and I will rant a little on that subject. This isn’t really intended as an attack on this particular script, but these are the issues that arose from it.
The Oracle installation provides a couple of scripts named dbshut and dbstart that look like they’ll do that job. Indeed, the top of dbstart states:
# This script is used to start ORACLE from /etc/rc(.local).
The corresponding RCS log from the import tells a different tale:
$ svn log -r99 dbstart
Password for 'ceri':
-------------------------------------------------------------------
r99 | ceri | 2006-06-09 15:27:33 +0100 (Fri, 09 Jun 2006) | 5 lines
Add a bunch of scripts used for looking after our databases.
Mainly culled from our live systems, with the notable exception
of the Oracle provided utilities dbshut and dbstart, which are
as out of the box here (and therefore do not work).
-------------------------------------------------------------------
Read more...
Posted in Oracle, Software, Solaris | 4 comments
Posted by Ceri Davies
Fri, 28 Apr 2006 19:57:00 GMT
Our T2000 arrived. We went for the 8-core 1.0GHz, 16GB version to allow us to use the Solaris resource management tools to carve it up in various configurations.
So far I’ve used it to great effect to test out an Oracle Data Guard configuration that I’m building, and particularly to understand how you are supposed to set up your clients in order to benefit from the redundancy goodness — some notes on that will be forthcoming real soon as I couldn’t find a single useful document on that.
Michael Bushkov’s Summer of Code project, cached(8), which adds caching for nsswitch along with enabling nsswitch for the services, protocols and rpc databases finally got committed today. This is really interesting work, similar to nscd(1M) on Solaris1, but with each user having their own cache.
Also, I discovered that a post on BSDNews is a good way to saturate a crappy cable modem link. :-)
Oh yeah, and John Birrell is making superb progress on a DTrace port to FreeBSD.
1 Yes, nscd(1M) has a bad reputation, but I strongly believe that’s because people don’t understand how to work it.
Posted in FreeBSD, Oracle, Solaris, Sun | no comments | no trackbacks
Posted by Ceri Davies
Fri, 24 Mar 2006 19:28:00 GMT
I added a new system board to one of our v880s this afternoon, after which picld(1M) spat out the following errors on boot, despite all diagnostics having passed:
Mar 24 16:48:21 vleappc picld[320]: ERROR running psvc_check_temperature_policy_0
on CPU5_DIE_TEMPERATURE_SENSOR (2757672)
Mar 24 16:48:21 vleappc picld[320]: ERROR running psvc_check_temperature_policy_0
on CPU5_DIE_TEMPERATURE_SENSOR (2757672)
Mar 24 16:48:21 vleappc picld[320]: No such file or directory
Mar 24 16:48:21 vleappc picld[320]: No such file or directory
Mar 24 16:48:21 vleappc picld[320]: ERROR running psvc_check_temperature_policy_0
on CPU7_DIE_TEMPERATURE_SENSOR (2757768)
Mar 24 16:48:21 vleappc picld[320]: No such file or directory
Mar 24 16:48:21 vleappc picld[320]: ERROR running psvc_check_temperature_policy_0
on CPU7_DIE_TEMPERATURE_SENSOR (2757768)
Mar 24 16:48:21 vleappc picld[320]: No such file or directory
The solution is as simple as a reconfigure boot.
Update [07/04/2006]: Chris Mackowski mailed me to point out that devfsadm -C works just nice too, of course. Thanks Chris.
Posted in Solaris, Sun | no comments | no trackbacks
Posted by Ceri Davies
Sun, 04 Dec 2005 12:19:00 GMT
Adrian Steinmann and I knocked the crunchgen(1) patch into submission; it’s now available for testing. All pre-existing configuration files should produce the same code — only use of the new libs_so keyword should make a difference. We’re looking at a 6 week MFC period or so.
I can’t mail Marius Nünnerich for some reason, so if you know him, let him know that his patch doesn’t work on the FreeBSD cluster; use bytes doesn’t exist in Perl 5.00503. I suspect that we can just get away without it.
I finally managed to buildworld after about 6 weeks; took a make installincludes and a make install in lib/libmemstat for some reason; I must have screwed something up somewhere. I can start work on some of that kernel side stuff I mentioned, though I don’t really have a test machine :-/
On which note, I’ve decided on a laptop that I want but I don’t think that December is the best time of year to be buying. I’m expecting the usual raft of price drops in the new year, and if the rumour about Apple based Intel laptops coming in January turns out to be true, I may be better off with one of them (assuming that it will run Windows too — that’s the real clincher here).
Sun released practically their entire Enterprise suite as free (as in beer, for now). This means that the Java Availability Suite which, despite the crappy name, is Sun Cluster and some other tools is now free, and so is the Grid Engine. The Grid Engine supports a whole bunch of operating systems and is a much better candidate for a compile farm than I thought Condor was. I strongly suspect that the Linux bits will work fine on FreeBSD too. I’ll be firing up the old Netra for a fiddle when these downloads finish; Stef will be pleased — I suppose it will save on heating.
Posted in FreeBSD, Apple, Solaris, Condor, Sun | no comments | no trackbacks
Posted by Ceri Davies
Thu, 17 Nov 2005 10:18:00 GMT
ZFS has finally been unleashed and it looks as good as we’ve been promised.
Check out the flash demo showing how to create pools, filesystems and quotas (not traditional quotas as you know them). Bryan Cantrill has collected a morass of links to further information which I will spend the rest of this week salivating over. Unfortunately I will need to discover bfu before I get to try it.
It’s good to have something interesting to learn again.
Posted in Solaris, Sun | no comments | no trackbacks
Posted by Ceri Davies
Thu, 27 Oct 2005 11:47:00 GMT
I installed Solaris 10 on an old x86 server in order to get some zones(5) and jumpstart development done (the jumpstart document currently references Solaris 11. Ooh.)
Since this was a really old server, it didn’t have quite enough memory to run the installation in graphical mode (which is good, because I don’t like it), so when the freshly installed Solaris booted into a blank screen I wasn’t too concerned - I simply needed to configure X.
Read more...
Posted in Solaris, Sun | no comments | no trackbacks