Doubts Even Here: the point of disbelief

Posted by Ceri Davies Sat, 30 May 2009 21:29:00 GMT

It’s slightly late in the day, but as I’m currently picking apart what parts of VMware’s VI/vSphere stack are actually useful in *my* Real World, I’m going to respond to Chuck Hollis’ blog post Why Oracle Doesn’t Like VMware, even though it was posted nearly a month ago.

To state my position clearly, I agree with Chuck’s on 80% of his points, particularly with the general sentiment that it would be trivial for them to support VMware as a platform but for the fact that they may lose business if they do. However, his point #4, “VMware Functionality Competes With Oracle DBMS Features” is completely disingenuous.

There is *no* VMware functionality that competes with Oracle DBMS features, although VMware marketing would like you to believe that there is. Let’s break Chuck’s examples down.

Chuck says about RAC:

On one hand, we've got a multi-server configuration running
 Oracle's latest (and most expensive) RAC product.  It's doing load
 balancing, high availability, and making the hardware function as
 a giant pool.

Let’s think about this RAC setup. It will be serving the same database via multiple instances. The clients know about each instance and will choose another when the first fails

and about VMware:

On the other hand, we've got the same multi-server configuration
 running the much cheaper Oracle SE on VMware. 

It too is load balancing, offers high availability, and makes the
 hardware function as a single giant pool.  Many of the management
 tasks are handled quite well outside of Oracle's domain.

Now I’m thinking right now that Chuck doesn’t know, or more likely has conveniently forgotten, how RAC works and also seems to have made similar mistakes regarding VMware’s features. Either that, or he’s completely believing VMware’s hype, much as they’d both like us all to do.

VMware is not like RAC

A RAC configuration consists of multiple instances serving the same database. This isn’t even conceptually similar to multiple VMs running multiple Oracle SE instances with, necessarily, multiple databases.

“load-balancing”

I’ve only been administering tens of Oracle DBMS databases for 4 years, but I have no idea how one would load balance read/write clients across multiple database+instances in anything approaching a productive way. I’m going to go as far as to say that, at least generally, you can not.

“high availability”

I’ve already mentioned what I think about VMware HA. It doesn’t offer good protection against network failures and it doesn’t offer any protection against FC storage failures. In fact, you can’t even mirror FC storage with VMware unless you get your SAN to do it[1].

Additionally, even when a failure is detected, the only fix is to restart the VM and thereby the Oracle instance, implying the loss of in-flight transactions.

“hardware functioning as a single giant pool”

While both RAC and VMware can be argued as making the hardware they run on function as a single pool, these two pools have an entirely different purpose. The “RAC pool” will take a database query and do the same thing regardless of which node handles it, while the “VMware pool” will not (unless, of course, you happen to be running RAC in it).

There’s more: Fault Tolerance.

Chuck then goes on to say, regarding the VMware setup:

And VMware brings a few very cool features to the table
 that Oracle doesn't, like real fault tolerance[...]

That’s a little shocking.

While RAC can be used to provide high-availability, there are (probably a large proportion of) RAC customers who would be using RAC in order to scale past a single server. VMware Fault Tolerance doesn’t even allow you to scale past a single virtual CPU.

VMware Fault Tolerance has other issues, such as a limited list of supported CPUs, the requirement to reboot a VM on most of those CPUs in order to enable it (and since you have to turn FT off in order to patch the ESX cluster it’s running on, that’s a big deal), lack of support for thin-provisioned VMs, an inability to support physical RDM, the requirement for a dedicated gigabit NIC and some other more minor ones. However, I’m also worried that it might use the same algorithm to determine failure of the Primary VM as VMware HA does - the documentation certainly mentions heartbeats, file locking on shared storage to prevent mistaken failover and that “failover occurs if the host running the Primary VM fails” which is the same terminology as the VMware HA documentation uses. I wonder if the loss of FC connectivity or the VM network will cause a failover here?

Don’t believe the hype

So much as I agree with Chuck’s main point, I’m annoyed at the over-hyping of VMware’s availability features because I don’t believe that they’re as good as Chuck would like me to. At the end of the day, as an Oracle and VMware customer I’d love to see Oracle’s database supported in VMware, but I’m well aware of the limitations of both and need this kind of misleading information being disseminated like a hole in the head.


[1] Note that this means that if you use Raw Device Mapping (RDM) to try to mirror in your OS instead, you’re just as at risk as if you hadn’t bothered because the mapping is stored in the not-fault tolerant .vmx file).

Posted in , ,  | 1 comment | no trackbacks

VMware HA?

Posted by Ceri Davies Sun, 24 May 2009 22:07:00 GMT

As a platforms engineer responsible for design, build and maintenance I spend a lot of time thinking about failure. As good system administrators and system designers will know, this means evaluating the impact of the failure of every aspect of the system as well as assessing whether this impact is actually important to the SLA in order to design around it where necessary.

VMware HA

VMware HA, a feature of VMware’s Virtual Infrastructure and vSphere products (both of which I will refer to, incorrectly, as “ESX” below), believes that it has a space to fill here. VMware HA has two features:

  1. If an ESX host fails, VMware HA will power on the Virtual Machines that it was running on another host in the cluster, thereby restoring that VM to service;
  2. VMware HA can monitor if a guest operating system is still responding and perform the same action if it is not.

It does all this in a manner transparent to applications and guest operating systems and on paper it sounds really useful. It’s not a sub-second response to an outage, but for a large number of situations it would be good enough.

Problems

I’ve recently undertaken to design a VMware based infrastructure for my employer and have been performing a number of experiments with VMware HA. The results of this testing are not promising and I’m therefore blogging my findings with a number of possible outcomes in mind:

  1. Someone corrects me, points me to a reference and I fix my test setup;
  2. People stop telling me VMware HA is the answer to all ills.

As further background to the complaints below, a typical ESX installation uses at least three NICs; one (or more) for the ESX service console, one for VMotion and one (or more) for Virtual Machine traffic. VMware HA heartbeats are carried over the service console NICs.

VM Networks are not protected

My first finding was that disconnecting the NICs carrying Virtual Machine traffic (and thereby causing those VMs to become disconnected from the world outside the ESX host) does not cause VMware HA to initiate a failover, as the machine is still “in cluster”. Further to that, it doesn’t even cause a warning flag or even a log entry to be displayed in the high-level vCenter interface; if you specifically look at the Network Configuration page on that host, you see a small red “x” next to the NIC.

Storage networks are not protected

ESX can use SAS, iSCSI, NFS or Fibre Channel as a remote (i.e., not physically in the ESX host) storage backend. I’ve tested NFS and Fibre Channel, and losing all paths to the storage doesn’t cause a failover of the VMs. Again, no warning flag is displayed on the ESX host in vCenter. A warning is logged, but no error, and the warning is misleading - entries were logged regarding attempts to fail over to an alternative path, but no error was logged stating that these attempts had failed.

Heartbeat network failure does not trigger a failover (by default)

In the default configuration, even losing the heartbeat networks will not cause VMs to be failed over. This is because ESX uses file locks to indicate when a VM is powered on on another host in the cluster, and these locks remain in place even when a host becomes disconnected from the other hosts in the cluster as the default setting is to take no action when this occurs, although changing this setting to “Power Off VM” or “Shut Down VM” does allow the loss of heartbeats to cause a failover. Even when an ESX host panics, in my experience, a failover may not occur as the locks can still be held until the host is manually power-cycled.

HA?

The conclusion that I draw here is that VMware HA actually only protects service if an ESX host suddenly powers off. I really don’t feel that VMware HA deserves the “H” in its moniker and would prefer that it be renamed use VMware SEPCA (Slightly Elevated in Particular Circumstances Availability) from this point onwards; there’s a good history of renaming products and product features so this shouldn’t be too much turmoil. This would help people to think more about what they’re actually getting rather than assuming that they’re covered by ESX.

Posted in  | 2 comments | no trackbacks

Disabling mailto: in Firefox

Posted by Ceri Davies Fri, 09 Jan 2009 10:08:00 GMT

Not interesting, but I couldn’t find this reliably anywhere else, hence the blog post for Google.

For Firefox 2.x (and maybe 3.x, I don’t know yet) set the following in user.js or about:config to disable mailto: handling.

user_pref("network.protocol-handler.external.mailto", false);

Posted in  | no comments | no trackbacks

OpenSolaris Test Farm

Posted by Ceri Davies Fri, 31 Oct 2008 20:10:00 GMT

I decided long ago that I didn’t want to run machines at home 24x7 and I didn’t want to spend money chasing performance. In fact, my main machine at home has an 800MHz VIA C3 CPU and just 512MB of RAM, and every other machine that I could be running OpenSolaris on is just as bad. I have access to much better hardware at work but it’s work hardware and not for that job.

Watching my attempts to build ONNV fail after 7 hours has always been slightly disheartening, especially when I’ve had to LiveUpgrade to a recent build first. Needless to say, the lack of good hardware with up-to-date tools has been somewhat of a barrier to my involvement. However, as of earlier this month, Sun has been providing the OpenSolaris project with a hosted farm of test machines that contributors can use to build and test software. There are two kinds of accounts, one for building software and the other for testing it.

The first, a Build Server account, gives you 15GB of disk space on a variety of machines of different processor types, which you can then use for compilation. These machines run builds of Nevada and the compilers that are recent enough to be able to build ONNV and they do it quickly; building ONNV on a 16-core x4600 in the Test Farm takes under one hour[1], which is massive boon to me if nobody else. Setting up an account is a simple matter of clicking “Add Account” on the test farm interface and waiting a few minutes.

The second kind of account reserves you an entire machine with console and SP access to splat your software over so that you can see if the machine still boots. There’s a bit more of a queue for one of these systems as obviously they can’t be used in parallel by more than one user but, again, all you need to do is click the relevant button in the interface and wait for a mail telling you that your server is ready.

If you have signed the Sun Contributor Agreement for OpenSolaris then you’re already set up on the Test Farm so go and grab an account now!


[1] And would probably take a lot less time if I could set DMAKE_MAX_JOBS higher than 4 :)

Posted in , ,  | no comments | no trackbacks

Converting SVR4 packages to datastream format and back

Posted by Ceri Davies Thu, 30 Oct 2008 20:54:00 GMT

OK, so SVR4 packages on Solaris are not the future, but for those of us with existing Solaris installations, we’ll be using pkgadd for some time.

While vendors seem to vary in their preference for datastream format (i.e. a single file) packages and what the Solaris documentation calls “file system format” packages (a directory), it’s pretty certain that a download of a package from the Internet will come in datastream format.

The datastream format is useful for distribution, but it’s slightly annoying for scripting and I personally hate not knowing what’s in a datastream package before I “run” it (preinstall scripts could do anything to your system, while the package itself could overwrite anything too). Therefore, I like to explode them to file system format and check them out first. This can be done easily with the pkgtrans tool.

To convert a datastream package to a file system package:

pkgtrans package-0.4.2.pkg /tmp all

The command above will explode all packages contained in the package-0.4.2.pkg file to directories in /tmp.

To convert a directory of file system format packages to datastream format, assemble them in the same directory, say /tmp, and run:

pkgtrans -s /tmp /tmp/foo.pkg all

This will create a single file, foo.pkg, containing all of the packages available in /tmp.

Invocations of the form pkgtrans -s Solaris_11/Product /tmp/nv97_sparc.pkg all are a slightly tidier way of keeping installation archives around.

Posted in ,  | no comments | no trackbacks

Book Review: Network Administration with FreeBSD 7

Posted by Ceri Davies Mon, 22 Sep 2008 20:31:00 GMT

I was approached some time ago by Packt Publishing with a request to review Network Administration with FreeBSD 7, a new book by Babak Farrokhi who is a FreeBSD committer among other things.

Right from the Preface it seemed clear that neither the author nor any of the editorial staff were native English speakers which, being the way that I am, made it very difficult to get into. Somewhere around chapter three, this improved drastically however, and I could finally manage to concentrate on the content. Which is good news, because the content is excellent.

Network administrators at any scale, from LAN to WAN, will find something useful. Routing protocols such as OSPF and BGP are covered, and there’s a good chapter on IPsec (and non-IPsec) tunnels which was directly useful to me personally while I was reading. Also welcome was information on IPv6 and a chapter on kernel tuning.

With multi-core systems such as Sun’s X4450 behemoth and the fine-grained locking in the network stack that FreeBSD enjoys now that GIANT is pretty much a distant memory, using FreeBSD on an off-the-shelf system to run hardcore bits of the network is practical, and this book works really well as a way to find out what the system is capable of and how to get started. Short chapters on getting familiar with FreeBSD and basic administration are also included, so there’s really no excuse not to take a look!

Posted in  | no comments

Moved

Posted by Ceri Davies Thu, 26 Jun 2008 18:43:00 GMT

Since it was set up, this machine has lived on my gateway FreeBSD machine in what has since become the boy’s bedroom. It moved from there to my own bedroom, and I recently started turning it off at night to try to save power (and sleep better!).

This led to a number of people emailing me about outages - which was quite flattering; didn’t realise anyone would care about my crappy blog that much - because they were in a timezone disparate to be trying to get to it during the hours I had it turned off.

Therefore, and with the realisation that I could get a hosted solution for $12/month (US), I have moved this blog to a Solaris[1] zone hosted at Gangus Internet. Service (technically and customer-servicely) so far has been absolutely impeccable.

If you are reading this, I guess the move went OK.


[1] Which is not a reflection on FreeBSD in any way. Questions such as were asked after this commit are unnecessary :)

Posted in ,  | no comments

New release of tcpdrop for Solaris

Posted by Ceri Davies Fri, 16 May 2008 21:09:00 GMT

Some years ago I ported tcpdrop to Solaris from the FreeBSD version. I did it very quickly as a proof of concept and never got round to quite getting the error handling right or worrying about Solaris 10 privileges.

After spending the required 14 seconds looking at the privileges stuff, it became pretty clear that the required privilege for using tcpdrop was PRIV_SYS_IP_CONFIG. This cannot be asserted in a non-global zone, so if you are one of the many people who have emailed me asking if it can work in a non-global zone, the answer is “no, it can’t”. Not only that, but there’s nothing I can do about it.

Also in this release, I fixed up the error messages so that they are at least correct :)

The next release will feature a manpage in man format, rather than the current mdoc one which can’t actually be formatted on Solaris. Anyone who knows an automated method to convert from mdoc to man, please shout.

Anyway, the new release is available for download, knock yourselves out.

Posted in ,  | no comments

Check your T5220!

Posted by Ceri Davies Fri, 15 Feb 2008 21:44:00 GMT

We took delivery of a pair of T5220s last week for implementing a new cluster.

All in all they’re nice systems, but the components such as the disk fillers and the DVD drive seem a little on the too plastic side; since the front USB ports are integrated on the DVD drive panel, there’s even a note in the hardware manual warning you to be careful not to unseat the DVD drive when removing USB devices! The PCI risers seems a little wobbly too.

We were a little surprised by how much heavier they seem to be when compared to the T2000; racking a T2000 can be done on your own but I wouldn’t want to do a T5220 alone. The T5220 is about 4 inches longer though which may account for the extra weight.

Since we’re a little nerdy, the first thing we did with the system was take the service panel off to check out the innards. It turned out to be a good job that we had, as the air duct had completely come away from its correct position during transit, and this would have made the system very unhappy. Particularly so, since the fan modules were a little loose too.

Posted in  | 1 comment

Profile support for zsh

Posted by Ceri Davies Wed, 12 Dec 2007 21:47:00 GMT

I’ve been using zsh for ages now, and the lack of a pfzsh implementation has been a minor annoyance for some of that time.

I happened to be looking at the csh source code and noticed how trivial the pfcsh implementation was and so, using the SFW code as a base, I threw pfzsh together yesterday afternoon.

Now, it turns out that the OpenSolaris FGAP project will be solving this in a different way, so a putback to SFW is unlikely. However, I’m going to find this useful in the meantime, so if you will too, download either the patch or an x86 package if they are useful to you.

Posted in , ,  | no comments

Older posts: 1 2 3 ... 13