Posted by Ceri Davies
Mon, 21 Dec 2009 22:23:00 GMT
We, apparently randomly, had a Solaris 10 host reboot and fail to come back up a couple of weeks ago.
On investigation, GRUB was failing to load the menu.lst file from the root pool and the Solaris GRUB findroot extension was failing to find any boot signatures, despite the pool on the disk being consistent and in good health.
As we couldn’t find any symptoms other than “it doesn’t work”, we rebuilt the machine and moved on with our lives. Until today, when it happened again…
Read more...
Posted in Solaris | no comments | no trackbacks
Posted by Ceri Davies
Sun, 22 Nov 2009 00:08:00 GMT
For anyone struggling to find the text of the Digitial Economy Bill, the current text is on the UK Parliament web site.
Posted in General | no comments | no trackbacks
Posted by Ceri Davies
Fri, 20 Nov 2009 21:31:00 GMT
The Digital Economy Bill has been proposed. Act now, before it becomes an Act. You can write to your MP via Write to Them right now. Write to a Lord or two while you’re there, it won’t hurt.
As for why, see Cory Doctorow’s summary, the Open Rights Group’s campaign (although I think this focusses a lot on the disconnection and dismisses some of the other crap and what other countries are doing.
Here’s the text of the letter I sent to my MP; it’s heavily based on Cory’s article above. If you’re writing - and if you’re not, why not? You should write even if you support the Bill (but what are you doing here?!), it’s that important - don’t just copy this, put your spin on it.
Dear Jennifer Willott,
I’m writing to outline in no uncertain terms that I feel that the Digital Economy Bill, as proposed, should be rejected by Parliament. This single issue is critical enough to directly influence how I will be voting in the next election.
Firstly, disconnection from the Internet is disproportionate and unacceptable, particularly in order to prevent what is a civil offence. Additionally, the fallout of this on members of the public who are unrelated to the accused offender are non-trivial. Examples:
1) I have bank accounts that I cannot access by any method other than online, I also work from home quite a lot. Another member of my household getting our connection disconnected is an obvious negative impact;
2) Many people do not know how to secure their wireless access points. A malicious person could easily drive around Plasnewydd potentially getting unsuspecting households disconnected by abusing their wireless connectivity (encryption of wireless access points does not really help, as all wireless protocols are rather insecure);
3) Students or young professionals in shared households could lose access due to no fault of their own, possibly directly affecting their jobs.
Apart from the question of disconnection, the new rating system will fail to protect children or consumers as all such rating systems do, and will actually make it more difficult for smaller businesses or startups to enter or stay in the market.
The secondary legislation grant to the Secretary of State is a huge concern, allowing the Secretary of State to pretty much do anything without consulting Parliament, including devolving frightening powers and levels of privacy intrusion, probably to private companies (who, as the T-Mobile data selling story this week shows, are often as poor as the government at protecting citizens’ data).
Finally, the Bill lacks any means of improving the digital economy that it purports to protect. Widening access to the Internet, rather than trying to curb it, should be the focus of the Bill, yet there is no mention of helping the poor to gain Internet access, nothing to ensure that broadband access becomes cheap and operator-neutral. There is no mention of how publicly funded media or intellectual property, such as that from the BBC, Arts Council grantee works, etc. are made available to people to actually access, embrace and use to drive creation of new art.
The Bill seems entirely focussed on protecting the income of industry (many of whom are struggling to create original content anyway - see the huge number of film “remakes” and musical cover versions) at the expense of the citizen’s ability to do what other countries are currently declaring basic rights (http://news.cnet.com/8301-17939_109-10374831-2.html).
I urge you to oppose this Bill.
Yours sincerely,
Ceri Davies
Posted in General | no comments
Posted by Ceri Davies
Sat, 30 May 2009 21:29:00 GMT
It’s slightly late in the day, but as I’m currently picking apart what parts of VMware’s VI/vSphere stack are actually useful in *my* Real World, I’m going to respond to Chuck Hollis’ blog post Why Oracle Doesn’t Like VMware, even though it was posted nearly a month ago.
To state my position clearly, I agree with Chuck’s on 80% of his points, particularly with the general sentiment that it would be trivial for them to support VMware as a platform but for the fact that they may lose business if they do. However, his point #4, “VMware Functionality Competes With Oracle DBMS Features” is completely disingenuous.
There is *no* VMware functionality that competes with Oracle DBMS features, although VMware marketing would like you to believe that there is. Let’s break Chuck’s examples down.
Chuck says about RAC:
On one hand, we've got a multi-server configuration running
Oracle's latest (and most expensive) RAC product. It's doing load
balancing, high availability, and making the hardware function as
a giant pool.
Let’s think about this RAC setup. It will be serving the same database via multiple instances. The clients know about each instance and will choose another when the first fails
and about VMware:
On the other hand, we've got the same multi-server configuration
running the much cheaper Oracle SE on VMware.
It too is load balancing, offers high availability, and makes the
hardware function as a single giant pool. Many of the management
tasks are handled quite well outside of Oracle's domain.
Now I’m thinking right now that Chuck doesn’t know, or more likely has conveniently forgotten, how RAC works and also seems to have made similar mistakes regarding VMware’s features. Either that, or he’s completely believing VMware’s hype, much as they’d both like us all to do.
VMware is not like RAC
A RAC configuration consists of multiple instances serving the same database. This isn’t even conceptually similar to multiple VMs running multiple Oracle SE instances with, necessarily, multiple databases.
“load-balancing”
I’ve only been administering tens of Oracle DBMS databases for 4 years, but I have no idea how one would load balance read/write clients across multiple database+instances in anything approaching a productive way. I’m going to go as far as to say that, at least generally, you can not.
“high availability”
I’ve already mentioned what I think about VMware HA. It doesn’t offer good protection against network failures and it doesn’t offer any protection against FC storage failures. In fact, you can’t even mirror FC storage with VMware unless you get your SAN to do it[1].
Additionally, even when a failure is detected, the only fix is to restart the VM and thereby the Oracle instance, implying the loss of in-flight transactions.
“hardware functioning as a single giant pool”
While both RAC and VMware can be argued as making the hardware they run on function as a single pool, these two pools have an entirely different purpose. The “RAC pool” will take a database query and do the same thing regardless of which node handles it, while the “VMware pool” will not (unless, of course, you happen to be running RAC in it).
There’s more: Fault Tolerance.
Chuck then goes on to say, regarding the VMware setup:
And VMware brings a few very cool features to the table
that Oracle doesn't, like real fault tolerance[...]
That’s a little shocking.
While RAC can be used to provide high-availability, there are (probably a large proportion of) RAC customers who would be using RAC in order to scale past a single server. VMware Fault Tolerance doesn’t even allow you to scale past a single virtual CPU.
VMware Fault Tolerance has other issues, such as a limited list of supported CPUs, the requirement to reboot a VM on most of those CPUs in order to enable it (and since you have to turn FT off in order to patch the ESX cluster it’s running on, that’s a big deal), lack of support for thin-provisioned VMs, an inability to support physical RDM, the requirement for a dedicated gigabit NIC and some other more minor ones. However, I’m also worried that it might use the same algorithm to determine failure of the Primary VM as VMware HA does - the documentation certainly mentions heartbeats, file locking on shared storage to prevent mistaken failover and that “failover occurs if the host running the Primary VM fails” which is the same terminology as the VMware HA documentation uses. I wonder if the loss of FC connectivity or the VM network will cause a failover here?
Don’t believe the hype
So much as I agree with Chuck’s main point, I’m annoyed at the over-hyping of VMware’s availability features because I don’t believe that they’re as good as Chuck would like me to. At the end of the day, as an Oracle and VMware customer I’d love to see Oracle’s database supported in VMware, but I’m well aware of the limitations of both and need this kind of misleading information being disseminated like a hole in the head.
[1] Note that this means that if you use Raw Device Mapping (RDM) to try to mirror in your OS instead, you’re just as at risk as if you hadn’t bothered because the mapping is stored in the not-fault tolerant .vmx file).
Posted in Clustering, Oracle, VMware | 1 comment | no trackbacks
Posted by Ceri Davies
Sun, 24 May 2009 22:07:00 GMT
As a platforms engineer responsible for design, build and maintenance I spend a lot of time thinking about failure. As good system administrators and system designers will know, this means evaluating the impact of the failure of every aspect of the system as well as assessing whether this impact is actually important to the SLA in order to design around it where necessary.
VMware HA
VMware HA, a feature of VMware’s Virtual Infrastructure and vSphere products (both of which I will refer to, incorrectly, as “ESX” below), believes that it has a space to fill here. VMware HA has two features:
- If an ESX host fails, VMware HA will power on the Virtual Machines that it was running on another host in the cluster, thereby restoring that VM to service;
- VMware HA can monitor if a guest operating system is still responding and perform the same action if it is not.
It does all this in a manner transparent to applications and guest operating systems and on paper it sounds really useful. It’s not a sub-second response to an outage, but for a large number of situations it would be good enough.
Problems
I’ve recently undertaken to design a VMware based infrastructure for my employer and have been performing a number of experiments with VMware HA. The results of this testing are not promising and I’m therefore blogging my findings with a number of possible outcomes in mind:
- Someone corrects me, points me to a reference and I fix my test setup;
- People stop telling me VMware HA is the answer to all ills.
As further background to the complaints below, a typical ESX installation uses at least three NICs; one (or more) for the ESX service console, one for VMotion and one (or more) for Virtual Machine traffic. VMware HA heartbeats are carried over the service console NICs.
VM Networks are not protected
My first finding was that disconnecting the NICs carrying Virtual Machine traffic (and thereby causing those VMs to become disconnected from the world outside the ESX host) does not cause VMware HA to initiate a failover, as the machine is still “in cluster”. Further to that, it doesn’t even cause a warning flag or even a log entry to be displayed in the high-level vCenter interface; if you specifically look at the Network Configuration page on that host, you see a small red “x” next to the NIC.
Storage networks are not protected
ESX can use SAS, iSCSI, NFS or Fibre Channel as a remote (i.e., not physically in the ESX host) storage backend. I’ve tested NFS and Fibre Channel, and losing all paths to the storage doesn’t cause a failover of the VMs. Again, no warning flag is displayed on the ESX host in vCenter. A warning is logged, but no error, and the warning is misleading - entries were logged regarding attempts to fail over to an alternative path, but no error was logged stating that these attempts had failed.
Heartbeat network failure does not trigger a failover (by default)
In the default configuration, even losing the heartbeat networks will not cause VMs to be failed over. This is because ESX uses file locks to indicate when a VM is powered on on another host in the cluster, and these locks remain in place even when a host becomes disconnected from the other hosts in the cluster as the default setting is to take no action when this occurs, although changing this setting to “Power Off VM” or “Shut Down VM” does allow the loss of heartbeats to cause a failover. Even when an ESX host panics, in my experience, a failover may not occur as the locks can still be held until the host is manually power-cycled.
HA?
The conclusion that I draw here is that VMware HA actually only protects service if an ESX host suddenly powers off. I really don’t feel that VMware HA deserves the “H” in its moniker and would prefer that it be renamed use VMware SEPCA (Slightly Elevated in Particular Circumstances Availability) from this point onwards; there’s a good history of renaming products and product features so this shouldn’t be too much turmoil. This would help people to think more about what they’re actually getting rather than assuming that they’re covered by ESX.
Posted in VMware | 3 comments | no trackbacks
Posted by Ceri Davies
Fri, 09 Jan 2009 10:08:00 GMT
Not interesting, but I couldn’t find this reliably anywhere else, hence the blog post for Google.
For Firefox 2.x (and maybe 3.x, I don’t know yet) set the following in user.js or about:config to disable mailto: handling.
user_pref("network.protocol-handler.external.mailto", false);
Posted in Aide memoire | no comments | no trackbacks
Posted by Ceri Davies
Fri, 31 Oct 2008 20:10:00 GMT
I decided long ago that I didn’t want to run machines at home 24x7 and I didn’t want to spend money chasing performance. In fact, my main machine at home has an 800MHz VIA C3 CPU and just 512MB of RAM, and every other machine that I could be running OpenSolaris on is just as bad. I have access to much better hardware at work but it’s work hardware and not for that job.
Watching my attempts to build ONNV fail after 7 hours has always been slightly disheartening, especially when I’ve had to LiveUpgrade to a recent build first. Needless to say, the lack of good hardware with up-to-date tools has been somewhat of a barrier to my involvement. However, as of earlier this month, Sun has been providing the OpenSolaris project with a hosted farm of test machines that contributors can use to build and test software. There are two kinds of accounts, one for building software and the other for testing it.
The first, a Build Server account, gives you 15GB of disk space on a variety of machines of different processor types, which you can then use for compilation. These machines run builds of Nevada and the compilers that are recent enough to be able to build ONNV and they do it quickly; building ONNV on a 16-core x4600 in the Test Farm takes under one hour[1], which is massive boon to me if nobody else. Setting up an account is a simple matter of clicking “Add Account” on the test farm interface and waiting a few minutes.
The second kind of account reserves you an entire machine with console and SP access to splat your software over so that you can see if the machine still boots. There’s a bit more of a queue for one of these systems as obviously they can’t be used in parallel by more than one user but, again, all you need to do is click the relevant button in the interface and wait for a mail telling you that your server is ready.
If you have signed the Sun Contributor Agreement for OpenSolaris then you’re already set up on the Test Farm so go and grab an account now!
[1] And would probably take a lot less time if I could set DMAKE_MAX_JOBS higher than 4 :)
Posted in OpenSolaris, Software, Sun | no comments | no trackbacks
Posted by Ceri Davies
Thu, 30 Oct 2008 20:54:00 GMT
OK, so SVR4 packages on Solaris are not the future, but for those of us with existing Solaris installations, we’ll be using pkgadd for some time.
While vendors seem to vary in their preference for datastream format (i.e. a single file) packages and what the Solaris documentation calls “file system format” packages (a directory), it’s pretty certain that a download of a package from the Internet will come in datastream format.
The datastream format is useful for distribution, but it’s slightly annoying for scripting and I personally hate not knowing what’s in a datastream package before I “run” it (preinstall scripts could do anything to your system, while the package itself could overwrite anything too). Therefore, I like to explode them to file system format and check them out first. This can be done easily with the pkgtrans tool.
To convert a datastream package to a file system package:
pkgtrans package-0.4.2.pkg /tmp all
The command above will explode all packages contained in the package-0.4.2.pkg file to directories in /tmp.
To convert a directory of file system format packages to datastream format, assemble them in the same directory, say /tmp, and run:
pkgtrans -s /tmp /tmp/foo.pkg all
This will create a single file, foo.pkg, containing all of the packages available in /tmp.
Invocations of the form pkgtrans -s Solaris_11/Product /tmp/nv97_sparc.pkg all are a slightly tidier way of keeping installation archives around.
Posted in Solaris, Aide memoire | no comments | no trackbacks
Posted by Ceri Davies
Mon, 22 Sep 2008 20:31:00 GMT
I was approached some time ago by Packt Publishing with a request to review Network Administration with FreeBSD 7, a new book by Babak Farrokhi who is a FreeBSD committer among other things.
Right from the Preface it seemed clear that neither the author nor any of the editorial staff were native English speakers which, being the way that I am, made it very difficult to get into. Somewhere around chapter three, this improved drastically however, and I could finally manage to concentrate on the content. Which is good news, because the content is excellent.
Network administrators at any scale, from LAN to WAN, will find something useful. Routing protocols such as OSPF and BGP are covered, and there’s a good chapter on IPsec (and non-IPsec) tunnels which was directly useful to me personally while I was reading. Also welcome was information on IPv6 and a chapter on kernel tuning.
With multi-core systems such as Sun’s X4450 behemoth and the fine-grained locking in the network stack that FreeBSD enjoys now that GIANT is pretty much a distant memory, using FreeBSD on an off-the-shelf system to run hardcore bits of the network is practical, and this book works really well as a way to find out what the system is capable of and how to get started. Short chapters on getting familiar with FreeBSD and basic administration are also included, so there’s really no excuse not to take a look!
Posted in FreeBSD | no comments
Posted by Ceri Davies
Thu, 26 Jun 2008 18:43:00 GMT
Since it was set up, this machine has lived on my gateway FreeBSD machine in what has since become the boy’s bedroom. It moved from there to my own bedroom, and I recently started turning it off at night to try to save power (and sleep better!).
This led to a number of people emailing me about outages - which was quite flattering; didn’t realise anyone would care about my crappy blog that much - because they were in a timezone disparate to be trying to get to it during the hours I had it turned off.
Therefore, and with the realisation that I could get a hosted solution for $12/month (US), I have moved this blog to a Solaris[1] zone hosted at Gangus Internet. Service (technically and customer-servicely) so far has been absolutely impeccable.
If you are reading this, I guess the move went OK.
[1] Which is not a reflection on FreeBSD in any way. Questions such as were asked after this commit are unnecessary :)
Posted in Solaris, General | no comments