Doubts Even Here: the point of disbelief

Posted by Ceri Davies Sat, 30 May 2009 21:29:00 GMT

It’s slightly late in the day, but as I’m currently picking apart what parts of VMware’s VI/vSphere stack are actually useful in *my* Real World, I’m going to respond to Chuck Hollis’ blog post Why Oracle Doesn’t Like VMware, even though it was posted nearly a month ago.

To state my position clearly, I agree with Chuck’s on 80% of his points, particularly with the general sentiment that it would be trivial for them to support VMware as a platform but for the fact that they may lose business if they do. However, his point #4, “VMware Functionality Competes With Oracle DBMS Features” is completely disingenuous.

There is *no* VMware functionality that competes with Oracle DBMS features, although VMware marketing would like you to believe that there is. Let’s break Chuck’s examples down.

Chuck says about RAC:

On one hand, we've got a multi-server configuration running
 Oracle's latest (and most expensive) RAC product.  It's doing load
 balancing, high availability, and making the hardware function as
 a giant pool.

Let’s think about this RAC setup. It will be serving the same database via multiple instances. The clients know about each instance and will choose another when the first fails

and about VMware:

On the other hand, we've got the same multi-server configuration
 running the much cheaper Oracle SE on VMware. 

It too is load balancing, offers high availability, and makes the
 hardware function as a single giant pool.  Many of the management
 tasks are handled quite well outside of Oracle's domain.

Now I’m thinking right now that Chuck doesn’t know, or more likely has conveniently forgotten, how RAC works and also seems to have made similar mistakes regarding VMware’s features. Either that, or he’s completely believing VMware’s hype, much as they’d both like us all to do.

VMware is not like RAC

A RAC configuration consists of multiple instances serving the same database. This isn’t even conceptually similar to multiple VMs running multiple Oracle SE instances with, necessarily, multiple databases.

“load-balancing”

I’ve only been administering tens of Oracle DBMS databases for 4 years, but I have no idea how one would load balance read/write clients across multiple database+instances in anything approaching a productive way. I’m going to go as far as to say that, at least generally, you can not.

“high availability”

I’ve already mentioned what I think about VMware HA. It doesn’t offer good protection against network failures and it doesn’t offer any protection against FC storage failures. In fact, you can’t even mirror FC storage with VMware unless you get your SAN to do it[1].

Additionally, even when a failure is detected, the only fix is to restart the VM and thereby the Oracle instance, implying the loss of in-flight transactions.

“hardware functioning as a single giant pool”

While both RAC and VMware can be argued as making the hardware they run on function as a single pool, these two pools have an entirely different purpose. The “RAC pool” will take a database query and do the same thing regardless of which node handles it, while the “VMware pool” will not (unless, of course, you happen to be running RAC in it).

There’s more: Fault Tolerance.

Chuck then goes on to say, regarding the VMware setup:

And VMware brings a few very cool features to the table
 that Oracle doesn't, like real fault tolerance[...]

That’s a little shocking.

While RAC can be used to provide high-availability, there are (probably a large proportion of) RAC customers who would be using RAC in order to scale past a single server. VMware Fault Tolerance doesn’t even allow you to scale past a single virtual CPU.

VMware Fault Tolerance has other issues, such as a limited list of supported CPUs, the requirement to reboot a VM on most of those CPUs in order to enable it (and since you have to turn FT off in order to patch the ESX cluster it’s running on, that’s a big deal), lack of support for thin-provisioned VMs, an inability to support physical RDM, the requirement for a dedicated gigabit NIC and some other more minor ones. However, I’m also worried that it might use the same algorithm to determine failure of the Primary VM as VMware HA does - the documentation certainly mentions heartbeats, file locking on shared storage to prevent mistaken failover and that “failover occurs if the host running the Primary VM fails” which is the same terminology as the VMware HA documentation uses. I wonder if the loss of FC connectivity or the VM network will cause a failover here?

Don’t believe the hype

So much as I agree with Chuck’s main point, I’m annoyed at the over-hyping of VMware’s availability features because I don’t believe that they’re as good as Chuck would like me to. At the end of the day, as an Oracle and VMware customer I’d love to see Oracle’s database supported in VMware, but I’m well aware of the limitations of both and need this kind of misleading information being disseminated like a hole in the head.


[1] Note that this means that if you use Raw Device Mapping (RDM) to try to mirror in your OS instead, you’re just as at risk as if you hadn’t bothered because the mapping is stored in the not-fault tolerant .vmx file).

Posted in , ,  | 1 comment | no trackbacks

Shell scripting for DBMS vendors

Posted by Ceri Davies Wed, 14 Jun 2006 19:52:00 GMT

I recently had cause to arrange for Oracle’s RDBMS 10g to be started at boot time. This led me to despair somewhat at the state of shell scripting in general, and I will rant a little on that subject. This isn’t really intended as an attack on this particular script, but these are the issues that arose from it.

The Oracle installation provides a couple of scripts named dbshut and dbstart that look like they’ll do that job. Indeed, the top of dbstart states:

# This script is used to start ORACLE from /etc/rc(.local).

The corresponding RCS log from the import tells a different tale:

$ svn log -r99 dbstart
Password for 'ceri': 
-------------------------------------------------------------------
r99 | ceri | 2006-06-09 15:27:33 +0100 (Fri, 09 Jun 2006) | 5 lines

Add a bunch of scripts used for looking after our databases.
Mainly culled from our live systems, with the notable exception
of the Oracle provided utilities dbshut and dbstart, which are
as out of the box here (and therefore do not work).

-------------------------------------------------------------------
Read more...

Posted in , ,  | 4 comments

Brief news

Posted by Ceri Davies Fri, 28 Apr 2006 19:57:00 GMT

Our T2000 arrived. We went for the 8-core 1.0GHz, 16GB version to allow us to use the Solaris resource management tools to carve it up in various configurations.

So far I’ve used it to great effect to test out an Oracle Data Guard configuration that I’m building, and particularly to understand how you are supposed to set up your clients in order to benefit from the redundancy goodness — some notes on that will be forthcoming real soon as I couldn’t find a single useful document on that.

Michael Bushkov’s Summer of Code project, cached(8), which adds caching for nsswitch along with enabling nsswitch for the services, protocols and rpc databases finally got committed today. This is really interesting work, similar to nscd(1M) on Solaris1, but with each user having their own cache.

Also, I discovered that a post on BSDNews is a good way to saturate a crappy cable modem link. :-)

Oh yeah, and John Birrell is making superb progress on a DTrace port to FreeBSD.

1 Yes, nscd(1M) has a bad reputation, but I strongly believe that’s because people don’t understand how to work it.

Posted in , , ,  | no comments | no trackbacks

SQL*Plus and parameter handling: the "Right Way"

Posted by Ceri Davies Sun, 05 Feb 2006 22:15:00 GMT

A few months ago I complained about ropey parameter handling in SQL*Plus. I found a solution.

Read more...

Posted in  | no comments | no trackbacks

Is the disk half full or half empty?

Posted by Ceri Davies Fri, 13 Jan 2006 11:54:00 GMT

I continue to learn about Oracle, and that includes finding tasks that should be simple but that seem intolerably difficult. I’m sure that this is a result of my ignorance, so please do correct me if you know better.

So long as you have a basic understanding of Oracle, you’ll know that filling up a tablespace is bad. Here’s a bit of SQL that tells you which tablespaces are getting full:

SELECT t.tablespace_name,
     decode(t.status, 'ONLINE', t.status, nls_initcap(t.status)) AS status,
     (tsa.bytes / 1024) AS ts_size,
     ((tsa.bytes - decode(tsf.bytes, null, 0, tsf.bytes)) / 1024) AS used,
     (decode(tsf.bytes, null, 0, tsf.bytes) / 1024) AS free,
     (round((1 - decode(tsf.bytes, null, 0,tsf.bytes) / tsa.bytes) * 100)) AS percent
     FROM     sys.dba_tablespaces t, sys.sm$ts_avail tsa, sys.sm$ts_free tsf
     WHERE    t.tablespace_name = tsa.tablespace_name
     AND      t.tablespace_name = tsf.tablespace_name (+)
     ORDER BY percent DESC, t.status, t.tablespace_name
/

which gives output somewhat like:

TABLESPACE_NAME                STATUS TS_SIZE       USED       FREE    PERCENT
------------------------------ ------ ---------- ---------- ---------- ----------
XDB                            ONLINE 46080      45824      256        99
CWMLITE                        ONLINE 20480      15168      5312       74
SYSTEM                         ONLINE 512000     358656     153344     70
DRSYS                          ONLINE 20480      4928       15552      24
WHATEVER                       ONLINE 2097152    120768     1976384    6
TOOLS                          ONLINE 10240      64         10176      1
INDX                           ONLINE 25600      64         25536      0
UNDOTBS1                       ONLINE 1048576    1856       1046720    0
USERS                          ONLINE 25600      64         25536      0

9 rows selected.

This gives some pretty good stats, but there is a problem. These statistics only count the blocks that are currently allocated, and tablespaces can be set to autoextend.

This means that the statistics above don’t actually indicate whether you’re about to run out of space or not, since a tablespace that looks %99 full above might be allowed to autoextend another 5GB[1]. Basically, you don’t want to get out of bed on the basis of this script, because by the time you do, the tablespace might have autoextended and fixed the problem all by itself. So let’s try to work around that.

Strictly speaking, tablespaces do not autoextend; the data files that act as a tablespace’s backing store do. Oracle maintains a table called DBA_DATA_FILES which contains file names, internal file ids, and the AUTOEXTENSIBLE flag indicating whether they may autoextend or not. So we can qualify the output of the previous script a little with an AUTOEXTEND column:

SELECT t.tablespace_name,
     decode(t.status, 'ONLINE', t.status, nls_initcap(t.status)) AS status,
     (tsa.bytes / 1024) AS ts_size,
     ((tsa.bytes - decode(tsf.bytes, null, 0, tsf.bytes)) / 1024) AS used,
     (decode(tsf.bytes, null, 0, tsf.bytes) / 1024) AS free,
     (round((1 - decode(tsf.bytes, null, 0,tsf.bytes) / tsa.bytes) * 100)) AS percent,
     dbf.autoextensible AS autoextend
     FROM     sys.dba_tablespaces t, sys.sm$ts_avail tsa,
               sys.sm$ts_free tsf, dba_data_files dbf
     WHERE    t.tablespace_name = tsa.tablespace_name
     AND      t.tablespace_name = dbf.tablespace_name (+)
     AND      t.tablespace_name = tsf.tablespace_name (+)
     ORDER BY percent DESC, t.status, t.tablespace_name

/

which looks like:

TABLESPACE_NAME                STATUS TS_SIZE    USED       FREE       PERCENT    AUT
------------------------------ ------ ---------- ---------- ---------- ---------- ---
XDB                            ONLINE 46080      45824      256         99        YES
CWMLITE                        ONLINE 20480      15168      5312        74        YES
SYSTEM                         ONLINE 512000     358656     153344      70        YES
DRSYS                          ONLINE 20480       4928      15552       24        YES
WHATEVER                       ONLINE 2097152     120768    1976384      6        NO
TOOLS                          ONLINE 10240         64      10176        1        YES
INDX                           ONLINE 25600         64      25536        0        YES
UNDOTBS1                       ONLINE 1048576       1856    1046720      0        YES
USERS                          ONLINE 25600         64      25536        0        YES

9 rows selected.

As you can see, where the XDB tablespace looked in trouble before, we can now safely ignore (assuming that the disk that the data file resides on has space, but that’s another article - see footnote 1).

Unfortunately, this is broken too. A tablespace may be made up of multiple data files, and we haven’t made any consideration for that at all here; in fact the above extract will just give multiple rows if a tablespace has more than one data file. What’s worse, those rows may contain different values in the AUTOEXTEND column, since one data file may be set to autoextend while another is not.

I don’t actually have any production databases that contain tablespaces comprising of multiple data files, so for now, I’ve given up at this point. Someone must be monitoring this properly though, so please let me know how you work out whether to get out of bed or not in this situation.

[1] The effect that having 15 tablespaces all set to autoextend has on the interpretation of the amount of free disk reported by df(1) is an exercise for an entirely different day…


Note: Apologies for the whitespace here. I can’t get markdown to stop fiddling with it for some reason.

Posted in  | 1 comment | no trackbacks

Periodic jobs in the Oracle RDBMS

Posted by Ceri Davies Mon, 03 Oct 2005 19:47:00 GMT

My foray into Oracle continues; today I got to see the horrors of the DBMS_JOB package, which is used for scheduling periodic tasks (think cron(8)).

This particular database had a couple of jobs scheduled, but they were not being run. I knew how to sort that out; in 9i there is a database parameter called job_queue_processes which governs how many job runners are allowed concurrently, and this was set to zero:

SQL> show parameters job

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes                  integer     0

There were, therefore, no job runners:

SQL> !ps -ef|grep j
  oracle  6183  6182  0 14:13:15 pts/1    0:00 grep j
  oracle  6182  6166  0 14:13:15 pts/1    0:00 /bin/sh -c ps -ef|grep j

This is a dynamic parameter, so I changed it:

SQL> alter system set job_queue_processes = 2;

System altered.

SQL> !ps -ef|grep j
  oracle  6226  6166  0 14:17:35 pts/1    0:00 /bin/sh -c ps -ef|grep j
  oracle  6219     1  0 14:17:30 ?        0:00 ora_cjq0_WXYZ
  oracle  6220     1  0 14:17:30 ?        0:00 ora_j000_WXYZ
  oracle  6222     1  0 14:17:30 ?        0:00 ora_j001_WXYZ

Since there were jobs that needed running, the ora_cjq controller instantly fired off a pair of ora_jnnn processes to run them. Unfortunately, one of them was an analyze of the SYS schema, which immediately caused deadlocks since this was the middle of the day. And this despite the job not being scheduled until what I thought was 3am:

SQL> select job,last_date,next_date,
 substr(interval,1,24) interval,
 substr(what,1,48) what
 from dba_jobs
 where job = '2';

JOB LAST_DATE NEXT_DATE INTERVAL                         WHAT
--- --------- --------- ------------------------ ------------------------------------------------
  2 03-OCT-05 17-OCT-05 trunc(sysdate+14)+27/24  dbms_utility.analyze_schema('SYS','COMPUTE');

However, Oracle insisted on playing catchup there and then, which made me doubt whether trunc(sysdate+14)+27/24 really did mean “14 days from now at 3am”.

Unfortunately the docs suck big time; the definition of INTERVAL doesn’t exist in the Oracle9i Database Administrator’s Guide, and the excerpt below tells you everything you need to know in order to understand my pain (my emphasis):

If you always want to automatically execute a job at a specific time, regardless of the last execution (for example, every Monday), […]

It just strikes me that “Monday” isn’t all that specific

Posted in  | 1 comment | no trackbacks

SQL*Plus and parameter handling

Posted by Ceri Davies Fri, 23 Sep 2005 12:01:00 GMT

In my new job, I’ve inherited the dubious pleasure of babysitting some Oracle 9i databases. Though I am somewhat familiar with Oracle, I really only know enough to know that I don’t know enough, so when I wanted to see where all the space was going, having already identified the tablespace that I was interested in, I issued the following, perhaps naive, statement:

SELECT segment_name, segment_type, bytes/1024/1024 MB
     FROM dba_segments
     WHERE tablespace_name = 'WIBBLE_DATA'
     ORDER BY bytes;

That turned out to be a lot of typing for something that I wanted to do a lot, so I decided to throw it in a script, named segmentsbytablespace.sql (yes, that *is* quite a long file name – your point?). The next logical step was obviously to parameterise the tablespace name. I simply cannot believe the can of worms that this opened.

If you’ve run a script in SQL*Plus before, you’ll probably know that arguments to scripts can be accessed by the positional parameters &1, &2, …, &n. So the script just becomes:

SELECT segment_name, segment_type, bytes/1024/1024 MB
    FROM dba_segments
    WHERE tablespace_name = '&1'
    ORDER BY bytes
    /

and is run like this:

SQL> @segmentsbytablespace WIBBLE_DATA

That would be too easy; there are a couple of problems with this.

The first is simply that if you leave out the parameter, SQL*Plus prompts you for its value, like this:

SQL> @segmentsbytablespace
Enter value for 1:

Enter value for what now? I’d like it to say Enter value for tablespace. It looks like it is impossible for me to do that (unless I throw that variable at the database server and ask if it is NULL or not; I’m not doing that). This issue is, at least, cosmetic.

The second problem is that running the script with an argument defines the same argument for the next invocation, i.e. in this example:

SQL> @segmentsbytablespace WIBBLE_DATA
SQL> @segmentsbytablespace

both scripts believe that they have been passed WIBBLE_DATA as an argument. I have to explicitly undefine the positional parameter 1 at the end of my script to avoid inadvertently passing it to another script executed in the same session.

For those who know SQL*Plus, you also benefit from this “feature” if you use the &&1 form of variable dereferencing in order to avoid prompting your users for the same value more than once.

Perhaps I missed something, but this is documented behaviour and looks like an accident waiting to happen to me.

Posted in  | no comments | no trackbacks