Shell scripting for DBMS vendors

Posted by Ceri Davies Wed, 14 Jun 2006 19:52:00 GMT

I recently had cause to arrange for Oracle’s RDBMS 10g to be started at boot time. This led me to despair somewhat at the state of shell scripting in general, and I will rant a little on that subject. This isn’t really intended as an attack on this particular script, but these are the issues that arose from it.

The Oracle installation provides a couple of scripts named dbshut and dbstart that look like they’ll do that job. Indeed, the top of dbstart states:

# This script is used to start ORACLE from /etc/rc(.local).

The corresponding RCS log from the import tells a different tale:

$ svn log -r99 dbstart
Password for 'ceri': 
-------------------------------------------------------------------
r99 | ceri | 2006-06-09 15:27:33 +0100 (Fri, 09 Jun 2006) | 5 lines

Add a bunch of scripts used for looking after our databases.
Mainly culled from our live systems, with the notable exception
of the Oracle provided utilities dbshut and dbstart, which are
as out of the box here (and therefore do not work).

-------------------------------------------------------------------

Complaint #1 — No shebang

That's nothing to do with Ricky Martin, obviously.

For some reason, the first line of dbstart (and dbshut, but I'll be talking about dbstart in the main) reads:

:

rather than the traditional

#!/bin/sh

I've seen this practice a lot over the years, from numerous different people, and I have no idea where it came from.

While I can understand that a naive user might elide the magic number from ignorance — and even get away with it in a very simple script — for some reason the author of this script has gone out of their way to put something in that place, so why a colon? It means nothing, it does nothing. If there is a reason for this, then I'd be very glad to hear it since I can't fathom why such a diverse body of people do it.

Subcomplaint #1.a — What shell is this for?

The lack of the shebang is annoying to me for another, more important, reason too. While I can easily tell the difference between Bourne shell and C shell style script, if the script is over a certain size then it can be a little difficult to spot exactly which flavour of that family you are dealing with. I used to assume plain old POSIX /bin/sh and, given the hint about rc.local above, that might have been a reasonable thing to do. Unfortunately, this assumption has become dangerous for a number of reasons:

  • A large number of users do not know that sh, ksh, bash, zsh et al. are all different;
  • if a script is missing a shebang is, the author can probably be assumed to:
    • be one of them;
    • have done little or no error checking;
    • have unwittingly used a shell specific construct where at all possible.
  • since shell scripts are evaluated lazily, I can't know that I have guessed the right shell, even if the script seems to run OK.

In these days where everyone is a Linux administrator, it's now safest to assume that the script is written for bash. That's particularly annoying because, application requirements notwithstanding, I usually don't have bash on my systems.

Subcomplaint 1.a.i — Learn how export works

If you read this far, you probably already know that this doesn't work in /bin/sh:

export FOO=`wibble | blah -foo`

I also want (need, demand, etc. ...) you to know that:

You don't have to export a variable

Seriously. If it's of no use outside your script, don't export it.

You only have to export a variable once

Not every time you change its value. Christ.

Complaint #2 — Untested, non-standard options

There is a rant in which I rail against using any feature of any command that isn't documented in a certain document in shell scripts, but this isn't it. This one is far worse.

The original dbstart contains the following:

LOGMSG="logger -puser.alert -s "

No version of Solaris supports a -s flag to logger(1), so what is this doing in the Solaris distribution?

Complaint #3 — I am a dumbass; make provision

dbstart says that it can be run from rc.local. First thing I did was throw it in /etc/init.d and run /etc/init.d/dbstart start. It kind of worked — /etc/init.d/dbstart stop didn't. It isn't supposed to be run like that.

First of all, I had to create a wrapper script for /etc/init.d:

#!/bin/sh --
# $Id: oracle 101 2006-06-09 15:00:16Z ceri $
# ----------------------------------------------------------------------------
# "THE BEER-WARE LICENSE" (Revision 42):
# <ceri@FreeBSD.org> wrote this file.  As long as you retain this notice you
# can do whatever you want with this stuff. If we meet some day, and you think
# this stuff is worth it, you can buy me a beer in return.   Ceri Davies
# ----------------------------------------------------------------------------

# Start/stop oracle.
# If you haven't customized the dbstart and dbshut scripts, then don't use
# this, because they don't work otherwise.  To force you into doing this,
# this script will fail unless you unset CHKPT below.

# Set BASEPATH to the directory where dbshut and dbstart are.
BASEPATH=/u01/app/oracle/product/10.2.0/db_1/bin
CHKPT="I am stupid lazy"

#
# No more user servicable parts inside...
#
DBSHUT=${BASEPATH}/dbshut
DBSTART=${BASEPATH}/dbstart

ORA_UID=oracle

case "$1" in
        start)
                echo "Starting Oracle"
                su ${ORA_UID} -c "${DBSTART} ${CHKPT}"
                ;;
        stop)
                echo "Stopping Oracle"
                su ${ORA_UID} -c "${DBSHUT} ${CHKPT}"
                # sleep a bit to let it shut down
                sleep 15
                ;;
        *)
                echo "Usage: $0 { start | stop }"
                exit 1
                ;;
esac

Then I added the following near the top of both dbshut and dbstart:

# If we got passed a parameter, someone doesn't know what they are
# doing.  Bail.
if [ ! -z "$1" ]; then
       echo "$0 doesn't take parameters, bailing." >> /dev/stderr
       exit 1
fi

I am now protected from myself.

Complaint #4 — Having taken the trouble to get this far, the basic functionality of the script should at least fucking well work

At this point I had the script working from the command line, so I rebooted and checked on the Oracle processes. The instance had started, but there was no listener.

Cutting a long story short, it seems that lsnrctl, which starts the listener, requires the ORACLE_HOME environment variable to be set and dbstart goes out of its way to ensure that this is not the case.

I'm done for now.

Posted in , ,  | 4 comments

Comments

  1. Mike Jones said about 23 hours later:
    Instructions? Not sure what that they are ;-)
  2. ceri said about 23 hours later:

    That seems to be a common complaint, in that he has set the ORACLE_HOME_LISTNER variable to somewhere in his own home directory.

    However, step 3 of the configuration instructions in the script reads:

    3) Set ORACLE_HOME_LISTNER

    so I don't see it as too much of a problem ;)

  3. Mike Jones said about 23 hours later:
    I've not installed Oracle since I left Cardiff so my memory might be hazy - but on 10g isn't the dbstart script broken because the developer has hard coded some directory locations on his own system. I'm sure I had to go through making changes before it would work.
  4. derek said 10 days later:
    dude, looks like plenty of issues with dbstart and dbshut in recent times. you're right, the scripts won't accept parameters in solaris. This is explained in metalink Note: 1019793.6 this also explains that the init.d must su - oracle -c before running the script, this assumes that shell will be defined by "oracle" account .profile methinks. theoretically, the script goes on to process the oratab but latest verions of dbstart/shut look in wrong place in solaris. it should be looking in /var/opt/oracle but instead looks in /etc/oracle so this has to be edited as it is only correct for almost every other flavour of unix. also check that the scripts are executable (natch) and your oratab file is correctly set up with a "Y" flag for the SIDs you want restarted. if no Y flag then dbstart/shut will ignore these lines also...in oracle9i version of these scripts they wouldn't work properly if the database only used an spfile.ora rather than an init.ora, again more editing was required - add the following line: SPFILE=${ORACLE_HOME}/dbs/spfile${ORACLE_SID}.ora after this line: PFILE=${ORACLE_HOME}/dbs/init${ORACLE_SID}.ora - change: if [ -f $PFILE ] ; then to: if [ -f $PFILE -o -f $SPFILE ] ; then regards listener startup, this was never automatiotically a part of the older dbstart/shut scripts and always had to be added by the dba. if your system is simple and one listener per database, per ORACLE_HOME then it's easily done. if your system is complex with multiple listeners with individual listener names with passwords per database, per ORACLE_HOME, then you're in for some fun scripting.... cheers, -derek-

Comments are disabled