AIX - Multibos


multibos magics:

Multibos
1. What is multibos?
2. Requirements for multibos
3. How to create a multibos instance
4. How to update a standby BOS
5. How to mount and unmount a standby BOS
6. How to start an interactive shell
7. How to install a fileset to a standby BOS
8. How to remove a fileset in a standby BOS
9. How to install an ifix to a standby BOS
10. How to remove an efix from a standby BOS
11. How to tell what instance you are booted from
12. How to change the bootlist for multibos
13. How to rebuild the standby BOS boot image
14. How to migrate with multibos
15. Where are the logs for multibos?
16. How to remove an instance of multibos
17. Debugging Failed multibos
18. Things to be aware of


1. What is multibos?

multibos is multiple copies of the base operating system (BOS) on the same root volume group (rootvg). You can simultaneously maintain two bootable instances of a BOS. The instance of a BOS associated with the booted BLV is the active BOS. The instance of a BOS associated with the BLV that has not been booted is the standby BOS. Only two instances of BOS are supported per rootvg.
The multibos setup operation creates a standby Base Operating System (BOS) that boots from a distinct Boot Logical Volume (BLV). This creates two bootable instances of BOSes on a given rootvg. You can boot from either instance of a BOS by specifying the respective BLV as an argument to the bootlist command, or using system firmware boot operations.
The multibos utility allows you to access, install, maintain, update, and customize the standby BOS either during setup or during any subsequent customization operations. Installing maintenance or technology level updates to the standby BOS does not change system files on the active BOS. This allows for concurrent update of the standby BOS, while the active BOS remains in production.
The multibos utility has the ability to copy or share logical volumes and file systems. By default, the multibos utility copies the BOS file systems (currently the /, /usr, /var, /opt, and /home directories), associated log devices, and the boot logical volume. You can make copies of additional BOS objects (see the –L flag). All other file systems and logical volumes are shared between instances of the BOS. Separate log device logical volumes (those not contained within the file system) are not supported for copy and will be shared.

Flags:

-a = Update_all -N = Block bosboot
-B = Bosboot operation -n = Block cleanup
-b = Install bundle file -p = Preview only
-c = Customization operation -R = Remove operation
-e = Exclude file -S = Shell operation
-f = Fix List file -s = Setup operation
-i = image.data file -t = Block bootlist change
-L = Additional LVs file -u = Unmount operation
-l = Install device or directory -X = Expand file systems as needed
-m = Mount operation -M = Mksysb setup operation

The preview option, using the -p flag, applies to the setup, remove, mount, unmount, and customization operations. If you specify the preview option, then the operation provides information about the action that will be taken, but does not perform actual changes.

The multibos -X flag auto-expansion feature allows for automatic file system expansion, if space is necessary to perform multibos-related tasks. Run all multibos operations with this flag.


2. Requirements for multibos

Following are the general requirements and limitations:

* The multibos utility is supported on AIX 5L Version 5.3 with the 5300-03
Recommended Maintenance package and higher versions.

* The current rootvg must have enough space for each BOS object copy.
BOS object copies are placed on the same disk or disks as the original.

* The total number of copied logical volumes cannot exceed 128. The total
number of copied logical volumes and shared logical volumes are subject to
volume group limits.


3. How to create a multibos instance

To preview the creation of a multibos instance
multibos –Xsp

If preview is successful, create a multibos instance
multibos –Xs


4. How to update a standby BOS

To do an update_all on an existing multibos instance
# multibos –Xac -l

To create a multibos instance and update it at the same time
# multibos –Xsa –l


5. How to mount and unmount a standby BOS

To mount a standby BOS
# multibos –Xm

To unmount a standby BOS
# multibos –Xu


6. How to start an interactive shell

# multibos –S
That will give you the following prompt
MULTIBOS>
To exit out of the interactive shell just type exit
Note: When you exit back to the active instance, the standby instance is still mounted.
To unmount the standby instance :
multibos -Xu



7. How to install a fileset to a standby BOS

To install a fileset(s) to an existing multibos instance
Create a bundle file with the name of the file(s) to be installed
vi /tmp/list
I: invscout.websm
I:
etc.
Then to install it
multibos -Xc -b /tmp/list -l


8. How to remove a fileset in a standby BOS

First initiate an interactive shell
# multibos -S
MULTIBOS> smitty remove
Then exit the interactive shell
MULTIBOS> exit


9. How to install an efix to a standby BOS

Create a bundle file with the name of the efix(s) to be installed
vi /tmp/efixes
E:IY#####.#####.epkg.Z
E:
etc.
Then to install it
multibos -Xc -b /tmp/efixes -l

10. How to remove an efix from a standby BOS

First initiate an interactive shell
# multibos –S
Verify there is an efix
MULTIBOS> /usr/sbin/emgr –P
Then to remove it
MULTIBOS> /usr/sbin/emgr -r -L
Then exit the interactive shell
MULTIBOS> exit

11. How to tell what instance you are booted from

There may be times you need to determine which instance of multibos you are booted from, you can determine that with
# bootinfo –v

That will either return hd5 or bos_hd5

12. How to change the bootlist for multibos

The bootlist command supports multiple BLVs.
To boot from disk hdisk0 and BLV bos_hd5, and have it display the bootlist you would enter the following:
# bootlist –m normal –o hdisk0 blv=bos_hd5
hdisk0 blv=bos_hd5

You can also specify to boot from the current active BOS in case the standby BOS doesn’t boot.
# bootlist –m normal –o hdisk0 blv=bos_hd5 hdisk0 blv=hd5
hdisk0 blv=bos_hd5
hdisk0 blv=hd5
After the system is rebooted from the standby BOS, the standby BOS logical volumes are mounted over the usual BOS mount points, such as /, /usr, /var, and so on.


13. How to rebuild the standby BOS boot image

# multibos –XB


14. How to migrate with multibos

Starting with 5.3 TL9 you can add a 6.1 TL2 (or above) instance. This is done with the new –M flag. You must be running with the 64bit kernel.
This isn’t really a migration because it populates the second instance using a mksysb based on the new release.

In 6.1 TL2 a new flag (-M) was added to the mksysb command which allows you to create a mksysb for use with multibos. It creates a backup of BOS (/, /usr, /var, /opt).
bos.alt_disk_install.boot_images must be installed.

To preview the multibos migration
multibos –M -spX

To perform the multibos migration
multibos –M -sX

It is not advised to run in this environment for an extended period of time. There could be problems if tfactor or maps are used. Be aware that 6.1 specific attributes may not be reflected in the standby instance.


15. Where are the logs for multibos?

You can view the multibos log via the following:
alog –of /etc/multibos/logs/op.alog | pg

There are also other logs in the /etc/multibos/logs directory
- scriptlog..txt : A log of commands being run during the current
shell operation.
- scriptlog..txt.Z : A compressed log of commands run during a
previous shell operation.

If the multibos instance is updated there is a log (on the standby side) located at
/var/adm/ras/install_all_updates.log
************************************************************************


16. How to remove an instance of multibos

# multibos –RX

The multibos remove operation performs the following steps:
1. All boot references to the standby BLV are removed.
2. The bootlist is set to the active BLV. You can skip this step using the -t flag.
3. Any mounted standby BLVs are unmounted.
4. Standby file systems are removed.
5. Remaining standby logical volumes are removed.


17. Debugging Failed multibos

NOTE: THESE FLAGS ARE UNDOCUMENTED AS THEY ARE FOR IBM SUPPORT USE ONLY
[-D]
Specifies that complete debug output be generated.

[-d ]
Specifies that debug output be generated for function .

[-T ]
Specifies that a breakpoint be set in function . Upon hitting the function, a multibos shell will be opened.
NOTE: This option requires user interaction once the function is entered.

Gathering Testcase Data
The following are useful as testcase data:
A general snap (snap -g).
The contents of /etc/multibos.
The contents of /bos_inst/etc/multibos.
NOTE: This requires mounting the standby instance

If you ever encounter a situation where you have the bos_ lv’s but you are missing the corresponding normal lv’s and your lsvg –l rootvg output looks similar to the following:

# lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd6 paging 2 2 1 open/syncd N/A
hd8 jfs2log 1 1 1 open/syncd N/A
hd3 jfs2 1 1 1 open/syncd /tmp
hd1 jfs2 1 1 1 open/syncd /home
hd11admin jfs2 1 1 1 open/syncd /admin
lg_dumplv sysdump 4 4 1 open/syncd N/A
livedump jfs2 1 1 1 open/syncd /var/adm/ras/livedump
bos_hd5 boot 1 1 1 closed/syncd N/A
bos_hd4 jfs2 2 2 1 closed/syncd /bos_inst
bos_hd2 jfs2 10 10 1 closed/syncd /bos_inst/usr
bos_hd9var jfs2 2 2 1 closed/syncd /bos_inst/var
bos_hd10opt jfs2 2 2 1 closed/syncd /bos_inst/opt

The way to recover from the above scenario would be to run multibos –sX
That would create another instance and add all of the normal lv’s back.


There may be times when you have done a multibos –RX to remove the bos_ instance but you still have the normal hd5 and you are missing the bos_hd5 and your lsvg –l rootvg output looks similar to the following:

# lsvg -l rootvg

rootvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT
POINT
hd6 paging 50 50 1 open/syncd N/A
hd5 boot 1 1 1 closed/syncd N/A
hd8 jfs2log 1 1 1 open/syncd N/A
hd3 jfs2 16 16 1 open/syncd /tmp
hd1 jfs2 5 5 1 open/syncd /home
lg_dumplv sysdump 8 8 1 open/syncd N/A
bos_hd4 jfs2 1 1 1 open/syncd /
bos_hd2 jfs2 28 28 1 open/syncd /usr
bos_hd9var jfs2 2 2 1 open/syncd /var
bos_hd10opt jfs2 3 3 1 open/syncd /opt
lg_dumplv1 sysdump 4 4 1 closed/syncd N/A

In order to correct the above scenario do the following steps (these steps assume rootvg is on hdisk0, you may need to change the hdisk for your system):
# rmlv hd5
# rmlv bos_hd5
# chpv -c hdisk0
# mklv -t boot -y bos_hd5 -ae rootvg 1 hdisk0
# cd /dev
# ln rbos_hd5 ipl_blv
# rm ipldevice
# ln rhdisk0 ipldevice
# bosboot -ad /dev/ipldevice
# bosboot -ad /dev/hdisk0
# bootlist -om normal hdisk0 blv=bos_hd5
# multibos -Xs
.
That created the "standard named" multibos instance.
-- You can then remove it or reboot to what you need


18. Things to be aware of

Migration is not currently supported with multibos. If you have multibos on your system you will need to remove it before doing a migration.

No Alternate Disk Installation operations are currently supported for this environment.
Currently, the only supported method of backup and recovery of a rootvg with
multiple instances is mksysb through CD, tape, or NIM.

To determine if multibos is on a system before performing a migration or alt_disk operation do lsfs If multibos is on the system you will see mount points listed that start with bos_


An undocumented verify operation is run from the inittab during boot. The inittab
entry looks as such:

mbverify:23456789:wait:/usr/sbin/multibos -V 2>&1 | alog -t boot > /dev/console

It is highly recommended that the user not modify this entry. This verify
operation allows the multibos utility to synchronize changes in logical volumes
and filesystems between the active and standby instances. This entry also
synchronizes the ODM and devices on initial boot after a mksysb restore.
Without this operation, both the active and standby instances could become
inconsistent with normal filesystem and logical volume operations.

More info can be found in /usr/lpp/bos/README.multibos

You may be wondering if it’s okay to continue to run in the bos instance. It’s okay to do so but we recommend against staying there indefinitely for a couple of reasons.
1) If programs you are running require the normal lv names, such as hd4 instead of bos_hd4
2) If you need to boot into maintenance mode you won’t be able to with the bos instance.
Once you have determined that the updated bos instance is stable for your environment you should remove the other instance and then do another multibos and boot to it in order to be back to normal lv names.

HACMP - Power HA Commands

Note:
Sometimes HA commands will not work hence please make sure you have the following path configured in your server,

export PATH=$PATH:/usr/es/sbin/cluster:/usr/es/sbin/cluster/utilities:/usr/es/sbin/cluster/sbin:/usr/es/sbin/cluster/cspoc
PowerHA(HACMP) Commands
How to start cluster daemons (options in that order:
 clstrmgr, clsmuxpd, broadcast message, clinfo, cllockd)
clstart -m -s -b -i -l
How to show cluster state and substate (depends on clinfo)
 clstat
SNMP-based tool to show cluster state
 cldump
Similar to cldump, perl script to show cluster state
 cldisp
How to list the local view of the cluster topology
 cltopinfo
How to list the local view of the cluster subsystems
 clshowsrv -a
How to show all necessary info about HACMP
 clshowsrv -v
How to show HACMP version
 lslpp -L | grep cluster.es.server.rte
How to verify the HACMP configuration
 /usr/es/sbin/cluster/diag/clconfig -v -O                                                                                                    
How to list app servers configured including start/stop scripts
 cllsserv
How to locate the resource groups and display their status
 clRGinfo -v
How to rotate some of the log files
 clcycle
A cluster ping program with more arguments
 cl_ping
Cluster rsh program that take cluster node names as argument
 clrsh
How to find out the name of the local node
 get_local_nodename
rHow to check the HACMP ODM
 clconfig
How to put online/offline or move resource groups
 clRGmove
How to list the resource groups
 cllsgrp
How to create a large snapshot of the hacmp configuration
 clsnapshotinfo
How to show short resource group information
 cllsres
How to list the cluster manager state
 lssrc -ls clstrmgrES
Cluster manager states

·         ST_NOT_CONFIGURED Node never started
·         ST_INIT Node configured but down - not  running
·         ST_STABLE Node up and running
·         ST_RP_RUNNING 
·         ST_JOINING 
·         ST_BARRIER 
·         ST_CBARRIER 
·         ST_VOTING 
·         ST_RP_FAILED Node with event error         
How to show heartbeat information
 lssrc -ls topsvcs
How to check logs related to hacmp
 odmget HACMPlogs
How to list all information from topology HACMP
 lssrc -ls topsvcs
How to show all info about group
 lssrc -ls grpsvcs
How to list the logs
 cllistlogs
How to list the resources defined for all resource group
 clshowres
How to show resource information by resource group
 clshowres -g'RG'
How to show resource information by node
 clshowres -n'NODE'
How to locate the resource groups and display status (-s)    
 clfindres
How to list interface name/interface device   name/netmask      associated with a specified ip label / ip address of a specific 
node
 clgetif
Cluster verification utility
 clverify
How to list cluster topology information
 cllscf
X utility for cluster configuration
 xclconfig
X utility for hacmp management 
 xhacmpm
X utility for cluster status
 xclstat
How to force shutdown cluster immediately without releasing resources
 lclstop -f -N
How to do graceful shutdown immediately with no takeover
 clstop -g -N
How to do graceful shutdown immediately with takeover
 clstop -gr -N
How to sync the cluster topology
 cldare -t
How to do the mock sync of topology
 cldare -t -f
How to sync the cluster resources
 cldare -r
How to do the mock sync of resources
 cldare -r -f
How to list the name and security level of the cluster
 cllsclstr
How to list the info about the cluster nodes
 cllsnode
How to list info about node69
 cllsnode -i node69
How to list the PVID of the shared hard disk for resource group dataRG
 cllsdisk -g dataRG
How to list all cluster networks
 cllsnw
How to list the details of network ether1
 cllsnw -n ether1
How to show network ip/nonip interface information
 cllsif
How to list the details of network adapter node1_service
 cllsif -n node1_service
How to list the shared vgs which can be accessed by all nodes
 cllsvg
How to list the shared vgs in resource group dbRG
 cllsvg -g dbRG
How to list the shared lvs
 cllslv
How to list the shared lvs in the resource group dbRG
 cllslv -g dbRG
How to list the PVID of disks in the resource group appRG
 cllsdisk -g appRG
How to list the shared file systems
 cllsfs
How to list the shared file systems in the resource group sapRG
 cllsfs -g sapRG
How to show info about all network modules
 cllsnim
How to show info about ether network module
 cllsnim -n ether
How to list the runtime parameters for the node node1
 cllsparam -n node1
How to add a cluster definition with name dcm and id 3
 claddclstr -i 3 -n dcm
How to create resource group sapRG with nodes n1,n2 in cascade
 claddgrp -g sapRG -r cascading -n n1 n2
Creates an application server ser1 with startscript as /usr/start and stop script as /usr/stop
 claddserv -s ser1 -b /usr/start -e /usr/stop
How to change cluster definitions name to dcmds and id to 2
 clchclstr -i 2 -n dcmds
How to change the cluster security to enhanced
 clchclstr -s enhanced
How to delete the resource group appRG and related resources
 clrmgrp -g appRG
How to remove the node node69
 clrmnode -n node69
How to remove the adapter named node69_svc
 clrmnode -a node69_svc
How to remove all resources from resource group appRG
 clrmres -g appRG
How to remove the application server app69
 clrmserv app69
How to remove all applicaion servers
 clrmserv ALL
How to list the nodes with active cluster manager processes from cluster manager on node node1clgetaddr node1 returns a pingable address from node node1
 clgetactivenodes -n node1
How to list the info about resource group sapRG
 clgetgrp -g sapRG
How to list the participating nodes in the resource group sapRG
 clgetgrp -g sapRG -f nodes
How to get the ip label associated to the resource group
 clgetip sapRG
How to list the network for ip 192.168.100.2, netmask 255.255.255.0
 clgetnet 192.168.100.2 255.255.255.0
How to list the VG of LV nodelv
 clgetvg -l nodelv
How to add node5 to the cluster
 clnodename -a node5
How to change the cluster node name srv5 to srv3
 clnodename -o srv5 -n srv3