CAA - Cluster Aware AIX



Disclaimer Note: The following information is gathered by referring various sources and credit should go to them.
If any updated version added with the uncovered function then please do try with your own risk.
_________________________________________________________________________________
Was CAA introduced from AIX 7.1 ?
No,
The Cluster Aware function of AIX was introduced with AIX 7.1 and AIX 6.1 TL6. This new technology builds clustering technologies into the AIX base operating system. Cluster Aware provides the AIX kernel with heartbeating, health management and monitoring capabilities.
Using Cluster Aware AIX you can easily create a cluster of AIX nodes with the following capabilities (Please note, CAA is not for High Availability (HA):

  • Clusterwide event management - The AIX Event Infrastructure allows event propagation across the cluster so that applications can monitor events from any node in the cluster.
    • Communication and storage events
      • Node UP and node DOWN
      • Network adapter UP and DOWN
      • Network address change
      • Point-of-contact UP and DOWN
      • Disk UP and DOWN
    • Predefined and user-defined events
  • Clusterwide storage naming service - When a cluster is defined or modified, the AIX interfaces automatically create a consistent shared device view across the cluster. A global device name, such as cldisk5, would refer to the same physical disk from any node in the cluster.
  • Clusterwide command distribution - The clcmd command provides a facility to distribute a command to a set of nodes that are members of a cluster. For example, the command clcmd date returns the output of the date command from each of the nodes in the cluster.   
  • Clusterwide communication making use of networking and storage communication Cluster Aware AIX is not a high availability solution taken alone and does not replace existing high availability products. It can be seen as a set of commands and services that the cluster software can exploit to provide high availability and disaster recovery support to external applications. CAA does not provide the application monitoring and resource failover capabilities that PowerHA provides. In fact, IBM PowerHA SystemMirror 7.1 and even (RSCT) Reliable Scalable Cluster Technology use the built-in AIX clustering capabilities. The reason for the introduction of cluster built-in functions of AIX was to simplify the configuration and management of high availability clusters. It also lays a foundation for future AIX capabilities and the next generation of PowerHA SystemMirror.

Note: Cluster Aware AIX capability is included in AIX 7.1 Standard or Enterprise Editions, but is not included in AIX 7.1 Express Edition.

Creating the cluster

Before creating the cluster there are some things to consider.
CAA uses IP based network communications and storage interface communication through Fibre Channel and SAS adapters. When using both type of communication, all nodes in the cluster can always communicate with any other nodes in the cluster configuration and thus eliminating "split brain" incidents.

Network

A multicast/unicast address is used for cluster communications between the nodes in the cluster. Therefore, you need to ensure proper network configuration on each node. Each node must have at least one IP address configured on its network interface. The IP address is used as a basis for creating an IP multicast/unicast address, which the cluster uses for internal communications. Check also if entries of the IP addresses exist in every node’s /etc/hosts file.

Storage

Each node of the cluster should have common storage devices available, either SAN or SAS disks. These storage devices are used for the cluster repository disk and for any clustered shared disks. If Fibre Channel devices will be used, the following procedure must be followed before creating the cluster (SAS adapters do not require special setup):
  1. Run the following command:
rmdev -Rl fcsX
Note: X is the number of your adapter. If you booted from the Fibre Channel adapter, you do not need to complete this step.
  1. Run the following command:
chdev -l fcsX -a tme=yes 
Note: If you booted from the Fibre Channel adapter, add the -P flag. The target mode enabled (TME) attribute is needed in order FC adapter to be supported.
  1. Run the following command:
chdev -l fscsiX -a dyntrk=yes -a fc_err_recov=fast_fail
  1. Run the cfgmgr command.

Note: If you booted from the Fibre Channel adapter and used the -P flag, you must reboot.
  1. Verify the configuration changes by running the following command:
lsdev -C | grep sfwcom
The following is an example of the output displayed from the lsdev -C | grep sfwcom command:
lsdev -C | grep sfwcom
sfwcomm0 Available 01-00-02-FF Fiber Channel Storage Framework Comm
sfwcomm1 Available 01-01-02-FF Fiber Channel Storage Framework Comm
The above procedure has to be performed on all nodes of the cluster.
The cluster repository disk is used as the central repository for the cluster configuration data. A disk size of 10 GB is recommended. The minimum is 1 GB.


The following commands can be used for creating and managing the cluster:
lscluster Used to list cluster configuration information.
mkcluster Used to create a cluster.
chcluster Used to change a cluster configuration.
rmcluster Used to remove a cluster configuration.
clcmd Used to distribute a command to a set of nodes that are members of a cluster.


See Man pages for different options.

Create the cluster:

Create the cluster with mkcluster command:
# mkcluster -n mycluster -m nodeA,nodeB -r hdisk1 -d hdisk2,hdisk3
mkcluster: Cluster shared disks are automatically renamed to names such as
cldisk1, [cldisk2, ...] on all cluster nodes. However, this cannot take place while a disk is busy or on a node which is down or not reachable. If any disks cannot be renamed now, they will be renamed later by the clconfd daemon, when the node is available and the disks are not busy.


The name of the cluster is mycluster, the nodes are nodeA and nodeB, the repository disk is hdisk1 and the shared disks are hdisk2 and hdisk3. Note that repository disk and shared disks will be automatically renamed as caa_private0 and cldisk1 and cldisk2 respectively. These names will be the same over both nodes, no matter what was their initial hdisk number (which could be different on each node).
Before mkcluster command:
# lspv
hdisk0 0050187a43833dc5 rootvg active
hdisk1 0050187a8de3af7d None active
hdisk2 0050187a70d8c6cf None
hdisk3 none None
After mkcluster command:
# lspv
hdisk0 0050187a43833dc5 rootvg active
caa_private0 0050187a8de3af7d caavg_private active
cldisk1 0050187a70d8c6cf None
cldisk2 none None
When the cluster is ready a special volume group (caavg_private), new logical volumes and filesystems are created.
When you create a cluster with mkcluster command the following actions are performed (taken from http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%2Fcom.ibm.aix.clusteraware%2Fclaware_architecture.htm):
  • The cluster is created using the mkcluster command.
  • The cluster configuration is written to the raw section of the cluster repository disk.
  • Special volume groups and logical volumes are created on the cluster repository disk.
  • Cluster file systems are created on the special volume group.
  • Cluster services are made available to other functions in the operating system, such as Reliable Scalable Cluster Technology (RSCT) and PowerHA® SystemMirror.
  • Storage framework register lists are created on the cluster repository disk.
  • A global device namespace is created and interaction with LVM starts for handling associated volume group events.
  • A clusterwide multicast address is established.
  • The node discovers all of the available communication interfaces.
  • The cluster interface monitoring starts.
  • The cluster interacts with Autonomic Health Advisory File System (AHAFS) for clusterwide event distribution.
  • The cluster exports cluster messaging and cluster socket services to other functions in the operating system, such as Reliable Scalable Cluster Technology (RSCT) and PowerHA SystemMirror.

To check the status of the cluster use lscluster with the following options:
-c Lists the cluster configuration.
-d Lists the cluster storage interfaces.
-i Lists the cluster configuration interfaces on the local node.
-m Lists the cluster node configuration information.

To check if the cluster is operating properly execute any clusterwide command:
#clcmd date
-------------------------------
NODE nodeA
-------------------------------
Wed Jun 6 11:19:44 EEST 2012
-------------------------------
NODE nodeB
-------------------------------
Wed Jun 6 11:19:44 EEST 2012


To remove the cluster just type:
# rmcluster -n mycluster
rmcluster: Removed cluster shared disks are automatically renamed to names such
as hdisk10, [hdisk11, ...] on all cluster nodes. However, this cannot
take place while a disk is busy or on a node which is down or not
reachable. If any disks cannot be renamed now, you must manually
rename them by removing them from the ODM database and then running
the cfgmgr command to recreate them with default names. For example:
rmdev -l cldisk1 -d
rmdev -l cldisk2 -d
cfgmgr
Cluster Aware AIX helps to create very easily cluster with minimum set of commands and user intervention. In our opinion, one of the best features it provides is the common disk names used in all the participating nodes in the cluster.


Examples

  1. To list the cluster configuration for all nodes, enter:
lscluster –m
The sample of the output follows:
# lscluster -m
Calling node query for all nodes...
Node query number of nodes examined: 2

                    Node name: nodeA.ibm.com
                    Cluster shorthand id for node: 1
                    uuid for node: 84088524-b124-11e3-8210-32c8e74b1e02
                    State of node:  UP  NODE_LOCAL
                    Smoothed rtt to node: 0
                    Mean Deviation in network rtt to node: 0
                    Number of clusters node is a member in: 1
                    CLUSTER NAME       TYPE  SHID   UUID
                    Sample local        84ee37f4-b124-11e3-8210-32c8e74b1e02

                    Number of points_of_contact for node: 0
                    Point-of-contact interface & contact state
                     n/a

            ------------------------------

                    Node name: nodeB.ibm.com
                    Cluster shorthand id for node: 2
                    uuid for node: 8492a5a6-b124-11e3-8210-32c8e74b1e02
                    State of node:  UP
                    Smoothed rtt to node: 70
                    Mean Deviation in network rtt to node: 82
                    Number of clusters node is a member in: 1
                    CLUSTER NAME       TYPE  SHID   UUID
                    Sample local        84ee37f4-b124-11e3-8210-32c8e74b1e02

                    Number of points_of_contact for node: 2
                    Point-of-contact interface & contact state
                    dpcom  UP  RESTRICTED
                    en0  UP
  1. To list the cluster configuration for the local node, enter:
lscluster -s
The sample of the output follows:
 # lscluster -s
            Cluster Network Statistics:

            pkts seen: 33861217                   passed: 32052241
            IP pkts: 5778096                      UDP pkts: 1934943
            gossip pkts sent: 1463320             gossip pkts recv: 688759
            cluster address pkts: 0               CP pkts: 1808962
            bad transmits: 5                      bad posts: 4
            Bad transmit (overflow - disk ): 0
            Bad transmit (overflow - tcpsock): 0
            Bad transmit (host unreachable): 0
            Bad transmit (net unreachable): 0
            Bad transmit (network down): 0
            Bad transmit (no connection): 0
            short pkts: 0                         multicast pkts: 1808880
            cluster wide errors: 0                bad pkts: 0
            dup pkts: 0                           dropped pkts: 14
            pkt fragments: 1                      fragments queued: 0
            fragments freed: 0
            pkts pulled: 0                        no memory: 0
            rxmit requests recv: 10               requests found: 3
            requests missed: 7                    ooo pkts: 0
            requests reset sent: 7                reset recv: 0
            remote tcpsock send: 0                tcpsock recv: 0
            rxmit requests sent: 0
            alive pkts sent: 0                    alive pkts recv: 0
            ahafs pkts sent: 2                    ahafs pkts recv: 0
            nodedown pkts sent: 0                 nodedown pkts recv: 1
            socket pkts sent: 62                  socket pkts recv: 54
            cwide pkts sent: 275321               cwide pkts recv: 275318
            socket pkts no space: 0               pkts recv notforhere: 0
            Pseudo socket pkts sent: 0            Pseudo socket pkts recv: 0
            Pseudo socket pkts dropped: 0
            arp pkts sent: 1                      arp pkts recv: 2
            stale pkts recv: 0                    other cluster pkts: 4
            storage pkts sent: 1                  storage pkts recv: 1
            disk pkts sent: 174                   disk pkts recv: 0
            unicast pkts sent: 275364             unicast pkts recv: 82
            out-of-range pkts recv: 0
            IPv6 pkts sent: 0                     IPv6 pkts recv: 122
            IPv6 frags sent: 0                    IPv6 frags recv: 0
            Unhandled large pkts: 0
            mrxmit overflow     : 0               urxmit overflow: 0
  1. To list the interface information for the local node, enter:
lscluster –i
The sample of output follows:
# lscluster -i
            Network/Storage Interface Query

            Cluster Name:  Sample
            Cluster uuid:  84ee37f4-b124-11e3-8210-32c8e74b1e02
            Number of nodes reporting = 2
            Number of nodes expected = 2

            Node nodeA.ibm.com
            Node uuid = 84088524-b124-11e3-8210-32c8e74b1e02
            Number of interfaces discovered = 2
                    Interface number 1 en0
                            ifnet type = 6 ndd type = 7
                            Mac address length = 6
                            Mac address =  32:C8:E7:4B:1E:02
                            Smoothed rrt across interface = 0
                            Mean Deviation in network rrt across interface = 0
                            Probe interval for interface = 100 ms
                            ifnet flags for interface = 0x1E080863
                            ndd flags for interface = 0x0021081B
                            Interface state  UP
                            Number of regular addresses configured on interface = 1
                            IPv4 ADDRESS: 9.3.199.216  broadcast 9.3.199.255  netmask 255.255.254.0
                            Number of cluster multicast addresses configured on interface = 1
                            IPv4 MULTICAST ADDRESS: 228.3.199.216  broadcast 0.0.0.0  netmask 0.0.0.0
                    Interface number 2 dpcom
                            ifnet type = 0 ndd type = 305
                            Mac address length = 0
                            Mac address =  00:00:00:00:00:00
                            Smoothed rrt across interface = 750
                            Mean Deviation in network rrt across interface = 1500
                            Probe interval for interface = 22500 ms
                            ifnet flags for interface = 0x00000000
                            ndd flags for interface = 0x00000009
                            Interface state  UP  RESTRICTED  AIX_CONTROLLED
                    Pseudo Interface
                            Interface State DOWN

            Node nodeB.ibm.com
            Node uuid = 8492a5a6-b124-11e3-8210-32c8e74b1e02
            Number of interfaces discovered = 2
                    Interface number 1 en0
                            ifnet type = 6 ndd type = 7
                            Mac address length = 6
                            Mac address =  32:C8:EF:AD:7C:02
                            Smoothed rrt across interface = 0
                            Mean Deviation in network rrt across interface = 0
                            Probe interval for interface = 990 ms
                            ifnet flags for interface = 0x1E084863
                            ndd flags for interface = 0x0021081B
                            Interface state  UP
                            Number of regular addresses configured on interface = 1
                            IPv4 ADDRESS: 9.3.199.128  broadcast 9.3.199.255  netmask 255.255.254.0
                            Number of cluster multicast addresses configured on interface = 1
                            IPv4 MULTICAST ADDRESS: 228.3.199.216  broadcast 0.0.0.0  netmask 0.0.0.0
                    Interface number 2 dpcom
                            ifnet type = 0 ndd type = 305
                            Mac address length = 0
                            Mac address =  00:00:00:00:00:00
                            Smoothed rrt across interface = 750
                            Mean Deviation in network rrt across interface = 1500
                            Probe interval for interface = 22500 ms
                            ifnet flags for interface = 0x00000000
                            ndd flags for interface = 0x00000009
                            Interface state  UP  RESTRICTED  AIX_CONTROLLED
                    Pseudo Interface
                            Interface State DOWN
  1. To list the storage interface information for the cluster, enter:
lscluster -d
The sample of output follows:
# lscluster -d
            Storage Interface Query

            Cluster Name:  Sample
            Cluster uuid:  84ee37f4-b124-11e3-8210-32c8e74b1e02
            Number of nodes reporting = 2
            Number of nodes expected = 2
            Node nodeA.ibm.com
            Node uuid = 84088524-b124-11e3-8210-32c8e74b1e02
            Number of disk discovered = 1
                    hdisk4
                      state : UP
                      uDid  :
                      uUid  : 76c94719-7335-ded6-10e2-77d61ff7998c
                      type  : REPDISK
            Node nodeB.ibm.com
            Node uuid = 8492a5a6-b124-11e3-8210-32c8e74b1e02
            Number of disk discovered = 1
                    hdisk0
                      state : UP
                      uDid  : 382300c4f4f700004c0000000140799c6e39.3105VDASD03AIXvscsi
                      uUid  : 76c94719-7335-ded6-10e2-77d61ff7998c
                      type  : REPDISK
  1. To list the cluster configuration, enter:
lscluster -c
The sample of the output follows:
# lscluster -c
Cluster Name: Sample
Cluste UUID: 8e1d89da-b39d-11e3-91e7-d24dc2d9d309
Number of nodes in cluster = 2
        Cluster ID for node nodeA.ibm.com: 1
        Primary IP address for node r5r3m25.aus.stglabs.ibm.com: 9.3.207.132
        Cluster ID for node nodeB.ibm.com: 2
        Primary IP address for node r5r3m26.aus.stglabs.ibm.com: 9.3.207.218
Number of disks in cluster = 1
        Disk = hdisk6 UUID = 57208624-fda4-d404-a7c0-8e425e2941a4 cluster_major = 0 cluster_minor = 1
Multicast for site LOCAL: IPv4 228.3.207.132 IPv6 ff05::e403:cf84
Communication Mode: multicast
Local node maximum capabilities: HNAME_CHG, UNICAST, IPV6, SITE
Effective cluster-wide capabilities: HNAME_CHG, UNICAST, IPV6, SI




CAA (Cluster Aware AIX)

CAA is an AIX feature, and with that AIX kernel has the capability to provide specific cluster services, like heartbeating and node monitoring. Beside these, using Cluster Aware AIX you can easily create a cluster of AIX nodes. CAA does not replace PowerHA, it provides several services for PowerHA. PowerHA 7.1 and RSCT use the built-in AIX clustering capabilities, which simplifies the configuration and management of cluster.

CAA needs the following ports on all nodes for network communication:
4098 (for multicast)
6181
16191
42112


These CAA commands can be used for managing clusters:

lscluster list cluster configuration information
-c cluster configuration
-d disk (storage) configuration
-i interfaces configuration
-m node configuration
mkcluster create a cluster
chcluster change a cluster configuration
rmcluster remove a cluster configuration
clcmd run a command on all nodes of a cluster

----------------
PowerHA uses a shared disk to store Cluster Aware AIX (CAA) information. At least a 512 MB (and no more than 460 GB) shared disk is needed, for this cluster repository disk. (This disk cannot be used for application storage or any other purpose.)

CAA stores the repository disk related information in the ODM CuAt, as part of the cluster information.

# odmget CuAt | grep -p cluster

CuAt:
name = "cluster0"
attribute = "node_uuid"
value = "52a6b8be-fff8-11e5-8e37-56a1a7627864"
type = "R"
generic = "DU"
rep = "s"
nls_index = 3

CuAt:
name = "cluster0"
attribute = "clvdisk"
value = "d7063c81-3f64-b5f7-d82b-fa8ed99bfe61"
type = "R"
generic = "DU"
rep = "s"
nls_index = 2

In case this ODM entry is missing (which can cause that a node will fail to join the cluster) it can be repopulated (and the node forced to join the cluster) using clusterconf command:

clusterconf -r hdiskx  //hdiskx is the repository disk