IBM PowerHA 7.

1 heartbeat over SAN
Configuration steps
Talor Holloway (talor@adventone.com), Senior Consultant, Advent One Summary: This article describes the prerequisite configuration required, and the actual configuration to use IBM® PowerHA® 7.1 storage area network (SAN) heartbeat. In PowerHA 7.1, the disk heartbeat has been replaced by a SAN heartbeat, which should be included in a resilient PowerHA architecture. Date: 21 Jan 2013 Level: Introductory Also available in: Chinese Activity: 4000 views Comments: 0 (View | Add comment - Sign in)

Average rating (13 votes) Rate this article

Introduction
IBM PowerHA System Mirror for AIX is clustering software which gives the capability for a resource or group of resources (an application) to be automatically or manually moved to another IBM AIX® system in the event of a system failure. Heartbeat and failure detection is performed over all interfaces available to the cluster. This could be network interfaces, Fibre Channel (FC) adapter interfaces, and the Cluster Aware AIX (CAA) repository disk. In PowerHA 6.1 and earlier versions, heartbeat over FC adapter interfaces was not supported, and instead, a SAN-attached heartbeat disk was made available to both nodes, and this was used for heartbeat and failure detection. In PowerHA 7.1, the use of heartbeat disks is no longer supported, and configuring heartbeat over SAN is the supported method to use in place of heartbeat disks. For this heartbeat over SAN to take place, the FC adapter in the AIX system needs to be configured to act as a target and aninitiator. In most SAN environments, an initiator device belongs to the server which is typically a host bus adapter (HBA) and a target is typically a storage device, such as a storage controller or a tape device. The IBM AIX 7.1 Information Center contains a list of supported FC adapters that can support the target mode. These adapters can be used for heartbeat over SAN. Back to top

Overview
In this article, I have provided simple examples of how to set up the SAN heartbeat in two scenarios; the first example with two AIX systems using physical I/O and the other example with two AIX logical partitions (LPARs) using Virtual I/O Server and N-Port ID Virtualization (NPIV). In each of the examples, we have a two-node PowerHA 7.1 cluster, with one node residing on a different IBM POWER® processor-based server. This article does not cover how to configure shared storage, advanced network communications, or application controllers. This is a practical example of how to build a very simple cluster, and get the SAN heartbeat working. Back to top

Requirements
The following minimum requirements must be met to ensure that we can create the cluster and configure the SAN heartbeat:  AIX 6.1 or preferably AIX 7.1 needs to be installed on both AIX systems, using the latest technology level and service pack.  PowerHA 7.1 needs to be installed on both AIX systems, using the latest service pack.

they must be 8 GBps adapters supporting NPIV. Figure 1. If NPIV is in use. If Virtual I/O Server is in use.     The FC adapters in the servers must support target mode.2. we have a very simple environment where there are two POWER processor-based systems. Overview of scenario 1 SAN zoning requirements . Back to top Scenario 1: Two nodes using physical I/O In this scenario. each with a single instance of AIX. and if NPIV is in use. then the fabric switches must have NPIV support enabled. NPIV support is required for Scenario 2 that is explained in this article. There must be a LUN allocated to both AIX systems for use as shared storage for the cluster. This is required for Scenario 2 that is explained in this article. This is required for Scenario 2 in this article. then the VIOS code should be the latest service pack of IOS 2. These systems are in a PowerHA cluster and connected through redundant SAN fabrics to shared storage. The following figure gives a high-level overview of this scenario. There must be a logical unit number (LUN) allocated to both AIX systems for use as the CAA repository disk. and be on a supported level of firmware.

s/:$//'). Zone the HBA adapters to the storage ports on the storage controller used for the shared storage. done fcs0 .10:00:00:00:C9:CC:2A:7D root@ha71_node2:/home/root# After the WWPNs are known.  Storage zones  Heartbeat zones To configure the zoning.. do print ${i} . .s/:$//'). SAN zoning is required.10:00:00:00:C9:A9:2E:97 fcs2 . and capture the worldwide port number (WWPN) of each adapter port./&:/g. You need to configure the following two types of zones.10:00:00:00:C9:A9:2E:96 fcs1 .10:00:00:00:C9:CC:49:44 fcs1 . zoning can be performed on the fabric switches. donefcs0 . and also create zones that can be used for the heartbeat.Before the cluster can be created. The following diagram gives an overview of how the heartbeat zones should be created.$(lscfg -vl $i |grep Network |awk '{print $2}' |cut -c21-50| sed 's/. do print ${i} .10:00:00:00:C9:C8:85:CC fcs3 .$(lscfg -vl $i |grep Network |awk '{print $2}' |cut -c21-50| sed 's/. verify that the FC adapters are available.10:00:00:00:C9:CC:2A:7C fcs3 .10:00:00:00:C9:CC:49:45 fcs2 . first log in to each of the nodes../&:/g. root@ha71_node1:/home/root# lsdev -Cc adapter |grep fcs fcs0 Available 02-T1 8Gb PCI Express Dual Port FC Adapter fcs1 Available 03-T1 8Gb PCI Express Dual Port FC Adapter fcs2 Available 02-T1 8Gb PCI Express Dual Port FC Adapter fcs3 Available 03-T1 8Gb PCI Express Dual Port FC Adapter root@ha71_node1:/home/root# for i in `lsdev -Cc adapter |awk '{print $1}' |grep fcs `.10:00:00:00:C9:C8:85:CD root@ha71_node1:/home/root# root@ha71_node2:/home/root# lsdev -Cc adapter |grep fcs fcs0 Available 02-T1 8Gb PCI Express Dual Port FC Adapter fcs1 Available 03-T1 8Gb PCI Express Dual Port FC Adapter fcs2 Available 02-T1 8Gb PCI Express Dual Port FC Adapter fcs3 Available 03-T1 8Gb PCI Express Dual Port FC Adapter root@ha71_node2:/home/root# for i in `lsdev -Cc adapter |awk '{print $1}' |grep fcs `. as shown in the following example.

root@ha71_node1:/home/root# fscsi0 Defined fcs0 Defined root@ha71_node1:/home/root# fscsi2 Defined fcs2 Defined root@ha71_node1:/home/root# fscsi0 changed root@ha71_node1:/home/root# fscsi2 changed root@ha71_node1:/home/root# fcs0 changed root@ha71_node1:/home/root# fcs2 changed root@ha71_node1:/home/root# root@ha71_node1:/home/root# rmdev –l fcs0 –R rmdev –l fcs2 –R chdev –l fscsi0 –a dyntrk=yes -a fc_err_recov=fast_fail chdev –l fscsi2 –a dyntrk=yes –a fc_err_recov=fast_fail chdev –l fcs0 -a tme=yes chdev –l fcs2 -a tme=yes cfgmgr root@ha71_node2:/home/root# rmdev –l fcs0 –R fscsi0 Defined fcs0 Defined root@ha71_node2:/home/root# rmdev –l fcs2 –R fscsi2 Defined fcs2 Defined root@ha71_node2:/home/root# chdev –l fscsi0 –a dyntrk=yes –a fc_err_recov=fast_fail . the adapters fcs0 and fcs2 on each node have been used for the SAN heartbeat zones. In the SAN zoning example. the next step is to enable target mode on each of the adapter device in AIX. For target mode to be enabled. This needs to be performed on each adapter that has been used for a heartbeat zone. andtarget mode need to be enabled on the fcs device.Figure 2. perform the following steps on both nodes to zone. To enable target mode. both dyntrk (dynamic tracking) and fast_fail need to be enabled on the fscsi device. Device configuration in AIX After the zoning is complete. Overview of creating heartbeat zones (scenario 1) Ensure that you zone one port from each FC adapter on the first node to another port on each FC adapter on the second node.

1. The following figure gives a high-level overview of this scenario. This should be checked on each of the fcs0 and fcs2 adapters on both nodes. Configure the PowerHA cluster. each with dual VIOS and LPARs using VIOS is used. we should next look for the available sfwcomm devices. which is our PowerHA node. and restart the server. When using VIOS. These devices are used for the PowerHA error detection and heartbeat over SAN. 3. The following example shows how to check fscsi0 and fcs0 on one of the nodes. a slightly more complex environment where there are two POWER processor-based systems. Zone the VIOS ports together. make the changes with the –P option at the end of the command. This will cause the change to be applied at the next start of the server. the following high-level steps are required. root@ha71_node1:/home/root# lsattr -El fscsi0 attach switch How this adapter is CONNECTED False dyntrk yes Dynamic Tracking of FC Devices True fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True scsi_id 0xbc0e0a Adapter SCSI ID False sw_fc_class 3 FC Class for Fabric True root@ha71_node1:/home/root# lsattr -El fcs0 |grep tme tme yes Target Mode Enabled root@ha71_node1:/home/root# True After the target mode is enabled. 2. Check whether these devices are available on both nodes. Configure the private 3358 VLAN for heartbeat traffic. what differs from the physical I/O scenario is that the FC ports of the Virtual I/O Server must be zoned together. The target mode setting can be verified by checking the attributes of the fscsi devices. 4. There is then a private virtual LAN (VLAN) with the port VLAN ID of 3358 (3358 is the only VLAN ID that will work) used to carry the heartbeat communication over the hypervisor from the Virtual I/O Server to the client LPAR. . In this case. Turn on target mode on the VIOS FC adapters.fscsi0 changed root@ha71_node2:/home/root# fscsi2 changed root@ha71_node2:/home/root# fcs0 changed root@ha71_node2:/home/root# fcs2 changed root@ha71_node2:/home/root# root@ha71_node2:/home/root# chdev –l fscsi2 –a dyntrk=yes –a fc_err_recov=fast_fail chdev –l fcs0 -a tme=yes chdev –l fcs2 -a tme=yes cfgmgr If the devices are busy. root@ha71_node1:/home/root# lsdev -C |grep sfwcomm sfwcomm0 Available 02-T1-01-FF Fibre Channel Storage sfwcomm1 Available 03-T1-01-FF Fibre Channel Storage sfwcomm2 Available 02-T1-01-FF Fibre Channel Storage sfwcomm3 Available 03-T1-01-FF Fibre Channel Storage root@ha71_node1:/home/root# Framework Framework Framework Framework Comm Comm Comm Comm root@ha71_node1:/home/root# lsdev -C |grep sfwcomm sfwcomm0 Available 02-T1-01-FF Fibre Channel Storage sfwcomm1 Available 03-T1-01-FF Fibre Channel Storage sfwcomm2 Available 02-T1-01-FF Fibre Channel Storage sfwcomm3 Available 03-T1-01-FF Fibre Channel Storage root@ha71_node1:/home/root# Framework Framework Framework Framework Comm Comm Comm Comm Back to top Scenario 2: Two nodes using Virtual I/O Server In this scenario. These LPARs are in a PowerHA cluster and connected using redundant SAN fabrics to shared storage.

..... You need to configure the following two types of zones./&:/g.Figure 3.10:00:00:00:C9:B7:65:32 fcs1 .//g' | sed 's/.s/:$//').$(lsdev -dev $i -vpd |grep Network |awk '{print $2}' |sed 's/Address. SAN zoning is required. log in to each of the VIOS (both VIOS on each managed system) and verify that the FC adapters are available... $ lsdev -type adapter |grep fcs fcs0 Available 02-T1 8Gb PCI Express Dual Port FC Adapter fcs1 Available 03-T1 8Gb PCI Express Dual Port FC Adapter fcs2 Available 02-T1 8Gb PCI Express Dual Port FC Adapter fcs3 Available 03-T1 8Gb PCI Express Dual Port FC Adapter $ for i in `lsdev -type adapter |awk '{print $1}' |grep fcs `.. When performing the zoning. The following example shows how to perform this step on one VIOS.  Storage zones o Contains the LPAR's virtual WWPNs o Contains the storage controller's WWPNs  Heartbeat zones (contains the VIOS physical WWPNs) o The VIOS on each machine should be zoned together....10:00:00:00:C9:B7:63:60 . o The virtual WWPNs of the client LPARs should not be zoned together... do print ${i} . Overview of Scenario 2 SAN zoning requirements Before the cluster can be created. and capture the WWPN information for zoning.10:00:00:00:C9:B7:65:33 fcs2 . done fcs0 .

./&:/g.s/:$//'). Zone the LPAR’s virtual WWPNs to the storage ports on the storage controller used for the shared storage.s/:$//').$(lscfg -vl $i |grep Network |awk '{print $2}' |cut -c21-50| sed 's/.$(lscfg -vl $i |grep Network |awk '{print $2}' |cut -c21-50| sed 's/. do print ${i} .fcs3 .10:00:00:00:C9:B7:63:61 The virtual WWPNs also need to be captured from the client LPAR for the storage zones.. . zoning can be performed on the fabric switches. done fcs0 – C0:50:76:04:F8:F6:00:00 fcs1 – C0:50:76:04:F8:F6:00:02 fcs2 – C0:50:76:04:F8:F6:00:04 fcs3 – C0:50:76:04:F8:F6:00:06 root@ha71_node2:/home/root# After the WWPNs are known./&:/g. do print ${i} . done fcs0 – c0:50:76:04:f8:f6:00:40 fcs1 – c0:50:76:04:f8:f6:00:42 fcs2 – c0:50:76:04:f8:f6:00:44 fcs3 – c0:50:76:04:f8:f6:00:46 root@ha71_node1:/home/root# root@ha71_node2:/home/root# lsdev -Cc adapter |grep fcs fcs0 Available 02-T1 Virtual Fibre Channel Client Adapter fcs1 Available 03-T1 Virtual Fibre Channel Client Adapter fcs2 Available 02-T1 Virtual Fibre Channel Client Adapter fcs3 Available 03-T1 Virtual Fibre Channel Client Adapter root@ha71_node2:/home/root# for i in `lsdev -Cc adapter |awk '{print $1}' |grep fcs `. root@ha71_node1:/home/root# lsdev -Cc adapter |grep fcs fcs0 Available 02-T1 Virtual Fibre Channel Client Adapter fcs1 Available 03-T1 Virtual Fibre Channel Client Adapter fcs2 Available 02-T1 Virtual Fibre Channel Client Adapter fcs3 Available 03-T1 Virtual Fibre Channel Client Adapter root@ha71_node1:/home/root# for i in `lsdev -Cc adapter |awk '{print $1}' |grep fcs `. and also create zones containing the VIOS physical ports. The following figure gives an overview of how the heartbeat zones should be created. which will be used for the heartbeat. The following example shows how to perform this step on both nodes.

Overview of creating heartbeat zones (scenario 2) Virtual I/O Server FC adapter configuration After the zoning is complete. the fcs0 and fcs2 adapters on each node have been used for the SAN heartbeat zones. it is strongly recommended to modify one VIOS at a time. For target mode to be enabled. and therefore. $ chdev -dev fscsi0 -attr dyntrk=yes fc_err_recov=fast_fail –perm fscsi0 changed $ chdev -dev fcs0 -attr tme=yes –perm fcs0 changed $ chdev -dev fscsi2 -attr dyntrk=yes fc_err_recov=fast_fail –perm fscsi2 changed $ chdev -dev fcs2 -attr tme=yes –perm fcs2 changed $ shutdown -restart A restart of each VIOS is required. . In the SAN zoning example. the next step is to enable target mode on each of the adapter device in each VIOS. To enable target mode. perform the following steps on both VIOS on each managed system.Figure 4. both dyntrk (dynamic tracking) and fast_fail need to be enabled on the fscsi device. This needs to be performed on each adapter that has been used for a heartbeat zone. andtarget mode need to be enabled on the fcs device.

Figure 5. The VLAN ID must be 3358 for this to work. . log in to each of the VIOS. however for the client LPAR (HA node) connectivity.Virtual I/O Server network configuration When VIOS is in use. a private VLAN must to be configured to provide this. This provides the VIOS connectivity to the 3358 VLAN. This provides connectivity between the VIOS on each managed system. Virtual Ethernet setup First. The following figure describes the virtual Ethernet setup. the physical FC adapters belonging to the VIOS are zoned together. and add an additional VLAN to each shared Ethernet bridge adapter. The following figure shows how this additional VLAN can be added to the bridge adapter.

This provides the client LPAR connectivity to the 3358 VLAN. Adding the additional VLAN to the bridge adapter Next. Do not put an IP address on this interface. create a virtual Ethernet adapter on the client partition. . run the cfgmgr command and pick up the virtual Ethernet adapter.Figure 6. From AIX. and set the port virtual VLAN ID to be 3358.

251 172.cf. Cluster Name: ha71_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: None Cluster IP Address: There are 2 node(s) and 1 network(s) defined NODE ha71_node1: Network net_ether_01 ha71_node1 NODE ha71_node2: Network net_ether_01 ha71_node2 172. Gathering cluster information. is to perform the following tasks:  Edit /etc/environment and add /usr/es/sbin/cluster/utilities and /usr/es/sbin/cluster/ to the $PATH variable. Back to top PowerHA cluster configuration The first step.  Populate /usr/es/sbin/cluster/netmon. which may take a few minutes.. root@ha71_node1:/home/root # clmgr add cluster ha71_cluster NODES="ha71_node1 ha71_node2" Warning: to complete this configuration. In the following example.16. the cluster can be created using smitty sysmirror or on the command line. and the SAN heartbeat is ready for use.  Populate /etc/cluster/rhosts. After this is complete. I have created a simple two-node cluster called ha71_cluster. a repository disk must be defined...252 No resource groups defined Initializing.Figure 7. Creating a virtual Ethernet adapter on the client partition After this is complete.5. . we can create our PowerHA cluster. before creating the cluster.16.5.

Processing.251 172.252 . Next. This could take a few minutes. root@ha71_node1:/home/root# lsdev –Cc hdisk0 Available 00-00-01 IBM MPIO FC hdisk1 Available 00-00-01 IBM MPIO FC root@ha71_node1:/home/root# lspv hdisk0 000966fa5e41e427 hdisk1 000966fa08520349 root@ha71_node1:/home/root# disk 2107 2107 rootvg None active root@ha71_node2:/home/root# lsdev –Cc disk hdisk0 Available 00-00-01 IBM MPIO FC 2107 hdisk1 Available 00-00-01 IBM MPIO FC 2107 root@ha71_node2:/home/root# lspv hdisk0 000966fa46c8abcb rootvg hdisk1 000966fa08520349 None root@ha71_node2:/home/root# active From the above example.. Retrieving data from available cluster nodes. So. it is clear that hdisk1 is a free disk on each node.. modify the cluster definition to include the cluster repository disk. Our free disk on both nodes is hdisk1. so that we can configure the CAA repository. etc…. This can be performed using smitty hacmp or on the command line.5. …. the next step is to check whether there is a free disk on each node.5.. root@ha71_node1:/home/root # clmgr modify cluster ha71_cluster REPOSITORY=hdisk1 Cluster Name: ha71_cluster Cluster Connection Authentication Mode: Standard Cluster Message Authentication Mode: None Cluster Message Encryption: None Use Persistent Labels for Communication: No Repository Disk: hdisk1 Cluster IP Address: There are 2 node(s) and 1 network(s) defined NODE ha71_node1: Network net_ether_01 ha71_node1 NODE ha71_node2: Network net_ether_01 ha71_node2 No resource groups defined Current cluster configuration: root@ha71_node1:/home/root # 172. Start data collection on node ha71_node1 Start data collection on node ha71_node2 Collector on node ha71_node1 completed Collector on node ha71_node2 completed Data collection complete Completed 10 percent of the verification checks Completed 20 percent of the verification checks Completed 30 percent of the verification checks Completed 40 percent of the verification checks Completed 50 percent of the verification checks Completed 60 percent of the verification checks Completed 70 percent of the verification checks Completed 80 percent of the verification checks Completed 90 percent of the verification checks Completed 100 percent of the verification checks IP Network Discovery completed normally Current cluster configuration: Discovering Volume Group Configuration root@ha71_node1:/home/root # After creating the cluster definition.16.16. The following example shows how to perform this step on the command line. this can be used for the repository..

.. Verification has completed normally. The sfwcom (Storage Framework Communication) interface is the SAN heartbeat.0. to all available nodes. In the following example.0. root@ha71_node1:/home/root # Now that a basic cluster has been configured. This is good news! root@ha71_node1:/home/root # lscluster -i sfwcom Network/Storage Interface Query Cluster Name: ha71_cluster Cluster uuid: 7ed966a0-f28e-11e1-b39b-62d58cd52c04 Number of nodes reporting = 2 Number of nodes expected = 2 Node ha71_node1 Node uuid = 7ecf4e5e-f28e-11e1-b39b-62d58cd52c04 Number of interfaces discovered = 3 Interface number 3 sfwcom ifnet type = 0 ndd type = 304 Mac address length = 0 Mac address = 0.net for IPAT on node ha71_node2. Completed Completed Completed Completed Completed Completed Completed 40 percent of the verification checks 50 percent of the verification checks 60 percent of the verification checks 70 percent of the verification checks 80 percent of the verification checks 90 percent of the verification checks 100 percent of the verification checks … etc… Committing any changes. Completed 10 percent of the verification checks WARNING: Multiple communication interfaces are recommended for networks that use IP aliasing in order to prevent the communication interface from becoming a single point of failure. This could take a few minutes.bak Verifying clcomd communication.0 Smoothed rrt across interface = 0 Mean Deviation in network rrt across interface = 0 Probe interval for interface = 100 ms ifnet flags for interface = 0x0 ndd flags for interface = 0x9 Interface state UP root@ha71_node1:/home/root # . please be patient. The lscluster –i command displays the cluster interfaces and their status. The following example shows how to synchronize the cluster topology and resources on the command line. Verifying multicast communication with mping. Start data collection on node ha71_node1 Start data collection on node ha71_node2 Collector on node ha71_node2 completed Collector on node ha71_node1 completed Data collection complete Verifying Cluster Topology. Adding any necessary PowerHA SystemMirror entries to /etc/inittab and /etc/rc. There are fewer than the recommended number of communication interfaces defined on the following node(s) for the given network(s): Node: ---------------------------------ha71_node1 ha71_node2 Network: ---------------------------------net_ether_01 net_ether_01 Completed 20 percent of the verification checks Completed 30 percent of the verification checks Saving existing /var/hacmp/clverify/ver_mping/ver_mping. This can be performed using smitty hacmp or on the command line..log to /var/hacmp/clverify/ver_mping/ver_mping.0. Adding any necessary PowerHA SystemMirror entries to /etc/inittab and /etc/rc. we can check this from one of the nodes to ensure that the SAN heartbeat is up..0..net for IPAT on node ha71_node1. root@ha71_node1:/home/root # cldare -rt Timer object autoclverify already exists Verification to be performed on the following: Cluster Topology Cluster Resources Retrieving data from available cluster nodes.The next step is to verify and synchronize the cluster configuration.log. the last step is to verify that the SAN heartbeat is up. as required.. Verifying Cluster Resources.

and so on are not covered in this article. mirror pools. such as configuring shared storage. . monitors. file collections.The remaining steps for cluster configuration. application controllers.

Sign up to vote on this title
UsefulNot useful