Solaris Unlimited: Sun Cluster 3.1

Defining Clustering:
1. Clustering is a general terminology that describes a group of two or more separate servers.
2. Clusters is a collection of 2 or more system that work together as a single, continuous available system to provide applications, system resources and data to users.

Cluster general characteristics:
1. Separate server nodes, each booting from its own, non-shared, copy of the operating system.
2. Dedicated h/w interconnects, providing private transport only between the nodes of the same cluster.
3. Multiported storage, providing paths from at least two nodes in the cluster to each physical storage device storing data for the applications running in the cluster.

Sun Cluster is a s/w used to build a cluster environment to provide increased availability and performance.

Note:
Sun Cluster 3.0 supports 8 Nodes
Sun Cluster 3.1 supports 16 Nodes
Sun Cluster 3.2 Supports 32 Nodes

HA – High Availability:
1. Clusters are generally marketed as the only way o provide high availability for the applications that run on them.
2. HA can be defined as the minimization of downtime rather than the complete elimination of downtime.

HA standards:
Usually phrased with wording such as “Provides 5 nines availability”. This means 99.999% uptime for the application or about 5 min of downtime per year. One clean server reboot often already exceeds that amount of downtime.

How cluster provide HA?
In the case of any single h/w or s/w failure in the cluster, application services and data are recovered automatically 9without human intervention) and quickly (faster than a server reboot). This is done by taking advantage of the existence of redundant servers in the cluster and redundant server storage paths.

Scalability:
1. Clusters also provide an integrated h/w and s/w environment for scalability.
2. Scalability is defined as the ability to increase application performance by supporting multiple instances of applications on different nodes in the cluster.

Sun cluster hardware environment:
1. Cluster nodes with local disk (Unshared disk)
2. Multi host storage (Shared disk)
3. Cluster interconnect (Provides the channel for inter node communication)
4. Public n/w interface (Used by clients system to access data services on the cluster. Usually the primary n/w interface)
5. Removable media configured as global device such as tapes and cdrom.

Note: Recommended to impletement the cluster with same hardware familiy, preferable with same model.

Sun cluster software environment:
1. Solaris 10 OS s/w
2. Sun Cluster 3.1 s/w (Separate s/w not from OS cd)
3. Data service application ( Sun Cluster Agent cd)
4. Volume management 9SVM or VXVM0. Is exceptional when, the volume management is on the box.

Terminology:
Cluster-node:
1. Is a system that runs both Solaris 10 OS s/w and Sun Cluster s/w.
2. Every node in the cluster is aware when another node joins or leaves the cluster.

Cluster interconnect:
1. This connection is established between all cluster nodes and is solely used by cluster nodes for the private and data service communications.
2. This communication path is also known as private n/w.
3. There are 2 variations of interconnect
i. Point-to-Point
ii. Junction based (In this junction based interconnect, the junction must be switches and not hubs)

CCR – Cluster Configuration Repository:
1. It’s a private, cluster wide, distributed database for storing information about the configuration and star of the cluster.
2. CCR contains the following information:
i. Cluster & node name
ii. Cluster transport configuration
iii. The names of SVM disk set or Vxvm disk group
iv. A list of nodes that can master each disk group or disk set
v. Operations parameter values for data services
vi. Paths to data services call back methods
vii. Current cluster status
3. CCR is accessed when error/recovery situations occur or when there has been general cluster status changes, such as node leaving or joining the cluster.

Local devices:
1. These devices are accessible only on a node that is running the service
and has a physical connection to the cluster. They are not highly available device.

Global device:
a. 1.These devices are highly available to any node in a cluster. Suppose if a node fails while providing access to a global device the Sun Cluster s/w switches over to another path to the device and re-directs the access to the path.
2. This access is known as global device access.
3. Provides simultaneous access to the raw (character) device associated with storage devices from nodes, regardless of where the storage is physically attached.

Device ID (Global Naming Scheme):
1. Each device in the cluster environment is assigned an unique id.
2. Access to the global device is possible through the unique device id (DID) assigned by DID driver instead of traditional Solaris DID’s.
3. Eg: /dev/did/(r)dsk/d2s3

4. It’s important to note that DID’s themselves are just global naming scheme and not a global access scheme.

Global file system:
1. This feature makes file systems simultaneously available on all nodes, regardless of their physical location.
2. UFS, VxFS, HSFS are supported
a. Eg: # mount –o global,logging /dev/md/dsk/dataset/d100 /global/nfs

Data Services/Sun Agents:
1. Is a combination of s/w and an application to run without modification in a Sun Cluster configuration.
2. S/w of the data service provides the following operations:
3. Starting and stopping the applications
4. Monitoring the faults in the applications and recovering from these faults.
5. Configuration files define the properties of the resource, which represents the applications to the RGM (Resource Group Manager)

Data services (Sun Agents) packages are those, programs which monitor various applications and handle stopping, starting, and migrating them on the cluster.
Once the Data Services are installed, they are ready to perform application-specific configuration.

Note:
Resource groups usually contain a logical host resource (virtual IP address for the resource), a data storage resource, and one or more application resources.

Resource:
1. In the context of cluster, the word resource refers to any element above the layer of the cluster frame work which can be turned on or off and can be monitored in the cluster.
2. Ins a instance (example or first stage of proceeding) of a resource type that is defined cluster wide.
NOTE:
Data services utilize several types of resources. Application & n/w resources form a basic unit ie., managed by RGM.

Resource group:
1. Are collection of resources
2. Are either fail over or scalable
iii. Fail over resource group: Is a collection of services that always run together on one node of the cluster at one time and simultaneously fail over or switch over to another node.
iv. Scalable resource group: Describes the collection of services that simultaneously sun on one or more nodes.

Resource type:
1. Is a collection of properties that describe an application to the cluster. This collection includes the information about how the application is to be started, stopped and monitored on nodes of the cluster.
2. Eg: 1. The resource type for Sun Cluster HA for nfs is SUNW.nfs
3. The resource type for Sun Cluster HA for apache is SUNW.apache

Data service types:
1. Fail over data service
2. Scalable data service
3. Parallel data service

Fail over data service:
a. Is a process by which the cluster automatically relocates an application from a failed primary node to a designated redundant secondary node.
b. Fail over services uses a fail over resource group which is a container for application instance resource & n/w resources. (Logical hostname – nothing but n/w resource).
c. NOTE: Logical host names are IP addresses that can be configured up on one node and later automatically configured down on the original node and configured up on the another node.

Scalable data service:
a. This enables applications instances to run on multiple nodes simultaneously. This service uses 2 resource groups.
v. Scalable resource group
vi. Fail over resource group
b. Scalable resource group contains the application resources and the fail over resource group contains n/w resources, 9shared addresses) on which the scalable source depends.

Parallel data service:
a. Sun Cluster systems provide an environment that shares parallel execution of applications across all the nodes of the cluster by using parallel data bases,
b. Sun Cluster support for Oracle parallel server/real applications clusters is a set of packages that, when installed enables Oracle parallel server/real application cluster to run on Sun Cluster nodes.

Cluster Application Modes:
a. Failover
b. Scalability or Load Balancing

Failover:
The application only runs on one node at a time. IF the controlling node fails, then the application and any other associated resources are passed to another node which was previously in standby.

Scalability/Load Balancing:
Runs on more than one node at the same time without any failover.

Public n/w interface:
Clients connect to the cluster through the public network interfaces.

IPMP – Internet Protocol Network Multipathing:
Software that uses fault monitoring and failover to prevent loss of node availability because of single network adapter or cable failure. IP network multipathing failover uses sets of network adapters called IP Network Multipathing groups (Multipathing groups) to provide redundant connections between a cluster node and the public network. The fault monitoring and failover capabilities work together to ensure availability of resources.

Note:
More than one highly available application can be run on a cluster at a time.

Cluster Topologies:
Cluster configuration comes in four types.
a. Clustered pair topology
(Two or more pairs of nodes are each physically connected to some external storage shared by the pair. 2-node cluster comes under this topology)
b. Pair+N
c. N+1
d. Multiported N*N

Multi host disk storage:
A storage which is connected to more than one node at a time. It provides the following benefits.
vii. Global access to file system
viii. Multiple access to file systems and data
ix. Tolerance for single node failures.

Cluster Time:

Time between all nodes in a cluster must be synchronized.

Heart Beat:
A periodic message sent across all available clusters interconnects transport paths. Lack of heart bat after a specified interval and number of retries might trigger an internal failover of transport communication to another path. Failure of all paths to a cluster member results in the CMM (Cluster Membership Monitor) revaluation the cluster program.
During normal operation, each node regularly sends out the heart beat information across the private network to let every other node know about this health.

Split brain:
A condition in which a cluster breaks up into multiple partitions, with each partition forming without knowledge of the existence of any other.

Amnesia:
A condition in which a cluster restarts after a shutdown with stale cluster configuration data (CCR). For Eg: On a 2-node cluster with only node1 operational, if a cluster configuration change occurs on Node1, Node2’s CCR become stale (=old). If the cluster is shutdown then restarted on Node2, an amnesia condition results because of Node2’s stale CCR.

Quorum device:
1. Is a disk shared by 2 or more nodes that contribute votes that are used to establish a quorum for the cluster to Sun.
2. Quorum device acquires quorum vote counts based on the number of node connections to the device.
3. It acquires the maximum vote count of (n-1) where n is the number of connected nodes to the quorum device.
4. Each node is assigned exactly one vote.

NOTE:
1. There must be a majority (more than 50% of all possible vote present) to form a cluster.
2. A single quorum device can be automatically configured by ‘scinstall’ for 2-node cluster only.
3. All other quorum devicews are manually configured after the Sun Cluster s/w installation is complete.
4. (n/2)+1 quorum devices are required. (Similar to replica)
5. Quorum device rules:
5. Must be available to both nodes in 2-node cluster
6. Information is maintained globally in the CCR database

Understanding Quorum:
Note:
1. To form a cluster an offer services, the nodes in a cluster must first reach quorum.
2. Each node in a configured cluster has one quorum vote.
3. Never have the number of quorum device votes exceed the number of device votes.
4. Always use the minimum number possible to achieve quorum, or the health of the cluster will depend on the health of the shared disks configured as quorum devices.

The quorum equation states that a cluster must have the total number of configured votes, divided by two (Note: Remainders are discarded), plus one.
Q = TCV/2 +1

If a cluster cannot reach quorum, then it does not form.
The individual cluster nodes do not boot fully, but wait until enough votes are available to reach quorum. If a running cluster loses quorum, the affected nodes panic and try to reboot (assuming auto-boot? Is set to true on those nodes).

Every shared device configured as a quorum device has votes totaling the number of connected devices minus one.
Q = TCD -1
Quorum required to operate:

Q = TCV/2 + 1 = (2)/2 + 1 = 2

Votes if one node fails: 1
When you introduce a quorum device, the equation changes. This Sun Cluster configuration, shown in the following figure, is one of the most common.

Quorum required to operate:

Q = (2 + 1)/2 + 1 = 2

Votes if one node fails: (1 + 1) = 2

Understanding SCSI reservations:
With SCSI-3 capable storage where there are more than 2 paths to the storage, this reservation is accomplished by each available cluster node registering a key by writing it to the disk. The controlling node is tagged as the owner, and the other nodes are tagged as capable of becoming the owner. If a node fails, the remaining nodes remove the failed node’s key from the disk, and it is no longer eligible to own the quorum device. Once the failed node recovers and rejoins the cluster, its key is re-registered.
In the event that the controlling node leaves the cluster, the remaining eligible nodes compete to gain control of the quorum devices.

Note:
/globaldevices file system, later renamed to /global/.devices/node@nodeid (where nodeid represents the number that is assigned to a node when it becomes a cluster member).

Creating an Administrative Console:
To use a machine outside the cluster as an administrative consle, add the SUNWccon and SUNWscman packages to the administrative machine. These include the programs ctelnet, cconsole, crlogin and others. They allow the administrator to type commands into one window and have them echo on multiple cluster node windows.

Configuring the cluster:
When the first cluster node is installed, the cluster enables installmode and is not completely configured. With installmode enabled, the firest cluster node sponsors additional nodes until quorum can be reached. Recall that in a 2-node cluster, a quorum device must be configured before the cluster will operate.

Important cluster commands:
cconsole, ctelnet, crlogin - multi-window, multi-machine, remote console, login and telnet commands
sccheck - check for and report on vulnerable Sun Cluster configurations
scconf - update the Sun Cluster software configuration
scdidadm - global device identifier configuration and administration utility wrapper
scinstall - install Sun Cluster software and initialize new cluster nodes
scrgadm - manage registration and unregistration of resource types, resource groups, and resources
scsetup - interactive cluster configuration tool
scshutdown - shut down a cluster
scstat - monitor the status of Sun Cluster
scswitch- perform ownership and state change of resource groups and disk device groups in Sun Cluster configuration

Solaris Unlimited

Tuesday, May 10, 2011

Sun Cluster 3.1

1 comment: