Friday, November 1, 2013

Data Storage High Availability

Storage High Availability with NetApp


As customers and service providers consolidate more and more applications and workloads onto shared storage infrastructures, it is challenging to maintain an infrastructure that is “always available.” Increased workloads and utilization drive a higher duty cycle, putting pressure on the storage architecture. A broader group of users and applications requires increased coordination of downtime for storage management and hardware upgrades/refreshes to prevent unintended out­ages. And because many different users, groups, or customers with different needs may be using the shared storage infrastructure at the same time, the impact of a failure proportionally increases.


To reduce the cost and complexity of protecting your IT environment from downtime and data loss with NetApp. NetApp storage is designed with high availability, flexibility, and efficiency in mind. A suite of capabilities within the NetApp FAS platform protects against component failures and even entire system/data center failures to keep your critical business operations run­ning. These functions work in tandem with NetApp storage efficiency tech­nologies to reduce capacity and opera­tional costs so that you can provide high availability (HA) for more of your environment.
HA Pair controller configuration provides data availability by transfer­ring data service of an unavailable controller to the surviving partner. Transfer of data service is often transparent to end users and app­lications, and the data service is quickly resumed with no detectable interruption to business operation.
Alternate Control Path (ACP) provides out­of­band management on disk shelves that use serial­attached SCSI (SAS) technology. ACP is completely separate from the SAS data path and enhances data availability by enabling the storage controller to nondisruptively and automatically reset a misbehaving component.

ACP technologies are a piece of great and logical craftmenship of th NetApp Filers that delivers full performance boost to the Enterprise. This is done simply by dividing the data and the redundancy paths via separate lanes. We can take a look at the logical diagram:


Controller-to-stack connections: Each storage system controller is connected to each stack of disk shelves through a dedicated Ethernet port:

Controller 1/A always connects to the top shelf IOM A square port in a stack.
Controller 2/B always connects to the bottom shelf IOM B circle port in a stack.
 
In essence, you daisy all IOMs of all shelves and then connect two remaining ports to both controllers. If you have single controller, you connect just one port. How exactly you daisy chain does not really matter, but keeping suggested order makes it easier to support.



The essence is in the redundancy. So now let us try to explain the HA Pairs and the benefits from them.

What is a HA Pair
An HA pair is two storage systems (nodes) whose controllers are connected to each other either directly or, in the case of a fabric-attached MetroCluster, through switches and FC-VI interconnect adapters.
You can configure the HA pair so that each node in the pair shares access to a common set of storage, subnets, and tape drives, or each node can own its own distinct set of storage. The nodes are connected to each other through a NVRAM adapter, or, in the case of systems with two controllers in a single chassis, through an internal interconnect. This allows one node to serve data that resides on the disks of its failed partner node. Each node continually monitors its partner, mirroring the data for each other’s nonvolatile memory (NVRAM or NVMEM).

Benefits of HA Pair
HA pairs provide fault tolerance and the ability to perform nondisruptive upgrades and maintenance.
Configuring storage systems in an HA pair provides the following benefits:
• Fault tolerance
When one node fails or becomes impaired a takeover occurs, and the partner node continues to
serve the failed node’s data.
• Nondisruptive software upgrades
When you halt one node and allow takeover, the partner node continues to serve data for the
halted node while you upgrade the node you halted.
• Nondisruptive hardware maintenance
When you halt one node and allow takeover, the partner node continues to serve data for the
halted node while you replace or repair hardware in the node you halted.


No comments:

Post a Comment