ADR in Action

Major Financial Services Firm

This customer turned to ADR to manage recovery operations for a large and rapidly growing farm of virtual servers supporting both production and development applications, distributed across two sites, that are greater than 26 miles apart. The first objective was to recover from a production server failure by automatically moving production applications to other available production servers at the same site, to achieve optimal workload balance. If additional servers failed, or there was no excess production capacity available, the second objective was to automatically move production workloads to remote servers at the second site, shutting down development servers as required, ensuring production service levels were maintained. The environment includes VMware ESX, IBM Blade Servers, IBM DS4300 Storage Arrays, Cisco MDS Storage Services, and Kashya Replication. ADR successfully orchestrated the complex workflows and analysis required to identify failure and coordinate all virtual server and storage activities required to meet both service objectives, within strict service level guidelines.

Large Telecommunications Company

This customer has a similar virtualized two-site environment as above, but relies on VMware's HA and DRS products for high availability of virtual server clusters within each site. The unique challenges for this customer were to extend high availability across multiple sites, model multiple failure scenarios in addition to virtual server failure, integrate with existing problem management systems, and enable both automated and manual failover and failback processes. The environment included, in addition to VMware, 3Par storage, Cisco Global Site Selector and BMC Remedy Service Management. With ADR, the customer easily modeled complex failover and failback workflows (11-16 distinct process steps each) and multi-tiered failure detection scenarios, including loss of site contact, storage failure, network failure, insufficient server capacity, and more. Over 40 operations stakeholders from multiple disciplines participate in each recovery event, and the resulting failure detection models consistently outperformed manual processes to rapidly detect failure throughout the environment and recommend appropriate recovery workflows. Complete site failover is consistently completed in less than one hour, dramatically reducing downtime and lost-business costs.

Back to ADR or Visit the Resource Center for video demonstration and white papers.

eZ Publish™ copyright © 1999-2008 eZ Systems AS