Setting the bar on data centre best practices

Joel Stone, VP, global data center operations, CenturyLink
Joel Stone, VP, global data center operations, CenturyLink

In data centre economics, design is half the battle. Site location, deployment of modular build, state-of-the-art facilities technologies and high efficiency IT equipment have filtered down from the hyperscale sites to inspire new levels of computing productivity for the enterprise. The other half of the war is waged on the management front, where operational process is increasingly viewed as the key to optimal resource management and reliability. Life cycle approaches, for example, that take into account both upfront investments and operational factors over time have demonstrated cost and uptime benefits: while TCO calculations have shown costs for staffing, management and energy to power and cool a facility over the typical 20 year data centre lifespan can outweigh CAPEX costs by a factor of four, it has been estimated that close to 70 percent of significant, unplanned events that shut down the data centre are caused by human error.

If the giant Internet sites serve as the source of advanced data centre design and technology, who provides the model for good practice in data centre operation? In this area, the Uptime Institute has delivered advanced tactics in its Management & Operations (M&O) Stamp of Approval, with CenturyLink Technology Services more recently offering flank support through its commitment to seek M&O certification across its global footprint of 57 data centres.

Introduced in February 2012, the M&O Stamp is an operational certification based on requirements developed in Tier Standard: Operational Sustainability, a 2010 Uptime publication that introduced a new standard designed to align operational behaviour with a data centre’s Tier specs. Reviewed and verified by an M&O Coalition composed of stakeholders from the enterprise, real estate, outsourced operations and multi-tenant provider industries, these criteria were adapted to address the needs of the many data centres, and older ones in particular, that were never designed to achieve Tier certification but could also benefit from adoption of management and operational practices contained in Operational Sustainability.

Rooted in 20 years of research into the cause of data centre downtime, which has convinced the Uptime that human error is the primary cause of failure, the M&O Stamp is a tool focused on operational process and behaviour that remains independent of data centre design, typology, infrastructure or size. It provides a score-based assessment in a number of areas including: staffing and organization (staffing levels, qualifications and skills mix); training and professional development; preventative maintenance programs and processes; operating conditions and housekeeping; and planning management and coordination practices and resources. This assessment in turn provides a benchmark against which the organization can measure process improvement and ultimately, achieve certification once the required criteria have been met.

While a number of providers – such as Equinix, Baidu, Colt Technology Services, Interxion and Fortune Data Centres – have obtained the M&O Stamp since its introduction, so far, this process has been largely piecemeal. CenturyLink’s announcement in early December that it would work to achieve certification across its global footprint of 57 data centres represents a huge commitment to enterprise-wide operational excellence, as well as an endorsement of Uptime expertise in this field. Commenting on the initiative, SVP and GM of global operations for CenturyLink Technology Services David Meredith noted: “We view Uptime Institute as the preeminent global authority on data center standards and believe that businesses should hold their data center providers to high standards for operational excellence. A data center’s design is a strong foundation, but keeping businesses running 24/7 requires solid processes and highly qualified staff.” For its part, the Uptime has applauded CenturyLink’s “volunteering for a level of scrutiny that no other company has undertaken at this scale:” as CTO of the Uptime Institute Lee Kirby explained, “by adopting an industry standard for existing operations, CenturyLink assures its clients that retirement, turnover, and contractor selection will not disrupt the fundamentals of the operations—and that availability levels remain steady,” characteristics that Kirby believes will offer CenturyLink “meaningful competitive differentiation.”

Differentiation of the company’s product offering is in fact a key goal of the certification process, according to Joel Stone, VP of global data center operations at CenturyLink: “We have a number of clients in multiple locations, and we are trying to offer that global consistency that will allow us to be the ‘go to’ colocation and data centre operations provider.” While acknowledging regional variations in demand and in delivery of services, Stone cited brand recognition for consistency in global delivery as an outcome of certification that will help CenturyLink address an important market segment. “For years, Savvis, now CenturyLink, has made claims for global processing procedures and for operation according to high standards,” he explained, “but we’re now trying to ensure that that is validated everywhere by an external, third party, and that where there are differences, these are aligned at all sites through consistent definitions.”

At the outset, M&O certification is expected to deliver marketing opportunity, but over time, Stone believes standardization will also deliver considerable operational benefits. On the issue of training, for example, he noted that CenturyLink’s commitment to obtaining and maintaining its certification will drive global, concurrent training updates. This is important in Stone’s view because the data centre environment is not static; rather “it is an ever evolving asset” that must be properly managed by staff with up to date skills. And while the provider community may have the resources to invest in cutting edge technologies and appropriate management, this is not core competency for most enterprises, who are, therefore, unable to match capabilities at competitive cost. Stone believes that by signing up for global M&O certification, CenturyLink is demonstrating a willingness to invest in training and other best practices needed for consistency, reliability and economy in service delivery. On the issue of cost, the benefits of certification are not limited to improved operational efficiencies, but extend to intangibles like the “true business cost of an incident or human error” to the customer and to CenturyLink. While Stone does not expect that this or any program can totally eliminate human error, Uptime certification can “identify areas where we have exposure… [and] create a base line from which we can address exposures so that we become better” operationally. And unlike Tier status, for which investment in retroactive certification many not be warranted in some existing data centre facilities, the M&O Stamp will deliver benefits across all CenturyLink locations because it is about people – a common resource across all types of facilities.

So far, CenturyLink has achieved M&O certification at three sites, and has worked closely with the Uptime Institute on a schedule for roll out of the program over the next two years to all 57 locations –training five operations team per quarter with 2016 the anticipated completion date. Some facilities that are in close proximity to each other and share the same operations team – in the case of the Sterling, Virginia location, for example, where there are six data centres within a five mile radius of one another – will undergo certification at the same time to streamline the process. As Stone described it, certification involves assessment an audit, identification of failures, quick remediation of any issues and retesting for certification, which lasts two years from the date of approval. Because CenturyLink has committed to recertification once the initial two years are up, the process will be virtually continuous – a large undertaking that CenturyLink sees initially as a marketing opportunity, but that will translate, Stone believes, into a vehicle for introducing the credibility and rigour needed for ongoing process improvement: “Data centres change,” he added. “Staff changes, and policies and procedures are updated and new equipment added – so this is an ongoing commitment to ensure that we are meeting change with proper processes. It’s so that we set the bar on the new, expected industry standard. We’re first in line, and we’re going to demonstrate that it is best practice that others should follow.”

LEAVE A REPLY