Renowned for their talent in pushing the envelope on data centre innovation, the hyperscale infrastructure providers pose a challenging conundrum to consumers of data centre services: should businesses emulate or take direct advantage of cutting-edge design and operational techniques? A decision to adopt lessons learned depends to some extent on an honest assessment of the user’s compute needs and relative appetite, ability and budget to build state of the art. But an even more telling exercise involves investigation into the source of the hyperscalers’ reputation. How exactly do they achieve service delivery performance that most enterprise operators can only dream of?
InsightaaS was recently offered a glimpse into the hyperscale world as it has been constructed by OVH, a French service provider founded by the Klaba family in 1999, which now offers public and private cloud, VPS, dedicated servers and other web hosting services to businesses around the globe. The largest web services provider in Europe, OVH is making substantial investment in the development of new products and services – OVH’s public cloud offering is seeing the most rapid growth – and in expansion of its data centre footprint; the company has opened two facilities in Asia Pac and in 2017 acquired VMware’s vCloud Air business to expand presence in the US.
The OVH discovery of America began back in 2012, though, in Canada with the establishment of a large data centre in Beauharnois, just south of Montreal. The Beauharnois facility epitomizes many of the key principles of hyperscale operation, as well as design, process, and business logic that sets OVH apart in the industry. Central to the OVH service vision is the “industrialization” of compute production, a concept that the company works to realize through a DIY compute factory strategy. Except for some smaller operations in Asia that are housed in colo facilities, all of OVH facilities are built ‘new’ – often in existing industrial structures – allowing for standardization in design, set up and provisioning, facilities operation and infrastructure management. In the Beauharnois facility, OVH has incorporated many ‘industrial’ attributes, including the following.
Scale – and economies thereof
In choosing new data centre sites, OVH has tended to buy old industrial buildings that can be brought online quickly without delays associated with usage permits and associated issues. In Beauharnois, the company worked with local authorities to acquire a building from Rio Tinto Alcan that formerly housed an aluminum smelter: it occupies 26,000 square meters of a six-hectare site in the Beauharnois industrial park. Currently, the company owns six of eight halls in the building and is negotiating with Rio Tinto to complete the acquisition. It now uses 18,000 m2 of physical space (will have another 6,000 with addition of the remaining two halls) and is building out compute capacity as is needed in modular fashion.
OVH budgeted $130 million overall for the building acquisition, refurbishment and equipment purchases. Initially, it constructed tall towers within the physical space to house the IT infrastructure, as these could more easily vent hot air out of the top of the building. However, to avoid cost and time involved in this set up – Beauharnois data centre manager Fabrice Fossaert estimates that it takes three months to build a tower, but only three weeks to assemble a container; additionally, towers were too high to easily fork lift server racks – a new approach to housing servers that takes advantage of industrial production capabilities in the new OVH manufacturing facility in Croix France was adopted. To enable quick assembly and easy rack deployment, the company has built containers, in which racks are stacked in parallel to a manageable height, and a second row of racks assembled on the other side of a central hallway. When fully built out, there will be 20 containers in each of 6 halls at Beauharnois, and with 1,600 servers per container, there will be approximately 32,000 servers in total online. Today, the facility utilizes approximately 25 percent of total potential capacity (2 of 8 halls).
The DIY data centre
Beyond the potential for delivery scale that the Beauharnois facility size provides, the use of standardized containers facilitates server deployment, as it delivers cost savings for the company through economies of scale. Containers are mass produced at the Croix facility, which operates with a high level of automation, advanced robotics, laser cutters and other precision machining tools, according to specs developed by a team of 30 specialists in industrial engineering. Based in France, this group works to optimize data centre components for quality, cost and ease of deployment – in Canada and at other OVH locations. According to François Sterin, chief industrial officer, OVH, working in mechanical and hardware innovation labs, the engineering group uses a process called “one; ten; hundred” in which one is the prototype, 10 models are created to test for viability in preproduction, mass production begins with 100 pieces, and ultimately this is extended to 1,000 piece production which characterizes “steady stream, full production.” The goal, he explained, is to create very short cycles for the development of precision components, to enable the company to move very quickly from an idea to industrial production.
In addition to containers, OVH also manufactures its own racks – “we have built our own box in a box,” Sterin added, a model that is “very industrial, and very different from that in other data centres.” The OVH “hori rack” are built horizontally, with a space between to accommodate a fork lift that will stack three racks on top of each other. At Beauharnois, it has even fabricated its own man trap: “we realized we could save around 50K for each [of these]” he explained. “When you have 20 in a building, you can save a million just on man traps [by building in-house instead of buying commercial products] so we have built the map trap, integrating all the software that is needed for monitoring.”
Kicking Henry Ford’s tires – embracing supply chain management
Server build at OVH is carried out at the local level. In a production room at Beauharnois, servers are assembled, equipped with all the necessary cabling, placed in racks and tested through work load simulation for CPU and configuration viability before being sent out for ‘lift and shift’ in the data centre. As part of this quality assurance, “we push servers to the maximum to make sure they are working properly,” a production specialist noted, and this is done before the racks are deployed to avoid failure in the working area of the data centre. Components are largely from Asia – CPUs are Intel, AMD and Nvidia, storage is Western Digital, etc. – and technicians follow a set assembly procedure, through depending on customer needs, may add more RAM or SSD to the configuration.
While OVH manages a lot of its build internally, externally sourced components like CPU or the UPS in the facility are always used in the same increment and so that the company is able to buy in bulk and share supply across its various regions. According to Sterin, “the demand forecast might change from one location to another, and we need to be able to make the industrial supply chain work for this.” The supply chain for servers and for the data centre infrastructure is managed centrally to ensure standardization – “that is our strength – mastering the supply chain – from the components to the rack in the data centres,” he noted. The Canadian facility participates both up and down stream in this chain: the Beauharnois server production unit also supplies hardware to US sites in Hillsboro, Oregon and Vint Hill, Virginia.
In Canada, most compute capacity is dedicated server; however, as Sterin explained, infrastructure devoted to the OVH public cloud offering is similar, with some small differences. Housed in separate containers, servers in the public cloud are more powerful, there are more network switches (vLAN rack at the end of the row), and two power supplies are provided for greater redundancy. Servers are connected to a UPS, as opposed to batteries (as are dedicated servers), and security is tighter in cloud areas, as only authorized personnel allowed to enter the container. Though public cloud area tends to be hotter than sections where customer utilization is low, Sterin argued that “the look and feel” is basically the same as in dedicated server sections: “it’s all built in house and we rack and stack as quickly as possible.”
OVH does accommodate customizable configurations – an HPC offering with big processors, large network attached storage arrays and water cooling at the back of the rack, for example, is available – and keeps a small stock of machines on site so it can quickly deploy custom solutions – vSAN is available in two hours. However, the primary goals are standardization, regularity, and simplicity in deployment. This uniformity is designed to support two management objectives: according to Sterin, “It’s different than in a colocation where you have different customers. Because we control everything ourselves, there is homogeneity, and standardization. We know how the air flows in the servers, we have the supplies we need, and because we control the whole lifecycle, it’s easier to manage. In a colo, there’s a mix and match of everything, and they have to go with the least common denominator in terms of set up and control.” This control allows OVH data centre operators to rely on standardized practices for maintenance and other processes developed by a global operations team. In addition, OVH has established a DOC (Data centre Operating Centre), a central monitoring room that manages standardized infrastructure across all data centres. For example, the DOC monitors temperatures, providing a layer of supervision above that which is delivered locally.
A second goal is resource sharing. While there may be local requirements – in electricity for North America, Europe and Asia, for example – standardized servers, deployment and operational models for maintenance and management (for example the frequency with which operators check infrastructure) means that staff can move between data centres, becoming operational very quickly. Flexibility to address resource needs at sites that may require more people has been used in North Americas, as Canadian staff have been shared with OVH US data centres.
Water – cool!
Within data centre generally, cooling is a critical issue. Typically, approximately 40 percent of energy spend within enterprise facilities is devoted to cooling, a burden that is likely to become only more onerous as server densities increase to meet new compute intensive demand associated with Big Data, AI and other applications. At OVH, the uniformity generated by lifecycle control of data centre equipment and components described above has made possible the implementation of mass scale water cooling techniques that are simple, yet effective in reducing data centre PUE and carbon footprint. Water cooling is not new – many organizations have used evaporative and other techniques to remove heat, typically in specialized applications. But as Sterin pointed out, OVH has evolved its original “craft” approach to water cooling, which began 15 years back, and industrialized it for application on a mass scale (for 300,000 commoditized servers in its primary facility in Roubaix, France). “Our technology is not rocket science,” he explained. “But the way we have applied it to an operation of such large scale is where we see hope. Other providers are doing this, but we have been doing it for 15 years and have learned. We have operational experience, and that’s where we have an edge. We have teams that know exactly how this works.” Innovation, in other words, can be a genius idea, but can also lie in how technology is applied.
At Beauharnois, OVH has deployed water cooling to the server in a closed loop system that pumps water into the server, circulates it on a water block surrounded by copper plate on the CPU, removes water to a larger tank and then pumps it to the outside where dry coolers lower the temperature. In winter, the outside temperature cools the water which contains a small amount of glycol to prevent freezing. OVH also uses a small amount of chlorine to manage water quality: pipes that connect to servers are fairly small and if bacteria accumulate inside, water flow and cooling efficiency is reduced. Excess heat is extracted through large exhausts on top of the racks to the outside, though in winter this excess heat is used to help heat the facility. In the summer, the data centre runs at hotter temperatures (35+ degrees for the ambient temperature of the building; it’s hotter inside the servers), though operators open the windows, and use simple fans to take full advantage of free air cooling (containers help protect the servers from dust). According to Sterin, because there is no evaporative cooling, no chillers or mechanical cooling, Beauharnois has a good PUE rating (1.09 in the summer moths), and the facility has run with these environmentals for several years with no degradation of the servers.
Beyond physical space constraints, access to power is a critical challenge for data centre facilities that look to scale. Like many data centre operators in Quebec, OVH is able to take advantage of very competitive hydro electricity rates at the Beauharnois site (residential rate comparison available here). Additionally, the Beauharnois energy supply is quality power that is super abundant. Energy comes from one of the largest river-run hydro stations in Quebec (on the St. Lawrence) to a 120 kv OVH high voltage substation, where a 100 MVA transformer steps down the DC feed that enters the building. Initially, OVH used a small transformer (20 MVA) but worked with Hydro Quebec to upgrade the original transformer to 100 MVA last year. Electric supply is managed in the power control room by a staff expert formerly from Rio Tinto who has long term experience managing power flow in the building. Sterin estimates that OVH is currently using only 10 percent of available power in the facility – and hence can support significant growth.
At Beauharnois, there are two power feeds into the building that serve to ensure power redundancy. Within the data centre, several racks (ex. the cloud servers) are connected to UPSes that translate power to AC and regulate flow to protect the IT equipment. In the facility, OVH has 23 UPS units set at 500 kwatt each, a smaller size that enables a flexible and modular approach – through deployment of the right size for compute resources. Other racks operate with 20 volt DC power, managed by centralized PDUs and battery units that connect to servers through individual fuses. The bridge between the main power, the battery, cables and fuses is proprietary OVH technology that was designed to manage power issues. With this battery system, if there is a problem, only one server is impacted, not the entire rack: “it’s like a UPS on board the rack,” Sterin explained. And with the help of 10 diesel generators that kick in should there be a large-scale outage, Beauharnois has 48 hours of generator power to keep machines up until the line is repaired.
Extending server lifecycle
The disposable server, an approach adopted by some hyperscalers to enable rapid replacement of commodity servers is becoming less viable from both economic and environmental perspectives. At OVH, the goal is to give servers ‘second’ and sometimes even ‘third’ life – the ‘reuse’ in the three “Rs” – before breakdown into components for recycling. According to Sterin, if a customer releases a server at Beauharnois, it may go back out into inventory in the same configuration. However, if it is out of date or is based on reference architecture that OVH doesn’t sell any more, it will be de-racked and sometimes rebuilt. Refurbishment is a first goal, which can provide value to the company and to customers. OVH, for example, offers refurbished machines to organizations who may not have mission critical requirements at a reasonable price (less than 10 Euros per month). Servers that cannot be rebuilt are sent to the knock down area and separated into components; RAM, motherboard and discs are tested for reuse, and components that are no longer usable by OVH are assigned to a broker program where they are sold to recyclers as scrap or for a secondary market. Sterin claims that very little goes to landfill, and that reuse, and the reverse supply chain produces good value for the company.
New technology deployment – buy, partner or build?
This kind of lifecycle management relies on sound knowledge of inventory, both in stock and on the data centre floor – each server has bar code that identifies the server location and age so that operators know when they need to decommission and replace it – as well as sound monitoring. At Beauharnois, OVH has a basic monitoring system which senses temperature and when a server or UPS machine goes down, and reports this into the monitoring room and with text-based alarm that alerts operators.
This system was built in-house; however, in deploying new technologies, OVH engages in each of the build, buy, and partner tactics, depending on the type of technology. UPS units, for example, are sourced through third-parties, and OVH recently announced a new partnership with NVIDIA, according to which it will offer the NVIDIA GPU Cloud (NGC) software platform to support customers looking to deploy artificial intelligence – to provide customer access to third-party technologies where it does not make sense for the company to reinvent the wheel. In advancing data centre automation, on the other hand, OVH is more likely to work towards its own solution. So while it is considering commercial DCIM options, OVH is more focused on the development of a smart PDU that will introduce thresholds for power consumption, functionality that is better suited to OVH business needs. Since Beauharnois is not a colocations facility and because power is inexpensive in Quebec, customers are not requesting capabilities that are featured in many DCIM packages, such as power monitoring at the server level.
Ultimately OVH will likely implement its own system; as Sterin noted, this will help the company achieve its overarching goal of remaining “very lean and frugal.” In each of its design, operations and process decisions, the company looks to maximize efficiency and to speed time to delivery. In the integration of new technologies, for example, it takes advantage of supply chain management to shorten lead time to delivery: “What we want to be able to do is speed deployment so that from the time we receive the component, assemble the server and connect it into the data centre we take one hour. [This is possible because] we don’t have suppliers that we have to wait for and because we can balance our production levels depending on the customer needs,” he added.
Optimizing process is part of the company’s lean approach to operations, designed to maximize value. “It’s a capital-intensive business,” Sterin explained, “so we can’t let things sit idle – components represent working capital. We need to make sure that every time we access a supply chain, we have some assembly. As soon as we’ve assembled the servers, we put them in a horizontal bay or rack, where we’ve automated the interfaces and all the electricity is pre-cabled so when it arrives in the data centre, we just use a fork lift to plug in the racks. It’s plug and play – very modular – so you can easily build more capacity. Given the amount of data and capacity that we have to manage, if you don’t master your supply chain, its really hard to compete in this business.” By master the supply chain, OVH is able to remove the margins, support innovation that matters and reduce waste. For example, servers have no packaging – they contain just the motherboard and thin metal – as customers will never see packaging that would only increase cost. And by building from the ground up – locating in regions with clean, cheap power, acquiring old buildings for new data centre sites, deploying modularly, deploying efficiency technologies at scale – the company is able to manage down cost and carbon footprint. For enterprise consumers of data centre resources, efficiency and cost savings achieved through this scale, data centre expertise and defined process may be difficult to replicate, a challenge that begs the proverbial question “why”?