Management algorithms a cloud panacea?

In the article below, Kevin Fogarty tackles the thorny topic of data centre power management, a huge subject that encomp

Kevin Fogarty, IT journalist
Kevin Fogarty, IT journalist

asses cloud and virtualization technologies, the politics of efficiency vs. uptime in facilities management as well as power and carbon pricing. He has chosen to focus in this piece on recent research exploring the potential for task and power management algorithms to resolve power consumption challenges — an issue that Fogarty righty argues has been wrongly identified as the problem with cloud. While more research focus on energy use in the data centre is clearly warranted — the carbon impact of this segment is growing more quickly than end user device or networking categories — increased power consumption has much to do with rapid growth in demand for computing resources overall and less to do with the lack of efficiency improvements. In fact, Fogarty offers a well-documented take on change in data centre management over the last decade, with virtualization and cloud technologies at the centre of this story, and new algorithms treated as a welcome extension of these efforts. And as Fogarty points out, growth in demand for computing resources has much to do with value: while the GeSI SMARTer 2020 report has concluded, for example, that on the carbon emissions front alone, ICT has the potential to produce an abatement potential that is seven times greater than its own footprint, on the computing front, the cost and agility benefits of cloud are now well recognized.

Researchers focused on increasing the power efficiency of data centres have recently been building a set of algorithms designed to make large-scale IT installations far less power hungry. Many, however, justify the need for power-conservation methods by pointing at the high volumes of power used by data centres worldwide, without acknowledging the vast increase in efficiency data centres are usually believed to have achieved.

Implicit in much of the discussion is the assumption that data centres and cloud computing are wasteful and that the efficiency of their power use should be improved before either is allowed to spread any further.

Researchers are certainly correct that cloud computing is far from as efficient as it could be, but not in the assumption that algorithms designed to tune performance levels of data centre hardware to improve its power efficiency are the one of very few options available or in use to reduce the burn.

In fact, according to studies of data centre-performance metrics from the Uptime Institute, DCD Institute and other computing-industry associations, both data centre technology vendors and data centre operators have made tremendous improvements in power efficiency during the past decade or so by redesigning data centre hardware or using different types altogether, improving both the technology and the configuration of data centre hardware and the HVAC systems that keep it cool and by using, funding or building new sources of renewable energy at a far higher rate than most other industries.

Data centres still do use a tremendous volume of electrical power — and so, by proxy, do the cloud-computing services that live in hyperscaled data centre facilities. However, datacenters and the cloud represent a tremendous improvement in the efficient use of power, money and the labor of IT departments because they concentrate most of the computing power needed by even large companies in a small number of purpose-built, professionally managed facilities where power use can be monitored and managed, rather than scatter tens of thousands of servers among thousands of small offices and department facilities — as was most common a decade ago — when IT departments often had difficulty keeping track of servers, let alone the amount of power they use.

Cloud computing — which was invented specifically to make more efficient use of existing computers and networks by dragging local networks and standalone servers and any other computing resource into a common pool of resources that any application, server or end user in an organization could access — helps when it works the way it's supposed to.

Data centres — and large-scale public and private clouds — do still consume massive amounts of energy and do need to continue improving their technology and management techniques to continue to improve — possibly even by incorporating performance-management algorithms created by researchers intent on improving the efficiency of a single machine or set of machines at a time.

Neither cloud nor data centres in general, however, are profligate wasters of power; they use a lot of electricity, but they use it to produce computing power and capabilities whose value is higher than that of the resources they use, and they continue to increase the volume of power they put out while reducing the volume of electricity required to do it.

Green computing leads to the cloud

Current data centre and cloud-computing technology reflect a fundamental shift in corporate computing from the physical to the virtual that made systems management more efficient, and delivered a huge return-on-investment by replacing thousands of small, inefficient servers that had to be maintained in person with a few dozen or hundred powerful servers stuffed into increasingly dense racks.

The first wave of x86-based server virtualization gained momentum during the first decade of this century, for example, due to the rapid and dramatic return on the investment in the technology produced by consolidating the data and applications running on a large number of departmental servers into workloads running as virtual machines on just one powerful data centre server.

Virtualized servers had far higher utilization levels compared to standalone hardware, and drastically cut the amount of power used, the maintenance and IT services required and the amount of hardware running within an IT infrastructure. Eliminating a dozen departmental servers also eliminated a dozen separate power supplies, a dozen UPS power backups, a dozen sets of hard drives, memory, motherboards and processors, network connections, as well as the air conditioning required to compensate for the heat put off by aging, haphazardly maintained standalone computers stuck in the wiring closets of a dozen separate facilities.

Virtualization didn't eliminate the power required to run departmental workloads, but it shifted the load from departmental offices to the data centre, where the computing hardware, power supplies and cooling systems were far more efficient. In most cases, it probably reduced overall power use considerably: few organizations are meticulous enough to measure the amount of power saved by removing one server from one departmental office, or to add up those incremental savings into a grand total.

Data centre power use — a legitimate concern

Almost every organization operating its own data centre knows how much power costs to both run the systems, storage and networking hardware and the cooling systems required to keep them from overheating. As corporate IT infrastructures expanded, while becoming ever-more-concentrated within corporate data centres, the required volume of computing resources increased, the density of server racks became greater, the heat put out by those servers increased exponentially and so did the power required to cool them, despite continual improvements in the power-efficiency of both the computing equipment and the HVAC systems that keep them from melting down.

Annual growth in the volume of electricity used to power data centres worldwide rose 6.8 percent between 2012 and 2013 — but that's a fraction of the 19 percent growth in data centre power demand between 2011 and 2012, according to the annual DCD Institute Industry Census survey of global data centre operators.

The efficiency with which data centres use power has increased by 34 percent since 2007, according to the 2013 edition of an annual survey of 1,000 data centres by the Uptime Institute.

In 2007, an Uptime Institute survey showed an average score of 2.7 on the standardized Power Usage Effectiveness (PUE) metric for data centre power use; by 2013 that number had dropped (become more efficient) to a PUE of 1.65.

The percentage of North American data centre operators who said conserving power was at or near the top of their agendas dropped to just half in the same survey, however — a survey that showed sloppiness in the way power use and power costs are reported and even some obvious gaffes in the way the efficiency of the operation of the data centre itself is tracked.

"It is worth mentioning that 6 percent of respondents in the 2013 survey reported a PUE less than 1.0 — which is, in fact, impossible — so take these self-reported PUE numbers with a grain of salt," the Uptime report warned.

The demand for power for cooling data centres has grown so high that cloud-computing providers including Google and Facebook — whose services require so-called hyperscale data centres with requirements for capacity and power orders of magnitude greater than the vast majority of enterprise data centres — have begun routinely to site those facilities in places with extreme climates that can help keep them cool. In 2011 Facebook began building a data centre in Lulea Sweden, 100km south of the Arctic Circle — an area that markets its frigid climate as data centre-friendly Node Pole. Facebook opened a second data centre there in 2013.

Google, which spends between $5 billion and $10 billion per year on data centre construction and maintenance, has spent more than $1 billion on a facility in Hamina, Finland, which is 415 miles south of the Arctic circle and, by 2015, will be entirely powered by five wind farms in Sweden, the entire output of which Google has purchased for the next 10 years. (Google says it is hiring in Hamina, if you're interested.)

Considering the size of globe-spanning cloud-service provider data centres, it's not surprising that the owners would go to great lengths to save money on power and on cooling — which makes up about half the cost of operating an average data centre.

Making the cloud more efficient

It is possible, however, to cut the cost of operating a cloud computing infrastructure using an algorithm that controls the location and performance of virtual machines in a way that minimizes the use of both CPU time and memory capacity, according to a paper published last week by researchers at the University of Oran in Algeria (3,010 miles from the Arctic Circle).

The volume of power used and resulting carbon emissions from sources producing it can be slowed considerably using a two-stage process that first evaluates the amount of CPU and RAM capacity available and upper and lower limits set by administrators, according to researchers Jouhra Dad and Ghalem Belalem of the Dept. of Computer Science at the University of Oran, whose paper "Cutting the cloud computing carbon cost" was published in the Sept. 12 International Journal of Information Technology.

The process then assigns individual VMs to those resources using a resource-optimization calculation called "the knapsack problem" — a metaphor for a process that determines the greatest number of objects it is possible to get into a particular space or assign to a particular resource without leaving any of the most valuable behind.

These two are not the first to realize that the same technology that allows cloud operators to move, schedule and assign or limit resources to virtual machines to maximize the utilization of the available hardware would also allow the minimization of power expenditures using those same techniques.

In January of 2014, a group of researchers at IBM and Trinity College, Dublin published results from a set of algorithms collectively titled Stratus that created a model of a global cloud-computing infrastructure which minimized power use and carbon emissions while still accomplishing pre-defined computing and data-transfer goals.

In addition, the Open Data Center Alliance published a guide that uses the alliance's data centre-efficiency-measuring approach called the Carbon Footprint and Energy Efficiency Usage Model  to trace energy use and carbon emission levels across a cloud ecosystem.

Stop the cloud until you make it more efficient?

There have, in fact, been a whole series of papers based on analysis of cloud-computing management systems and the potential for power-and carbon-emissions savings that could result from VM migrations and task management whose goal is to save power as well as run existing workloads.

Even in well-managed cloud infrastructures with plenty of resources, management tends to be wasteful of the cloud's own resources, which could be far better utilized using any number of easily available methods and task-assigning algorithms, according to researchers who blame a wealth of computing resources for the lack of desire to conserve them.

"Reliance on Moore's law to solve inefficiencies has increased the problem (of resource use in cloud computing)," according to "Impact of Algorithms on Green Computing," by Sanjay Kumar of the computer science department at Ravishankar Shukla University in Raipur, India, who writes that the flip side of Moore's law is that "software efficiency halves every 18 months, compensating Moore's Law."

The papers, which sound like criticism of cloud computing in general, are actually an extension of the conclusions of a 2011 study by the Carbon Disclosure Project  in London, which concluded that European companies could cut the carbon emissions in half by 2020 by moving their IT systems from their existing platforms to cloud computing systems.

US companies, the study said, could save $12.3 billion in energy costs and save enough oil to power 5.7 million cars for a year by shifting much of their IT infrastructure to the cloud.

Cloud-computing infrastructures are generally more efficient in terms of energy use than existing IT infrastructures, when the comparison is between on-premise and external cloud systems, according to a report on small-and medium-sized businesses done in by the Natural Resources Defense Council.

Neither that, nor the other surveys or research reports citing resource savings resulting from the use of specific algorithms were specific about how much savings in time, energy, carbon or processing power could be achieved, how quickly, or at what cost by improving operational management of cloud infrastructures.

Sources quoted by London's The Guardian newspaper in 2011 argued that even the motivation to move to cloud platforms depended more on business concerns including time to market than they did on cost savings or efficiency issues.

Keep improving, don't stop building

All missed the point by focusing on one aspect of a larger picture and evaluating it independent of all the others: cloud-computing infrastructures managed automatically by algorithms designed to save power may indeed do so, but are an improvement on existing ways to manage cloud infrastructures. The cloud itself is an improvement on the comparatively disarrayed virtualized-server-based infrastructure that usually depended on hypervisors and management software from a single vendor and provided little insight into what a given virtual machine was up to at any given time, let alone how efficiently it was using power controlled by hardware three levels of abstraction below it.

Virtual-server infrastructures were more efficient that un-virtualized data centres and both were more efficient than distributed computing models that put a single application on an underutilized, overpowered server to minimize the chance that anything on it would crash, then stuck the hardware in a departmental wiring closet where its only routine maintenance came from administrative assistants and interns sent in occasionally to turn it off and then back on to get a frozen application working again.

Saying we could slow the growth in demand for power by managing the placement and operation of virtual servers within a cloud infrastructure is perfectly fair and accurate — but also perfectly obvious.

The speed with which global cloud infrastructures are currently growing reflects not a need to outrun the inefficiency of the operation of those clouds, but the need of organizations and end users stuck in far less efficient infrastructures to drag themselves into the 21st century, and their desire to take advantage of services within the cloud that they could never create or buy outright themselves.

Task-management algorithms that improve the efficiency with which cloud infrastructures are managed and conserve the resources of data centres and service providers that make up the cloud are important to the continued development and operation of cloud infrastructures. They are not a revolution that needs to happen to keep cloud computing infrastructures from collapsing on themselves — at least, not any more than performance-enhancement and efficiency-improvements have prevented exactly the same thing as local-area-networks grew into global ones, and as massive server-based applications shrunk down into versions that could be used from a smartphone and devices as small as a radio was 20 years ago become powerful enough to run graphics and software only the most sophisticated and powerful data centres of two decades ago could have managed.

Power-use-minimizing algorithms will help make cloud computing less wasteful of its own resources, but they are not the balm required to make cloud computing efficient enough to be acceptable even to the most carbon-emissions-conscious energy monitor. The cloud is already more efficient than the resources it is replacing, as well as being far more powerful, far more flexible and far more egalitarian in both availability and usability.

Developers, cloud-platform service providers and internal IT organizations should all research and adopt more efficient ways to manage and configure their cloud projects; they should not hold up a single cloud project to wait for more power-efficient ways to do the same thing. There's plenty of time to gild that lily; it's not necessary to keep it from growing until it's possible to figure out how to make it emerge already gilded and packaged into a corsage.