At an Intel Experience presentation held in Toronto last fall to showcase use cases for new chip capabilities, Scott Overson, GM of Intel’s Enterprise & Government Datacenter Sales organization, pointed to significant growth in computing needs. “Compute demand is growing at a 60 percent compounded annual growth rate of MIPS, verses a 25 percent annual growth rate for data. So compute demand is growing twice as fast as data,” he explained. Overson attributed this growth to two factors; the need for significant performance boosts in each generation of products to support advanced use cases like AI, security, and virtualization, and the expansion of overall compute requirement associated with the ongoing transformation of industries that is beginning to push processes across the spectrum from physical to digital formats. That was the best of times; now, we are in the worst of times. Today, Intel has issued new news, an additional pledge of $50 million for a pandemic response initiative that aims to combat the coronavirus by accelerating access to technology at the point of patient care, to speed scientific research and to ensure access to online learning for students. A key goal of the Intel COVID-19 Response and Readiness Initiative is to drive advances in diagnosis, treatment and vaccine development, through the leverage of technologies such as AI, high performance computing, and edge-to-cloud service delivery.
The world has turned on its head as governments, citizens and businesses struggle to mitigate the worst impacts of a global pandemic; however, common to these pre and post coronavirus perspectives is the ongoing acceleration of compute demand. And if tech forecasting on the pandemic’s overall impact on IT markets is fraught with uncertainly due to the absolute scope of economic fallout, growth in specific areas is clear. Physical distancing practices and remote work, for example, are driving exponential uptake of collaboration solutions, which in turn depend on the optimal performance and dynamic resource capacity in the underlying IT infrastructure that supports online solutions. In December 2019, the maximum number of daily meeting participants for the video conference app Zoom was 10 million; this March, that number soared to more than 200 million daily meeting participants. At the same time, infrastructure providers, such as Microsoft, are anticipating capacity constraints resulting from a spike in demand for cloud computing related to the pandemic, and responding with plans to prioritize usage of Azure for healthcare, government and collaboration purposes. In countries like Canada, information and communications technology has been identified as an essential service, and staff, including data centre operators among a range of ICT workers, flagged as performing functions that from a national security perspective, the country cannot dispense with during the pandemic.
But how is it possible for tech companies – or the many other organizations that are on the front line of pandemic response – able to step up so quickly to meet new demands?
In IT, the answer is ongoing innovation. As Overson noted in his November presentation, at Intel, research into new product design takes place five years in advance of rollout. So, the client and server solutions that are under development by India’s Council of Scientific and Industrial Research and the International Institute of Information Technology to enable faster and less expensive COVID-19 testing, or AI-based genome sequencing and risk-based analysis of COVID-19 patients comorbidities are based on new features and capabilities that were created much earlier by Intel. The AVX-512, the recently introduced VNI instruction set for virtual neural networks that works to enable AI capability in hardware that Overson singled out as an example of AI innovation is an extension of instructions introduced in 2013. And while high power Xeon processors are based on multi-generational development, the core software and instructions for Intel’s DL Boost, which speeds the processing of instructions in server processors during deep learning operations, or Intel’s Optane DC persistent memory, which allows users to derive insights from data more quickly, was were initially written by Intel, and developed subsequently for scale by ecosystem software partners.
Another answer lies in a platform approach, which Intel believes is essential to supporting technology that is increasingly complex, and increasingly pervasive in “the data centric era.” According to Overson, “It’s not just the chip. There are lots of companies that will talk to you about the technology they are making within their chip. At Intel, we have an unparalleled capability end-to-end for building the platform. And that means the edge, all the way back to the data centre, in a data centric world, and includes instructions, capabilities, and work at a software level. We are no longer a company of engineers building software for hardware, we are now looking to develop one API – one platform – and investing significantly in the power of the platform, and in the power of software.”
A good example of software work beyond silicon design is the Intel Data Center Manager (Intel DCM), a data centre monitoring solution that helps to optimize operations and maximize uptime, two goals that will become increasingly critical as the current pandemic places increased strain on infrastructure resources. Introduced 10 years ago as a reference platform and then SDK for OEM partners such as Dell, Lenovo and HP, who integrated the software into their products, Intel DCM is now used by a range of ISVs (ex. Schneider or Siemens in their DCIM products), and has since evolved into a second, commercial solution for enterprises looking to monitor and manage servers and other data centre devices. Installed on-premise in Windows or Linux versions as a software instance (one node for 20,000 devices) on a physical system or a virtual machine, the solution gathers telemetry data from all the devices, including servers, UPSs, etc. and aggregates thermal, power and device health data in a console with dashboards providing different insights into the IT environment. DCM can be installed in multiple data centres that are on the same network, and can be opened up for remote operation.
Ron Pullis, strategic business development and new technology enablement lead at Intel, has outlined DCM use cases as follows:
- Automate health monitoring – all devices can be monitored, grouped by brand, or deployment in a physical setting. If a fan is starting to shut down, this can be identified on a dashboard, and proactive maintenance initiated.
- Improve system manageability – insight into device location, versioning, and workload enables better systems management.
- Simplify capacity planning – monitoring power consumption in devices, or measuring server utilization provides insight into racks that need to be moved, or racks that may need a power boost. Data can be used in budget planning (financial and physical) for the data centre.
- Identify dead and underutilized servers – allows for identification and redeployment/repurposing of ghost servers.
- Measure energy use by device – consumption data can be used in capacity planning and procurement.
- Pinpoint power/thermal Issues – every device acts as a sensor, and data is pulled every 20 seconds to automatically generated a report. This appears in a dashboard as a histogram, with the distribution of temperatures identifying over-cooling, or an imbalance in thermal profile. The report helps to identify issues such as workloads that are not distributed properly, or an airflow problem.
- Create power-aware job scheduling tasks – DCM can perform power-aware job scheduling and VM migration, scheduling workloads to ensure there is adequate power for the job.
- Increase rack densities – device power/thermal data tells operators how many devices should be placed in a rack, and what position (top or bottom) they should occupy.
- Set power policies and caps – policy can enable the prioritization of critical workloads. In a power outage, power can be routed to ensure critical servers stay up, while others are powered down. Rules can establish, for example, that power won’t exceed 300 watts for a system via control of the CPU frequency, on a server or rack basis.
- Improve data centre thermal profile – DCM provides finer granularity in terms of thermal data, allowing operators to fine tune set temperatures and cooling techniques.
- Optimize application power consumption – device data enables operators to ensure application workloads are in optimal locations/devices, based on power requirements.
- Avoid expensive PDUs and smart power strips – PDUs have been used traditionally to monitor power consumption, but for the last decade, servers have exposed telemetry (via baseboard management controllers) that includes power consumption. DCM can pull that information directly from the system, without using PDUs or smart power strips.
According to Rami Radi, senior software application engineer, Intel Data center Management Solutions, this diverse list of use cases helps to differentiate Intel DCM from other DCIM or server management solutions: “It’s a merger between both ideas, designed to help with challenges such as capacity planning, asset management, inventory management, thermal issues, cooling analysis or energy monitoring. It’s a little bit of everything – and it’s vendor agnostic. Most solutions in the market are specific to their particular brand. DCM is not only vendor agnostic on the server side, it also supports a lot of other IT devices such as PDUs, switches, NASes and SANs. Under the hood, there are a ton of protocols that we support to enable this… so it doesn’t matter what the technology is, or what the platform is – as long as the device exposes some telemetry information via a dedicated user management port, we are able to manage it.”
This more wholistic view, combined with remote environmental management, enables DCM to deliver the operational cost savings and infrastructure efficiencies that are critical in times of increased – and exponential – acceleration of compute demand. Ultimately, it is the product of Intel’s own broader, solutions-oriented vision of its role in the data centre, that has been designed to provide what Orson called “balance from a performance perspective – so that there’s not a bottle neck in any one subsystem of the platform.” He concluded: “We continue to look at a much broader view of where we can contribute and lead in the data centre. It’s not just about processing data where we will continue to be a cornerstone moving forward, but also moving the data with network and communications products and solutions, and also storing the data through our SSDs and persistent memory product family. So one approach is expanding breadth and relevance in multiple categories, and a second is the growing importance of software solutions and platforms beyond the silicon design. We will continue to be a leader in silicon design manufacturing and we’re putting that in a richer, more robust context across data centre solutions.”