The role of energy in research computing
According to the Department of Energy's Information Agency (EIA), data centers in the US account for 3-5% of all electricity utilized today. Consider that while the power footprint of an average household is under 100 Watts per square foot (e.g., http://www.eia.gov/tools/faqs/faq.cfm?id=97&t=3), floor space in data centers can exceed 10,000 W/s.f. The trajectory for power utilization is upwards, as society's reliance on computing technology continues to grow.
Fundamentals of electrical usage for computing
The physics of electricity in computing
The central processing units (CPUs) at the heart of modern computers include tens of millions (i.e., the Intel Atom) to billions (for Intel's and AMD's latest multi-core processor) of tiny transistors. Binary logic is the basis for computational operations, including the FLOP or FLoating point OPeration. The implementation of binary logic is via logic gates made up of transistors. One type for a binary "AND," another for "OR," and yet another for not-and or "NAND." Computer memory, similarly, represents data as a series of high or low electrical values, representing '1' and '0.'
Here's the point: a transistor is a type of electrical circuit that takes two electrical signals as input, and produces one signal as output. The second signal, which is not output, is released as heat. Thus, we see that a CPU that utilizes, say, 100 Watts of electricity will need to dissipate quite a lot of that power as heat. The message here is simple: the act of computation on modern digital electronics produces heat as a byproduct.
There is a lot of detail, which I won't attempt to thoroughly cover here. For example, there are numerous other components in computers, and other subsystems on the CPU itself, that have their own power utilization tendencies. A single computer, whether it's a desktop workstation, or in a supercomputer cluster, cloud cluster, or elsewhere, will consume anywhere from a little under 100 Watts (for a laptop computer) to over 1000 Watts (for a high end workstation, or computer with multiple CPU sockets, or sets of accelerator devices like GPUs and the Xeon Phi). These computers have all kinds of mechanisms for temporarily shutting off, or minimizing, power utilization for components not in use. Nevertheless, the statement above prevails: the act of computation requires electricity, and produces heat.
Supercomputers are groups of individual computational units, grouped closely together and linked by high performance networks. This can concentrate thousands of kW of power consumption in just a few square feet, and the waste heat will need to be removed lest the components overheat. This is typically accomplished by blowing cool air and/or water (or another liquid with desirable properties) over the components.
Movement of air and/or water can require a substantial proportion of the total electrical consumption of the system. The power utilization efficiency (PUE) number is a measure of this "overhead" for cooling. Historically, especially with air-cooled systems and in data centers that didn't keep incoming (cool) and outgoing (warm) air adequately separated, PUEs of 1.5 or greater were common. This means that for every 100 watts that is needed for computation, a further 50 watts is required for cooling.
In modern data centers using best practices for cooling, a PUE Of 1.1 of 1.3 is more common. This indicates that cooling will consume an additional 10 to 30% of the energy use of the system. This is based on long-term utilization, since at any moment many factors might influence the cooling needs. Such factors include how busy the computer system is (corresponding to the heat produced through computation and other activity), the outdoor temperature and humidity, and other factors.
Most people have lost work on a computer due to a power loss or other sudden failure. With a supercomputer, the impact of such loss can be amplified by the scale of computation ongoing. A big computational job might run on 1000 nodes, composing 24,000 CPUs, for a week. That's over 4 million CPU hours. If there is a power loss at the end of the 7th day, that work might be lost and need to be restarted (modulo checkpointing and other techniques that can be applied to mitigate such losses.) Additionally, the sensitive electronics that make up the supercomputer and its many subsystems might fail if the power is unsteady or is lost.
Therefore, supercomputers, along with most enterprise computing equipment found in data centers, are likely to be protected by some sort of backup power system. A typical scenario is a bank of batteries that automatically takes over the power load, if the power to the facility should fail. These uninterruptible power supply (UPS) devices usually also serve to condition the power against momentary surges or drops. Depending on the needs of the center, the UPS might be backed by an on-site generator. Thus, the UPS might only need to last a few minutes, until the generator is started.
The data center industry has standards and practices for redundancy of power source, cooling, power conditioning, networking, and other important factors. Most research computing centers have somewhat lesser needs, but might be collocated with equipment with greater needs. Capital costs for a UPS can be a substantial fraction of the cost of a supercomputer, but it will serve multiple purposes, and last longer than the systems it services.
Energy and Data Movement
In manufacturing, there are growing trends towards vertical integration, and towards collocating manufacturing facilities with highly reliable and inexpensive electricity. For example, Renewable Energy Corporation, which produces silicon products for photovoltaics, recently relocated their world headquarters to central Washington. This is a location with some of the least expensive and most reliable electricity in the country, generated via hydro power.
In computing, similar decisions apply at the micro and macro level. At the macro level, we find some of the largest supercomputing facilities in areas of relatively inexpensive electricity. For example, two of the largest academic supercomputers, at the University of Tennessee and the University of Illinois, are in states with relatively inexpensive bulk power (generated via nuclear and hydroelectric). Central Washington has become home to huge data centers operated by Dell, Microsoft and others.
Cheap and Reliable Electricity
At the macro level, electricity for supercomputers (including for cooling and air handling) can dominate operational costs. This has favored growth of computer data centers in areas with relatively inexpensive and reliable electricity. A cooler climate, such as is found in the Pacific Northwest, is also appealing. Advances in cloud computing and the ubiquity of high-speed networks has made it appealing to many industries to locate computers far from their other operations, in order to have lower electricity costs.
This relocation is even the case for manufacturing industries. For example, Boeing makes extensive use of supercomputing for simulations of airflow and other aspects of aircraft design. Instead of collocating their supercomputers with their design team or corporate office, Boeing places some of its computers at commercial data centers, where power is less expensive and Boeing can outsource data center services, rather than hiring their own personnel.
But what of data movement, and the suitability of remote access? The good news is that command and control of computers can happen remotely, with relatively low-speed connectivity. Submitting a computational job can be done just as easily from a home subscriber Internet connection as from a computer on a high-speed research network. Most analysis, and some visualization, can also happen remotely. For complex visualizations, it might be necessary to be on a high-speed network, or to download input data and perform the visualizations on local resources. Experts have a variety of choices in how to approach this.
The challenge is in large-scale data transfer. Simulations can take huge input data sets, and generate correspondingly large output data. For example, a climate model might require topography, observations, and some sort of initial conditions as input. Input data for a global climate model can easily exceed tens of gigabytes, which is non-trivial to transfer, even on a high-speed network. Output data from a climate model, which might span several decades, might be a terabyte or more. Computational scientists might make many runs of the model, with different parameters, to discover trends. Such a computational campaign, with associated analysis and visualization, can be hundreds of TB to over a petabyte. At these volumes, data transfer time can dominate time to completion of analysis.
The solution, like with vertical integration in manufacturing, is to keep the computation and the data "in house," at the same facility, on the highest speed networks. This makes lots of sense for analysis, and also results in energy efficiency:
- Scientists and engineers do their heavy computation, including analysis and visualization, on systems that benefit from economies of scale (versus everyone having large workstations or deskside clusters that are unused much of the time).
- Those remote systems are more likely to be at inexpensive locations for power, with renewable electricity generation.
- Savings in costs and some savings in energy occur by not needing the highest-end networking to the desktop.
Large-scale Data Movement is to be Avoided
Getting data into and out of remote systems can still be a challenge, but input, at least, might only need to be supplied one time. For bulk transfer, "sneakernet" might still be the most effective: ship a load of disk drives or tapes to the data center, for loading to systems on the local network.
Doing the math, let's imagine a campus has a 10Gb uplink, and a scientist needs to transfer data to a remote center. The fastest possible transfer times are as follows. This assumes full-speed throughput, and exclusive use of the network link, which are unlikely on shared operational networks.
|1 gigabyte||1.5 minutes|
|1 terabyte||1+ days|
|1 petabyte||around 3 years|
As we become increasingly able to scale out computations to many more compute cores with more memory, data volumes rapidly will (or have already) become untenable for distribution. The only logical approach for large-scale simulation is to collocate all phases of analysis, from input data, to computation, to analysis and visualization. This is directly analogous to vertical integration in manufacturing. In such situations, a single site will be engaged in everything from raw material processing to finished products. And, in such manufacturing, the location is chosen based on ready availability and low costs for the most important and/or costly resources: electricity, water, location, and/or labor.
Local Data Movement is also to be Avoided
Even at the system or data center level, data movement can be a problem. Lustre, GPFS and other high-speed parallel filesystems are designed to make it feasible to generate very large output datasets from parallel computation. Even so, bandwidth to storage for a mid-size supercomputer is often at the gigabyte/second level, which can be challenging for output that is many terabytes in size. Various strategies for decreasing the wait time for output, such as by having different computational nodes write simultaneously on independent data paths, and by interspersing file output with computation, can help.
Within a research computer, data and data movement can contribute significantly to energy utilization:
- Spinning disk drive: around 3 watts each. If each node in a 1000 node has a disk drive, that's 3kW. Often the drives spin continuously, even if not in active use. A 1PB storage appliance might have over 400 disk drives. Data transfer speeds on the order of GB/s or less.
- Solid state drive: around 10% of a spinning disk. Comparable data transfer speeds to spinning disk. SSDs typically have some availability to greatly reduce electricity needs when not in active use.
- Memory: From 2-5 Watts per DIMM, we can estimate a 1000 node system with 4 slots per CPU, and two CPUs, would use 28kW just for memory. Like most disk configurations, memory is "always on" if the node is on, even if it's not doing any work. Transfer speed is perhaps two orders of magnitude higher than for disk drives.
- On-die memory (such as L1/L2 cache) power usage is difficult to quantify, since they are part of the CPU package. With modern multicore packages consuming 80-150 Watts each, data movement to and from cache and to the microprocessor electronics is a major portion of total electrical consumption. Transfer speeds are aligned with the microprocessor frequency, and therefore on the order of nanoseconds in latency, yet relatively low storage volumes.
Fundamentally, we can view data movement as a major limitation on time to solution for computation. Data movement takes time, and requires power. For most computer technologies, simply having a data device in the system requires some power, even if the device is not in active use.
In the spreadsheet accompanying this monograph, some tools for estimating energy use are provided. Manufacturers are a good first source of such information, and then the individual components' energy needs may be considered across an entire systems. Another feature of the spreadsheet is the embedded energy: the energy it takes to manufacture a system and get it to the data center. This includes mineral extraction and processing, as well as manufacturing. Materials have an impact beyond energy use, notably the environmental and human costs of their extraction and processing. These are discussed in more detail in the section, Equipment lifecycles and technology refresh.