Keeping I.T. cool

Editorial Type: Opinion Date: 2021-11-22 Views: 854 Tags: Management, Data Centre, Strategy, Servers, Supermicro PDF Version:
Dieter Bodemer, Systems Solutions Manager at Supermicro, discusses the options available for liquid cooling in modern data centres

Data centres worldwide are estimated to consume between 1% and 3% of electricity as of 2021. A significant amount of this power generated, depending on the geography, may come from fossil fuel plants. In an effort to reduce the carbon footprint of data centres and decrease operating costs, new technologies can be implemented that decrease data centre cooling costs and reduce the Power Usage Effectiveness (PUE) of the data centre.

The current generation of high end CPUs from the major manufacturers uses between 270 and 280 Watts per socket. Add in the latest GPUs that are used for AI training and inferencing at 400 Watts each, and a single server can require over 2 Kwatts and then the need to cool these CPUs and GPUs. While using more powerful fans can help somewhat, servers are reaching the point where liquid cooling may be required. Installing liquid cooling technologies not only can cool these hotter systems better than air cooling but reduce the need for data centre massive air conditioners.

Looking at the physics of liquid cooling compared to air cooling, the liquid option will cool quicker due to more molecules in touch with the hot surface than the spacing of air molecules.

LIQUID COOLING OPTIONS
Many years ago, supercomputers were immersed in a special liquid to cool the components. As technology progressed, air cooling prevailed, but new options are needed with the recent rise in performance and watts that a high-end CPU can draw.

The most popular methods to cool CPUs and GPUs with liquid or associated with liquid cooling are Direct To Chip, sometimes shortened as D2C, full system immersion, and rear door heat exchangers. Looking in more detail at each of these:

  • Direct To Chip (D2C) - in this method, the chilled liquid is passed over each CPU or GPU in a closed system. Then, the now warmer liquid is sent back to the chilling unit that brings the liquid temperature back down to the specified temperature, and the cycle begins again. This method is the most popular today, and the chilling units can be installed in each rack or in a central location. Thus, liquid cooling can be applied to certain servers or a specific rack.
  • Immersion Cooling - for immersion cooling, the entire server is placed in a tank filled with a liquid. The cooler liquid that is at the bottom of the tank rises as it moves over the hot CPUs, and once at the top of the tank is removed through pumps to get chilled. Certain server features must be turned off, such as the fans, but most servers will be OK when immersed in a special liquid. This method removes heat most efficiently, but the downside is that additional real estate within a data centre is used for these tanks.
  • Rear Door Heat Exchanger (RDHx) - although not technically using liquid cooling directly on the CPUs, this method brings the power of liquid cooling closer to where the heat is presented. A rear door to the rack is installed and has liquid running through the pipes in the door. Many data centres have hot aisles and cold aisles, and these doors would be in the "hot" aisle side. As the warm or hot air comes out of the servers, the air is passed over the cold coils where the air is cooled. The now warmer liquid in the coils is returned to a cooling system external to the rack. The benefit of using RDHx is that it can be used with any rack and can be used on specified racks within a data centre.

REDUCING COSTS
Liquid cooling reduces OPEX by removing or lessening the need for Computer Room Air Conditioners (CRAC). In addition, the server fans can be run at lower speeds if less air flowing over the hot electronics is needed, further reducing power consumption.

The PUE of a data centre is measured by the total electricity delivered to a data centre, divided by the amount of electricity dedicated to the servers, storage, and network infrastructure. While a value of 1.0 would mean that all power is used just for the computing and associated infrastructure, a typical PUE is in the 1.5 to 1.8 range. The PUE is reduced to the 1.10 to 1.20 range when implementing various liquid cooling technologies. A quick calculation can show that using liquid cooling over a year can reduce CAPEX, allow data centre operators to upgrade components (to more efficient ones), or purchase additional servers and storage systems.

SUPERMICRO SOLUTIONS
Supermicro has years of experience in supplying liquid cooled servers with various technologies. The BigTwin, SuperBlade, GPU, and Ultra product lines, featuring 3rd Gen Intel Xeon Scalable Processors, can easily accommodate a liquid cooling option for maximum performance. Supermicro works with customers and a range of vendors to deliver the most optimal solution for the given workload.

Through the Supermicro Rack Plug and Play program, systems that require liquid cooling can be completely tested, including the desired liquid cooling technology before delivery. In addition, Supermicro's manufacturing facilities are equipped with the latest liquid cooling options and work closely with the leading suppliers.

PLANNING FOR THE FUTURE
Choosing the most appropriate liquid cooling technology, whether a D2C, Immersive, or RDHx, depends on many factors. These include whether this is a retro-fit situation, for example, installing new and hotter servers into an existing rack. An RDHx system would be the best choice for this environment. If new servers generate more than 1 KW of heat, then the D2C would be the best option, primarily when a server consists of dual CPUs and multiple GPUs. An immersive cooling situation may be the best choice for environments where systems are expected to run very hot, and there may not be sufficient two-phase cooling for D2C or ultimate cooling. For example, suppose a new physical data centre is being designed. In that case, the architects and building managers need to plan for the future and install options for the next generation of servers, which will undoubtedly be generating more heat.

Whatever the preferred solution, liquid cooling will ultimately become a critical technology to keep the latest and future generations of CPUs and GPUs within operating thermal envelopes. While liquid cooling may be considered more justifiable for very high-end clusters today, upcoming CPUs and GPUs will undoubtedly require novel solutions to maximise performance from future microprocessors.

More info: www.supermicro.com