Datacenters Bleed Watts And Cash – All Because They're Afraid To Flip A Switch
Datacenter power consumption has become a major concern in recent years, as utilities struggle to keep up with growing demand and operators are forced to seek alternative means to keep the lights on.
According to Uptime Institute, curbing energy consumption – and by extension lowering operating costs – could be as simple as flipping the switch on any one of the performance- and power-management mechanisms build into modern systems.
We're not talking about a trivial amount of power either. In a blog post this week, Uptime analyst Daniel Bizo wrote that simply enabling OS-level governors and power profiles could result in a 25 to 50 percent reduction in energy consumption. Scaled across a whole datacenter those savings add up pretty quickly.
Additionally, enabling processor C-states can lead to a nearly 20 percent reduction in idle power consumption. In a nutshell, C-states dictate which aspects of the chip can be turned off during idle periods.
The problem, according to Bizo, is these features are disabled by default on most server platforms today, and enabling them is often associated with performance instability and added latency.
That's because whether you're talking about C-or P-states, the transition from a low performance state like P6 to full power at P0 takes time. For some workloads, that can have a negative effect on observed performance.
However, Bizo argues that outside of a select few latency-sensitive workloads – like technical computing, financial transactions, high-speed analytics, and real-time operating systems – enabling these features will have negligible, if any, impact on performance while offering a substantial reduction in power consumption.
Do you really need all that perf anyway
Uptime's argument is rooted in the belief that modern chips are capable of delivering far more performance than is required to maintain an acceptable quality of service.
"If a second for a database query is still within tolerance, there is, by definition, limited value to having a response under one tenth of a second just because the server can process a query that fast when loads are light. And, yet, it happens all the time," Bizo wrote.
Citing benchmark data published by Standard Performance Evaluation Corp. and The Green Grid, Uptime reports that modern servers typically achieve their best energy efficiency when their performance is limited to something like P2.
Making matters more difficult, over-performance isn't something that's typically tracked – while there are numerous tools out there for maintaining SLAs and QoS.
There's an argument to be made that the faster the computation is completed, the lower the power consumption will be. For example, using 500 watts to complete a task in a minute will require less energy as a whole than consuming 300 watts for two minutes.
However, Bizo points out, the gains aren't always that clear cut. "The energy consumption curve for semiconductors gets steeper the closer the chip pushes to the top of its performance envelope."
In other words, there's often a point of diminishing returns, after which you're burning more power for minimal gains. In this case, running a chip at 500 watts just to shave off an extra two or three seconds compared to running at 450 watts probably isn't worth it.
- SiFive expands from RISC-V cores for AI chips to designing its own full-fat accelerator
- Microsoft, BlackRock form fund to sink up to $100B into AI infrastructure
- Amazon to pour £8B into UK datacenters through to 2028
- Oracle wants to power 1GW datacenter with trio of tiny nuclear reactors
Plenty of knobs and levers to turn
The good news is CPU vendors have developed all manner of techniques for managing power and performance over the years. Many of these are rooted in mobile applications, where energy consumption is a far more important metric than in the datacenter.
According to Uptime, these controls can have a major impact on system power consumption and don't necessarily have to kneecap the chip by limiting its peak performance.
The most power efficient of these regimes, according to Uptime, are software-based controls, which have the potential to cut system power consumption by anywhere from 25 to 50 percent – depending on how sophisticated the operating system governor and power plan are.
However, these software-level controls also have the potential to impart the biggest latency hit. This potentially makes these controls impractical for bursty or latency-sensitive jobs.
By comparison, Uptime found that hardware-only implementations designed to set performance targets tend to be far faster when switching between states – which means a lower latency hit. The trade-off is the power savings aren't nearly as impressive, topping out around ten percent.
A combination of software and hardware offers something of a happy medium, allowing the software to give the underlying hardware hints as to how it should respond to changing demands. Bizo cites performance savings of between 15 and 20 percent when utilizing performance management features of this nature.
While there are still performance implications associated with these tools, the actual impact may not be as bad as you might think. "Arguably, for most use cases, the main concern should be power consumption, not performance," Bizo wrote. ®
From Chip War To Cloud War: The Next Frontier In Global Tech Competition
The global chip war, characterized by intense competition among nations and corporations for supremacy in semiconductor ... Read more
The High Stakes Of Tech Regulation: Security Risks And Market Dynamics
The influence of tech giants in the global economy continues to grow, raising crucial questions about how to balance sec... Read more
The Tyranny Of Instagram Interiors: Why It's Time To Break Free From Algorithm-Driven Aesthetics
Instagram has become a dominant force in shaping interior design trends, offering a seemingly endless stream of inspirat... Read more
The Data Crunch In AI: Strategies For Sustainability
Exploring solutions to the imminent exhaustion of internet data for AI training.As the artificial intelligence (AI) indu... Read more
Google Abandons Four-Year Effort To Remove Cookies From Chrome Browser
After four years of dedicated effort, Google has decided to abandon its plan to remove third-party cookies from its Chro... Read more
LinkedIn Embraces AI And Gamification To Drive User Engagement And Revenue
In an effort to tackle slowing revenue growth and enhance user engagement, LinkedIn is turning to artificial intelligenc... Read more