An in-depth study of GPU frequency-scaling latency and its optimization on modern architectures
| dc.contributor.author | Velička, Daniel | |
| dc.contributor.author | Vysocký, Ondřej | |
| dc.contributor.author | Yasal, Osman | |
| dc.contributor.author | Říha, Lubomír | |
| dc.date.accessioned | 2026-04-23T12:00:04Z | |
| dc.date.available | 2026-04-23T12:00:04Z | |
| dc.date.issued | 2026 | |
| dc.description.abstract | The move towards the exascale systems in High-Performance Computing and the demand for Artificial Intelligence brought together thousands of CPUs and even more GPU accelerators. This massive hardware consolidation has made energy optimization a critical challenge. The immense amount of energy consumption creates a cascade of secondary issues: it increases the carbon footprint, generates significant heat that demands advanced cooling, and causes dramatic power fluctuations that threaten the stability of the electrical grid. Although energy-saving techniques based on Dynamic Voltage and Frequency Scaling are well understood for CPUs, a critical knowledge gap exists for GPU accelerators, limiting the ability to apply similar optimizations. This paper presents a method for measuring how long it takes the CPU to adjust the operating frequency of the GPU (switching latency), and how long the frequency change itself takes to complete (transition latency). The approach employs a minimal iterative workload that allows statistically distinguishing runtime differences between frequency pairs. It first measures execution times for each frequency and then determines the switching and transition latency of the change from an initial to a target frequency by tracking runtime changes and repeating measurements to ensure statistical robustness. Finally, the methodology filters out outliers from external factors such as driver management or system interruptions. The methodology is implemented in the open-source LATEST [1] tool with support for NVIDIA GPU accelerators. It is evaluated on three GPUs based on different generations of architecture, GH200, A100-SXM4, and RTX Quadro 6000. These results show that the transition latency takes from hundreds of microseconds up to hundreds of milliseconds, while the absolute majority of the time is spent in the GPU applying the frequency change. From the analysed GPUs, the GH200 exhibited the widest range, with switching latencies spanning from 5.6 ms to 477 ms and transition latencies from 0.2 ms to 471 ms. Additionally, the transition latency measurement can be used to identify manufacturing variability of accelerators, showing differences in frequency scaling reactivity. Our analysis identifies specific frequency pairs with high switching latencies, creating a challenge that the slow transitions discourage their use, yet the target frequencies themselves may be highly efficient in terms of energy consumption. To address this, we introduce an indirect switching method that leverages an intermediate frequency. This technique effectively circumvents overhead, allowing the system to access these efficient frequency states without the high latency penalty of a direct transition. The use of the indirect frequency switching technique produced a latency reduction between 250 and 431 ms, for a single frequency change on GH200. | |
| dc.description.firstpage | art. no. 108331 | |
| dc.description.source | Web of Science | |
| dc.description.volume | 179 | |
| dc.identifier.citation | Future Generation Computer Systems. 2026, vol. 179, art. no. 108331. | |
| dc.identifier.doi | 10.1016/j.future.2025.108331 | |
| dc.identifier.issn | 0167-739X | |
| dc.identifier.issn | 1872-7115 | |
| dc.identifier.uri | http://hdl.handle.net/10084/158460 | |
| dc.identifier.wos | 001657954500001 | |
| dc.language.iso | en | |
| dc.publisher | Elsevier | |
| dc.relation.ispartofseries | Future Generation Computer Systems | |
| dc.relation.uri | https://doi.org/10.1016/j.future.2025.108331 | |
| dc.rights | © 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies. | |
| dc.subject | GPU | |
| dc.subject | energy efficient computing | |
| dc.subject | DVFS | |
| dc.subject | transition latency | |
| dc.title | An in-depth study of GPU frequency-scaling latency and its optimization on modern architectures | |
| dc.type | article | |
| dc.type.status | Peer-reviewed | |
| dc.type.version | publishedVersion |
Files
License bundle
1 - 1 out of 1 results
Loading...
- Name:
- license.txt
- Size:
- 718 B
- Format:
- Item-specific license agreed upon to submission
- Description: