An in-depth study of GPU frequency-scaling latency and its optimization on modern architectures

dc.contributor.authorVelička, Daniel
dc.contributor.authorVysocký, Ondřej
dc.contributor.authorYasal, Osman
dc.contributor.authorŘíha, Lubomír
dc.date.accessioned2026-04-23T12:00:04Z
dc.date.available2026-04-23T12:00:04Z
dc.date.issued2026
dc.description.abstractThe move towards the exascale systems in High-Performance Computing and the demand for Artificial Intelligence brought together thousands of CPUs and even more GPU accelerators. This massive hardware consolidation has made energy optimization a critical challenge. The immense amount of energy consumption creates a cascade of secondary issues: it increases the carbon footprint, generates significant heat that demands advanced cooling, and causes dramatic power fluctuations that threaten the stability of the electrical grid. Although energy-saving techniques based on Dynamic Voltage and Frequency Scaling are well understood for CPUs, a critical knowledge gap exists for GPU accelerators, limiting the ability to apply similar optimizations. This paper presents a method for measuring how long it takes the CPU to adjust the operating frequency of the GPU (switching latency), and how long the frequency change itself takes to complete (transition latency). The approach employs a minimal iterative workload that allows statistically distinguishing runtime differences between frequency pairs. It first measures execution times for each frequency and then determines the switching and transition latency of the change from an initial to a target frequency by tracking runtime changes and repeating measurements to ensure statistical robustness. Finally, the methodology filters out outliers from external factors such as driver management or system interruptions. The methodology is implemented in the open-source LATEST [1] tool with support for NVIDIA GPU accelerators. It is evaluated on three GPUs based on different generations of architecture, GH200, A100-SXM4, and RTX Quadro 6000. These results show that the transition latency takes from hundreds of microseconds up to hundreds of milliseconds, while the absolute majority of the time is spent in the GPU applying the frequency change. From the analysed GPUs, the GH200 exhibited the widest range, with switching latencies spanning from 5.6 ms to 477 ms and transition latencies from 0.2 ms to 471 ms. Additionally, the transition latency measurement can be used to identify manufacturing variability of accelerators, showing differences in frequency scaling reactivity. Our analysis identifies specific frequency pairs with high switching latencies, creating a challenge that the slow transitions discourage their use, yet the target frequencies themselves may be highly efficient in terms of energy consumption. To address this, we introduce an indirect switching method that leverages an intermediate frequency. This technique effectively circumvents overhead, allowing the system to access these efficient frequency states without the high latency penalty of a direct transition. The use of the indirect frequency switching technique produced a latency reduction between 250 and 431 ms, for a single frequency change on GH200.
dc.description.firstpageart. no. 108331
dc.description.sourceWeb of Science
dc.description.volume179
dc.identifier.citationFuture Generation Computer Systems. 2026, vol. 179, art. no. 108331.
dc.identifier.doi10.1016/j.future.2025.108331
dc.identifier.issn0167-739X
dc.identifier.issn1872-7115
dc.identifier.urihttp://hdl.handle.net/10084/158460
dc.identifier.wos001657954500001
dc.language.isoen
dc.publisherElsevier
dc.relation.ispartofseriesFuture Generation Computer Systems
dc.relation.urihttps://doi.org/10.1016/j.future.2025.108331
dc.rights© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
dc.subjectGPU
dc.subjectenergy efficient computing
dc.subjectDVFS
dc.subjecttransition latency
dc.titleAn in-depth study of GPU frequency-scaling latency and its optimization on modern architectures
dc.typearticle
dc.type.statusPeer-reviewed
dc.type.versionpublishedVersion

Files

License bundle

Now showing 1 - 1 out of 1 results
Loading...
Thumbnail Image
Name:
license.txt
Size:
718 B
Format:
Item-specific license agreed upon to submission
Description: