Using synthetic data for pretraining partial discharge detection in overhead transmission lines

Loading...
Thumbnail Image

Downloads

Date issued

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Nature

Location

Signature

Abstract

Accurate detection of partial discharges (PDs) in medium-voltage overhead transmission lines is critical for preemptive maintenance and avoiding costly outages, yet it is challenged by scarce labeled data and pervasive electromagnetic interference. This paper investigates a hybrid simulation-and-data-driven framework in which synthetically generated PD signals are used to pretrain deep neural networks and are subsequently fine-tuned on a limited set of real overhead-line measurements. The synthetic pipeline systematically varies PD repetition rates, amplitude distributions, vegetation-contact scenarios, and noise conditions, producing diverse time-series and spectrogram-like representations that approximate real operating environments. We conduct a comprehensive ablation study across multiple architectures—Convolutional Neural Networks (CNNs), a Vision Transformer (ViT), and a Long Short-Term Memory (LSTM) network—and analyze their sensitivity to granular sweeps of synthetic-data parameters. CNN-based models decisively outperform ViT and LSTM counterparts on the spectrogram-based classification task, while ViT and LSTM fail to learn meaningful representation. For the successful CNNs, pretraining on carefully parameterized synthetic datasets—particularly those reflecting higher PD activity, such as our Datasets 3 and 4—consistently improves downstream performance on real data, boosting the Matthews Correlation Coefficient (MCC) on imbalanced, cost-sensitive test sets by roughly 10–20% compared with training from scratch. At the same time, we show that poorly aligned synthetic data can degrade generalization, underscoring the need for accurate noise calibration and domain-aligned simulation. Overall, the results confirm that (i) architectural choice is pivotal for PD detection in overhead lines and (ii) well-designed synthetic data is a powerful, practical lever for achieving reliable and cost-effective PD monitoring when real labeled data are limited.

Description

Delayed publication

Available after

Subject(s)

partial discharge detection, synthetic data, deep learning, overhead transmission lines, machine learning

Citation

Scientific Reports. 2025, vol. 15, issue 1, art. no. 45079.