"Proposal and Implementation of Churn Prediction system for Telecommunications Company"

Loading...
Thumbnail Image

Downloads

5

Date issued

Journal Title

Journal ISSN

Volume Title

Publisher

Vysoká škola báňská - Technická univerzita Ostrava

Location

ÚK/Sklad diplomových prací

Signature

201900016

Abstract

The telecommunications industry is a large and important part of the sector of information and communication technologies. Because of its highly competitive nature, it is very common for customers to switch to another service provider or to non-renew a commitment. This behavior of customers is called customer churn. It is an expensive business problem since acquiring new customers costs five to six times more than retaining the existing ones. With the still decreasing costs of data storage, telecommunication companies have an access to various customer related data sources, which can be used to create predictive models helpful to identify who, when and why is about to leave the company. The main objective of the dissertation thesis is to propose and implement churn prediction system, which helps selected telecommunications company to reduce the number of churning customers and better understand the customer base. The partial goals are to summarize current theoretical, methodological and empirical results and to process raw data, divide customers into clusters, estimate and compare selected classification models, determine the key factors driving the churn, create customer knowledge database and visualize the data in selected visualization tool. Firstly, the methodological part of the thesis is focused on the data mining methodology CRISP-DM. Then methods of cluster analysis utilized in the thesis such as Gower distance and k-medoids algorithm and classification models – logistic regression, decision trees and random forests are described. Performance measures for comparison of predictive ability of classification algorithms are also introduced. The last part deals with an estimation of future performance of predictive models - approaches such as training and testing data set, cross-validation or bootstrap sampling. The application part of the thesis is devoted to the proposal of churn prediction system. Input data in CSV files are loaded into statistical tool R. Customers are then divided into clusters and logistic regression, decision tree and random forest models are estimated for the entire training data set as well as for each cluster. Customer characteristics, predicted probabilities of churn and variable importances are stored to MySQL relational database and these data are used to create a dashboard in the visualization tool Qlik Sense. This dashboard is provided to business users as a user-friendly tool for understanding the customer behavior.

Description

Subject(s)

Logistic regression, decision trees, random forests, k-medoids, R, MySQL, Qlik Sense, telecommunications, customer churn

Citation