Teknik Weighting untuk Mengatasi Ketidakseimbangan Kelas Pada Prediksi Churn Menggunakan XGBoost, LightGBM, dan CatBoost

Wahyu Nugraha; Muhamad Syarif

doi:10.33633/tc.v22i1.7191

Teknik Weighting untuk Mengatasi Ketidakseimbangan Kelas Pada Prediksi Churn Menggunakan XGBoost, LightGBM, dan CatBoost

Wahyu Nugraha, Muhamad Syarif

Abstract

Churn merupakan kondisi dimana seseorang berpindah dari satu layanan ke layanan yang lain. Churn pelanggan menjadi masalah yang meningkat cukup signifikan dan menjadi tantangan utama yang harus dihadapi banyak perusahaan perbankan karena memiki peran penting terhadap laba perusahaan. Oleh sebab itu, diperlukan cara untuk memprediksi perilaku churn tepat waktu agar bisa menerapkan retensi pelanggan. Namun, Permasalahan yang dihadapi oleh model prediksi churn adalah ketidakseimbangan kelas sehingga membuat model klasifikasi menghasilkan kinerja yang buruk. Solusi yang paling sering digunakan untuk mengatasi masalah ketidakseimbangan kelas terbagi menjadi tiga pendekatan yaitu pendekatan level data, level algoritma dan ensemble. Setiap pendekatan mengalami beberapa masalah yang sulit diprediksi ketika digunakan untuk menangani masalah ketidakseimbangan kelas. Pada penelitian ini, peneliti melakukan eksperimen menggunakan metode ensemble berbasis boosting untuk melakukan prediksi churn pelanggan dan mencoba meningkatkan kinerjanya pada dataset yang tidak seimbang dengan parameter tuning menggunakan scale pos weight. Algoritma klasifikasi yang digunakan yaitu XGBoost (extreme gradient boosting), LightGBM (light gradient boosting machine) dan CatBoost. Hasil eksperimen akan membandingkan kinerja dari ketiga algoritma berbasis boosting tersebut dengan menyesuaikan bobot parameternya sebanyak tiga kali. Dari hasil pengujian, model CatBoost memperoleh nilai recall tertinggi sebesar 0.79. Sedangkan untuk nilai recall terendah adalah model CatBoost default dengan nilai 0.47. Bedasarkan hasil ekperimen dapat disimpulan bahwa model bekerja dengan cukup baik pada data yang tidak seimbang dengan memberikan mekanisme hyperparameter scale pos weightsehingga model dapat lebih fokus pada kelas minoritas yang sulit dideteksi.

Keywords

Prediksi Churn; Imbalance Class; Klasifikasi Boosting

Full Text:

PDF

References

M. A. H. Farquad, V. Ravi, and S. B. Raju, “Churn prediction using comprehensible support vector machine: An analytical CRM application,” Applied Soft Computing Journal, vol. 19, pp. 31–40, 2014, doi: 10.1016/j.asoc.2014.01.031.

B. He, Y. Shi, Q. Wan, and X. Zhao, “Prediction of customer attrition of commercial banks based on SVM model,” Procedia Comput Sci, vol. 31, pp. 423–430, 2014, doi: 10.1016/j.procs.2014.05.286.

B. Zhu, B. Baesens, and S. K. L. M. vanden Broucke, “An empirical comparison of techniques for the class imbalance problem in churn prediction,” Inf Sci (N Y), vol. 408, pp. 84–99, 2017, doi: 10.1016/j.ins.2017.04.015.

K. S. Karuppaiah and N. P. G. Palanisamy, “Enhanced Churn Prediction Using Stacked Heuristic Incorporated Ensemble Model,” Journal of Information Technology Research, vol. 14, no. 2, pp. 174–186, 2021, doi: 10.1016/j.matpr.2020.12.893.

Y. Xie, X. Li, E. W. T. Ngai, and W. Ying, “Customer churn prediction using improved balanced random forests,” Expert Syst Appl, vol. 36, no. 3 PART 1, pp. 5445–5449, 2009, doi: 10.1016/j.eswa.2008.06.121.

J. Xiao, G. Teng, C. He, and Z. Bing, “One-Step Classifier Ensemble Model for Customer Churn Prediction WIth Imbalance Class,” Advances in Intelligent Systems and Computing, vol. 281, no. January, pp. 843–854, 2014, doi: 10.1007/978-3-642-55122-2.

D. Gholamiangonabadi, S. Nakhodchi, A. Jalalimanesh, and A. Shahi, “Customer churn prediction using a meta-classifier approach; A case study of Iranian banking industry,” Proceedings of the International Conference on Industrial Engineering and Operations Management, vol. 2019, no. MAR, pp. 364–375, 2019.

A. de Caigny, K. Coussement, and K. W. de Bock, “A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees,” Eur J Oper Res, vol. 269, no. 2, pp. 760–772, 2018, doi: 10.1016/j.ejor.2018.02.009.

W. Verbeke, D. Martens, C. Mues, and B. Baesens, “Building comprehensible customer churn prediction models with advanced rule induction techniques,” Expert Syst Appl, vol. 38, no. 3, pp. 2354–2364, 2011, doi: 10.1016/j.eswa.2010.08.023.

T. Gattermann-Itschert and U. W. Thonemann, “How training on multiple time slices improves performance in churn prediction,” Eur J Oper Res, vol. 295, no. 2, pp. 664–674, 2021, doi: 10.1016/j.ejor.2021.05.035.

N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systemati Study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429–449, 2002.

N. Moniz and H. Monteiro, “No Free Lunch in imbalanced learning,” Knowl Based Syst, vol. 227, p. 107222, 2021, doi: 10.1016/j.knosys.2021.107222.

H. Ali, M. N. M. Salleh, R. Saedudin, K. Hussain, and M. F. Mushtaq, “Imbalance class problems in data mining: A review,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, no. 3, pp. 1552–1563, 2019, doi: 10.11591/ijeecs.v14.i3.pp1552-1563.

A. Fernández, S. García, F. Herrera, and N. v. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” Journal of Artificial Intelligence Research, vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.

Z. Sun, Q. Song, X. Zhu, H. Sun, B. Xu, and Y. Zhou, “A novel ensemble method for classifying imbalanced data,” Pattern Recognit, vol. 48, no. 5, pp. 1623–1637, 2015, doi: 10.1016/j.patcog.2014.11.014.

B. Zhu et al., “Improving Resampling-based Ensemble in Churn Prediction Seppe vanden Broucke,” Proc Mach Learn Res, vol. 74, no. September, pp. 79–91, 2017.

R. Zhong, R. L. Johnson, and Z. Chen, “Using Machine Learning Methods to Identify Coals from Drilling and Logging-While-Drilling LWD Data,” Asia Pacific Unconventional Resources Technology Conference, Brisbane, Australia, 18–19 November 2019, pp. 970–994, 2020, doi: https://doi.org/10.15530/AP-URTEC-2019-198288.

C. Goutte and E. Gaussier, “A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation,” European Conference on Information Retrieval, vol. 3408, pp. 345–359, 2005, Accessed: Jan. 22, 2023. [Online]. Available: https://doi.org/10.1007/978-3-540-31865-1_25

M. Syed, J. Marshall, A. Nigam, and N. v. Chawla, “Gender Prediction Through Synthetic Resampling of User Profiles Using SeqGANs,” International Conference on Computational Data and Social Networks, vol. 11917, pp. 363–370, 2019.

M. Buscema, S. Terzi, and W. Tastle, “A new meta-classifier,” in Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS, 2010. doi: 10.1109/NAFIPS.2010.5548298.

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 42, no. 4, pp. 463–484, 2012, doi: 10.1109/TSMCC.2011.2161285.

K. Coussement and K. W. de Bock, “Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning,” J Bus Res, vol. 66, no. 9, pp. 1629–1636, 2013, doi: 10.1016/j.jbusres.2012.12.008.

N. Lu, H. Lin, J. Lu, and G. Zhang, “A customer churn prediction model in telecom industry using boosting,” IEEE Trans Industr Inform, vol. 10, no. 2, pp. 1659–1665, 2014, doi: 10.1109/TII.2012.2224355.

Q. H. Nguyen et al., “Influence of data splitting on performance of machine learning models in prediction of shear strength of soil,” Math Probl Eng, vol. 2021, 2021, doi: 10.1155/2021/4832864.

A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit, vol. 91, pp. 216–231, 2019, doi: 10.1016/j.patcog.2019.02.023.

S. Demir and E. K. Sahin, “Predicting occurrence of liquefaction-induced lateral spreading using gradient boosting algorithms integrated with particle swarm optimization: PSO-XGBoost, PSO-LightGBM, and PSO-CatBoost,” Acta Geotech, Jan. 2023, doi: 10.1007/s11440-022-01777-1.

E. al Daoud, “Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset,” International Journal of Computer and Information Engineering, vol. 13, no. 1, pp. 6–10, 2019, doi: doi.org/10.5281/zenodo.3607805.

DOI: https://doi.org/10.33633/tc.v22i1.7191