XGBoost and Random Forest Optimization using SMOTE to Classify Air Quality
Abstract
Air pollution due to the growth of industry and motorized vehicles seriously threatens human health. Clean air is essential, but pollutant contamination can cause acute respiratory illnesses and other illnesses. Several studies have been carried out to anticipate this air pollution. Various algorithms, methods, and data balancing techniques have been implemented, but still need to be done to obtain better accuracy results. Therefore, this study aims to classify heart disease using the XGBoost and Random Forest algorithms and implement the SMOTE technique to overcome data imbalance. This research produces a Random Forest algorithm with SMOTE implementation with splitting 80:20, which has the best accuracy with an accuracy of 92.4%, an average AUC of 0.98, and a log loss of 0.2366, which shows that SMOTE has succeeded in improving model performance in classifying minority classes. Based on the results obtained, the XGBoost and Random Forest algorithms after SMOTE are superior to the model with SMOTE, with accuracy above 90%.
Keywords
Full Text:
PDFReferences
M. Méndez, M. G. Merayo, dan M. Núñez, “Machine learning algorithms to forecast air quality: a survey,” Artif Intell Rev, vol. 56, no. 9, hlm. 10031–10066, Sep 2023, doi: 10.1007/s10462-023-10424-4.
B. V. Jayadi, T. Handhayani, dan M. D. Lauro, “Perbandingan Knn Dan Svm Untuk Klasifikasi Kualitas Udara Di Jakarta,” Jurnal Ilmu Komputer dan Sistem Informasi, hlm. 1–7, 2023.
S. S. A. Umri, M. S. Firdaus, dan Primajaya A, “Analisis Dan
Komparasi Algoritma Klasifikasi Dalam Indeks Pencemaran Udara Di Dki Jakarta,” JIKO (Jurnal Informatika dan Komputer), vol. 4, no. 2, hlm. 98–104, 2021.
Y. Devianto dan S. Dwiasnati, “Kerangka Kerja Sistem Kecerdasan Buatan dalam Meningkatkan Kompetensi Sumber Daya Manusia Indonesia,” Jurnal Telekomunikasi dan Komputer, vol. 10, no. 1, hlm. 19, Apr 2020, doi: 10.22441/incomtech.v10i1.7460.
S. B. Nadkarni, G. S. Vijay, dan R. C. Kamath, “Comparative Study of Random Forest and Gradient Boosting Algorithms to Predict Airfoil Self-Noise,” dalam RAiSE-2023, Basel Switzerland: MDPI, Des 2023, hlm. 24. doi: 10.3390/engproc2023059024.
E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, dan F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, hlm. 677–690, Jul 2022, doi: 10.30812/matrik.v21i3.1726.
C. Haryawan dan Y. M. K. Ardhana, “Analisa Perbandingan Teknik Oversampling Smote Pada Imbalanced Data,” JIRE (Jurnal Informatika & Rekayasa Elektronika), vol. 6, no. 1, hlm. 73–78, Apr 2023.
A. A. Nababan, M. Jannah, M. Aulina, dan D. Andrian, “Prediksi Kualitas Udara Menggunakan Xgboost Dengan Synthetic Minority Oversampling Technique (SMOTE) Berdasarkan Indeks Standar Pencemaran Udara (ISPU),” Jurnal Teknik Informatika Kaputama (JTIK), vol. 7, no. 1, hlm. 214–219, 2023.
M. Fahmi dan I. Suhartana, “Perbandingan Algoritma Decision Tree Dan Support Vector Machine Dalam Prediksi Kualitas Udara,” Jurnal Nasional Teknologi Informasi dan Aplikasinya, vol. 1, no. 1, hlm. 21–30, Nov 2022.
M. Mustaqim, B. Warsito, dan B. Surarso, “Kombinasi Synthetic Minority Oversampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan,” Register: Jurnal Ilmiah Teknologi Sistem Informasi, vol. 5, no. 2, hlm. 128, Jul 2019, doi:
26594/register.v5i2.1705.
A. A. H. Kirono, I. Asror, dan Y. F. A. Wibowo, “Klasifikasi Tingkat Kualitas Udara Dki Jakarta Menggunakan Algoritma Naive Bayes,” e-Proceeding of Engineering, vol. 9, no. 3, hlm. 1962–1969, Jun 2022.
G. A. Mursianto, I. M. Falih, M. Irfan, T. Sakinah, dan D. S. Prasvita, “Perbandingan Metode Klasifikasi Random Forest dan XGBoost Serta Implementasi Teknik SMOTE pada Kasus Prediksi Hujan,” dalam Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA), Jakarta, Sep 2021, hlm. 41–50.
N. B. Putri dan A. W. Wijayanto, “Analisis Komparasi Algoritma Klasifikasi Data Mining Dalam Klasifikasi Website Phishing,” Komputika : Jurnal Sistem Komputer, vol. 11, no. 1, hlm. 59–66, Jan 2022, doi: 10.34010/komputika.v11i1.4350.
A. Nugroho, I. Asror, dan Y. F. A. Wibowo, “Klasifikasi Tingkat Kualitas Udara DKI Jakarta Berdasarkan Open Government Data Menggunakan Algoritma Random Forest,” e-Proceeding of Engineering, vol. 10, no. 2, hlm. 1824–1834, Apr 2023.
A. N. Cahyani, J. Zeniarja, S. Winarno, R. T. E. Putri, and A. A. Maulani, “Heart Disease Classification Using Deep Neural Network with SMOTE Technique for Balancing Data,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, p. 0240108, Dec. 2023, doi: 10.26877/asset.v6i1.17521.
DOI: https://doi.org/10.26877/asset.v6i1.18136
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Advance Sustainable Science, Engineering and Technology (ASSET)
E-ISSN: 2715-4211
Published by Science and Technology Research Centre
Universitas PGRI Semarang, Indonesia
Website: http://journal.upgris.ac.id/index.php/asset/index
Email: asset@upgris.ac.id