Comparative Evaluation of Machine Learning Algorithms with Data Balancing Approach and Hyperparameter Tuning in Predicting Thyroid Disorder Recurrence

Darnell Ignasius; Rhyan David Levandra; Ramadhan Rakhmat Sani; Ika Novita Dewi

JurnalJurnal Masyarakat Informatika
Volume16
Nomor2
Halaman284-300
Tahun2025
PenerbitInstitute of Research and Community Services Diponegoro University (LPPM UNDIP)
ISSN2777-0648

Abstrak

<jats:p>This research evaluates and compares the performance of five machine learning algorithms (Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, and Gradient Boosting) in predicting thyroid disease recurrence using patient data. The analysis was conducted on the Thyroid Disease Dataset from the UCI Machine Learning Repository. The methodology includes data preprocessing, normalization, and class balancing with the Synthetic Minority Over-sampling Technique (SMOTE). Additionally, hyperparameter tuning was conducted using GridSearchCV to optimize model performance. The results demonstrate that ensemble-based models, specifically Random Forest and Gradient Boosting, consistently outperform the other algorithms in terms of accuracy and robustness. These models achieve 95–96% accuracy across various scenarios.A key finding is that SMOTE significantly improves recall for minority classes, highlighting its value in imbalanced medical datasets.</jats:p>