Md Abdus Samad, PhD
  • About
    • News
    • Contact
  • Publications
  • LaTeX
  • My Blog
  • Guidance
    • Top Global Scholarships
    • University Portal
  • Miscellaneous
    • List of Publishers
    • Journal Templates
    • Verifying Journal Indexing
    • Reference, Image Quality, and Detexify
    • Author Services by Major Publishers
    • Document Conversion and Figure Tools
    • Manuscript Anonymization
    • Switching Elsevier LaTeX Templates
    • DOCX to LaTeX Convert
    • LaTeX Reference and Label Management
    • Latex Reference Converter
    • Sequential Section Labels
    • Open Access Journals having Discount Policy
    • Overleaf git sync issues
    • Latexdiff Configuration Guide
    • Open source tools
    • Windows shortcuts & commands
    • Mathpix PDF to Word
  • Other Sites
    • Scholar’s Note

On this page

  • Abstract
  • Keywords
  • Key Contributions
  • Links
  • Dataset & Methods
  • Performance Comparison
  • Key Findings

CatBoost with Physics-Based Metaheuristics for Thyroid Cancer Recurrence Prediction

BioData Mining

paper
Machine learning approach combining CatBoost classifier with physics-based metaheuristic algorithms for efficient thyroid cancer recurrence prediction with feature optimization.
Authors

Proshenjit Sarker

Kwonhue Choi

Abdullah-Al Nahid

Md Abdus Samad

Published

December 9, 2025

Abstract

Thyroid Cancer (TC) is the uncontrolled growth of carcinogenic cells in the thyroid gland, with a higher recurrence rate than other cancers. Early detection of TC recurrence (TCR) is crucial for timely intervention. This study develops machine-learning algorithms that reduce features while maintaining high performance. Previous studies on the Differentiated Thyroid Cancer Recurrence (DTCR) dataset struggled to improve performance with feature reduction, and misclassification causes remained unexplored.

This work proposes three Physics-based Metaheuristic Algorithms (PBMHAs)—Energy Valley Optimization (EVOA), Equilibrium Optimization (EOA), and Electromagnetic Field Optimization (EFOA)—combined with the Categorical Boosting (CatBoost) classifier. SHAP is used to analyze feature importance. CatBoost without optimization (Only CB) achieved 95.83% Accuracy, 92.42% F-score, 96.29% Precision, and 89.27% Recall using all 16 features. After optimization, EVOA_CB reached 96.35% mean accuracy, while EOA_CB and EFOA_CB achieved 96.17%. EOA_CB excluded 11 less important features, and EFOA_CB attained the highest mean AUC of 0.994 with the lowest computational times. Additionally, this work provides insights into the factors contributing to misclassification. Using a 30:70 train-test split over 5 folds, EVOA_CB performed best on six selected features, with 96.35% Accuracy, 93.34% F-score, and 96.19% Precision. SHAP highlighted response, risk, and N as the most important features. These findings support early, efficient detection of TC recurrence with fewer features.

Keywords

Thyroid cancer, Cancer recurrence, Feature selection, Physics-based metaheuristic algorithms, Energy valley optimization, Equilibrium optimization, Electromagnetic field optimization, CatBoost, Explainable AI, SHAP

Key Contributions

  • Optimized Feature Selection: Applied three PBMHAs (EVOA, EOA, and EFOA) to select the most relevant features, resulting in reduced and informative feature sets (5-9 features vs. 16 original)

  • Enhanced Classification Performance: Using CatBoost with optimized hyperparameters achieved high predictive accuracy (96.35% for EVOA_CB) for distinguishing recurred and non-recurred patients

  • Explainability and Feature Insights: Utilized SHAP to interpret model predictions, identifying response, risk, and N as the most influential features for TCR outcomes

  • Analysis of Misclassified Cases: Investigated misclassified instances to uncover potential anomalies and understand model limitations

  • Model Efficiency: Achieved accurate TCR prediction using compact feature sets without compromising performance

Links

  • Published paper
  • Full Text PDF
  • Data Repository
  • GitHub Repository

Dataset & Methods

Dataset: DTCR (Differentiated Thyroid Cancer Recurrence) - 383 patients (115 recurred, 268 non-recurred) - 16 clinicopathological features - 15-year follow-up period (minimum 10 years observation)

Algorithms: - CatBoost: Gradient boosting with categorical feature support - Feature Selection Methods: EVOA, EOA, EFOA (physics-based metaheuristics) - Evaluation: 5-fold cross-validation with 70:30 train-test split - Explainability: SHAP (Shapley Additive Explanations)

Performance Comparison

Model Features Accuracy (%) F-score (%) Precision (%) Recall (%) AUC
Only_CB 16 95.83 92.42 96.29 89.27 -
EVOA_CB 6 96.35 93.34 96.19 90.94 0.989
EOA_CB 5 96.17 93.12 94.31 92.21 0.989
EFOA_CB 9 96.17 93.09 95.78 91.15 0.994

Key Findings

  • EVOA_CB achieved the highest accuracy (96.35%) with only 6 selected features
  • EOA_CB demonstrated greatest feature reduction (5 features) with highest recall (92.21%)
  • EFOA_CB achieved the highest AUC (0.994) with lowest testing time (1.51 ms)
  • Most Important Features: Response, Risk, N (lymph node involvement)
  • Structural Incomplete response strongly associated with recurrence
  • Excellent response correlated with non-recurrence
  • Intermediate risk and N1b classification linked to higher recurrence

Citation

BibTeX citation:
@article{sarker2025,
  author = {Sarker, Proshenjit and Choi, Kwonhue and Nahid, Abdullah-Al
    and Abdus Samad, Md},
  title = {CatBoost with {Physics-Based} {Metaheuristics} for {Thyroid}
    {Cancer} {Recurrence} {Prediction}},
  journal = {BioData Mining},
  volume = {18},
  number = {84},
  date = {2025-12-09},
  url = {https://biodatamining.biomedcentral.com/articles/10.1186/s13040-025-00494-1},
  doi = {10.1186/s13040-025-00494-1},
  langid = {en}
}
For attribution, please cite this work as:
Sarker, Proshenjit, Kwonhue Choi, Abdullah-Al Nahid, and Md Abdus Samad. 2025. “CatBoost with Physics-Based Metaheuristics for Thyroid Cancer Recurrence Prediction.” BioData Mining 18 (84). https://doi.org/10.1186/s13040-025-00494-1.
 

© 2025 Dr. Md Abdus Samad. All rights reserved.