Theoretical and Natural Science

- The Open Access Proceedings Series for Conferences

Theoretical and Natural Science

Vol. 38, 24 June 2024

Open Access | Article

Logistic regression for cardiovascular diseases prediction by integrating PCA and K-means ++

Hancheng Miao * 1
1 New York University

* Author to whom correspondence should be addressed.

Theoretical and Natural Science, Vol. 38, 126-132
Published 24 June 2024. © 2023 The Author(s). Published by EWA Publishing
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Citation Hancheng Miao. Logistic regression for cardiovascular diseases prediction by integrating PCA and K-means ++. TNS (2024) Vol. 38: 126-132. DOI: 10.54254/2753-8818/38/20240569.


This research introduces a novel method for forecasting cardiovascular diseases using an advanced combination of K-means++ clustering, Principal Component Analysis (PCA), and Logistic Regression techniques. Given the global impact of cardiovascular diseases as a primary cause of death, this research utilizes a comprehensive dataset to tackle the prediction challenges associated with CVDs. Initially employing K-means++ for enhanced data quality, followed by PCA for dimensionality reduction, the study applies Logistic Regression for outcome prediction, achieving remarkable accuracy, specificity, and sensitivity. This methodological rigor offers a promising avenue for early and accurate CVDs detection, significantly outperforming traditional predictive models. By refining data through these steps, the study ensures the predictive model is built on a solid foundation, enhancing the reliability and generalizability of the predictions. The integration of these advanced analytical techniques marks a step forward in the pursuit of effective cardiovascular disease management, highlighting the importance of data preprocessing in predictive modeling.


Cardiovascular diseases, PCA, K-means++, logistic regression


1. Bose P 2023 Rising threat: cardiovascular disease on the rise among young adults. in News-Medical. Net.

2. Domingos P 2012 A few useful things to know about machine learning. in Commun. ACM, 55, 78.

3. Imani M and Ghassemian H 2015 Feature extraction using weighted training samples. in IEEE Geosci. Remote Sensing Lett, 12, 1387–1391.

4. Gárate-Escamila A K, et al. 2020 Classification models for heart disease prediction using feature selection and PCA. in Informatics in Medicine Unlocked, 19.

5. Zhu C, Idemudia C U and Feng W 2019 Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques,” in Informatics in Medicine Unlocked, 17, 100179.

6. Rathore V S, et al. 2020 A Hybrid Cluster and PCA-Based Framework for Heart Disease Prediction Using Logistic Regression. in Advances in Intelligent Systems and Computing, Springer Singapore, 111–117.

7. Jhaldiyal T and Mishra P K 2014 Analysis and prediction of diabetes mellitus using PCA, REP, and SVM. in Int. J. Eng. Tech. Res. (IJETR), 2, 164-166.

8. Ilyas I F and Chu X 2019 Data Cleaning. Association for Computing Machinery.

9. Loureiro A, Torgo L and Soares C 2004 Outlier Detection Using Clustering Methods: A Data Cleaning Application. in University of Porto, LIACC.

10. Hu W, et al. 2017 Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata. in BMC Bioinformatics, 18, 415.

11. Guo X, et al. 2020 Study on Data Cleaning Based on Improved K-Means Clustering and Error Analysis. in 2020 IEEE 4th Conf. on Energy Internet and Energy System Integration (EI2), Wuhan, China, IEEE, 4243-4248.

12. Arthur D and Vassilvitskii S 2007 K-means++: The Advantages of Careful Seeding. in Proc. of the eighteenth annual ACM-SIAM symp. on Discrete algorithms, 1027–1035.

13. Rana J H, et al. 2022 Cardiac Abnormality Prediction Using Multiple Machine Learning Approaches. in Bangabandhu and Digital Bangladesh, ICBBDB 2021. Communications in Computer and Information Science, 1-12.

14. Comlan M and Kpodohoun L 2023 Implementation of a Model for Risk Assessment of Cardiovascular Diseases Using Artificial Intelligence. in 2023 Int. Conf. on Electrical, Computer and Energy Technologies (ICECET), Cape Town, South Africa, 1-6.

15. Shorewala V 2021 Early detection of coronary heart disease using ensemble techniques,” in Informatics in Medicine Unlocked, 100655.

16. Theerthagiri P and Vidya J 2022 Cardiovascular Disease Prediction Using Recursive Feature Elimination and Gradient Boosting Classification Techniques. in Expert Systems, 9.

Data Availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open Access Instruction).

Volume Title
Proceedings of the 2nd International Conference on Mathematical Physics and Computational Simulation
ISBN (Print)
ISBN (Online)
Published Date
24 June 2024
Theoretical and Natural Science
ISSN (Print)
ISSN (Online)
24 June 2024
Open Access
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

Copyright © 2023 EWA Publishing. Unless Otherwise Stated