U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

A Brief Review of Cardiovascular Diseases, Associated Risk Factors and Current Treatment Regimes

Affiliation.

  • 1 Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, United States.
  • PMID: 31553287
  • DOI: 10.2174/1381612825666190925163827

Cardiovascular diseases (CVDs) are the leading cause of premature death and disability in humans and their incidence is on the rise globally. Given their substantial contribution towards the escalating costs of health care, CVDs also generate a high socio-economic burden in the general population. The underlying pathogenesis and progression associated with nearly all CVDs are predominantly of atherosclerotic origin that leads to the development of coronary artery disease, cerebrovascular disease, venous thromboembolism and, peripheral vascular disease, subsequently causing myocardial infarction, cardiac arrhythmias or stroke. The aetiological risk factors leading to the onset of CVDs are well recognized and include hyperlipidaemia, hypertension, diabetes, obesity, smoking and, lack of physical activity. They collectively represent more than 90% of the CVD risks in all epidemiological studies. Despite high fatality rate of CVDs, the identification and careful prevention of the underlying risk factors can significantly reduce the global epidemic of CVDs. Beside making favorable lifestyle modifications, primary regimes for the prevention and treatment of CVDs include lipid-lowering drugs, antihypertensives, antiplatelet and anticoagulation therapies. Despite their effectiveness, significant gaps in the treatment of CVDs remain. In this review, we discuss the epidemiology and pathology of the major CVDs that are prevalent globally. We also determine the contribution of well-recognized risk factors towards the development of CVDs and the prevention strategies. In the end, therapies for the control and treatment of CVDs are discussed.

Keywords: Atherosclerosis; epidemiological studies; hypertension; platelets; stroke; thrombosis..

Copyright© Bentham Science Publishers; For any queries, please email at [email protected].

PubMed Disclaimer

Similar articles

  • Prevention of cardiovascular diseases: Role of exercise, dietary interventions, obesity and smoking cessation. Buttar HS, Li T, Ravi N. Buttar HS, et al. Exp Clin Cardiol. 2005 Winter;10(4):229-49. Exp Clin Cardiol. 2005. PMID: 19641674 Free PMC article.
  • Prevention of Cardiovascular Diseases with Anti-Inflammatory and Anti- Oxidant Nutraceuticals and Herbal Products: An Overview of Pre-Clinical and Clinical Studies. Jain S, Buttar HS, Chintameneni M, Kaur G. Jain S, et al. Recent Pat Inflamm Allergy Drug Discov. 2018;12(2):145-157. doi: 10.2174/1872213X12666180815144803. Recent Pat Inflamm Allergy Drug Discov. 2018. PMID: 30109827 Review.
  • Optimal risk factor modification and medical management of the patient with peripheral arterial disease. Chi YW, Jaff MR. Chi YW, et al. Catheter Cardiovasc Interv. 2008 Mar 1;71(4):475-89. doi: 10.1002/ccd.21401. Catheter Cardiovasc Interv. 2008. PMID: 18307227 Review.
  • Cardiovascular risk factors among high-risk individuals attending the general practice at king Abdulaziz University hospital: a cross-sectional study. Ghamri RA, Alzahrani NS, Alharthi AM, Gadah HJ, Badoghaish BG, Alzahrani AA. Ghamri RA, et al. BMC Cardiovasc Disord. 2019 Nov 27;19(1):268. doi: 10.1186/s12872-019-1261-6. BMC Cardiovasc Disord. 2019. PMID: 31775642 Free PMC article.
  • Update on cardiovascular disease in post-menopausal women. Gorodeski GI. Gorodeski GI. Best Pract Res Clin Obstet Gynaecol. 2002 Jun;16(3):329-55. doi: 10.1053/beog.2002.0282. Best Pract Res Clin Obstet Gynaecol. 2002. PMID: 12099666 Review.
  • Mesenchymal stem cells as future treatment for cardiovascular regeneration and its challenges. Seow KS, Ling APK. Seow KS, et al. Ann Transl Med. 2024 Aug 1;12(4):73. doi: 10.21037/atm-23-1936. Epub 2023 Dec 29. Ann Transl Med. 2024. PMID: 39118948 Free PMC article. Review.
  • Exploring the link between T-regulatory cells and inflammatory cytokines in atherogenesis: findings from patients with stable angina pectoris. Tahtouh Zaatar M, Othman R, Abou Samra E, Karam M. Tahtouh Zaatar M, et al. Ann Med Surg (Lond). 2024 Jun 4;86(8):4456-4462. doi: 10.1097/MS9.0000000000002150. eCollection 2024 Aug. Ann Med Surg (Lond). 2024. PMID: 39118685 Free PMC article.
  • Comparative observation of the effectiveness and safety of remimazolam besylate versus dexmedetomidine in gastrointestinal surgery in obese patients. Deng YF, Jiang XR, Feng ZG. Deng YF, et al. World J Gastrointest Surg. 2024 May 27;16(5):1320-1327. doi: 10.4240/wjgs.v16.i5.1320. World J Gastrointest Surg. 2024. PMID: 38817287 Free PMC article.
  • Organic Nanoparticles in Progressing Cardiovascular Disease Treatment and Diagnosis. Udriște AS, Burdușel AC, Niculescu AG, Rădulescu M, Balaure PC, Grumezescu AM. Udriște AS, et al. Polymers (Basel). 2024 May 16;16(10):1421. doi: 10.3390/polym16101421. Polymers (Basel). 2024. PMID: 38794614 Free PMC article. Review.
  • Analysis of the Associations of Measurements of Body Composition and Inflammatory Factors with Cardiovascular Disease and Its Comorbidities in a Community-Based Study. Tarabeih N, Kalinkovich A, Ashkenazi S, Cherny SS, Shalata A, Livshits G. Tarabeih N, et al. Biomedicines. 2024 May 11;12(5):1066. doi: 10.3390/biomedicines12051066. Biomedicines. 2024. PMID: 38791028 Free PMC article.

Publication types

  • Search in MeSH

Related information

Linkout - more resources, full text sources.

  • Bentham Science Publishers Ltd.
  • Ingenta plc
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Heart-Healthy Living
  • High Blood Pressure
  • Sickle Cell Disease
  • Sleep Apnea
  • Information & Resources on COVID-19
  • The Heart Truth®
  • Learn More Breathe Better®
  • Blood Diseases & Disorders Education Program
  • Publications and Resources
  • Clinical Trials
  • Blood Disorders and Blood Safety
  • Sleep Science and Sleep Disorders
  • Lung Diseases
  • Health Disparities and Inequities
  • Heart and Vascular Diseases
  • Precision Medicine Activities
  • Obesity, Nutrition, and Physical Activity
  • Population and Epidemiology Studies
  • Women’s Health
  • Research Topics
  • All Science A-Z
  • Grants and Training Home
  • Policies and Guidelines
  • Funding Opportunities and Contacts
  • Training and Career Development
  • Email Alerts
  • NHLBI in the Press
  • Research Features
  • Ask a Scientist
  • Past Events
  • Upcoming Events
  • Mission and Strategic Vision
  • Divisions, Offices and Centers
  • Advisory Committees
  • Budget and Legislative Information
  • Jobs and Working at the NHLBI
  • Contact and FAQs
  • NIH Sleep Research Plan
  • < Back To Research Topics

Coronary Heart Disease Research

Language switcher.

For almost 75 years, the NHLBI has been at the forefront of improving the nation’s health and reducing the burden of  heart and vascular diseases . Heart disease, including coronary heart disease, remains the leading cause of death in the United States. However, the rate of heart disease deaths has declined by 70% over the past 50 years, thanks in part to NHLBI-funded research. Many current studies funded by the NHLBI focus on discovering genetic associations and finding new ways to prevent and treat the onset of coronary heart disease and associated medical conditions.

Icon of document with medical cross symbol

NHLBI research that really made a difference

The NHLBI supports a wide range of long-term studies to understand the risk factors of coronary heart disease. These ongoing studies, among others, have led to many discoveries that have increased our understanding of the causes of cardiovascular disease among different populations, helping to shape evidence-based clinical practice guidelines.

  • Risk factors that can be changed:  The NHLBI  Framingham Heart Study (FHS)  revealed that cardiovascular disease is caused by modifiable risk factors such as smoking,  high blood pressure ,  obesity , high  cholesterol  levels, and physical inactivity. It is why, in routine physicals, healthcare providers check for high blood pressure, high cholesterol, unhealthy eating patterns, smoking, physical inactivity, and unhealthy weight. The FHS found that cigarette smoking increases the risk of heart disease. Researchers also showed that cardiovascular disease can affect people differently depending on sex or race, underscoring the need to address health disparities. 
  • Risk factors for Hispanic/Latino adults:  The  Hispanic Community Health Study/Study of Latinos (HCHS/SOL)  found that heart disease risk factors are widespread among Hispanic/Latino adults in the United States , with 80% of men and 71% of women having at least one risk factor. Researchers also used HCHS/SOL genetic data to explore genes linked with central adiposity (the tendency to have excess body fat around the waist) in Hispanic/Latino adults. Before this study, genes linked with central adiposity, a risk factor for coronary heart disease, had been identified in people of European ancestry. These results showed that those genes also predict central adiposity for Hispanic/Latino communities. Some of the genes identified were more common among people with Mexican or Central/South American ancestry, while others were more common among people of Caribbean ancestry.
  • Risk factors for African Americans:  The  Jackson Heart Study (JHS) began in 1997 and includes more than 5,300 African American men and women in Jackson, Mississippi. It has studied genetic and environmental factors that raise the risk of heart problems, especially high blood pressure, coronary heart disease,  heart failure ,  stroke , and  peripheral artery disease (PAD) . Researchers discovered a gene variant in African American individuals that doubles the risk of heart disease. They also found that even small spikes in blood pressure can lead to a higher risk of death. A community engagement component of the JHS is putting 20 years of the study’s findings into action by turning traditional gathering places, such as barbershops and churches, into health information hubs.
  • Risk factors for American Indians:  The NHLBI actively supports the  Strong Heart Study , a long-term study that began in 1988 to examine cardiovascular disease and its risk factors among American Indian men and women. The Strong Heart Study is one of the largest epidemiological studies of American Indian people ever undertaken. It involves a partnership with 12 Tribal Nations and has followed more than 8,000 participants, many of whom live in low-income rural areas of Arizona, Oklahoma, and the Dakotas. Cardiovascular disease remains the leading cause of death for American Indian people. Yet the prevalence and severity of cardiovascular disease among American Indian people has been challenging to study because of the small sizes of the communities, as well as the relatively young age, cultural diversity, and wide geographic distribution of the population. In 2019, the NHLBI renewed its commitment to the Strong Heart Study with a new study phase that includes more funding for community-driven pilot projects and a continued emphasis on training and development. Read more about the  goals and key findings  of the Strong Heart Study.

Current research funded by the NHLBI

Within our  Division of Cardiovascular Sciences , the Atherothrombosis and Coronary Artery Disease Branch of its  Adult and Pediatric Cardiac Research Program and the  Center for Translation Research and Implementation Science  oversee much of our funded research on coronary heart disease.

Research funding  

Find  funding opportunities  and  program contacts for research on coronary heart disease. 

Current research on preventing coronary heart disease

  • Blood cholesterol and coronary heart disease: The NHLBI supports new research into lowering the risk of coronary heart disease by reducing levels of cholesterol in the blood. High levels of blood cholesterol, especially a type called low-density lipoprotein (LDL) cholesterol, raise the risk of coronary heart disease. However, even with medicine that lowers LDL cholesterol, there is still a risk of coronary heart disease due to other proteins, called triglyceride-rich ApoB-containing lipoproteins (ApoBCLs), that circulate in the blood. Researchers are working to find innovative ways to reduce the levels of ApoBCLs, which may help prevent coronary heart disease and other cardiovascular conditions.
  • Pregnancy, preeclampsia, and coronary heart disease risk: NHLBI-supported researchers are investigating the link between developing preeclampsia during pregnancy and an increased risk for heart disease over the lifespan . This project uses “omics” data – such as genomics, proteomics, and other research areas – from three different cohorts of women to define and assess preeclampsia biomarkers associated with cardiovascular health outcomes. Researchers have determined that high blood pressure during pregnancy and low birth weight are predictors of atherosclerotic cardiovascular disease in women . Ultimately, these findings can inform new preventive strategies to lower the risk of coronary heart disease.
  • Community-level efforts to lower heart disease risk among African American people: The NHLBI is funding initiatives to partner with churches in order to engage with African American communities and lower disparities in heart health . Studies have found that church-led interventions reduce risk factors for coronary heart disease and other cardiovascular conditions. NHLBI-supported researchers assessed data from more than 17,000 participants across multiple studies and determined that these community-based approaches are effective in lowering heart disease risk factors .

Find more NHLBI-funded studies on  preventing coronary heart disease  on the NIH RePORTER.

plaque

Learn about the impact of COVID-19 on your risk of coronary heart disease.

Current research on understanding the causes of coronary heart disease

  • Pregnancy and long-term heart disease:  NHLBI researchers are continuing the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-be (nuMoM2b)   study to understand the relationship between pregnancy-related problems, such as gestational hypertension, and heart problems. The study also looks at how problems during pregnancy may increase risk factors for heart disease later in life. NuMoM2b launched in 2010, and long-term studies are ongoing, with the goal of collecting high-quality data and understanding how heart disease develops in women after pregnancy.
  • How coronary artery disease affects heart attack risk: NHLBI-funded researchers are investigating why some people with coronary artery disease are more at risk for heart attacks than others. Researchers have found that people with coronary artery disease who have high-risk coronary plaques are more likely to have serious cardiac events, including heart attacks. However, we do not know why some people develop high-risk coronary plaques and others do not. Researchers hope that this study will help providers better identify which people are most at risk of heart attacks before they occur.
  • Genetics of coronary heart disease:  The NHLBI supports studies to identify genetic variants associated with coronary heart disease . Researchers are investigating how genes affect important molecular cascades involved in the development of coronary heart disease . This deeper understanding of the underlying causes for plaque buildup and damage to the blood vessels can inform prevention strategies and help healthcare providers develop personalized treatment for people with coronary heart disease caused by specific genetic mutations.

Find more NHLBI-funded studies on understanding the  causes of coronary heart disease  on the NIH RePORTER.

statin tablets

Recent findings suggest that cholesterol-lowering treatment can lower the risk of heart disease complications in people with HIV.

Current research on treatments for coronary heart disease

  • Insight into new molecular targets for treatment: NHLBI-supported researchers are investigating the role of high-density lipoprotein (HDL) cholesterol in coronary heart disease and other medical conditions . Understanding how the molecular pathways of cholesterol affect the disease mechanism for atherosclerosis and plaque buildup in the blood vessels of the heart can lead to new therapeutic approaches for the treatment of coronary heart disease. Researchers have found evidence that treatments that boost HDL function can lower systemic inflammation and slow down plaque buildup . This mechanism could be targeted to develop a new treatment approach for coronary heart disease.
  • Long-term studies of treatment effectiveness: The NHLBI is supporting the International Study of Comparative Health Effectiveness with Medical and Invasive Approaches (ISCHEMIA) trial EXTENDed Follow-up (EXTEND) , which compares the long-term outcomes of an initial invasive versus conservative strategy for more than 5,000 surviving participants of the original ISCHEMIA trial. Researchers have found no difference in mortality outcomes between invasive and conservative management strategies for patients with chronic coronary heart disease after more than 3 years. They will continue to follow up with participants for up to 10 years. Researchers are also assessing the impact of nonfatal events on long-term heart disease and mortality. A more accurate heart disease risk score will be constructed to help healthcare providers deliver more precise care for their patients.
  • Evaluating a new therapy for protecting new mothers: The NHLBI is supporting the Randomized Evaluation of Bromocriptine In Myocardial Recovery Therapy for Peripartum Cardiomyopathy (REBIRTH) , for determining the role of bromocriptine as a treatment for peripartum cardiomyopathy (PPCM). Previous research suggests that prolactin, a hormone that stimulates the production of milk for breastfeeding, may contribute to the development of cardiomyopathy late in pregnancy or the first several months postpartum. Bromocriptine, once commonly used in the United States to stop milk production, has shown promising results in studies conducted in South Africa and Germany. Researchers will enroll approximately 200 women across North America who have been diagnosed with PPCM and assess their heart function after 6 months. 
  • Impact of mental health on response to treatment:  NHLBI-supported researchers are investigating how mental health conditions can affect treatment effectiveness for people with coronary heart disease. Studies show that depression is linked to a higher risk for negative outcomes from coronary heart disease. Researchers found that having depression is associated with poor adherence to medical treatment for coronary heart disease . This means that people with depression are less likely to follow through with their heart disease treatment plans, possibly contributing to their chances of experiencing worse outcomes. Researchers are also studying new ways to treat depression in patients with coronary heart disease .

Find more NHLBI-funded studies on  treating coronary heart disease  on the NIH RePORTER.  

lungs

Researchers have found no clear difference in patient survival or heart attack risk between managing heart disease through medication and lifestyle changes compared with invasive procedures. 

Coronary heart disease research labs at the NHLBI

  • Laboratory of Cardiac Physiology
  • Laboratory of Cardiovascular Biology
  • Minority Health and Health Disparities Population Laboratory
  • Social Determinants of Obesity and Cardiovascular Risk Laboratory
  • Laboratory for Cardiovascular Epidemiology and Genomics
  • Laboratory for Hemostasis and Platelet Biology

Related coronary heart disease programs

  • In 2002, the NHLBI launched  The Heart Truth® ,  the first federally sponsored national health education program designed to raise awareness about heart disease as the leading cause of death in women. The NHLBI and  The Heart Truth®  supported the creation of the Red Dress® as the national symbol for awareness about women and heart disease, and also coordinate  National Wear Red Day ® and  American Heart Month  each February. 
  • The  Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC)  facilitates access to and maximizes the scientific value of NHLBI biospecimen and data collections. A main goal is to promote the use of these scientific resources by the broader research community. BioLINCC serves to coordinate searches across data and biospecimen collections and provide an electronic means for requesting additional information and submitting requests for collections. Researchers wanting to submit biospecimen collections to the NHLBI Biorepository to share with qualified investigators may also use the website to initiate the application process. 
  • Our  Trans-Omics for Precision Medicine (TOPMed) Program  studies the ways genetic information, along with information about health status, lifestyle, and the environment, can be used to predict the best ways to prevent and treat heart, lung, blood, and sleep disorders. TOPMed specifically supports NHLBI’s  Precision Medicine Activities. 
  • NHLBI  population and epidemiology studies  in different groups of people, including the  Atherosclerosis Risk in Communities (ARIC) Study , the  Multi-Ethnic Study of Atherosclerosis (MESA) , and the  Cardiovascular Health Study (CHS) , have made major contributions to understanding the causes and prevention of heart and vascular diseases, including coronary heart disease.
  • The  Cardiothoracic Surgical Trials Network (CTSN)  is an international clinical research enterprise that studies  heart valve disease ,  arrhythmias , heart failure, coronary heart disease, and surgical complications. The trials span all phases of development, from early translation to completion, and have more than 14,000 participants. The trials include six completed randomized clinical trials, three large observational studies, and many other smaller studies.

The Truth About Women and Heart Disease Fact Sheet

Learn how heart disease may be different for women than for men.

Explore more NHLBI research on coronary heart disease

The sections above provide you with the highlights of NHLBI-supported research on coronary heart disease. You can explore the full list of NHLBI-funded studies on the NIH RePORTER .

To find more studies:

  • Type your search words into the  Quick Search  box and press enter. 
  • Check  Active Projects  if you want current research.
  • Select the  Agencies  arrow, then the  NIH  arrow, then check  NHLBI .

If you want to sort the projects by budget size — from the biggest to the smallest — click on the  FY Total Cost by IC  column heading.

Advertisement

Advertisement

Heart disease risk prediction using deep learning techniques with feature augmentation

  • Open access
  • Published: 14 March 2023
  • Volume 82 , pages 31759–31773, ( 2023 )

Cite this article

You have full access to this open access article

research paper of heart diseases

  • María Teresa García-Ordás 1 ,
  • Martín Bayón-Gutiérrez 1 ,
  • Carmen Benavides 2 ,
  • Jose Aveleira-Mata 1 &
  • José Alberto Benítez-Andrades   ORCID: orcid.org/0000-0002-4450-349X 2  

11k Accesses

12 Citations

1 Altmetric

Explore all metrics

Cardiovascular diseases state as one of the greatest risks of death for the general population. Late detection in heart diseases highly conditions the chances of survival for patients. Age, sex, cholesterol level, sugar level, heart rate, among other factors, are known to have an influence on life-threatening heart problems, but, due to the high amount of variables, it is often difficult for an expert to evaluate each patient taking this information into account. In this manuscript, the authors propose using deep learning methods, combined with feature augmentation techniques for evaluating whether patients are at risk of suffering cardiovascular disease. The results of the proposed methods outperform other state of the art methods by 4.4%, leading to a precision of a 90%, which presents a significant improvement, even more so when it comes to an affliction that affects a large population.

Similar content being viewed by others

research paper of heart diseases

Heart Disease Prediction Using Deep Learning Algorithm

research paper of heart diseases

An Extensive Review of Machine Learning and Deep Learning Techniques on Heart Disease Classification and Prediction

research paper of heart diseases

Application of Deep Learning Methods in the Diagnosis of Coronary Heart Disease Based on Electronic Health Record

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction and related work

Cardiovascular diseases (CVDs) are the main reasons for disease burden and mortality all over the world [ 22 ]. The term “cardiovascular disease” includes a wide range of conditions affecting the heart and blood vessels and the way blood is pumped and circulated through the body [ 29 ]. Heart disease is a common disease which has given rise to the deaths of many people. This is because it affects heart function and may cause death [ 9 ]. Over the last few decades, the population worldwide is has increasingly suffered from heart disease,considered one of the most significant causes of fatalities. About 17.7 million people die anually because of heart disease [ 19 ]. Diagnoses related to heart disease are made by a specialized doctor and it is essential for good treatment. This has the disadvantage that diagnoses may not be entirely objective and is subject to human error.

Heart failure has been subject to significant research as a result of its complicated diagnostic procedure [ 15 ] making a Computer Aided Decision Support System very helpful in this field, as the one presented in [ 20 ], where data mining techniques were used to reduce the time it takes to make an accurate prediction of the disease

Heart diseases are highly varied and lead to different types of complications that can lead to reduced quality of life and even death, especially in developing countries [ 30 ]. Furthermore, the number of deaths that occurs due to heart failure is higher in developing countries and in those with worse health facilities [ 23 ]. This highlights the need for the development of a method that can guarantee an accurate and early prediction of the risk of heart failure in patients.

For these reasons, many authors have developed methods that help in the detection of heart disease, by taking into account different factors. Most of this methods use machine learning techniques to prevent the problems derived from statistical analysis methods, that fail to capture prognostic information in large datasets containing multi-dimensional interactions [ 3 , 17 , 18 , 25 , 26 ]. Some of these papers have generally benefited from large datasets that allow detection of existing diseases thanks to historical data over a long period of time. On the other hand, the results that were obtained until a few years ago, generally focused more on determining abnormal behaviors of the heart without going into detail about whether they are possible serious cases or simply benign to health.

In [ 1 ], the authors used a boosted decision tree algorithm to capture correlations between patient characteristics and mortality. Patients were classified as having a high or low risk of death based on their early death or not of them after visiting the hospital. Results obtained an area under the curve of 0.88 considering the proposed labels.

Pires et al. proposed in [ 27 ] the use of different methods, such as Neural Networks, Decision Trees, k-Nearest Neighbor (kNN), Combined nomenclature (CN2) rule inducer, Support Vector Machine (SVM), and Stochastic Gradient Descent (SGD) obtaining results of up to 87.69% which is a good result in the case of predicting heart disease. The article proposes a method that is capable of detecting heart disease with an acceptable success rate, but validations have been carried out with a limited number of individuals.

More recently, Ali et al. [ 4 ] tried to identify the best machine learning classifiers -that with the highest accuracy- for these diagnostic purposes. Several supervised machine-learning algorithms were applied and compared for performance and accuracy in heart disease prediction. Results showed a 100% of accuracy using Random Forest, Decision Trees and kNN but they presented only the best result achieved after a cross-validation process which is not very conclusive.

Those papers used classic methods to predict heart diseases. However, the precision of the classifications can be enhanced by using improved techniques, as in the case of Faiayaz et al., [ 11 ], which managed to increase the accuracy by 5.68% compared to the original kNN, by proposing an improved KNN. In [ 19 ] a hybrid method of Random Forest with a Linear Model (HRFLM) technique reached an 88.7% precision on the Cleveland dataset which contains 297 records.

All these works seem very promising but lack the need to generalize with a larger number of patients.

As classical machine learning models showed promising results for this problem, many other authors have tried to combine different machine learning algorithms.

For example, in [ 24 ], many machine learning techniques, such as generalized boosted regression, main-terms Bayesian logistic regression using a Cauchy prior, penalized regression, main-terms logistic regression with and without variable selection, bootstrap aggregation of regression trees, multivariate adaptive regression splines, the arithmetic mean assigning the marginal probability of mortality to each patient, and classification and regression trees using random forest, were used in a 30-day mortality risk prediction after discharge in patients with heart failure. Results showed that ensemble models achieved better results than the benchmark models.

Authors in [ 16 ] combined five classifier model approaches, including support vector machine, artificial neural network, Naïve Bayes, regression analysis, and random forest, to predict and diagnose the recurrence of cardiovascular disease. In this case, Cleveland and Hungarian datasets from the UCI repository were used.Results demonstrated the better performance of ensemble algorithms obtaining the best results with the random forest method (98.12% of accuracy).

Ensemble algorithms have demonstrated well performance not only in heart failure detection but also in another diseases such as breast cancer [ 8 ], Hepatitis C [ 10 ] or Diabetics retinopathy [ 14 ].

In the last years, deep learning techniques based on neural networks have also been used to address medical problems including heart failure risk.

Convolutional Neural Networks (CNN) was used in [ 2 ] for the early identification of individuals at risk of heart failure using solely electrocardiograms (ECGs) obtaining a 0.78 AUC. Adaptive multi-layer networks were also used in [ 28 ] to predict the risk of heart failure, outperforming classical neural networks and even hybrid and ensemble techniques proposed in the previous years. In this work, the Cleveland dataset was used, so the number of samples was small with only 297 patients evaluated.

In the last year, a new dataset [ 12 ] consisting of some well-known datasets such as Cleveland (303 observations), Hungary (294 observations), Switzerland (123 observations), Long Beach VA (200 observations), and Stalog (270 observations) allowed the training of new techniques, that were capable of classifying this high volume of samples with a very limited number of features (11).

The main goal of this paper is:

Obtaining results for the classification of heart diseases that allow us to have a high percentage of successes in the early detection of the disease.

Taking into account the dataset evaluated, two secondary goals have been achieved:

Finding a new methodology for classification problems with a very low number of features.

Harnessing the properties of convolutional neural networks to improve current feature augmentation techniques.

To achieve these goals, in this work, an architecture based on convolutional neural networks and a sparse autoencoder is proposed for the treatment of the data and its subsequent analysis, obtaining precisions of up to 90% , which can be considered a significant advance and a great help in determining the risk of heart diseases.

The rest of this paper is organized as follows: In Section  2 , the classic methods used to compare results and the architectures used in our proposal are explained. Experiments and results, as well as the dataset used, are detailed in Section  3 . Finally, we conclude in Section  4 .

2 Methodology

2.1 classic methods.

Several classic methods have been used to compare the results with the proposed architecture. They are going to be introduced very briefly as they are all well-known methods.

2.1.1 Decision tree

A decision tree is a model in which each internal or intermediate node is labeled with an input feature. These intermediate nodes are those in which a decision must be made between several possible ones. Arcs are the unions between nodes and come from a node labeled with one input feature which is labeled with each of the possible values of the output or target feature. Alternatively, the arc leads to a subordinate decision node on a different input feature. Leaf nodes are labeled with a class or probability distribution between classes, which means that the tree has classified the data set into a specific class or a particular probability distribution so these nodes contain the final selected class.

2.1.2 Random forest

Random forest [ 6 ] is a combination of decision trees. Each of the decision trees that make up the forest is built as follows: First, the number of data (N) and the number of variables of the classifier (M) are defined. The number of input variables that is used to determine the decision of a certain node is called m . For each node of the tree, m variables are chosen and from these m variables, the best partition of the set is calculated. To predict a new case, the nodes of the tree are traversed downwards and the label of the terminal node it reaches is assigned to it. This process is iterated throughout all the trees in the forest and the one that obtains the highest number of indices will be the one used as a predictor. Random forest is one of the most accurate learning algorithms, as long as a large enough data set is used [ 7 ].

2.1.3 K-nearest neighbors

The k nearest neighbors method is one of the simplest. Unlike other machine learning algorithms, k-NN does not generate a model from the training data, but rather the learning takes place at the same moment in which it is tested with the test data, therefore, it is a lazy learning method. The input data are vectors of dimension p of the form:

In the training phase, the vectors and class labels of the training examples are stored and the distance between the stored vectors and the new vector is calculated in the classification phase, and the k examples closest to that new input data are selected. The new data are classified with the class that repeats the most in the selected vectors. Any metric can be used to calculate the distance, but the most common one is to use the Euclidean distance (See ( 2 )).

2.1.4 AdaBoost

AdaBoost consists of combining several weak classifiers in order to obtain a robust classifier. The main idea is to assign greater weights to poorly classified data and to assign less weight to data that has been well classified. Thus, each weak classifier focuses more on badly classified cases, thus improving the results. In our case, decision trees have been used as the base method. The algorithm is made up of three steps. First, the weights are initialized and each of the N samples is given the same weight. In a second step, a weak classifier is trained taking into account that if the data is correctly classified the weight is reduced and if it is badly classified, the weight is increased to give it more importance. Finally, the weak classifiers obtained in each training are combined into a strong classifier.

2.1.5 XGBoost

The XGBoost classifier has also been used to compare the results obtained with those obtained with our proposal. The difference between this method and AdaBoost is that in each iteration, instead of assigning more weight to misclassified samples, XGBoost focuses on reducing losses. Each iteration focuses on reducing the error and establishing a new model to reduce the loss (negative gradient) further.

2.1.6 Gaussian Naive Bayes (GNB)

This classifier is based on Bayes’ theorem. The predictor variables are assumed to be independent of each other. First, the data set is converted into a frequency table. A probability table is created, Bayes’ theorem is applied and the class with the highest posterior probability is the result of the prediction. It works in the same way as the Naive Bayes classifier but in this case GNB follows a Gaussian normal distribution and supports continuous data.

2.1.7 Multilayer Perceptron (MLP)

A multilayer perceptron is a neural network that has an input layer, an output layer and one or more intermediate layers with a certain number of neurons. It has the peculiarity that it has a linear activation function in all neurons and each neuron of a layer is connected with the neurons of the previous and next layer learning complex information on the input data.

2.2 Proposal

2.2.1 reconstruction and feature augmentation: sparse autoencoder (sae).

In a conventional autoencoder, the latent space has fewer neurons than the input and output layers, so it is possible to represent a feature vector in a reduced way. In the case of the sparse autoencoder (SAE), the latent space has more neurons than the input and output layers. An L1 regularization term is also added to the latent space to force the network to just use some neurons each time. With this type of network we manage to increase the number of features of the data and analyze them from a different perspective.

In Fig.  1 the typical architecture of a SAE can be seen.

figure 1

Vanilla SAE network architecture. In the latent space, and L1 regularization term is applied

Once the Sparse Autoencoder has been trained for input reconstruction, the decoder part of the neural network can be detached, leaving only the part of the encoder that reaches the latent space. The encoder increases the features of the input data in such a way that the initial N features are dissociated to form M features (with N<M). This procedure allows to keep all the information of the original data and add additional information that was hidden.

2.2.2 Data classification: Convolutional Neural Networks (CNN)

Convolutional neural networks can take two-dimensional data, such as images, as input and are capable of extracting complex features from that data. This information is extracted thanks to the filters (kernels). In the training process, the filter weights are adjusted to carry out an accurate feature map of each class.

In the architecture of a CNN, each convolutional layer must be followed by a pooling layer. The pooling layers are responsible for reducing the overfitting and the number of network parameters so that the computation is not so heavy. The most widely used type of pooling is Max-pooling, which acts by selecting the maximum value of each window. The last layers of the convolutional network have to be dense layers to be able to carry out the classification of the features extracted by the kernels. A vanilla CNN is represented in Fig.  2

figure 2

A vanilla CNN representation

2.2.3 Our proposal: multi task neural network

The proposed architecture consists of two tasks. On the one hand, the features of each data are augmented using a sparse autoencoder (SAE). Furthermore, the latent space is connected with a classifier which is trained together with the SAE. We have evaluated two different classifiers (to join with the SAE): a traditional MLP (see Fig.  3 ) and a convolutional neural network (see Fig.  4 ). The two tasks are described independently below.

figure 3

Multitask neural network composed of Sparse Autoencoder and MLP classifier

figure 4

Multitask neural network composed of Sparse Autoencoder and CNN classifier

In Fig.  5 , three different approaches dealing with data with a small number of features are presented. The first approach takes as an input the whole training dataset and carries out a feature augmentation process to increase the number of them. This new dataset with more features than the original one is took as the input of the classification model, determining if the sample has a positive class or not.

figure 5

Schema of three different approaches dealing with data with a small number of features

Second approach ignores the feature augmentation step and directly feed the model with the small dataset.

Our proposed approach is based on a parallel training process of a data augmentation neural network called Sparse Autoencoder and a classifier network which takes as the input the maximized dataset to predict the final class. Training a model with this approach makes the process of augmenting features something trainable, and as information is processed, it is capable of improving to achieve more accurate predictions.

3 Experiments and results

3.1 dataset.

The dataset used is made up of 11 clinical features: the patient’s age, sex, type of chest pain (typical angina, atypical angina, non-anginal pain or asymptomatic), the resting blood pressure mmHg, the serum cholesterol (mm/dl), the fasting blood sugar (value 1 if FastingBS > 120 mg/dl, and value 0 otherwise), resting electrocardiogram results (which can be Normal, ST if the patient has ST-T abnormalities or LVH if the patient shows probable ventricular hypertrophy), the maximum heart rate (numeric value between 60 and 202), exercise-induced angina which can be yes or no, the oldpeak (numeric value measured in depression) and finally, the slope of the peak exercise ST segment (Up, Flat, Down). The column number 12 contains the output class which can be 1 (heart disease) or 0 (normal).

This dataset was created on September 2021 by combining different datasets already available independently, but not combined before: Cleveland, Hungarian, Switzerland, Long Beach, stalog. The final heart disease dataset is made up of 918 samples with a similar number of cases for each class, 410 corresponding to the healthy class and 508 to the with heart issues class [ 12 ]. For that reason, no methods have been needed to deal with unbalanced classes.

3.2 Experimental setup

First, a preprocessing step was carried out in order to clean and extract more useful information from the dataset. The age was dropped and three new columns were added representing an age range: young, adult and elder. In the same way, the feature resting BP was converted into three new columns for lowBP, mediumBP and highBP. Lastly, the Cholesterol feature was converted into three categorical columns which determines the risk: low, medium, high.

One hot encoding technique was applied on three features: ChestPainType, RestingECG and ST_Slope. Finally, Sex and Exercise Angina were processed using a label encoder. At the end of this process, we have a dataset made up of 24 features.

A k-fold cross-validation has been carried out with 10 folds for all of the experiments in order to avoid randomness. Every model has been evaluated through an extensive hyperparameter grid search. In the results, the score for the hyperparameter configuration with the best mean value of the 10 folds is presented. Neural-network architectures are comprised of one sparse autoencoder, for feature augmentation, and one classifier both of which are trained at the same time. We have carried out two different configurations. The first one uses an MLP classifier and the second one converts the latent space into a bidimensional matrix to train a 2D convolutional neural network. In both cases, the training was carried out using the ADAM optimizer due to its fast optimization time, the invariance to rescaling of the gradient and its possibility of working with sparse gradients which highly increase the performance of the neural network.

The selected loss function was binary cross-entropy for the classification subnet and a mean squared error for the decoder. Binary cross-entropy can be defined as:

and it is used because it is equivalent to fitting the model using maximum likelihood. Mean Squared error can be defined as:

and it is used because it penalizes a lot the large error which is very interesting for feature reconstruction in the autoencoder problem.

Different latent space sizes were evaluated to study the importance of this parameter in the final classification.

3.3 Results

To compare the proposed method along other approaches, authors have carried out an analysis of classical machine learning methods discussed in Section  2.1 . As discussed, a grid search has been applied to find the best hyperparameters for each method. This results can be observed in Fig.  6 , where the mean of the 10-fold validation with the best configuration is presented.

figure 6

Results obtained using classical machine learning methods

These results demonstrate the good performance of the MLP neural network, which achieved the best accuracy with 86.281%. Following that, the Random forest or the Adaboost ensemble method also performed similarly. In contrast, decision trees obtained the worst performance with a 78.978% of accuracy, 8.46% lower than MLP.

Because of the good performance of neural networks, we have carried out training by combining MLP with feature augmentation through an sparse autoencoder. In Fig.  7 we present see the results achieved with different latent space sizes. All of the values represent the mean accuracy on a 10-fold cross-validation.

figure 7

Results obtained using the architecture which combines MLP for classification and sparse autoencoder for feature augmentation

When the latent space has a size of 100 features, in contrast to the original 11 features, the accuracy of the classification improved by 3.78% achieving an accuracy of 89.543%.

As preliminary results exposed improvements in the classification when using a multitasking neural network for training the SAE and the MLP at the same time and given that this approach not only performs the best for heart problem detection but it also allows to the generation of new features from the data set, a new set of experiments have been carried out by replacing the MLP classifier with a 2D CNN. This approach helps CNN to deal with structured data by rearranging its data using SAE adding the optimal spatial representation by reordering the features and creating new ones as a combination of them. In Fig.  8 results with different latent space sizes can be compared.

figure 8

Results obtained using the architecture which combines CNN for classification and sparse autoencoder for feature augmentation

In this new set of experiments, the best result was achieved with a latent space of 200 new features with an accuracy of 90.088% increasing the performance over the classic MLP by more than 4.4% which is a really interesting increase, even more so when it comes to a problem that can cause serious problems in patients, even death.

In Fig.  9 , we can see the improvement in our proposals over the vanilla MLP and the random forest model. As indicated, feature augmentation techniques lead to the improve the results achieved in the original dataset. Furthermore, the approach of combining SAE with a 2D Convolutional Neural Network to rearrange the new features increases the accuracy slightly.

figure 9

Comparison of our proposal multi task neural networks with the classical MLP and Random Forest models

Two statistical tests were performed to verify that the results obtained by the proposed approach are superior to those obtained by more traditional techniques: the Kolmogorov-Smirnov test and the Independent Samples t-test.

To perform these tests, the accuracy results were grouped as follows: (group I) existing techniques, where traditional machine learning and traditional MLP techniques were included, and (group II) techniques proposed in this paper, i.e. MLP and CNN using SAE. The results after applying these tests exposed that the accuracy superiority obtained by the proposed approach (M = 88.99, SD= 1.13) is statistically significant in relation to the accuracy obtained using traditional methods (M = 84.73, SD= 2.61) with a significance of p < 0.001. In the case of the t-test, the t(17) = 4.97, p < 0.001.

Some recent works have also take advantage of the dataset used in this research. In [ 5 ], a study with different machine learning methods was performed. An stacking technique formed by K-NN and Logistic Regresion followed by a k-NN classification using the vote classifier output of the previous techniques obtained the best results with a 87.24% of accuracy. In 2022, Ghosh et al. [ 13 ] achieved an accuracy of 86.40% when using Random Forest Model Lately, in [ 21 ] an stacking algorithm was proposed which combines the output of Logistic Regression, Random Forest, MLP, Cat Boost and Decision Trees and trains a meta classifier to derive the final result. This method achieved an accuracy of 89.86% over the same experimental conditions than our proposal.

In Table  1 a comparison with the state of the art results and our proposal is presented. Our results using the multitasking classifier with CNN outperforms all the other published methods.

4 Conclusions

This article provides deep learning-based methods that allow the combination classification and feature augmentation tasks to address the prediction of heart problems in a dataset consisting of patient records from five independent centers. This dataset consists of 918 samples with only 11 clinical characteristics per sample. A new architectural approach has been proposed that combines the Sparse Autoencoder and the Convolutional Classifier.

As the dataset only contains 11 features, a feature augmentation has been carried out using the Sparse autoencoder to extract new features. Thanks to the high number of features we have extracted, a convolutional neural network can be trained by reshaping them into a 2D array. These two processes are joint in a complex net, which combines SAE and the classifier (MLP or CNN), that has been implemented in order to increase the feature extraction ability by taking into account the classifier information obtained as feedback in the backpropagation algorithm. When the SAE is trained jointly, CNN outperforms MLP in a 0.6% of accuracy. It indicates that CNN interferes in the SAE feature extraction by forcing it to extract more relevant features with spatial location information. MLP also modify the feature extraction carried out by SAE but the improvement is clearly less significant than in the convolutional network.

A deep analysis of the number of neurons in the latent space of the sparse autoencoder, which represents the new features, were performed, concluding that the optimal size was 200. This study is very interesting because it demonstrates that there is a certain size from where the results worsen, which implies that not always the more neurons the better

With this approach, we have achieved 90.088% which represents a 4.4% improvement in comparison with the results obtained by classic classifiers (MLP or RF) trained on the same dataset and under the same conditions.

In addition, our method also obtained better results than those that compose the state of the art, that carried out combinations of algorithms (stacking). Moreover, stacking is a computationally very expensive technique, since it involves tha analysis of several models, sometimes sequentially, to obtain the desired result.

Taking into account that detecting a heart problem in a patient can mean survival, improvements presented in this manuscript, along with the proposed method, are of great interest to specialists in the field.

Data Availability

All the data used in the experiments, are available in a Kaggle https://www.kaggle.com/fedesoriano/heart-failure-prediction

Adler ED, Voors AA, Klein L, Macheret F, Braun OO, Urey MA et al (2020) Improving risk prediction in heart failure using machine learning. Eur J Heart Fail 22(1):139–147. https://doi.org/10.1002/EJHF.1628

Article   Google Scholar  

Akbilgic O, Butler L, Karabayir I, Chang P, Kitzman D, Alonso A et al (2021) Artificial intelligence applied to ecg improves heart failure prediction accuracy. J Am Coll Cardiol 77(18):3045. https://doi.org/10.1016/S0735-1097(21)04400-4

Albert KF, John R, Divyang P, Saleem T, Kevin MT, Carolyn JP et al (2019) Machine learning prediction of response to cardiac resynchronization therapy: improvement versus current guidelines. Circ Arrhythmia Electrophysiol, vol 12(7). https://doi.org/10.1161/CIRCEP.119.007316

Ali MM, Paul BK, Ahmed K, Bui FM, Quinn JMW, Moni MA (2021) Heart disease prediction using supervised machine learning algorithms: performance analysis and comparison. Comput Biol Med 136:104672. https://doi.org/10.1016/J.COMPBIOMED.2021.104672

Araujo M, Pope L, Still S, Yannone C (2021) Prediction of heart disease with machine learning techniques. Graduate Res, Kennesaw State Un

Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

Article   MATH   Google Scholar  

Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Conference: machine learning, proceedings of the twenty-fifth international conference (ICML 2008), Helsinki, Finland

Dalal S, Onyema EM, Kumar P, Maryann DC, Roselyn AO, Obichili MI (2022) A hybrid machine learning model for timely prediction of breast cancer. Int J Model Simul Sci Comput 0(0):2341023. https://doi.org/10.1142/S1793962323410234

Diwakar M, Tripathi A, Joshi K, Memoria M, Singh P, Kumar N (2021) Latest trends on heart disease prediction using machine learning and image fusion. Mater Today: Proc 37(Part 2):3213–3218. https://doi.org/10.1016/J.MATPR.2020.09.078

Edeh MO, Dalal S, Dhaou IB, Agubosim CC, Umoke CC, Richard-Nnabu NE et al (2022) Artificial intelligence-based ensemble learning model for prediction of hepatitis C disease. Front Public Health 10:892371

Faiayaz Waris S, Koteeswaran S (2021) Heart disease early prediction using a novel machine learning method called improved K-means neighbor classifier in python. Mater Today: Proc, https://doi.org/10.1016/J.MATPR.2021.01.570

Fedesoriano Heart failure prediction dataset kaggle. Available from https://www.kaggle.com/fedesoriano/heart-failure-prediction . Accessed 12 September 2022

Ghosh A, Jana S (2022) A study on heart disease prediction using different classification models based on cross validation method. Int J Eng Res Technol, https://doi.org/10.17577/IJERTV11IS060029

Ghouali S, Onyema E, Guellil M, Wajid MA, Clare O, Cherifi W et al (2022) Artificial intelligence-based teleopthalmology application for diagnosis of diabetics retinopathy. IEEE Open J Eng Med Biol, pp 1–11. https://doi.org/10.1109/OJEMB.2022.3192780

Go AS, Mozaffarian D, Roger VL, Benjamin EJ, Berry JD, Blaha MJ et al (2014) Heart disease and stroke statistics—2014 update. Circulation, vol 129(3). https://doi.org/10.1161/01.CIR.0000441139.02102.80

Jan M, Awan AA, Khalid MS, Nisar S (2018) Ensemble approach for developing a smart heart disease prediction system using classification algorithms. Res Rep Clin Cardiol 9:33–45. https://doi.org/10.2147/RRCC.S172035

Google Scholar  

Khajehali N, Khajehali Z, Tarokh MJ (2021) The prediction of mortality influential variables in an intensive care unit: a case study. Personal Ubiquit Comput, https://doi.org/10.1007/s00779-021-01540-5

Kim YJ, Saqlian M, Lee JY (2022) Deep learning–based prediction model of occurrences of major adverse cardiac events during 1-year follow-up after hospital discharge in patients with AMI using knowledge mining. Personal Ubiquit Comput 26(2):259–267. https://doi.org/10.1007/s00779-019-01248-7

Kondababu A, Siddhartha V, Kumar BB, Penumutchi B (2021) A comparative study on machine learning based heart disease prediction. Mater Today: Proc. https://doi.org/10.1016/J.MATPR.2021.01.475

Krishnaiah V, Narsimha G, Chandra NS (2016) Heart disease prediction system using data mining techniques and intelligent fuzzy approach: a review. Int J Comput Appl 136(2):975–8887

Liu J, Dong X, Zhao H, Tian Y (2022) Predictive classifier for cardiovascular disease based on stacking model fusion. Processes, vol 10(4). https://doi.org/10.3390/pr10040749

Maini E, Venkateswarlu B, Maini B, Marwaha D (2021) Machine learning–based heart disease prediction system for Indian population: an exploratory study done in South India. Med J Armed Forces India 77(3):302–311. https://doi.org/10.1016/J.MJAFI.2020.10.013

Muzammal M, Talat R, Sodhro AH, Pirbhulal S (2020) A multi-sensor data fusion enabled ensemble approach for medical data from body sensor networks. Inf Fusion 53:155–164. https://doi.org/10.1016/J.INFFUS.2019.06.021

Negassa A, Ahmed S, Zolty R, Patel SR (2021) Prediction model using machine learning for mortality in patients with heart failure. Am J Cardiol 153:86–93. https://doi.org/10.1016/J.AMJCARD.2021.05.044

Olsen CR, Mentz RJ, Anstrom KJ, Page D, Patel PA (2020) Clinical applications of machine learning in the diagnosis, classification, and prediction of heart failure. Am Heart J 229:1–17. https://doi.org/10.1016/J.AHJ.2020.07.009

Panahiazar M, Taslimitehrani V, Pereira N, Pathak J (2015) Using EHRs and machine learning for heart failure survival analysis. Stud Health Technol Inform 216:40

Pires IM, Marques G, Garcia NM, Ponciano V (2020) Machine learning for the evaluation of the presence of heart disease. Procedia Comput Sci 177:432–437. https://doi.org/10.1016/J.PROCS.2020.10.058

Samuel OW, Yang B, Geng Y, Asogbon MG, Pirbhulal S, Mzurikwao D et al (2020) A new technique for the prediction of heart failure risk driven by hierarchical neighborhood component-based learning and adaptive multi-layer networks. Future Gener Comput Syst 110:781–794. https://doi.org/10.1016/J.FUTURE.2019.10.034

Soni J, Ansari U, Sharma D, Soni S (2011) Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int J Comput Appl 17(8):43–48. https://doi.org/10.5120/2237-2860

Yang H, Garibaldi JM (2015) A hybrid model for automatic identification of risk factors for heart disease. J Biomed Inform 58:S171–S182. https://doi.org/10.1016/J.JBI.2015.09.006

Download references

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research was funded by the Junta de Castilla y Leon grant number LE078G18. This work is partially supported by Universidad de León under the “Programa Propio de Investigación de la Universidad de León 2021” grant.

Author information

Authors and affiliations.

SECOMUCI Research Group, Escuela de Ingenierías Industrial e Informática, Universidad de León, Campus of Vegazana s/n, León, 24071, León, Spain

María Teresa García-Ordás, Martín Bayón-Gutiérrez & Jose Aveleira-Mata

SALBIS Research Group, Department of Electric, Systems and Automatics Engineering, Universidad de León, Campus of Vegazana s/n, León, 24071, León, Spain

Carmen Benavides & José Alberto Benítez-Andrades

You can also search for this author in PubMed   Google Scholar

Contributions

María Teresa García-Ordás : Conceptualization, Data curation, Methodology, Software, Visualization, Validation, Writing- Original draft preparation. Martín Bayón-Gutiérrez : Data curation, Writing- Original draft preparation. Carmen Benavides : Conceptualization, Supervision, Writing- Reviewing and Editing. Jose Aveleira-Mata : Conceptualization, Supervision, Writing- Reviewing and Editing. José Alberto Benítez-Andrades : Data curation, Methodology, Software, Visualization, Validation, Writing- Reviewing and Editing.

Corresponding author

Correspondence to José Alberto Benítez-Andrades .

Ethics declarations

Competing interests.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

María Teresa García-Ordías, Martín Bayón-Gutiérrez, Carmen Benavides, Jose Aveleira-Mata and José Alberto Benítez-Andrades are contributed equally to this work.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

García-Ordás, M.T., Bayón-Gutiérrez, M., Benavides, C. et al. Heart disease risk prediction using deep learning techniques with feature augmentation. Multimed Tools Appl 82 , 31759–31773 (2023). https://doi.org/10.1007/s11042-023-14817-z

Download citation

Received : 30 December 2021

Revised : 13 September 2022

Accepted : 06 February 2023

Published : 14 March 2023

Issue Date : August 2023

DOI : https://doi.org/10.1007/s11042-023-14817-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Sparse autoencoder
  • Convolutional neural network
  • Heart disease
  • Find a journal
  • Publish with us
  • Track your research

Masks Strongly Recommended but Not Required in Maryland, Starting Immediately

Due to the downward trend in respiratory viruses in Maryland, masking is no longer required but remains strongly recommended in Johns Hopkins Medicine clinical locations in Maryland. Read more .

  • Vaccines  
  • Masking Guidelines
  • Visitor Guidelines  

Cardiovascular Research

Research topics, cardiovascular research topics, heart rhythm and arrhythmias.

cardiovascular research - arrythmia image

HIV and Heart Disease

cardiovascular research - hiv 3d image

Hypertension

cardiovascular research - image of red blood cells

Interventional Cardiology

cardiovascular research - catheter inside artery

Myocardial Biology/Heart Failure

cardiovascular research - heart failure image

Myocardial Protection

Striated cardiac muscle cells.

Myocarditis

cardiovascular research - woman looking at vitals

Neuroprotection in Cardiac Surgery

Illustration of an active human brain.

Precision Medicine

an illustration of the heart

Preventive Cardiology

cardiovascular research - two women jogging

Stem cell and Regenerative Biology

cardiovascular research - regenerative biology 3d image

Women and Heart Disease

cardiovascular research - nurse listening to woman's chest with stethoscope

Breakthrough Discoveries Core Lab

The Johns Hopkins Core Lab provides access to Small Animal Cardiovascular Phenotyping and Model Core.

COVID-19: Long-term effects

Some people continue to experience health problems long after having COVID-19. Understand the possible symptoms and risk factors for post-COVID-19 syndrome.

Most people who get coronavirus disease 2019 (COVID-19) recover within a few weeks. But some people — even those who had mild versions of the disease — might have symptoms that last a long time afterward. These ongoing health problems are sometimes called post- COVID-19 syndrome, post- COVID conditions, long COVID-19 , long-haul COVID-19 , and post acute sequelae of SARS COV-2 infection (PASC).

What is post-COVID-19 syndrome and how common is it?

Post- COVID-19 syndrome involves a variety of new, returning or ongoing symptoms that people experience more than four weeks after getting COVID-19 . In some people, post- COVID-19 syndrome lasts months or years or causes disability.

Research suggests that between one month and one year after having COVID-19 , 1 in 5 people ages 18 to 64 has at least one medical condition that might be due to COVID-19 . Among people age 65 and older, 1 in 4 has at least one medical condition that might be due to COVID-19 .

What are the symptoms of post-COVID-19 syndrome?

The most commonly reported symptoms of post- COVID-19 syndrome include:

  • Symptoms that get worse after physical or mental effort
  • Lung (respiratory) symptoms, including difficulty breathing or shortness of breath and cough

Other possible symptoms include:

  • Neurological symptoms or mental health conditions, including difficulty thinking or concentrating, headache, sleep problems, dizziness when you stand, pins-and-needles feeling, loss of smell or taste, and depression or anxiety
  • Joint or muscle pain
  • Heart symptoms or conditions, including chest pain and fast or pounding heartbeat
  • Digestive symptoms, including diarrhea and stomach pain
  • Blood clots and blood vessel (vascular) issues, including a blood clot that travels to the lungs from deep veins in the legs and blocks blood flow to the lungs (pulmonary embolism)
  • Other symptoms, such as a rash and changes in the menstrual cycle

Keep in mind that it can be hard to tell if you are having symptoms due to COVID-19 or another cause, such as a preexisting medical condition.

It's also not clear if post- COVID-19 syndrome is new and unique to COVID-19 . Some symptoms are similar to those caused by chronic fatigue syndrome and other chronic illnesses that develop after infections. Chronic fatigue syndrome involves extreme fatigue that worsens with physical or mental activity, but doesn't improve with rest.

Why does COVID-19 cause ongoing health problems?

Organ damage could play a role. People who had severe illness with COVID-19 might experience organ damage affecting the heart, kidneys, skin and brain. Inflammation and problems with the immune system can also happen. It isn't clear how long these effects might last. The effects also could lead to the development of new conditions, such as diabetes or a heart or nervous system condition.

The experience of having severe COVID-19 might be another factor. People with severe symptoms of COVID-19 often need to be treated in a hospital intensive care unit. This can result in extreme weakness and post-traumatic stress disorder, a mental health condition triggered by a terrifying event.

What are the risk factors for post-COVID-19 syndrome?

You might be more likely to have post- COVID-19 syndrome if:

  • You had severe illness with COVID-19 , especially if you were hospitalized or needed intensive care.
  • You had certain medical conditions before getting the COVID-19 virus.
  • You had a condition affecting your organs and tissues (multisystem inflammatory syndrome) while sick with COVID-19 or afterward.

Post- COVID-19 syndrome also appears to be more common in adults than in children and teens. However, anyone who gets COVID-19 can have long-term effects, including people with no symptoms or mild illness with COVID-19 .

What should you do if you have post-COVID-19 syndrome symptoms?

If you're having symptoms of post- COVID-19 syndrome, talk to your health care provider. To prepare for your appointment, write down:

  • When your symptoms started
  • What makes your symptoms worse
  • How often you experience symptoms
  • How your symptoms affect your activities

Your health care provider might do lab tests, such as a complete blood count or liver function test. You might have other tests or procedures, such as chest X-rays, based on your symptoms. The information you provide and any test results will help your health care provider come up with a treatment plan.

In addition, you might benefit from connecting with others in a support group and sharing resources.

  • Long COVID or post-COVID conditions. Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/long-term-effects.html. Accessed May 6, 2022.
  • Post-COVID conditions: Overview for healthcare providers. Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-care/post-covid-conditions.html. Accessed May 6, 2022.
  • Mikkelsen ME, et al. COVID-19: Evaluation and management of adults following acute viral illness. https://www.uptodate.com/contents/search. Accessed May 6, 2022.
  • Saeed S, et al. Coronavirus disease 2019 and cardiovascular complications: Focused clinical review. Journal of Hypertension. 2021; doi:10.1097/HJH.0000000000002819.
  • AskMayoExpert. Post-COVID-19 syndrome. Mayo Clinic; 2022.
  • Multisystem inflammatory syndrome (MIS). Centers for Disease Control and Prevention. https://www.cdc.gov/mis/index.html. Accessed May 24, 2022.
  • Patient tips: Healthcare provider appointments for post-COVID conditions. https://www.cdc.gov/coronavirus/2019-ncov/long-term-effects/post-covid-appointment/index.html. Accessed May 24, 2022.
  • Bull-Otterson L, et al. Post-COVID conditions among adult COVID-19 survivors aged 18-64 and ≥ 65 years — United States, March 2020 — November 2021. MMWR Morbidity and Mortality Weekly Report. 2022; doi:10.15585/mmwr.mm7121e1.

Products and Services

  • A Book: Endemic - A Post-Pandemic Playbook
  • Begin Exploring Women's Health Solutions at Mayo Clinic Store
  • A Book: Future Care
  • Antibiotics: Are you misusing them?
  • COVID-19 and vitamin D
  • Convalescent plasma therapy
  • Coronavirus disease 2019 (COVID-19)
  • COVID-19: How can I protect myself?
  • Herd immunity and respiratory illness
  • COVID-19 and pets
  • COVID-19 and your mental health
  • COVID-19 antibody testing
  • COVID-19, cold, allergies and the flu
  • COVID-19 tests
  • COVID-19 drugs: Are there any that work?
  • COVID-19 in babies and children
  • Coronavirus infection by race
  • COVID-19 travel advice
  • COVID-19 vaccine: Should I reschedule my mammogram?
  • COVID-19 vaccines for kids: What you need to know
  • COVID-19 vaccines
  • COVID-19 variant
  • COVID-19 vs. flu: Similarities and differences
  • COVID-19: Who's at higher risk of serious symptoms?
  • Debunking coronavirus myths
  • Different COVID-19 vaccines
  • Extracorporeal membrane oxygenation (ECMO)
  • Fever: First aid
  • Fever treatment: Quick guide to treating a fever
  • Fight coronavirus (COVID-19) transmission at home
  • Honey: An effective cough remedy?
  • How do COVID-19 antibody tests differ from diagnostic tests?
  • How to measure your respiratory rate
  • How to take your pulse
  • How to take your temperature
  • How well do face masks protect against COVID-19?
  • Is hydroxychloroquine a treatment for COVID-19?
  • Loss of smell
  • Mayo Clinic Minute: You're washing your hands all wrong
  • Mayo Clinic Minute: How dirty are common surfaces?
  • Multisystem inflammatory syndrome in children (MIS-C)
  • Nausea and vomiting
  • Pregnancy and COVID-19
  • Safe outdoor activities during the COVID-19 pandemic
  • Safety tips for attending school during COVID-19
  • Sex and COVID-19
  • Shortness of breath
  • Thermometers: Understand the options
  • Treating COVID-19 at home
  • Unusual symptoms of coronavirus
  • Vaccine guidance from Mayo Clinic
  • Watery eyes

Related information

  • Post-COVID Recovery & COVID-19 Support Group - Related information Post-COVID Recovery & COVID-19 Support Group
  • Rehabilitation after COVID-19 - Related information Rehabilitation after COVID-19
  • Post-COVID-19 syndrome could be a long haul (podcast) - Related information Post-COVID-19 syndrome could be a long haul (podcast)
  • COVID-19 Coronavirus Long-term effects

Help transform healthcare

Your donation can make a difference in the future of healthcare. Give now to support Mayo Clinic's research.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 06 August 2024

Exploring the predictive factors of heart disease using rare association rule mining

  • Sadeq Darrab 1 ,
  • David Broneske 2 &
  • Gunter Saake 1  

Scientific Reports volume  14 , Article number:  18178 ( 2024 ) Cite this article

204 Accesses

Metrics details

Cardiovascular diseases continue to be the leading cause of mortality worldwide, claiming a significant number of lives each year. Despite the advancements in predictive models, including logistic regression, neural networks, and random forests, these techniques often lack transparency and interpretability, limiting their practical application in clinical settings. To address this challenge, this research introduces EPFHD-RARMING, an innovative approach designed to enhance the understanding and predictability of heart disease through the discovery of rare and meaningful patterns. EPFHD-RARMING utilizes rare association rule mining to uncover hidden and unexpected rules that identify critical factors contributing to heart disease. This method is particularly adept at identifying high-risk patterns in individuals who appear healthy but may develop heart disease under certain conditions, thus facilitating early intervention and preventive measures. By integrating these insights with established feature engineering techniques, EPFHD-RARMING enhances its practical utility, enabling medical professionals to proactively manage patient care and tailor interventions to individual risk profiles. This study demonstrates the effectiveness of EPFHD-RARMING in providing a deeper, actionable understanding of the complex dynamics of heart disease. The model’s ability to identify and interpret rare patterns holds significant promise for advancing medical analytics and improving patient outcomes. Moreover, the applicability of EPFHD-RARMING extends beyond the healthcare domain, offering valuable insights in various fields where the discovery of rare patterns is critical, such as finance, marketing, and cybersecurity. This study conducts a comprehensive evaluation, which demonstrates the superior performance of EPFHD-RARMING compared to traditional predictive models in identifying key factors contributing to heart disease, in terms of interestingness, explainability, and comprehensiveness of insights. The results underscore the potential of this innovative approach to revolutionize our understanding and prediction of heart disease, ultimately contributing to more effective and personalized healthcare solutions. This research emphasizes the importance of rare association rule mining in medical analytics and paves the way for future studies to explore and utilize these techniques across diverse domains.

Similar content being viewed by others

research paper of heart diseases

Predicting heart failure onset in the general population using a novel data-mining artificial intelligence method

research paper of heart diseases

Empirical exploration of whale optimisation algorithm for heart disease prediction

research paper of heart diseases

Harnessing EHR data for health research

Introduction.

The World Health Organization (WHO) identifies cardiovascular diseases (CVDs) as the foremost cause of mortality globally, presenting a significant global health challenge 1 . This issue extends beyond high-income nations and is increasingly prevalent in low- and middle-income countries, where it imposes a substantial burden on individuals, health systems, and economies 2 . CVD risk factors are widespread throughout diverse populations, regardless of age, gender, socioeconomic status, or geographical location, highlighting the urgency of addressing this public health crisis.

While the WHO and other international organizations, such as the World Heart Federation (WHF), have been pivotal in raising awareness and shaping interventions, the global response to CVDs involves a complex interplay of surveillance, policy-making, and community engagement. Efforts to mitigate the impact of CVDs are multifaceted, ranging from the implementation of educational programs that promote healthy lifestyles to the strategic planning of global health initiatives 3 . The recognition of CVDs as a major global health issue by the WHO is supported by the widespread occurrence of CVDs and their risk factors, as well as the coordinated efforts of international health agencies to address this concern 3 , 4 . Addressing the challenge of CVDs requires a comprehensive approach that includes prevention, control, and global collaboration to reduce the associated morbidity and mortality 4 .

Machine learning (ML) techniques 5 , 6 have shown immense potential across various domains, revolutionizing industries by providing enhanced data analysis, predictive capabilities, and automation. In finance, ML algorithms aid in fraud detection, risk assessment, and algorithmic trading. In marketing, AI-driven analytics offer personalized customer experiences and targeted advertising. The healthcare sector benefits from ML in areas such as disease prediction, patient monitoring, and personalized treatment plans.

In the context of heart disease, a significant global health concern leading to substantial morbidity and mortality rates, the importance of ML becomes particularly evident. The ability to process and analyze vast amounts of medical data enables the early detection of CVDs, allowing for timely and targeted interventions. These methods help in identifying subtle patterns and risk factors that may not be apparent through traditional diagnostic methods. For instance, ML algorithms can analyze electronic health records, imaging data, and genetic information to predict the likelihood of heart disease with remarkable accuracy. This predictive power not only aids in early diagnosis but also in the development of personalized treatment plans, ultimately improving patient outcomes and reducing healthcare costs. These techniques 7 , 8 , 9 , 10 encompass a range of algorithms and models that can analyze complex medical data to identify patterns and predict health risks with high precision.

However, the interpretability and explainability of ML models 11 , particularly in the context of early detection and diagnosis of heart disease, remain significant concerns 12 . While ML models offer substantial advancements in identifying patterns within complex datasets for disease prediction, the opacity of these models can undermine trust and hinder their practical adoption in healthcare settings 12 . This lack of transparency may lead to ethical dilemmas and challenges in clinical decision-making, as healthcare practitioners and patients may not fully understand the reasoning behind AI-driven predictions.

Association rule mining (ARM) is a widely recognized and highly interpretable data mining technique that reveals hidden patterns and correlations among various factors 13 . Its prominence, ease of interpretation, and ability to extract valuable knowledge make it an excellent tool for real-world applications such as market basket analysis and web traffic analysis. However, despite its potential, ARM has not been widely adopted in the field of medicine. This is unfortunate, as association rules can identify every pattern in a given dataset, which is highly beneficial for clinical data analysis. By utilizing association rules, clinicians can quickly and automatically make well-informed diagnoses, extract valuable information, and develop essential knowledge bases.

Despite the advantages of ARM, it presents some challenges. One significant challenge is the generation of many irrelevant and repetitive rules. Furthermore, the most interesting rules often have low support values, referred to as rare rules. Low support thresholds can result in an overwhelming number of rules, complicating their management and analysis. Therefore, appropriate methods are necessary to determine the usefulness of the rules and to identify the most relevant ones 14 .

Rare association rule mining is crucial in enhancing the interpretability and explainability of data, particularly in the context of early detection and diagnosis of heart disease. By identifying infrequent but significant patterns within medical datasets, this technique can uncover subtle correlations between symptoms and heart disease that may not be apparent through traditional frequent pattern mining methods 15 , 16 , 17 . Interestingly, while rare association rule mining provides valuable insights, it also presents challenges such as the need for efficient algorithmic approaches to handle the balance between finding rare patterns and managing the vast number of potential rules generated 17 , 18 . Moreover, the application of this technique in medical datasets for heart disease diagnosis requires careful consideration of the rarity and relevance of the associations to ensure clinical utility. Hence, rare association rule mining plays a crucial role in the early detection and diagnosis of heart disease by enabling the discovery of rare but clinically relevant patterns that may improve interpretability and explainability. The technique’s ability to process and analyze complex medical data can lead to the identification of early indicators of heart disease, potentially leading to timely interventions and better patient outcomes. The challenges associated with this mining technique, such as creating a large amount of patterns, noise, and finding only the interesting rules, must be addressed to fully utilize its potential.

To address these challenges effectively, we propose EPFHD-RARMING (Exploring the Predictive Factors of Heart Disease using Rare Association Rule MinING). Our proposed solution aims to discover the factors contributing to heart disease without generating an excessive number of rules. Instead, we focus on generating only the relevant and interesting rules. The EPFHD-RARMING model incorporates two types of patterns, namely frequent and rare, to generate compelling and meaningful rules. Frequent patterns represent well-known or anticipated patterns, reflecting our existing knowledge (set of beliefs). By leveraging these patterns, we identify a set of rare rules that deviate from the established beliefs, with significantly lower support when additional features are considered. This approach allows us to explore predictive factors and their related symptoms leading to heart disease.

Feature selection has gained extensive attention in recent years due to its significant role in identifying the most important features for model predictions 19 , 20 . However, focusing solely on features and their impact on model predictions misses the importance of determining patterns associated with these features that lead to predictions. Therefore, in our proposed model, we emphasize not only feature selection but also the patterns that may indicate the development of heart disease when these features (symptoms, in our case study of heart disease) are present.

To the best of our knowledge, this study represents the first attempt to utilize simple yet powerful rule mining algorithms to extract symptoms and identify patterns indicative of future heart disease development. The rules generated by our model have the potential to significantly assist clinicians in making informed decisions for the early detection and treatment of heart disease. Our primary objective in this paper is to generate rules that are both insightful and applicable for predicting heart disease, thereby enhancing the explainability and transparency of predictive models.

The main motivations for our research are as follows:

Early detection enhancement: Early detection of heart disease is crucial for reducing mortality rates. Current methods often rely on intricate models that lack interpretability. Our research aims to address this issue by employing rule mining algorithms that provide clear and actionable insights for clinicians.

Improving clinical decision support: Clinicians require tools that facilitate rapid, well-informed decisions. Our model’s primary goal is to generate precise and relevant rules to support clinical decision-making processes, leading to improved patient outcomes.

Enhancing data-driven approaches: With the rise of health data availability, there is an urgent need to effectively utilize this data. Our research focuses on uncovering patterns that are not immediately apparent through conventional analysis methods.

Promoting transparency and trust in ML models: One of the significant challenges in ML models for healthcare is the lack of transparency. By using rule-based models, our research seeks to promote transparency, making it easier for healthcare professionals to understand and trust the outcomes.

For this paper, the main contributions are as follows:

Innovative rule extraction: Our model, EPFHD-RARMING, specifically addresses the challenge of traditional association rule mining, which often produces an excessive number of low-support rules. It extracts a meaningful set of association rules from extensive data, focusing on those that are truly insightful and relevant, thus mitigating the common issue of rule quantity overwhelming quality.

Critical factor identification: Our model effectively uncovers pivotal factors and symptoms of heart disease. Using advanced analytics, it prioritizes the most significant variables associated with cardiovascular risks, thereby enhancing early detection and intervention strategies.

Predictive vulnerability analysis: This approach diverges from conventional models by identifying not only conditions directly linked to heart disease but also seemingly healthy states that may lead individuals to future health risks. This predictive analysis of vulnerabilities provides a deeper, more detailed understanding of potential health trajectories.

Comprehensive data exploration through unsupervised learning: Utilizing the unsupervised tool of Association Rule Mining (ARM), our methodology offers a more complete exploration of datasets to identify overlooked patterns and factors. This comprehensive analysis aids in understanding the complex interactions between variables and heart disease. Furthermore, the rule-based approach enhances interpretability and usability, particularly in clinical settings, making the findings accessible and actionable for medical professionals.

This paper is organized as follows: “ Background ” defines association rules and related concepts. Section “ Related work ” reviews prior studies on predicting heart disease. Section “ Dataset: heart disease ” describes the characteristics of the heart disease dataset used in our research. Section “ The proposed model: EPFHD-RARMING ” details the proposed solution. Section " Experimental results " presents the results of our experiments with the medical dataset. Finally, Section " Discussion " concludes the paper with a discussion of future research directions.

To fully understand the proposed work, it is imperative to review the key concepts and definitions of association rule mining. Association rule mining 21 is an unsupervised learning technique that seeks to uncover hidden patterns in a dataset. It is based on “if-then” logic, also known as association rules. A rule has two components: an antecedent (if) and a consequent (then), both of which are sets of items. For example, using a heart disease dataset, an association rule could be “if ’asymptomatic’, ’fasting blood sugar’ = 1, ’man’, then heart disease,” indicating that patients with chest pain type = ’asymptomatic’, fasting blood sugar = 1, and sex = ’man’ are more likely to have heart disease.

The process of association rule mining comprises two primary steps:

Identifying interesting patterns: A pattern is a set of items that appear together in a dataset and is considered interesting if it satisfies a threshold constraint.

Generating association rules: Association rules are generated from patterns obtained in the first step by splitting them into an antecedent and a consequent, and then evaluating their quality using metrics such as support, confidence, and lift.

To clarify concepts and terms associated with association rule mining, we provide the following formal definitions 22 .

Let \(I = \{i_1, i_2, \ldots , i_n\}\) be a set of n unique items and \(DB = \{ T_1, T_2, \ldots , T_m\}\) be a set of m transactions called a dataset. Each transaction \(T_i \subseteq I\) consists of one or more items from I . An association rule is an implication of the form \(X \rightarrow Y\) , where \(X \subseteq I\) , \(Y \subseteq I\) , and \(X \cap Y = \emptyset\) . Here, X is called the antecedent of the rule and Y is called the consequent of the rule.

Two key metrics are widely used to assess the quality of an association rule: support and confidence .

Support (Supp) : This metric measures the frequency or proportion of transactions that contain both X and Y . It is defined as:

where \(\sigma (X \cup Y)\) is the number of transactions that contain both X and Y , and m is the number of transactions in the dataset. Additionally, the support of X is defined as:

where \(\sigma (X)\) is the number of transactions that contain X .

Confidence (Conf) : This metric measures the conditional probability or strength of the rule, indicating how often Y appears in transactions that contain X . It is defined as:

where \(\text {Supp}(X \rightarrow Y)\) is calculated as in Eq. ( 1 ), and \(\text {Supp}(X)\) is calculated as in Eq. ( 2 ).

In general, a high support value indicates that the rule is broadly applicable across the dataset, while a high confidence value suggests that the rule is reliable and has strong predictive power.

Next, we review some key definitions related to association rule mining. These will help us better understand and interpret the results and implications of our proposed approach.

Definition 1

(Frequent and rare pattern) A pattern X whose support satisfies a user-specified support threshold, minSup , is called a frequent pattern, such that \(\text {Supp}(X) \ge minSup\) . In contrast, a pattern X that does not satisfy minSup is called a rare pattern.

In our work, we utilize a formal definition of frequent patterns as collections of symptoms that frequently co-occur and are well-known and expected. These patterns may either signify the presence of heart disease or not, as both possibilities are reasonable given their frequent appearance in the dataset. On the other hand, rare patterns are characterized as sets of symptoms that generate unusual and infrequent rules. The most interesting patterns are those that are rare and may potentially contribute to the development of heart disease.

Definition 2

(Strong association rule) An association rule \(X \rightarrow Y\) is called strong if its \(\text {Supp}\) and \(\text {Conf}\) measures satisfy a specified minimum support threshold ( minSup ) and minimum confidence threshold ( minConf ), respectively.

Definition 3

(Unexpected association rule 23 ) An association rule (rare) \(X \rightarrow Y\) is unexpected with respect to a known (frequent) rule \(A \rightarrow B\) if the following conditions are met:

The antecedents of the rules (i.e., A and X ) are statistically significant on DB and have high similarity (above a given similarity threshold).

The consequents of the rules (i.e., B and Y ) are opposite to each other.

The traditional support-confidence model of frequent pattern generation has become widely popular because of its simplicity. Raw frequency counts and conditional probabilities are very helpful when it comes to supporting the claim and determining the confidence level. Despite this, the frequency of patterns does not always correspond to the most interesting patterns, as stated in a study 24 . Our approach addresses this limitation by using multiple commonly used metrics (lift, leverage, and conviction) to assess the generated rules. We combine these statistical measures with the previous definitions to find rules that are considered interesting. The metrics are defined as follows:

Lift 25 : It is a measure of how much more likely the antecedent and consequent of a rule are to occur together than expected if they were independent. It is defined as the ratio of the observed support of the rule to the expected support if the antecedent and consequent were independent. The formula for lift is:

Leverage (lev) 25 : It is a measure of the difference between the observed support of the rule and the expected support if the antecedent and consequent were independent. It is defined as the difference between the observed support of the rule and the product of the supports of the antecedent and consequent. The formula for leverage is:

Conviction (conv) 25 : It is a measure of how much the antecedent of a rule implies the consequent, or how often the rule would be incorrect if the antecedent and consequent were independent. It is defined as the ratio of the expected frequency of the antecedent occurring with an alternative consequent to the observed frequency of incorrect predictions. The formula for conviction is:

Previous research has generally assumed that generating association rules from patterns is a straightforward process. However, this assumption is not necessarily accurate, as the primary objective of discovering patterns is to create meaningful rules. The sheer number of rules generated from patterns can be quite large, making analysis costly and impractical. This problem becomes even more complex when attempting to identify interesting rules among the rare ones. Therefore, identifying interesting rules from such a large set of rules is a significant challenge. For instance, our heart disease dataset consists of only 1190 transactions, yet it yields 448,981 rare and frequent rules, as illustrated in Fig.  1 . Our proposed solution addresses this challenge by producing only the interesting and unexpected rules that can aid clinicians in determining the likelihood of heart disease based on the given symptoms.

figure 1

A comparison of the number of frequent and rare rules generated from the heart disease dataset.

Heart disease is one of the primary causes of death worldwide, underscoring the importance of early detection to save lives. However, accurately diagnosing and predicting the factors contributing to heart disease remains a significant challenge. This study aims to identify rules that comply with Definitions 1 – 3 . Discovering such rules enables the extraction of surprising association rules from medical data. These surprising association rules are particularly valuable because they contradict established knowledge or expectations, thereby uncovering novel and insightful information. However, not all surprising rules are equally interesting or relevant. Therefore, a key question arises: how can we identify and evaluate the most compelling and significant rules for heart disease detection?

The objective of this paper is to propose a framework that addresses the question of generating and evaluating surprising association rules using a variety of metrics. The framework introduces various metrics, such as support, confidence, lift, leverage, and conviction, which are essential in assessing the importance of the identified rules. By considering these metrics, we aim to emphasize the significance of the generated rules and minimize the possibility of errors in detecting heart disease.

In summary, this paper presents a novel framework for generating and evaluating association rules from medical data, specifically targeting heart disease detection. By integrating multiple statistical measures, our approach aims to uncover meaningful patterns that can significantly enhance the understanding, early detection, and treatment of heart disease. The ultimate goal is to provide a methodology that aids clinicians in making informed decisions, thereby improving patient outcomes and contributing to the broader field of cardiovascular research.

Related work

In spite of the significant challenges presented by heart disease, which remains the leading cause of death worldwide, machine learning techniques have greatly assisted in the analysis of clinical data. These techniques make use of the vast amount of healthcare data that is readily available and have become powerful decision-making and forecasting tools.

Various studies have explored the potential of machine learning in predicting heart disease. In a study 26 , the Random Forest algorithm emerged as the most accurate method for predicting heart disease. Another study 27 proposed an innovative approach that combines various features and classification techniques to enhance prediction accuracy. In a research 28 , machine learning techniques for heart disease prediction were reviewed, revealing a variety of data mining strategies with varying degrees of effectiveness and accuracy. Similarly, a study 29 performed a comprehensive review of different machine learning techniques, including Artificial Neural Networks, Decision Trees, Fuzzy Logic, K-Nearest Neighbours, Naïve Bayes, and Support Vector Machines, in the context of heart disease prediction.

Furthermore, extensive research has been conducted to predict and evaluate the risk factors associated with heart disease. In a study 30 , various machine learning algorithms, such as logistic regression and KNN, were used to predict and classify patients with heart disease. Another study 31 utilized the optimized LightGBM classifier with improved hyperparameters and a focal loss function optimized through OPTUNA. This model, evaluated on CVD data from the Framingham Heart Institute, achieved an AUC value of 97.8%, outperforming other comparative models in terms of accuracy.

A novel Recommendation System for CVD Prediction Using an IoT Network (DEEP-CARDIO) was proposed in another study 32 , offering prior diagnosis, treatment, and dietary recommendations for cardiac diseases. This system collects data from four biosensors (ECG, pressure, pulse, and glucose) and processes it using an Arduino controller. The BiGRU attention model diagnoses and classifies CVD into five categories, achieving an overall accuracy of 99.90%. Furthermore, the QMBC technique, which employs the Quine McCluskey method to derive the Minimum Boolean expression for the target feature, was introduced 33 . By combining predictions from seven classifiers, the ensemble model forms a comprehensive dataset to apply the minimum Boolean equation with an 80:20 train-to-test ratio. The proposed QMBC model demonstrates superior performance compared to current state-of-the-art models and previously suggested methods, indicating its potential for improved cardiovascular disease prediction.

Although many machine learning techniques have been proposed for the early detection and diagnosis of heart disease, clinicians often struggle to trust these models due to their lack of interpretability. This difficulty in understanding the basis for the predictions compromises the reliability and acceptance of these models. To address this issue, it is essential to focus on developing transparent and interpretable models that enable clinicians and patients to comprehend the underlying mechanisms and have confidence in the model’s predictions. Several studies have investigated the utilization of rule-based methods, particularly association rule mining, in the domain of heart disease detection.

A novel methodology and algorithm for mining distributed medical data sources using association rules, specifically focusing on predicting heart disease, was presented in a study 34 . Another study 35 utilized association rule mining to uncover concealed patterns related to frequently occurring heart diseases within the Bangladeshi population. Associative classification mining was employed in another research 36 to construct a classifier with prediction rules of high interestingness values for accurate heart disease prediction. An enhanced association rule mining approach for detecting coronary artery disease using a heart disease dataset was introduced in a study 37 .

While current methods focus on improving prediction accuracy and identifying factors that contribute to cardiac disease, several limitations persist. One significant challenge is managing unlabeled data, which is crucial for developing robust and comprehensive models. Additionally, these approaches often fail to explore the relationships between various symptoms and heart disease-causing factors, potentially overlooking critical indicators.

Many recent studies 38 have generated rules based on frequent patterns, resulting in predictable and well-known outcomes. Despite their utility, these studies often produce an overwhelming number of rules, making analysis and interpretation costly. To address this limitation, we introduce a novel modeling approach designed to generate a limited number of insightful and interesting rules, enhancing both the efficiency and effectiveness of rule analysis.

Association rule mining, particularly for rare patterns, is essential yet challenging for making decisions about heart disease. In this work, we propose a novel method that not only identifies factors leading to heart disease but also uncovers patterns that may indicate future disease development. Our approach uses frequent patterns as a foundation for discovering interesting patterns associated with heart disease. We developed a model to identify these patterns and their potential to lead to heart disease when combined with specific risk factors.

Dataset: heart disease

The purpose of this section is to provide a brief overview of the heart disease dataset utilized in our research. Our study aims to identify predictive factors that lead to heart disease and to analyze patterns among healthy patients who may develop heart disease in the future. The dataset used in this research was obtained from the well-established IEEE DataPort 39 .

In this study, several popular heart disease datasets are combined to create a comprehensive dataset that was previously unavailable. This new dataset comprises 1190 instances and 12 common features, making it the largest heart disease dataset currently available for research. The dataset was curated from five different sources: Cleveland, Hungarian, Swiss, Long Beach VA, and Statlog (Heart). By integrating these datasets into a single resource, we aim to facilitate the advancement of machine learning and data mining algorithms related to heart disease. This comprehensive dataset will enable researchers to develop more accurate and effective methods for detecting and preventing heart disease.

Tables 1 and 2 present the characteristics of the dataset. Table 1 summarizes several key characteristics, while Table 2 provides a description of the nominal attributes.

To ensure the highest level of data quality and consistency, a rigorous preprocessing pipeline was developed, which included several crucial steps such as handling missing values and standardizing the representation of data. Our dataset did not contain any missing or null values, and a value of 0 was only found once in an instance where St slope was 0. As it did not contribute to pattern or rule generation, we removed it, resulting in a total of 1189 transactions. However, since our proposed work involves unsupervised techniques rather than classification tasks, it is crucial that we perform preprocessing and feature selection that are tailored to our objectives. These processes will be described in detail in the following sections.

The proposed model: EPFHD-RARMING

Our proposed model, EPFHD-RARMING, builds upon previous work 40 and aims to generate rules that facilitate the early detection of heart disease and predict the factors contributing to its development. This is achieved through a three-phase process, specifically designed as a case study for heart disease. Our approach allows for the identification of rare but significant associations that traditional methods often overlook, providing deeper insights into the factors leading to conditions like heart disease. The workflow of this model is illustrated in Fig. 2 . Utilizing our method for enhancing rule-based machine learning in medical datasets, we developed and implemented the Mine Interesting Rules Algorithm 1. This algorithm systematically mines interesting rules from a dataset through three main phases, ensuring comprehensive analysis and interpretation.

figure 2

EPFHD-RARMING model for detecting heart disease risk factors.

figure a

Mine interesting rules.

Explanation of the algorithm

In this subsection, we explain the proposed algorithm and demonstrate its functionality. Algorithm 1 outlines a process for mining interesting rules from a dataset through three main phases.

Phase 1: Data preparation and cleaning

In the first phase, the dataset ( ds ) is prepared to handle missing values, outliers, and noise, followed by feature selection and data transformation to make it suitable for ARM. In lines 1–2, the algorithm starts by defining the inputs ( ds , minSup , minRare , simT ) and the expected output, which is a list of interesting rules. Here, ds stands for dataset, minSup for minimum support threshold, minRare for minimum rare support, and simT for similarity threshold.

In line 3, the dataset is cleaned to handle missing values, outliers, and noise, ensuring data quality for further analysis. In line 4, after data cleaning, the dataset undergoes feature selection and data transformation to prepare it for ARM, making the data suitable for pattern discovery.

Phase 2: Pattern discovery and rule extraction

From lines 5 to 12, the second phase involves pattern discovery and rule extraction. In line 5, frequent patterns are identified using a minimum support threshold ( minSup ). From lines 6 to 8, frequent rules are generated and filtered based on specified metrics and categorized into two kinds of rules based on their consequent values (“Yes” for heart disease and “No” for healthy). Similarly, rare patterns are found using both minimum support ( minSup ) and minimum rareness ( minRare ) thresholds, and corresponding rare rules are generated and filtered from lines 9 to 12.

Phase 3: Insightful rule identification and interpretation

In the final phase, from lines 13 to 22, interesting rules are identified by comparing rare rules with frequent ones. The similarity between the antecedents of rare rules (with a “Yes” consequent) and frequent rules (with a “No” consequent) is calculated. If the similarity exceeds a specified threshold ( simT ) and the consequences differ, the pair of rules is considered interesting and added to the list of interesting rules. The algorithm ultimately returns this list of interesting rules.

By following these detailed steps, the algorithm effectively cleans and transforms the data, discovers frequent and rare patterns, generates and filters rules, and identifies interesting rules for further analysis.

In the following subsections, a thorough explanation of each phase of the model will be provided, including the data preparation and transformation phase, Pattern discovery and rule extraction phase, and insightful rule identification and interpretation phase. The ultimate goal of this model is to provide a comprehensive understanding of its operation, with the aim of aiding in the early detection of heart disease and predicting the factors that contribute to its development.

Data preparation and transformation phase

This section will concentrate on the preprocessing phase, which entails transforming the dataset from a supervised classification task to an unsupervised association rule mining task. The preprocessing phase is crucial in preparing the heart disease dataset for the mining process. We will examine in detail the two key steps of the preprocessing phase, as outlined in the initial phase of our workflow and described in lines 1–2 of the proposed Algorithm 1.

Selection of features: In this stage, we employ various techniques to identify the most pertinent factors contributing to heart disease. This involves selecting a subset of the most informative features for the subsequent mining process. Appropriate feature selection can enhance the quality and efficiency of the mining process. Consequently, the feature selection process enables us to identify the most crucial attributes related to heart disease. It is a critical step in the knowledge discovery process that can help refine the understanding of heart disease and its associated factors.

Dataset transformation: The transformation of the dataset for heart disease is crucial to ensure that it is suitable for mining association rules. The ideal format for mining association rules is a boolean transactional representation. This representation involves each instance being represented as a set of items, where each item represents a selected feature. The value of each item is either present (1) or absent (0). By undertaking this transformation, the dataset is prepared for further mining to generate association rules.

The following information provides a detailed description of the steps taken for the heart disease dataset that was utilized in this research:

Selection of features

The dataset for cardiovascular diseases contains 12 features, as illustrated in Table 1 . To determine the essential features that contribute to cardiovascular disease, we have implemented a reliable feature selection process utilizing five distinct approaches. It is crucial to incorporate all relevant features that impact the heart disease to gain a comprehensive understanding of this condition. Through the following selection methods, we derived a final set of 10 features from the 12 features in the heart disease dataset. To choose the most crucial feature for the mining process, we employed the following Scikit-learn feature selection techniques:

Feature selection using the chi-squared statistic.

Feature selection using ANOVA F-value.

Features selected through mutual information.

Features selected via Recursive Feature Elimination with logistic regression.

Features selected based on random forest feature importance.

Figure 3 illustrates the significance of the selected features using all the approaches employed in this paper. According to the graph, the features ’ST slope’ and ’oldpeak’ appear to have the greatest influence on the outcome variable, as they are included by all approaches. ’Max heart rate’ , ’exercise angina’ , and ’chest pain type’ occupy a secondary position in terms of importance, being favored by four out of five applied methods. The selection of ’cholesterol’ is made by three of the five methods, while the selection of ’age’ and ’sex’ is made by two methods, and the selection of ’fasting blood sugar’ is made by a single method.

figure 3

Key features contributing to the prediction of heart disease as identified by multiple feature selection methods.

To comprehensively address the majority of the important factors, all these features are incorporated into our approach. Therefore, the following features were chosen for this study: {’ST slope’, ’age’, ’chest pain type’, ’cholesterol’, ’exercise angina’, ’fasting blood sugar’, ’max heart rate’, ’oldpeak’, and ’sex’} . These features represent the union of the top features from each feature selection method. Additionally, the class feature ’target’ is included. As a result, 10 of the 12 features are utilized in our paper.

Undertaking extensive feature selection and concentrating our examination on the chosen features, we endeavor to gain a thorough understanding of the aspects that significantly influence the prevalence of heart disease.

Dataset transformation

To ensure the mining process is effective, it is essential to present all features in the dataset related to heart disease in a binary format. Four continuous attributes, namely ’age’, ’cholesterol’, ’max heart rate’, and ’oldpeak’, must be discretized to convert continuous data into discrete categories or bins. The process for discretizing these four features is as follows:

The ’age’ feature is divided into three bins: ’young’, ’middle-aged’, and ’elderly’. The bin edges are specified as [0, 30, 60, np.inf], where ’np.inf’ represents infinity.

The ’cholesterol’ feature is discretized into three bins: ’chollow’, ’cholnormal’, and ’cholhigh’. The bin edges are defined as [ \(-1\) , 200, 240, np.inf].

The ’max heart rate’ feature is discretized into three bins: ’heartratelow’, ’heartratenormal’, and ’heartratehigh’. The bin edges are specified as [0, 100, 160, np.inf].

The ’oldpeak’ feature is discretized into three bins: ’oldpeaklow’, ’oldpeakmoderate’, and ’oldpeakhigh’. The bin edges are specified as [− np.inf, 1.0, 2.0, np.inf].

Transforming continuous variables into discrete categories streamlines data representation and expedites subsequent data analysis during the mining process. To prepare the dataset for analysis, it is crucial to convert the data into a binary format, such as [0, 1] or True/False. The TransactionEncoder() method is utilized for one-hot encoding, resulting in a final dataset consisting of 1189 rows and 28 dimensions. A representation of the first 5 rows of the preprocessed dataset is depicted in Fig.  4 .

figure 4

Dataset after preprocessing phase.

Pattern discovery and rule extraction phase

The objective of this subsection is to detail the process of pattern and rule generation, a critical component of the proposed model, as depicted in lines 5–12 of the Algorithm 1. This phase is crucial for identifying and extracting meaningful relationships within the data, permitting the discovery of both frequent and rare patterns that contribute to the overall analysis.

Pattern generation

During the process of pattern discovery, we employ a formal technique to identify relevant patterns, which subsequently generate association rules. This process involves the exploration of both frequent and rare patterns present within the dataset. To accomplish this, we utilize specialized algorithms designed for each type of pattern. The FP-growth algorithm is employed to uncover frequent patterns 41 . This algorithm is well-known for its capacity to produce patterns that satisfy a specified support threshold. By using this algorithm, we are able to pinpoint patterns that frequently occur and hold significance within the dataset. These patterns represent commonly recognized phenomena and expected information.

In addition to frequent patterns, we also uncover rare patterns, which provide distinctive insights. To achieve this, we utilize the Rare Pre-Post (RPP) algorithm 42 . This algorithm enables us to identify rare patterns within the dataset that occur infrequently but are intriguing and valuable.

By merging FP-growth for frequent pattern mining with RPP for rare pattern mining, we can generate a comprehensive set of patterns. This approach allows us to produce a complete set of association rules, encompassing both rare and frequent rules. As a result, we acquire meaningful insights and extract valuable knowledge from the data.

Rule generation

After analyzing frequent and rare patterns, we can derive association rules. These rules offer valuable insights into the relationships and dependencies between various items and attributes in a dataset. By examining these rules, we can gain a thorough understanding of the underlying patterns and associations in the data.

Our model generates two types of rules: frequent rules and rare rules. Frequent rules represent significant associations within the dataset, determined by their high support and satisfaction of multiple statistical metrics. In this study, we concentrate on rules where the consequent signifies healthy patients without heart disease. These frequent rules expose health attributes or factors connected to good health and the absence of heart disease.

On the other hand, rare patterns lead to rules with low support but significant statistical relevance based on the metrics we applied. The inspection of these rare rules provides insights into unique factors associated with heart disease. In this study, rare rules are used to represent non-healthy patients, particularly those with heart disease.

Analyzing both frequent and rare rules provides a comprehensive understanding of the associations and dependencies present in the data, enabling the determination of which attributes or factors contribute to good health and which indicate heart disease. This dual analysis allows for the extraction of valuable knowledge from the data and the making of informed decisions based on identified patterns.

Insightful rule identification and interpretation

The third and final phase of our proposed model, detailed in lines 13–22 of Algorithm 1, focuses on generating and interpreting interesting rules. This critical phase aims to identify the features that contribute to heart disease and uncover meaningful rules. By establishing a set of beliefs-frequent rules that represent healthy patients without heart disease-we aim to identify individuals with heart disease whose characteristics differ subtly from those of healthy individuals. This approach allows us to highlight the distinct features associated with heart disease, providing valuable insights for early detection and targeted interventions.

  • Interesting rules

By identifying rare rules with low support that satisfy all statistical metrics based on the background information provided, we can further refine the set of interesting association rules. In this study, we aim to identify those rare rules that deviate from the common (frequent) rules representing healthy individuals. To determine the interestingness and unexpected nature of these rules, we consider the following factors:

Similarity of antecedents : We assess the similarity between the antecedents of rare rules and frequent rules using a similarity measure, such as the Jaccard similarity approach.

Contrasting consequences : We evaluate whether the consequences of the rare rules contrast with those of the frequent rules. For a rule to be considered interesting, the consequences should oppose each other.

Low support : The rare rule (Rrule) must meet all predefined metrics, particularly having low support. This indicates its deviation from the normal (frequent) rule (Frule).

By incorporating these criteria, we can filter and prioritize rare rules that exhibit low support, deviate from normal patterns, have similar antecedents, and contrasting consequences with frequent rules. These refined rules offer valuable insights into the underlying patterns and deviations within the dataset, enhancing our understanding of exceptional cases and unexpected associations.

Explainability

The utilization of association rule mining, a data mining approach based on rules, is of paramount importance due to its interpretability and ease of comprehension. In our research, we have presented a summary of the noteworthy rules generated by our model, emphasizing the elements that contribute to heart disease. To increase the clarity and understanding of these association rules, our model is supported by extensive documentation, including tables, examples, and illustrations.

Through the identification of the factors in the rare rules that differ from our established beliefs, represented by the frequent rules, we are able to gain deeper insights into the specific factors contributing to heart disease. This knowledge is invaluable for deciphering the underlying causes of heart disease and can play a significant role in its prevention and treatment.

Experimental results

In this section, we present the outcomes of our proposed model, EPFHD-RARMING, which aims to generate concise and valuable rules for heart disease prediction. We provide a comprehensive explanation of the results, demonstrating the effectiveness and efficiency of our model in producing these desired rules without an excessive number. Through extensive analysis, we emphasize the significance of the generated rules and their potential implications in the field of heart disease prediction. The experimental results section details the findings and insights derived from our in-depth experiments and analyses, with a focus on the important rules, factors, and relationships between heart disease risk factors observed by our model. In the subsequent subsections, we elaborate on the results obtained through the use of the proposed approach, EPFHD-RARMING.

Experimental setup

In this paper, the experiments were conducted on Google Colab using the following commonly used parameters and constraints. While mining both frequent and rare patterns for pattern generation, it is essential to adhere to the following constraints.

To obtain frequent patterns, a minimum support threshold, minSup , of 0.01 was established in order to identify frequent patterns. This signifies that a pattern is classified as frequent only when it occurs in no fewer than 0.01 of the total instances within the dataset.

Rare patterns are identified by focusing on patterns with support below this minimum support, minSup , and above the minimum support, minRare , of 0.001, denoted as \(minRare = 0.001\) . Therefore, we aim to identify rare patterns with support less than minSup and support equal to or greater than minRare .

With regard to rule generation, it is necessary to adhere to several conditions to determine the criteria for compelling rules within the proposed model. These conditions are applicable to both frequent and rare rules. Thus, only rules that meet these stringent criteria are recognized in our proposed model.

Minimum support of rules : Frequent rules must have a minimum support of minSup . In same way, for rare rules, we consider rules with support less than minSup , but still exceeding minRare .

Metric requirements : To be considered a strong rule, whether frequent or infrequent, the below popular metrics must be met.

Confidence : Confidence score must exceed 0.80.

Lift : Lift must be greater than 1.

Leverage : Leverage should be greater than 0.

Conviction : Conviction should be greater than 1.

In addition, for brevity, we replace the names of columns with abbreviations, as shown in Table 3 . By equalizing full column names with their abbreviated counterparts (e.g., ’asymptomatic’ to ’asym’ and ’heart_disease’ to ’yes’), the mapping provides clarity and brevity in data representation.

Patterns generation

To identify frequent patterns, we employ the FP-growth algorithm 41 . Additionally, we use the RPP algorithm 42 to detect rare patterns that may result in unexpected outcomes. Figure  5 illustrates the results of our case study on heart disease. By applying very low support levels, the graph reveals a substantial number of rare patterns. In this phase of pattern generation, we identified a total of 81,632 patterns, of which 22,178 are frequent and 59,454 are rare.

figure 5

A comparison of the number of frequent and rare patterns generated from the heart disease dataset.

After generating both frequent and rare patterns, we proceed to derive association rules. Our approach emphasizes identifying the most valuable and significant rules while filtering out the majority of less-relevant rules derived from rare patterns. The subsequent section will detail the process of extracting these intriguing and insightful rules from the comprehensive set of patterns.

Following the generation of frequent and rare patterns as depicted in Fig.  5 , the next step involves formulating an exhaustive set of association rules. As shown in Fig.  1 , a significant number of rules, 55,307 in total are generated from the frequent patterns, where the antecedent support meets or exceeds the specified minimum support threshold ( \(minSup = 0.01\) ). Additionally, a much larger number of rules, totaling 389,531, are derived from the rare patterns, where their support falls below the minSup threshold. This substantial quantity of rules underscores a critical limitation within the domain of association rule mining and highlights the need for a methodology that facilitates the efficient identification of insightful rules.

The primary goal and challenge of this study are to develop a methodology that can identify rules revealing factors contributing to heart disease detection. Specifically, we focus on rules that are most relevant to this objective.

To mitigate the overwhelming growth of rules and address the aforementioned challenge, we undertook an extensive exploration of rule generation. To enhance understanding, the rules have been organized based on their outcomes, specifically distinguishing between those that indicate the presence or absence of heart disease. Consequently, our focus is directed solely towards rules that relate to the presence or absence of heart disease, aligning with our primary objective. As shown in Fig.  6 , these rules can be classified into four distinct categories:

Frequent rules leading to the occurrence of heart disease.

Frequent rules that emphasize health in the absence of heart disease.

Rare rules indicating the presence of heart disease.

Rare rules suggesting the absence of heart disease.

figure 6

It is essential to highlight that all the rules being considered are considered dependable, as they fulfill all the necessary requirements outlined in our experimental framework. As a result, the rules can be classified into the following four categories:

Type 1 : There are 2624 rules that are frequent and have the consequence “no heart disease”. Here is an example for such kind of rules: { ’maged’, ’fbsugar0’, ’usloping’, ’peaklow’ \(\Rightarrow\) ’No’ }. The rule’s various evaluation metrics are as follows: support (0.24), confidence (0.86), lift (1.81), leverage (0.11), and conviction (3.73) .

Type 2 : There are 3293 frequent rules with “heart disease” as their consequent. Here is an example of such kind of rule: { hrnoraml’, ’exangina1’, ’asym’, ’M’ \(\Rightarrow\) ’Yes’ }. The rule’s various evaluation metrics are as follows: support (0.21), confidence (0.92), lift (1.7), leverage (0.08), conviction (6.02) .

Type 3 : A total of 7530 rules are rare and have “no heart disease” as an outcome. For example, the rule: {’fbsugar0’, ’tangina’, ’hrnoraml’, ’exangina0’, ’dsloping’, ’M’} \(\Rightarrow\) ’No’ indicates there is no presence of heart disease. The rule’s various evaluation metrics are as follows: support (0.001), confidence (1.0), lift (2.11), leverage (0.0008), conviction (Infinity) .

Type 4 : A total of 9381 rare rules indicate the presence of “heart disease”. In the case of this type of rule, for example, { ’asym’, ’exangina1’, ’hrnoraml’, ’flat’, ’ncol’, ’elderly’, ’M’ \(\Rightarrow\) ’Yes’ }. The rule’s various evaluation metrics are as follows: support (0.009), confidence (1.0), lift (1.89), leverage (0.004), conviction (Infinity) .

Type 1 and 2 (frequent rules)

As shown in Fig.  6 , Type 1 rules consist of 2624 frequent rules associated with the consequence “no heart disease.” These rules represent healthy patients and indicate that the exhibited symptoms do not suggest the presence of heart disease. Conversely, Type 2 rules encompass frequent rules with high support that represent patients with heart disease. These rules reflect a well-established phenomenon where frequent rules exhibit a high level of support and align with specific expectations.

The insights gained from these rule types are widely recognized and can be readily interpreted by domain experts. Numerous studies have extensively examined this category of rules 38 , leading us to regard them as a set of prevailing beliefs, as they encapsulate the most commonly occurring patterns.

Our analysis focuses on Type 1 rules, which we use as the foundation for identifying unexpected and intriguing rules. We elaborate on this endeavor in the following section, “Interesting Rules,” which constitutes the principal contribution of our novel model, EPFHD-RARMING.

Type 3 and 4 (rare rules)

Typically, traditional methodologies primarily concentrate on the analysis of rules that fall under categories 1 and 2, while rules of types 3 and 4 are often not given due consideration. However, it is imperative to acknowledge that rules within categories 3 and 4 possess the potential to yield more insightful and valuable findings. Consequently, identifying noteworthy rules among this extensive collection poses a significant challenge, particularly when attempting to discover unexpected and substantial rules connected to the incidence of heart disease.

As part of our research endeavor, we undertake a comprehensive examination of these commonly overlooked rules. Our primary focus is on type 4 rules, which comprise 9381 rare rules that indicate the presence of “heart disease.” By analyzing these rules, we aim to uncover factors that contribute to the development of heart disease. Conversely, we choose to exclude the rules of type 3, which typically indicate healthy patients and exhibit low levels of support in our dataset. An extensive analysis of these rules would incur excessive costs without yielding significant insights into our primary objective: identifying patients with heart disease. For further information, these categories are depicted in Fig.  6 as rare rules.

It is the primary objective of this subsection to identify and investigate the unusual or uncommon rules that differ from those observed in cases without cardiac conditions. The interesting aspect of these rules lies in their similarity to the factors and their consequent contrast with one another, resulting in unexpected outcomes. Consequently, we analyzed both type 4 rules (the rare rules indicating heart disease) and type 1 rules (the frequent rules indicating healthy patients).

To determine the interesting rules, we employed objective metrics such as lift, confidence, leverage, and conviction, as defined in Definitions 2 and 3 . These rules, whether frequent or rare, must satisfy these metrics to be considered strong and demonstrate their objective interest.

Moreover, we explored rare rules that deviate from the normal rules (i.e., frequent rules without heart disease) due to symptoms that reduce the support of the rules. To identify such rules, we utilized the Jaccard metric and set the similarity threshold at 0.80. This allowed our study to identify patterns associated with the absence of heart disease that become rare in the presence of heart disease when another factor is introduced. Consequently, we identified a total of 163 interesting rules using our proposed model. Analyzing these rules can provide valuable insights for medical experts, particularly in identifying symptoms that may be indicative of heart disease.

Our model, EPFHD-RARMING, was successful in extracting 163 relevant rules from a vast number of rules. These rules can provide valuable insights to medical experts in their investigation of symptoms that may indicate cardiovascular disease.

Let us analyze two specific rules, denoted as “frequent” and “rare.” The first rule, represented as ’heartrate = normal’, ’oldpeak = high’, ’exercise angina = 0’, ’fasting blood sugar = 0’, ’cholesterol = high’, ’sex = female’ ==> ’Yes’ (heart disease) , is classified as a rare rule . Conversely, the second rule, expressed as ’heartrate = normal’, ’exercise angina = 0’, ’fasting blood sugar = 0’, ’cholesterol = high’, ’sex= female’ ==> ’No’ (no heart disease) , is categorized as a frequent rule . These two rules demonstrate a substantial degree of similarity, quantified at 0.83. This similarity indicates that when a seemingly healthy patient exhibits specific characteristics, including normal heart rate, absence of exercise-induced angina, normal fasting blood sugar levels, high cholesterol levels, and female gender, a flag is raised suggesting a potential risk of heart disease, particularly if their oldpeak value is high. In other words, it is important to note that a patient may develop heart disease if their oldpeak value becomes high while exhibiting the symptoms indicated in the frequent rule.

The visualization presented in Fig. 7 illustrates two critical rules derived from our heart disease dataset. These rules highlight the importance of identifying rare but significant patterns that can drastically alter prediction outcomes. The first rule, with a consequent “No heart disease,” has antecedents that include “Oldpeak = low,” “Middle age,” “High heart rate,” “Low cholesterol,” and “Fasting blood sugar = 0.” This rule has a support value of 0.02 and a confidence of 0.82, indicating that it is relatively common and reliable in predicting the absence of heart disease under these conditions.

figure 7

Visualization of the rules highlighting the significance of the ’asymptomatic’ feature in altering predictions from ’No’ to ’Yes’ for heart disease. The graph illustrates how our proposed model identifies interesting rare rules by showing changes in support and confidence when the ’asymptomatic’ condition is added. This example demonstrates the primary results of our work, emphasizing the importance of considering rare but meaningful rules in heart disease detection.

The second rule, which includes the additional antecedent “Asymptomatic,” changes the prediction to “Heart disease.” Despite having a lower support value of 0.004, this rule boasts a higher confidence of 0.83. The transformation from a “No heart disease” to a “Heart disease” consequent upon adding the “Asymptomatic” condition underscores the critical nature of this rare pattern. The similarity in antecedents between these two rules, differing only by the presence of “Asymptomatic,” makes the second rule particularly intriguing and significant for heart disease detection.

This analysis highlights how our proposed model effectively identifies interesting rare rules by examining this example, which demonstrates the primary results of our work. By focusing on such rare but valuable rules, healthcare professionals can better identify and manage patients who might otherwise be overlooked due to the rarity of these conditions. This approach not only enhances the accuracy of heart disease predictions but also contributes to a more nuanced understanding of the various factors involved. The “Asymptomatic” feature, when combined with other symptoms, can change the risk assessment from no heart disease to high risk, emphasizing its role in medical diagnostics.

Explanation and interpretation of interesting rules

The purpose of this section is to provide a thorough and comprehensive explanation of the generated intriguing rules, ensuring that they are clearly communicated and understood by the end user. A total of 163 interesting rules have been identified by our model, and their visual representation can be found in Fig.  8 . It is worth noting that the graph indicates a high similarity between frequent and rare rules, with a similarity score exceeding 0.80. Our focus is on the rare rules that deviate from the frequent rules by introducing additional symptoms, resulting in the formation of new rules with lower support but yielding more unexpected insights. For example, the labeled rules in the graph showcase both the frequent rule and the rare rules that diverge from it. Figure 8 visualizes the relationship between frequent rules and rare rules in terms of their support and similarity. The plot uses three dimensions to represent key metrics:

X-axis (frequent rule support): This axis represents the support values of frequent rules, indicating how often these rules occur within the dataset.

Y-axis (rare rule support): This axis represents the support values of rare rules, showing how often these less common rules appear within the dataset.

Z-axis (Jaccard similarity) This axis represents the Jaccard similarity between the antecedents of frequent and rare rules. A higher similarity value indicates a greater overlap between the sets of conditions that define the rules.

figure 8

163 interesting rules plotted in 3D.

The points in the plot are color-coded based on their similarity values, with the color bar on the side providing a reference for the similarity scale. The interactive nature of the plot allows users to hover over individual points to see detailed information, including the ID of the rule pair, the antecedents, consequents, support, and confidence of both frequent and rare rules, and the similarity value for each pair of rules.

In the highlighted example, the point represents a pair of rules with the following details:

Frequent rule: {Antecedents: ’Middle age’, ’Male’, ’Fast blood suger =0, ’Upsloping ST slope’, ’Non-anginal pain’, ’Exercise-induced angina = 0’}, Consequent: ’No heart disease’, Support: 0.048, and Confidence: 0.89.

Rare rule: {Antecedents: ’Middle age’, ’Male’, ’Fast blood suger =0, ’Upsloping ST slope’, ’Non-anginal pain’, ’Exercise-induced angina = 0’, ’OldPeak high’}, Consequent: ’heart disease’, Support: 0.003, and Confidence: 1.

Similarity (Jaccard): 0.857.

This specific pair of rules is significant because the addition of the antecedent ’OldPeak High’ in the rare rule changes the consequent from ’No heart disease’ to ’Heart disease’. Despite the rare rule having a much lower support value, the high Jaccard similarity (0.86) with the frequent rule indicates that the conditions for both rules are very similar. This insight is crucial as it highlights how a slight change in conditions can alter the outcome, emphasizing the importance of considering rare rules in heart disease prediction and diagnosis. The confidence levels of both rules also suggest their reliability, making them valuable for further analysis and application in medical diagnostics.

Table 4 presents the most interesting rules based on their similarity measure. It is important to note that all these rules are frequent and correspond to healthy patients, becoming rare and indicative of heart disease when an additional symptom is included. A possible explanation for the occurrence of interesting rare rules in the dataset is that adding another factor or symptom to frequent rules reduces their support and makes them rare.

Let us take rule number 7 in Table  4 to illustrate how interesting rules are generated. In the absence of the red symptom oldpeak with a high value , the frequent rule with the factors ’middle-aged’, ’high heart rate’, ’male’, ’fasting blood sugar = 0’, ’upsloping ST slope’, ’no exercise-induced angina’ ==> ’no heart disease’ suggests that individuals with these factors are generally free from heart disease. However, when a new rule, a rare one, is formed by including a high ’oldpeak’ value, the generated rule ’high oldpeak’, ’middle-aged’, ’high heart rate’, ’male’, ’fasting blood sugar = 0’, ’upsloping ST slope’, ’no exercise-induced angina’ ==> ’heart disease’ identifies patients at risk of heart disease. While the support of the frequent rule is 0.06 out of 1189, indicating that approximately 71 patients with these factors are healthy, the support of this new rare rule decreases to 0.002, implying that only 2 patients with these factors actually have heart disease.

The following section provides an in-depth analysis of the factors contributing to the development of heart disease. Our findings underscore the significance of these factors and their impact on the prediction of heart disease, further validating our proposed model, EPFHD-RARMING. This analysis not only offers deeper insights into the relationship between these factors and heart disease but also enhances the interpretability and explainability of our findings.

It is imperative to emphasize that the rules outlined in Tables  5 , 6 , 7 , 8 , 9 and 10 pertain to healthy individuals, as they represent the most frequent rules associated with no heart disease (set of beliefs) and high support when the crucial “red feature” is absent. This “red feature” serves as the pivotal element that transforms these common rules—characterized by their high support and absence of cardiac disease—into uncommon rules with reduced support when cardiac disease is present. In the following subsections, we will provide a thorough examination of these contributing factors.

ST depression induced by exercise relative to rest (oldpeak)

The significance of the oldpeak value in the examination of noteworthy rules cannot be overstated, as it is present in 69 of the 163 notable rules under investigation. The transformation from a state of good health to one of rarity with heart disease is strongly indicative of heart disease, as the transition occurs when oldpeak is high and combined with these 69 common rules.

Our model places a high emphasis on the importance of a high oldpeak value, as it is associated with nearly 40% of the interesting rules. This suggests a strong connection between a high oldpeak value and an increased risk of heart disease. This information is valuable in identifying healthy rules that include these factors and serves as a significant alarm for potential heart disease in patients with high oldpeak values.

The presence of a high oldpeak value (indicated as ’peakhigh’) is a critical factor that triggers the transition from frequent, benign patterns to rare, high-risk patterns. This shift highlights the importance of monitoring oldpeak values closely, as they can serve as early warning signs for the onset of cardiovascular disease. The transformation from frequent to rare rules not only underscores the predictive power of oldpeak but also demonstrates the utility of our model in identifying these crucial changes in health status.

In Table  5 , we present the top 10 rules that demonstrate the transformation of frequent rules (common in healthy patients) into rare rules as a result of high oldpeak values, signifying the development of heart disease. These rules illustrate the significant impact of oldpeak on cardiovascular risk and the value of our model in uncovering these patterns.

To illustrate how unexpected rare rules are generated, consider Rule 1 in Table  5 . This rule indicates that if the cholesterol level is within the normal range, the patient’s age falls within the middle range, there is a high heart rate, fasting blood sugar levels are normal, the ST depression on the Resting Electrocardiogram presents an upsloping pattern, the chest pain type is non-anginal pain, and exercise-induced angina is absent, then a diagnosis of heart disease is made when the “Oldpeak” value is high.

This example underscores the importance of analyzing various factors in conjunction with oldpeak to make early and accurate diagnoses. Effective treatments aimed at reducing mortality associated with cardiovascular diseases can be developed by understanding these contributing factors. The availability of such frequent rules, particularly for individuals at high risk of developing heart disease when “Oldpeak” (ST depression induced by exercise relative to rest) is high, is of utmost importance. Hence, high oldpeak values often indicate ischemia or reduced blood flow to the heart muscle, which is a critical factor in the development of cardiovascular disease. By identifying patterns where high oldpeak values correlate with other risk factors, healthcare providers can develop more targeted interventions to manage and mitigate these risks.

The insights provided by our model highlight the need for comprehensive evaluations that consider the interplay of multiple factors. By identifying these critical patterns, healthcare professionals can better assess and manage patients at risk, ultimately improving outcomes and reducing the burden of cardiovascular diseases.

Overall, the presence of a high oldpeak value as a significant marker in our model underscores the importance of detailed cardiovascular assessments and proactive management strategies. The rules identified by our model provide a roadmap for clinicians to follow, ensuring that at-risk patients receive the necessary care to prevent the progression of heart disease. This proactive approach can lead to earlier interventions, better patient outcomes, and a reduction in the overall incidence of cardiovascular events.

Please note that this explanation is applicable to the remaining rules found in Table  5 , emphasizing the broad applicability and significance of our findings across different patient profiles.

The slope of the peak exercise ST segment (ST slope)

The experimental results of the proposed model demonstrate that the ST Slope plays a significant role in the onset of cardiovascular disease, particularly when its value is (’flat’). According to the results, 28 out of 163 significant rare rules indicate cardiovascular disease when their ST Slope is ’flat’, compared to healthy patients (frequent rules without cardiovascular disease). This transition from frequent to rare rules signifies an increased likelihood of cardiovascular disease, indicating a critical shift in health status.

The flat ST Slope is particularly noteworthy because it reflects a significant modification in the underlying factors or conditions. A flat ST Slope during peak exercise typically indicates an abnormal response to physical stress, which can be a precursor to more serious cardiovascular issues. The occurrence of rarity along with specific rule attributes, such as a flat ST Slope, may act as a strong marker of cardiovascular risk. This requires further investigation of the causes and consequences of this transformation for health outcomes.

The top 10 interesting rules, displayed in Table  6 , illustrate how this factor determines the rules that lead to cardiovascular disease. These rules deviate from the norms as their support falls and are often missed during frequent pattern mining. For example, a rule might indicate that a patient with normal cholesterol and no other significant symptoms, when combined with a flat ST Slope, suddenly falls into a high-risk category for heart disease.

Our proposed model excels in identifying these critical deviations, uncovering hidden patterns that are not apparent through traditional analysis. The identification of a flat ST Slope as a significant risk factor for cardiovascular disease highlights the importance of this symptom in clinical assessments. By recognizing the importance of a flat ST Slope, healthcare professionals can better assess the risk of cardiovascular disease in patients who might otherwise appear healthy.

As a result, the presence of a flat ST Slope is a vital factor in our model for detecting rare but significant rules that indicate cardiovascular disease. This insight underscores the importance of considering ST Slope in comprehensive cardiovascular risk assessments. The exceptional rules identified by our model, as shown in Table  6 , provide a deeper understanding of the factors contributing to cardiovascular risk. These findings emphasize the necessity of thorough evaluations that include the ST Slope, enabling early detection and improved management of heart disease.

Type of chest pain: asymptomatic

A significant factor identified by our proposed model is the type of chest pain experienced by patients. Our experimental results have demonstrated that the occurrence of chest pain plays a crucial role in the development of heart disease. Specifically, among the 163 significant rare rules in otherwise healthy patients, 23 have been found to be at risk of heart disease when the chest pain is (’asymptomatic (asym)’). This indicates that the presence of asymptomatic chest pain, when combined with other common health indicators, significantly alters the patient’s health status, suggesting a strong likelihood of heart disease.

The presence of asymptomatic chest pain is particularly concerning because it often goes unnoticed by patients, delaying diagnosis and treatment. Our findings underscore the importance of identifying these subtle yet critical symptoms. The transition from frequent to rare rules signifies a substantial shift in health status, where the addition of asymptomatic chest pain to otherwise benign conditions results in an increased risk of heart disease.

The top 10 interesting rules, shown in Table  7 , illustrate how this factor deviates from the norms and leads to rules that express patients with heart disease, despite their rarity. These rules highlight the critical nature of asymptomatic chest pain as a determinant in the onset of cardiovascular disease. For instance, a rule might indicate that a middle-aged individual with normal cholesterol and no other significant symptoms, when combined with asymptomatic chest pain, suddenly falls into a high-risk category for heart disease.

Our proposed model is effective in discovering these important rules that contribute to the development of heart disease. By focusing on the presence of asymptomatic chest pain, our model uncovers hidden patterns that are not evident in traditional analysis. This insight is invaluable for early detection and intervention, as it identifies patients who might otherwise be overlooked due to the absence of more obvious symptoms.

Therefore, the identification of asymptomatic chest pain as a significant risk factor for heart disease is a major finding of our study. The exceptional rules identified by our model, as shown in Table  7 , provide a deeper understanding of the factors contributing to cardiovascular risk. These rules emphasize the importance of thorough clinical assessments that include the evaluation of subtle symptoms like asymptomatic chest pain. By incorporating these insights, healthcare professionals can improve early diagnosis and treatment, ultimately reducing the incidence and severity of heart disease.

Max heart rate

The importance of maximum heart rate in identifying unusual rules that diverge from typical frequent rules, which serve as standard beliefs, cannot be overstated. Our findings reveal that when the maximum heart rate is low, 12 out of 163 significant rare rules have been linked to cardiovascular disease in otherwise healthy individuals. These deviations from the norm occur specifically when their maximum heart rate (’hrlow’) is low. Consequently, these 12 unusual rules, represent deviations from the expected and are indicative of cardiovascular disease in patients.

Our proposed model has successfully uncovered these unique rules, highlighting critical contributors to the onset of cardiovascular disease despite their rarity. It is essential to note that all these distinctive rules apply to women. Furthermore, six of these distinctive rules are relevant to elderly women, specifically rules 1, 2, 8, 9, 11, and 12 as shown in Table 8 . This suggests that elderly women with a low maximum heart rate are at a particularly heightened risk of developing cardiovascular disease, emphasizing the need for targeted interventions and monitoring in this demographic.

In contrast, when the maximum heart rate is high, our model does not identify any distinct rules directly linking this symptom to cardiovascular disease. However, it is crucial to emphasize that high maximum heart rate is associated with 59 interesting rare rules. These rules, although not directly caused by a high maximum heart rate, are linked to other significant factors that contribute to the development of cardiovascular disease. These factors include a high oldpeak, asymptomatic chest pain, and various other symptoms. This indicates that while a high maximum heart rate alone may not be a direct indicator, its presence alongside other risk factors can significantly increase the likelihood of cardiovascular disease.

This dual insight-highlighting the critical role of both low and high maximum heart rates in different contexts-demonstrates the robustness of our proposed model. It underscores the importance of considering maximum heart rate in comprehensive cardiovascular risk assessments. By identifying these rare but critical rules, our model provides valuable information that can aid in early diagnosis and targeted intervention, ultimately contributing to better patient outcomes and more personalized healthcare strategies.

In summary, the presence of a low maximum heart rate is a vital factor in our model for detecting rare but significant rules that indicate cardiovascular disease, particularly in women and elderly women. Conversely, a high maximum heart rate, while not directly causal, is associated with other risk factors that collectively indicate an increased risk of heart disease. These insights from our model emphasize the importance of a holistic approach in cardiovascular risk assessment, taking into account various interrelated factors to improve the accuracy and effectiveness of disease prediction and management.

Exercise-induced angina

Our research highlights the critical role of exercise-induced angina in identifying exceptional and atypical rules that deviate from established norms. Exercise-induced angina, indicated by a value of 1, is a condition where chest pain occurs during physical activity due to reduced blood flow to the heart. This factor has proven to be significant in our study.

Our findings indicate that when (’exercise-induced angina (’exangina1’) is present, 13 out of 163 rare rules exhibit a strong association with cardiovascular disease in otherwise healthy individuals. These 13 unusual rules, detailed in Table  9 , contrast sharply with conventional norms and are effective in identifying patients with cardiovascular disease. These rules indicate that the presence of exercise-induced angina, combined with other factors, significantly alters the patient’s health status, leading to a higher risk of cardiovascular disease.

The detailed analysis of these rules reveals that the presence of exercise-induced angina, when combined with certain other health indicators, serves as a critical marker for cardiovascular disease. This demonstrates the power of our innovative model in uncovering important health insights that might be missed by conventional analysis. Despite their rarity, these rules provide valuable information for early diagnosis and prevention of heart disease.

On the other hand, our study found that when exercise-induced angina is absent (denoted by a value of 0), no exceptional rare rules are generated. This absence indicates that the lack of exercise-induced angina does not contribute to significant deviations from the norm, thus not highlighting any unusual patterns or risk factors for cardiovascular disease.

Thus, the presence of exercise-induced angina is a vital factor in our model for detecting rare but critical rules that point to cardiovascular disease. This insight underscores the importance of considering exercise-induced angina in clinical assessments and highlights its role in the early detection and management of heart disease. The exceptional rules identified by our model, as shown in Table  9 , provide a deeper understanding of the factors contributing to cardiovascular risk.

Presence of fasting blood sugar

The experimental results have revealed the crucial function of (’fasting blood sugar (’fbsugar1’) in identifying exceptional and unconventional rules that diverge from established norms, especially those associated with frequently occurring rules without heart disease. Notably, all of these rare rules apply to women, as shown in the top 10 interesting rules illustrated in Table 10 . Our findings demonstrate that when fasting blood sugar is present (indicated by a value of 1), 23 out of 163 rare rules exhibit a considerable association with cardiovascular disease in otherwise healthy women.

The presence of fasting blood sugar at elevated levels often coincides with other risk factors, such as high cholesterol and angina, particularly in women. This correlation underscores the heightened risk of developing heart disease when these factors are present. For example, women with high cholesterol and positive fasting blood sugar tests are at a significantly increased risk, especially if they also experience symptoms like angina or exercise-induced angina. This highlights the multi-faceted nature of cardiovascular risk, where the interaction between multiple factors compounds the overall risk.

Our analysis shows that these rare rules deviate significantly from conventional norms, effectively identifying female patients at risk for cardiovascular disease. This deviation from frequent patterns signifies a substantial change in the health status, indicating a critical shift towards disease when fasting blood sugar levels are high. The presence of high fasting blood sugar, as highlighted by our novel model, emerges as a crucial determinant in uncovering pivotal rules that contribute to the onset of cardiovascular disease in women. Despite their rarity and deviation from conventional norms, these rules provide essential insights for early diagnosis and intervention.

Conversely, the absence of a positive fasting blood sugar test (denoted by a value of 0) does not generate any exceptional rare rules. This suggests that normal fasting blood sugar levels do not significantly contribute to deviations from the norm, thereby not highlighting any unusual patterns or risk factors for cardiovascular disease. The absence of this factor indicates a lower risk profile, aligning with conventional medical understanding.

To summarize, the presence of fasting blood sugar is a vital factor in our model for detecting rare but critical rules that point to cardiovascular disease in women. This insight underscores the importance of considering fasting blood sugar levels in clinical assessments and highlights their role in the early detection and management of heart disease. The exceptional rules identified by our model provide a deeper understanding of the factors contributing to cardiovascular risk, particularly in female patients.

Comparison with state-of-the-art methods

Our model, EPFHD-RARMING, is compared with two state-of-the-art methods used for heart disease prediction 8 , 43 . These methods represent the current advanced techniques in the field. The focus of our comparison is on identifying the factors contributing to heart disease, the interestingness and explainability of the results, and the comprehensiveness of the insights provided. This comparison aims to highlight the strengths and unique contributions of our approach in these specific areas.

Comparison with machine learning methods

In order to assess and evaluate our model, we compared it with a recent paper 8 that used Catboost and other machine learning methods in order to detect and diagnose cardiovascular disease. Although the Catboost model achieved high accuracy, 91%, and significant F1-scores for heart disease prediction, it operates as a “black box”, making it difficult for clinicians to understand the underlying reasons for predictions. This lack of transparency can hinder trust and practical adoption in medical settings, where understanding the rationale behind predictions is crucial for effective treatment planning. In contrast, our proposed model, EPFHD-RARMING, generates clear, interpretable association rules. These rules provide medical professionals with comprehensible insights into the factors leading to heart disease, enabling them to make informed decisions and communicate effectively with patients. This interpretability is not just beneficial but essential for clinical decision-making, fostering greater confidence in the predictive model’s outputs.

While traditional ML models focus on optimizing prediction accuracy using large datasets, they often overlook rare but significant patterns. These models detect patterns present in the training data, potentially missing subtle and infrequent combinations of symptoms that could indicate future heart disease. This oversight can result in missed opportunities for early intervention. Our EPFHD-RARMING model excels at identifying rare rules with low support that deviate from frequent patterns, offering valuable insights into potential future risks. By capturing these rare but meaningful patterns, our model provides a proactive approach to disease management, highlighting vulnerable patterns that might develop into heart disease. By examining both frequent and rare association rules, we are able to gain a more comprehensive understanding of the factors associated with heart disease, thereby supporting early intervention and tailored patient care. This comprehensive approach addresses the critical need for interpretable and thorough predictive models in healthcare, ultimately leading to better patient outcomes, more informed clinical decisions, and the ability to anticipate and mitigate future health risks.

Our results also compare with another recent work 43 , which used a classification and regression tree (CART) algorithm for heart disease prediction, focusing primarily on model accuracy. While their approach identifies key risk factors through supervised learning, our novel method leverages rare rules to analyze unsupervised datasets, emphasizing interpretability and explainability. Unlike the supervised approach, our model uncovers detailed patterns and provides a comprehensive view of factors leading to heart disease, making the findings more actionable for healthcare professionals. Additionally, our model identifies patterns that may indicate future heart disease development, aiding in early detection and intervention. Our comprehensive and unsupervised approach makes our method highly adaptable to various domains and offers a more comprehensive understanding of cardiovascular health.

Identification and validation of key factors leading to heart disease

In our model, EPFHD-RARMING, we aim to uncover and highlight rare rules that contradict expectations, thereby leading to more remarkable discoveries. Our novel method is successful because it can extract interesting rules from hundreds of thousands of rules. Within this model, we employ well-established frequent rules as our grounding truth, representing widely accepted beliefs due to their high frequency of co-occurrence. Surprisingly, our model identifies specific factors that account for the transformation of common rules into rare ones, even with low support. These findings are particularly noteworthy, as demonstrated in our experiments. Several significant factors play a crucial role in the development of heart disease, including ST depression induced by exercise relative to rest (Oldpeak), the slope of the peak exercise ST segment (ST Slope), asymptomatic chest pain, low heart rate, the presence of exercise-induced angina, and fasting blood sugar . Figure  9 illustrates the factors that generate unexpected rules leading to heart disease.

figure 9

Factors contributing to the generation of interesting rules with heart disease as an outcome.

The EPFHD-RARMING model not only unveils these previously unknown associations but also illuminates the intricate interplay of these factors, providing valuable insights into the development of heart disease in otherwise healthy individuals. The factors identified by our model have been further validated by applying multiple feature selection algorithms, which consistently identify the same variables as critical contributors. The convergence of methodologies across multiple approaches demonstrates the reliability and robustness of the factors identified by our model. According to our model, the factors contributing to heart disease include ’oldpeak’, ’ST slope’, ’chest pain type’, ’max heart rate’, ’exercise angina’, and ’fasting blood sugar’ . These results align with the most prominent feature selection methods, as illustrated in Fig.  3 . This figure highlights the key features that contribute to the prediction of heart disease, as identified by multiple feature selection methods.

Additionally, a recent study 43 has confirmed the significance of these factors, further attesting to their substantial impact on predictive modeling. This external corroboration serves as a strong endorsement of the accuracy and relevance of our proposed solution. Notably, our novel model uses rare association rule mining for this purpose, which serves as a robust option for future feature selection. By identifying these rare but significant patterns, our model provides a comprehensive tool for understanding and predicting heart disease, thereby facilitating early intervention and improved patient outcomes.

Our model, EPFHD-RARMING, utilizes an unsupervised method, specifically Association Rule Mining (ARM), which enhances the credibility of our findings. The unsupervised nature of our approach underscores the independence and objectivity of the model, allowing it to uncover patterns without the aid of predefined labels. This aspect of our model makes it highly adaptable and applicable to various domains beyond health, such as finance, marketing, and any field where identifying rare patterns is crucial. The alignment of our model’s findings with established feature selection algorithms, together with supporting evidence from recent studies, lends substantial evidence to the accuracy and correctness of our proposed solution. The utilization of an unsupervised method further emphasizes the model’s ability to autonomously identify and validate crucial factors in the absence of labeled data.

In contrast to conventional approaches in identifying factors that play a major role in prediction, our proposed model has effectively identified a diverse set of notable rules that can be summarized as follows:

Frequent rules relating to heart disease: These rules closely reflect those derived from traditional methodologies, representing well-established rules associated with patients affected by heart disease.

Frequent rules facilitating early detection of heart disease: Among the vast number of rules, our proposed model, EPFHD-RARMING, identified 163 interesting frequent rules that represent healthiness. The identification of these frequent rules that deviate to rare and interesting patterns upon the occurrence of one of the critical factors (such as ’oldpeak’, ’ST slope’, ’chest pain type’, ’max heart rate’, ’exercise angina’, and ’fasting blood sugar’) helps medical experts detect patients who may be at risk of developing heart disease. These vulnerable frequent patterns that deviate aid in the early determination of potential heart disease development.

Identifying risk factors: Our model has been successful in identifying risk factors that contribute to the onset of heart disease.

The results of our model should be further investigated by domain experts and tested on more datasets to fully ascertain its effectiveness and importance. The validation of this model will assist in determining its generalizability and potential for wider application. By doing so, we can ensure that the insights provided by our model are robust and reliable, paving the way for its application in real-world scenarios.

Consequently, our groundbreaking model, EPFHD-RARMING, has demonstrated an exceptional ability to identify and prioritize rare rules that have low support, diverge from common rules, share similar antecedents, and contrast their outcomes with prevalent rules. The filtered rules provide valuable insights into the dataset’s patterns and deviations from the norm, enabling us to better comprehend exceptional cases and unforeseen associations. As a result, our paper presents a revolutionary model that transforms the discovery of heart disease. Conventional techniques often overlook critical risk indicators and fail to capture the intricate relationships between different factors. In contrast, our model, which utilizes an unsupervised tool, ARM, thoroughly analyzes the data, providing new insights into the complex cardiovascular health dynamics and yielding patterns that signify both health and potential risk. This innovative approach represents a significant breakthrough in the field, offering a more complete understanding of the multifaceted mechanisms underlying heart disease.

Conclusion and future work

This article introduces EPFHD-RARMING, an unsupervised method developed to pinpoint crucial factors leading to heart disease. In contrast to conventional supervised techniques, our strategy employs rare association rule mining to improve efficiency and specificity in identifying predictive indicators. By defining frequent rules as fundamental beliefs, we were able to isolate rare yet significant rules that shed light on the onset of heart disease. Moreover, our method identifies sensitive frequent rules, which correspond to symptoms present in healthy individuals who may develop heart disease if the factors identified by our model are triggered. This predictive capability allows for early interventions, ultimately enhancing patient care.

EPFHD-RARMING effectively overcomes the limitations of conventional association rule mining, which often generates an excessive number of rules. Our approach narrows down this vast number to a more manageable 163 rules, focusing on rare, divergent factors with low support, and highlighting unique patterns and deviations within the dataset. This method confirms the value of our model in detecting key contributors to heart disease and enhances our understanding of exceptional and unforeseen cases within medical data.

The EPFHD-RARMING model’s future potential transcends its current boundaries. To further validate its versatility and effectiveness, our objective is to implement it across a wide array of datasets. Additionally, we plan to adapt the model for big data to boost its performance and scalability. By examining irregular patterns and pinpointing their underlying causes in different populations, we anticipate gaining more comprehensive insights into the factors contributing to heart disease.

The EPFHD-RARMING model’s potential extends beyond healthcare, with applications in other sectors, such as climate change. By identifying factors and trends that precipitate severe events, it can provide crucial insights for devising effective mitigation strategies. Ultimately, our aim is to broaden the model’s applicability not only for predicting heart disease but also for offering actionable insights for prevention, treatment, and crisis management across various fields.

While EPFHD-RARMING demonstrates potential, it is crucial to address several limitations. The present study’s dataset is limited, which restricts the generalizability of our findings. Consequently, future research involving larger and more diverse datasets is needed to validate and refine the model. Furthermore, the current method can be computationally intensive, especially with larger datasets, necessitating algorithm optimization to enhance performance and scalability. Although our model is interpretative, integrating it into clinical workflows seamlessly requires further testing and validation in real-world healthcare settings. External validation with additional datasets and clinical trials is also necessary to ensure the model’s robustness and reliability. In the future, improvements could focus on enhancing the model’s computational efficiency, expanding its applicability to other medical conditions, and developing strategies for seamless integration into clinical practice. These efforts will help establish the model’s utility and effectiveness across diverse healthcare scenarios.

In summary, the EPFHD-RARMING model represents a significant breakthrough in predictive modeling, providing a comprehensive, actionable understanding of heart disease. By harnessing rare association rule mining, our methodology overcomes the limitations of conventional methods, offering a robust tool for early intervention and improved patient outcomes. Future investigations should continue to explore and expand upon these techniques, unlocking their full potential across diverse applications and contributing to substantial advancements in various domains.

Data availability

The datasets 39 that support the findings of this study are publicly available through IEEE. The heart disease research dataset is a comprehensive collection compiled from five independent sources: Cleveland, Hungarian, Switzerland, Long Beach VA, and Statlog (Heart) Data. This integration has resulted in one of the most extensive heart disease datasets available.

World Health Organization. Cardiovascular diseases (2021).

Cook, C., Cole, G., Asaria, P., Jabbour, R. & Francis, D. P. The annual global economic burden of heart failure. Int. J. Cardiol. 171 , 368–376 (2014).

Article   PubMed   Google Scholar  

Adhikary, D., Barman, S., Ranjan, R. & Stone, H. A systematic review of major cardiovascular risk factors: A growing global health concern. Cureus . 14 , 1–9 (2022).

Google Scholar  

Addressing Cardiovascular Disease - A Global Employer’s Approach to Non Communicable Diseases , vol. All Days of SPE International Conference and Exhibition on Health, Safety, Environment, and Sustainability . https://doi.org/10.2118/156849-MS .

Chen, Y., Xia, R., Yang, K. & Zou, K. Dnnam: Image inpainting algorithm via deep neural networks and attention mechanism. Appl. Soft Comput. 154 , 111392 (2024).

Article   Google Scholar  

Chen, Y., Xia, R., Yang, K. & Zou, K. Micu: Image super-resolution via multi-level information compensation and u-net. Expert Syst. Appl. 245 , 123111 (2024).

Khourdifi, Y. & Baha, M. Heart disease prediction and classification using machine learning algorithms optimized by particle swarm optimization and ant colony optimization. Int. J. Intell. Eng. Syst . 12 , 242–252 (2019).

Baghdadi, N. A. et al. Advanced machine learning techniques for cardiovascular disease early detection and diagnosis. J. Big Data 10 , 144 (2023).

Narayanan, J. Implementation of efficient machine learning techniques for prediction of cardiac disease using smote. Procedia Comput. Sci. 233 , 558–569. https://doi.org/10.1016/j.procs.2024.03.245 (2024).

Kumar, C. D. N., Raja, J. J., Manjutha, M. & Pradeep, T. Cardiovascular disease detection using machine learning technology. in Healthcare Applications in Computer Vision and Deep Learning Techniques , vol. 3 of IIP Series , 63–72. https://doi.org/10.58532/nbennurch233 (IIP Series, 2024).

Lisboa, P. J., Saralajew, S., Vellido, A., Fernández-Domenech, R. & Villmann, T. The coming of age of interpretable and explainable machine learning models. Neurocomputing 535 , 25–39 (2023).

Tripathi, R. K. P. & Tiwari, S. Unravelling the enigma of machine learning model interpretability in enhancing disease prediction. in Machine Learning Algorithms Using Scikit and TensorFlow Environments , 125–153 (IGI Global, 2024).

Luna, J. M., Fournier-Viger, P. & Ventura, S. Frequent itemset mining: A 25 years review. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 9 , e1329 (2019).

Brin, S., Motwani, R. & Silverstein, C. Beyond market baskets: Generalizing association rules to correlations. in Proceedings of the 1997 ACM SIGMOD international conference on Management of data , 265–276 (1997).

Adda, M., Wu, L. & Feng, Y. Rare itemset mining. In Sixth International Conference on Machine Learning and Applications (ICMLA 2007) , 73–80 (IEEE, 2007).

Shrivastava, K. & Jotwani, V. Study to determine adverse diseases pattern using rare association rule mining. Int. J. Sci. Res. Comput. Sci. Eng. Inform. Technol. 6 , 519–526 (2020).

Darrab, S., Broneske, D. & Saake, G. Modern applications and challenges for rare itemset mining. Int. J. Mach. Learn. Comput. 11 , 208–218 (2021).

Darrab, S., Broneske, D. & Saake, G. Ucrp-miner: Mining patterns that matter. In 2022 5th International Conference on Data Science and Information Technology (DSIT) , 1–7 (IEEE, 2022).

Chen, Y., Xia, R., Yang, K. & Zou, K. Micu: Image super-resolution via multi-level information compensation and u-net. Expert Syst. Appl. 245 , 123111. https://doi.org/10.1016/j.eswa.2023.123111 (2024).

Chen, Y., Xia, R., Yang, K. & Zou, K. Dnnam: Image inpainting algorithm via deep neural networks and attention mechanism. Appl. Soft Comput. 154 , 111392. https://doi.org/10.1016/j.asoc.2024.111392 (2024).

Agrawal, R., Imieliński, T. & Swami, A. Mining association rules between sets of items in large databases. in Proceedings of the 1993 ACM SIGMOD international conference on Management of data , 207–216 (1993).

Agrawal, R. et al. Fast discovery of association rules. Adv. Knowl. Discov. Data Mining 12 , 307–328 (1996).

Darrab, S., Bhardwaj, P., Broneske, D. & Saake, G. Opecur: An enhanced clustering-based model for discovering unexpected rules. in International Conference on Advanced Data Mining and Applications , 29–41 (Springer, 2022).

Aggarwal, C. C. et al. Data mining: the textbook , vol. 1 (Springer, 2015).

Tew, C., Giraud-Carrier, C., Tanner, K. & Burton, S. Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Mining Knowl. Discov. 28 , 1004–1045 (2014).

Article   MathSciNet   Google Scholar  

Motarwar, P., Duraphe, A., Suganya, G. & Premalatha, M. Cognitive approach for heart disease prediction using machine learning. in 2020 international conference on emerging trends in information technology and engineering (ic-ETITE) , 1–5 (IEEE, 2020).

Mohan, S., Thirumalai, C. & Srivastava, G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7 , 81542–81554 (2019).

Katarya, R. & Meena, S. K. Machine learning techniques for heart disease prediction: A comparative study and analysis. Health Technol. 11 , 87–97 (2021).

Marimuthu, M., Abinaya, M., Hariesh, K., Madhankumar, K. & Pavithra, V. A review on heart disease prediction using machine learning and data analytics approach. Int. J. Comput. Appl. 181 , 20–25 (2018).

Jindal, H., Agrawal, S., Khera, R., Jain, R. & Nagrath, P. Heart disease prediction using machine learning algorithms. in IOP conference series: materials science and engineering , vol. 1022, 012072 (IOP Publishing, 2021).

Yang, H., Chen, Z., Yang, H. & Tian, M. Predicting coronary heart disease using an improved lightgbm model: Performance analysis and comparison. IEEE Access 11 , 23366–23380. https://doi.org/10.1109/ACCESS.2023.3253885 (2023).

Yashudas, A. et al. Deep-cardio: Recommendation system for cardiovascular disease prediction using iot network. IEEE Sensors J. 24 , 14539–14547. https://doi.org/10.1109/JSEN.2024.3373429 (2024).

Article   ADS   CAS   Google Scholar  

Kapila, R., Ragunathan, T., Saleti, S., Lakshmi, T. J. & Ahmad, M. W. Heart disease prediction using novel quine Mccluskey binary classifier (qmbc). IEEE Access 11 , 64324–64347. https://doi.org/10.1109/ACCESS.2023.3289584 (2023).

Khedr, A. M., Al Aghbari, Z., Al Ali, A. & Eljamil, M. An efficient association rule mining from distributed medical databases for predicting heart diseases. IEEE Access . 9 , 15320–15333 (2021).

Sonet, K. M. H., Rahman, M. M., Mazumder, P., Reza, A. & Rahman, R. M. Analyzing patterns of numerously occurring heart diseases using association rule mining. in 2017 twelfth international conference on digital information management (ICDIM) , 38–45 (IEEE, 2017).

Lakshmi, K. P. & Reddy, C. Fast rule-based heart disease prediction using associative classification mining. in 2015 International conference on computer, communication and control (IC4) , 1–5 (IEEE, 2015).

Yadav, C., Lade, S. & Suman, M. K. Predictive analysis for the diagnosis of coronary artery disease using association rule mining. Int. J. Comput. Appl . 87 , 9–13 (2014).

Fournier-Viger, P. et al. A survey of itemset mining. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 7 , e1207 (2017).

Siddhartha, M. Heart disease dataset (comprehensive). ieee dataport. Dataset . (2020). https://doi.org/10.21227/dz4t-cm36

Darrab, S., Broneske, D. & Saake, G. Ucrp-miner: Mining patterns that matter. In 2022 5th International Conference on Data Science and Information Technology (DSIT) , 1–7, (IEEE, 2022). https://doi.org/10.1109/DSIT55514.2022.9943880

Han, J., Pei, J. & Yin, Y. Mining frequent patterns without candidate generation. ACM Sigmod. Rec. 29 , 1–12 (2000).

Darrab, S., Broneske, D. & Saake, G. Rpp algorithm: A method for discovering interesting rare itemsets. in Data Mining and Big Data: 5th International Conference, DMBD 2020, Belgrade, Serbia, July 14–20, 2020, Proceedings 5 , 14–25 (Springer, 2020).

Ozcan, M. & Peker, S. A classification and regression tree algorithm for heart disease modeling and prediction. Healthc. Anal. 3 , 100130 (2023).

Download references

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Department of Computer Science, University of Magdeburg, 39106, Magdeburg, Germany

Sadeq Darrab & Gunter Saake

German Centre for Higher Education Research and Science Studies, 30195, Hannover, Germany

David Broneske

You can also search for this author in PubMed   Google Scholar

Contributions

The study was designed and implemented by Sadeq Darrab, who carried out the analysis and drafted the first draft of the paper. David Broneske and Gunter Saake participated in discussing the results and contributed to the revision and discussion of the manuscript from the first draft to the final version.

Corresponding author

Correspondence to Sadeq Darrab .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Darrab, S., Broneske, D. & Saake, G. Exploring the predictive factors of heart disease using rare association rule mining. Sci Rep 14 , 18178 (2024). https://doi.org/10.1038/s41598-024-69071-6

Download citation

Received : 26 January 2024

Accepted : 31 July 2024

Published : 06 August 2024

DOI : https://doi.org/10.1038/s41598-024-69071-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Heart disease
  • Cardiovascular risk factors
  • Early detection
  • Frequent association rules
  • Rare association rules

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research paper of heart diseases

Subscribe to our free Newsletter! →

Home › Food News

Exposed: FDA fails to police food additives properly thanks to loophole, paper warns

' src=

By StudyFinds Staff

Reviewed by Steve Fink

Research led by Jennifer L. Pomeranz (JD, MPH), Emily M. Broad Leib (JD), and Dariush Mozaffarian (MD, DrPH)

Aug 09, 2024

FDA logo

(Photo by Jeppe Gustafsson on Shutterstock)

Experts claim agency outsources much of its food safety oversight to the very companies it’s supposed to regulate

NEW YORK — In a shocking revelation, new research exposes an alleged gaping hole in the Food and Drug Administration’s (FDA) oversight of food ingredients. Thousands of potentially harmful substances may be lurking in our food supply without proper safety evaluation or public disclosure. This alarming situation stems from an apparent regulatory loophole that allows food companies to determine the safety of their own ingredients without FDA review.

An editorial published in the American Journal of Public Health reveals that while the FDA rigorously evaluates some food additives before they hit the market, it allows the food industry to self-regulate and classify many substances as “ generally recognized as safe ” (GRAS) based on undisclosed data. Even more concerning, the FDA lacks a formal process to systematically review the safety of additives and GRAS substances already in our food.

While this category was originally created for common ingredients like vinegar and spices, it has become a loophole big enough to drive a food truck through. The paper says that since 1997, the FDA has allowed food companies to determine on their own which new substances qualify as GRAS, without any requirement to notify the agency or share their safety data.

This means the ingredients in our food fall into a wide spectrum: from innocuous items like black pepper to substances harmful at high levels like salt . Then there are questionable chemicals like potassium bromate, a baked goods additive that may cause cancer, and unknown compounds that neither the FDA nor the public are aware of.

“Both the FDA and the public are unaware of how many of these ingredients—which are most commonly found in ultra-processed foods —are in our food supply,” says Jennifer Pomeranz, associate professor of public health policy and management at NYU School of Global Public Health and the editorial’s first author, in a statement.

The researchers argue that this regulatory gap leaves the FDA unable to fulfill its mission of ensuring a safe food supply. With diet-related diseases on the rise, addressing this oversight failure is crucial for public health.

Meat being tested in a lab for chemicals and additives

How did this FDA oversight begin?

The problem stems from the 1958 Food Additives Amendment, which established two categories of food ingredients: food additives requiring FDA pre-market approval , and GRAS substances exempt from such scrutiny. While this exemption was initially intended for common ingredients like salt and pepper, the food industry has exploited it to introduce a wide array of new substances without FDA oversight.

In fact, the study cites research estimating that between 1990 and 2010, about 1,000 new ingredients entered the food supply without any report to the FDA. An additional 2,700 substances were deemed safe by industry panels, often comprised of experts with conflicts of interest.

“There are now hundreds, if not thousands, of substances added to our foods for which the true safety data are unknown to independent scientists, the government, and the public,” explains study senior author Dariush Mozaffarian, director of the Food is Medicine Institute at Tufts University.

The FDA’s hands-off approach was recently upheld in court, with a 2021 ruling affirming the agency’s voluntary notification system for GRAS substances. This decision essentially codified the existing regulatory gaps, leaving states to try filling the void. For instance, in October 2023, California banned four FDA-permitted food additives due to cancer and other health risks.

Even when clear evidence of harm emerges, the FDA’s ability to remove substances from the food supply is limited. The case of partially hydrogenated oils (trans fats) illustrates this challenge. Despite mounting evidence of their dangers since the 1950s, it took until 2015 for the FDA to revoke their GRAS status, with the ban not taking full effect until 2018.

Saturated and Trans Fat on a nutrition label

Reforms needed to make food supply safer

The study’s authors argue that relying on post-market authority is an ineffective method for ensuring food safety, given the vast number of ingredients to review and the FDA’s lack of knowledge about self-GRAS substances. They call for a new framework to assess the safety of food ingredients, including mandatory pre-market notification, user fees to fund robust FDA reviews, and a system for regular post-market evaluation of substances already in the food supply.

“[The] FDA is only starting to utilize its post-market powers to review a tiny number of ingredients in the food supply, even though evidence of harm has been present for decades,” notes study co-author Emily Broad Leib, director of Harvard Law School Center for Health Law and Policy Innovation and founding director of the Harvard Law School Food Law and Policy Clinic.

“Both the FDA and Congress can do more to enable the FDA to meet its mission of ensuring a safe food supply,” adds Pomeranz.

Without such reforms, the American public may remain exposed to a food supply of uncertain safety, with potential long-term consequences for public health.

StudyFinds has reached out to the FDA for a response to the paper.

Paper Summary

Methodology.

The researchers conducted a comprehensive review of existing FDA regulations, court decisions, and scientific literature related to food additives and GRAS substances. They analyzed the historical development of these regulations, examined key case studies, and evaluated the current state of FDA oversight.

The study found significant gaps in the FDA’s regulation of food ingredients, particularly those classified as GRAS. Key findings include:

  • The FDA allows food companies to determine GRAS status without agency review or public disclosure.
  • An estimated 1,000 new ingredients entered the food supply between 1990-2010 without FDA notification.
  • Industry panels determining GRAS status often have conflicts of interest.
  • The FDA lacks a systematic process for reviewing the safety of additives and GRAS substances already in use.
  • Recent court decisions have upheld the FDA’s voluntary notification system, despite safety concerns.

Limitations

The study primarily relies on existing literature and regulatory analysis, rather than new empirical data. Additionally, the full extent of self-GRAS determinations by industry is unknown due to lack of reporting requirements. The authors acknowledge that some food companies may conduct thorough safety assessments, but the lack of transparency makes it impossible to evaluate the adequacy of these processes across the industry.

Discussion and Takeaways

The researchers argue that the current regulatory framework is inadequate to ensure food safety. They propose several policy recommendations, including:

  • Implementing a mandatory pre-market notification system for GRAS substances.
  • Establishing user fees to fund more robust FDA reviews.
  • Creating a framework for regular post-market review of food additives and GRAS substances.
  • Increasing congressional funding for FDA oversight activities.
  • Addressing conflicts of interest in industry GRAS determinations.

The authors emphasize that without significant reform, the FDA cannot fulfill its mission of protecting public health through food safety oversight. They call for urgent action by both the FDA and Congress to address these regulatory gaps.

Funding and Disclosures

The research was supported by the National Institutes of Health (NIH). The authors declared no conflicts of interest related to this study.

' src=

About StudyFinds Staff

StudyFinds sets out to find new research that speaks to mass audiences — without all the scientific jargon. The stories we publish are digestible, summarized versions of research that are intended to inform the reader as well as stir civil, educated debate. StudyFinds Staff articles are AI assisted, but always thoroughly reviewed and edited by a Study Finds staff member. Read our AI Policy for more information.

Our Editorial Process

StudyFinds publishes digestible, agenda-free, transparent research summaries that are intended to inform the reader as well as stir civil, educated debate. We do not agree nor disagree with any of the studies we post, rather, we encourage our readers to debate the veracity of the findings themselves. All articles published on StudyFinds are vetted by our editors prior to publication and include links back to the source or corresponding journal article, if possible.

Our Editorial Team

Editor-in-Chief

Chris Melore

Sophia Naughton

Associate Editor

Related Content

Stonehenge's "Altar Stone"

Stonehenge mystery solved as Altar Stone’s distant origins are unearthed

August 14, 2024

Taylor Swift performs at the 2019 Z100 Jingle Ball at Madison Square Garden.

Taylor Swift fans actually save money flying to London to see The Eras Tour!

Joan Rivers' star on the Hollywood Walk of Fame is surrounded by flowers and various memorial tributes left by fans on September 6, 2014.

Top 5 Funniest Female Comedians of All Time | Consensus Picks

Leave a reply cancel reply.

research paper of heart diseases

©2024 Study Finds. All rights reserved. Privacy Policy • Disclosure Policy • Do Not Sell My Personal Information

Warning: The NCBI web site requires JavaScript to function. more...

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Cardiovascular disease.

Edgardo Olvera Lopez ; Brian D. Ballard ; Arif Jan .

Affiliations

Last Update: August 22, 2023 .

  • Continuing Education Activity

The cardiovascular system consists of the heart and its blood vessels. A wide array of problems can arise within the cardiovascular system, a few of which include endocarditis, rheumatic heart disease, and conduction system abnormalities. Cardiovascular disease, also known as heart disease, refers to the following 4 entities: coronary artery disease (CAD) which is also referred to as coronary heart disease (CHD), cerebrovascular disease, peripheral artery disease (PAD), and aortic atherosclerosis. CAD results from decreased myocardial perfusion that causes angina due to ischemia and can result in myocardial infarction (MI), and/or heart failure. It accounts for one-third to one-half of all cases of cardiovascular disease. Cerebrovascular disease is the entity associated with strokes, also termed cerebrovascular accidents, and transient ischemic attacks (TIAs). Peripheral arterial disease (PAD) is arterial disease predominantly involving the limbs that may result in claudication. Aortic atherosclerosis is the entity associated with thoracic and abdominal aneurysms. This activity reviews the evaluation and treatment of cardiovascular disease and the role of the medical team in evaluating and treating these conditions.

  • Review the cause of coronary artery disease.
  • Describe the pathophysiology of atherosclerosis.
  • Summarize the treatment options for heart disease.
  • Outline the evaluation and treatment of cardiovascular disease and the role of the medical team in evaluating and treating this condition.
  • Introduction

The cardiovascular system consists of the heart and blood vessels. [1]  There is a wide array of problems that may arise within the cardiovascular system, for example, endocarditis, rheumatic heart disease, abnormalities in the conduction system, among others, cardiovascular disease (CVD) or heart disease refer to the following 4 entities that are the focus of this article [2] :

  • Coronary artery disease (CAD): Sometimes referred to as Coronary Heart Disease (CHD), results from decreased myocardial perfusion that causes angina, myocardial infarction (MI), and/or heart failure. It accounts for one-third to one-half of the cases of CVD.
  • Cerebrovascular disease (CVD): Including stroke and transient ischemic attack (TIA)
  • Peripheral artery disease (PAD): Particularly arterial disease involving the limbs that may result in claudication
  • Aortic atherosclerosis:  Including thoracic and abdominal aneurysms

Although CVD may directly arise from different etiologies such as emboli in a patient with atrial fibrillation resulting in ischemic stroke, rheumatic fever causing valvular heart disease, among others, addressing risks factors associated to the development of atherosclerosis is most important because it is a common denominator in the pathophysiology of CVD.

The industrialization of the economy with a resultant shift from physically demanding to sedentary jobs, along with the current consumerism and technology-driven culture that is related to longer work hours, longer commutes, and less leisure time for recreational activities, may explain the significant and steady increase in the rates of CVD during the last few decades. Specifically, physical inactivity, intake of a high-calorie diet, saturated fats, and sugars are associated with the development of atherosclerosis and other metabolic disturbances like metabolic syndrome, diabetes mellitus, and hypertension that are highly prevalent in people with CVD. [3] [2] [4] [5]

According to the INTERHEART study that included subjects from 52 countries, including high, middle, and low-income countries, 9 modifiable risks factors accounted for 90% of the risk of having a first MI: smoking, dyslipidemia, hypertension, diabetes, abdominal obesity, psychosocial factors, consumption of fruits and vegetables, regular alcohol consumption, and physical inactivity. It is important to mention that in this study 36% of the population-attributable risk of MI was accounted to smoking. [6]

Other large cohort studies like the Framingham Heart Study [7] and the Third National Health and Nutrition Examination Survey (NHANES III) [5] have also found a strong association and predictive value of dyslipidemia, high blood pressure, smoking, and glucose intolerance. Sixty percent to 90% of CHD events occurred in subjects with at least one risk factor.

These findings have been translated into health promotion programs by the American Heart Association with emphasis on seven recommendations to decrease the risk of CVD: avoiding smoking, being physically active, eating healthy, and keeping normal blood pressure, body weight, glucose, and cholesterol levels. [8] [9]

On the other hand, non-modifiable factors as family history, age, and gender have different implications. [4] [7] Family history, particularly premature atherosclerotic disease defined as CVD or death from CVD in a first-degree relative before 55 years (in males) or 65 years (in females) is considered an independent risk factor. [10] There is also suggestive evidence that the presence of CVD risk factors may differently influence gender. [4] [7]  For instance, diabetes and smoking more than 20 cigarettes per day had increased CVD risk in women compared to men. [11] Prevalence of CVD increases significantly with each decade of life. [12]  

The presence of HIV (human immunodeficiency virus), [13]  history of mediastinal or chest wall radiation, [14]  microalbuminuria, [15] , increased inflammatory markers [16] [17]  have also been associated with an increased rate and incidence of CVD. 

Pointing out specific diet factors like meat consumption, fiber, and coffee and their relation to CVD remains controversial due to significant bias and residual confounding encountered in epidemiological studies. [18] [19]

  • Epidemiology

Cardiovascular diseases (CVD) remain among the 2 leading causes of death in the United States since 1975 with 633,842 deaths or 1 in every 4 deaths, heart disease occupied the leading cause of death in 2015 followed by 595,930 deaths related to cancer. [2]  CVD is also the number 1 cause of death globally with an estimated 17.7 million deaths in 2015, according to the World Health Organization (WHO). The burden of CVD further extends as it is considered the most costly disease even ahead of Alzheimer disease and diabetes with calculated indirect costs of $237 billion dollars per year and a projected increased to $368 billion by 2035. [20]

Although the age-adjusted rate and acute mortality from MI have been declining over time, reflecting the progress in diagnosis and treatment during the last couple of decades, the risk of heart disease remains high with a calculated 50% risk by age 45 in the general population. [7] [21]  The incidence significantly increases with age with some variations between genders as the incidence is higher in men at younger ages. [2]  The difference in incidence narrows progressively in the post-menopausal state. [2]

  • Pathophysiology

Atherosclerosis is the pathogenic process in the arteries and the aorta that can potentially cause disease as a consequence of decreased or absent blood flow from stenosis of the blood vessels. [22]

It involves multiple factors dyslipidemia, immunologic phenomena, inflammation, and endothelial dysfunction. These factors are believed to trigger the formation of fatty streak, which is the hallmark in the development of the atherosclerotic plaque [23] ; a progressive process that may occur as early as in the childhood. [24]  This process comprises intimal thickening with subsequent accumulation of lipid-laden macrophages (foam cells) and extracellular matrix, followed by aggregation and proliferation of smooth muscle cells constituting the formation of the atheroma plaque. [25]  As this lesions continue to expand, apoptosis of the deep layers can occur, precipitating further macrophage recruitment that can become calcified and transition to atherosclerotic plaques. [26]

Other mechanisms like arterial remodeling and intra-plaque hemorrhage play an important role in the delay and accelerated the progression of atherosclerotic CVD but are beyond the purpose of this article. [27]

  • History and Physical

The clinical presentation of cardiovascular diseases can range from asymptomatic (e.g., silent ischemia, angiographic evidence of coronary artery disease without symptoms, among others) to classic presentations as when patients present with typical anginal chest pain consistent of myocardial infarction and/or those suffering from acute CVA presenting with focal neurological deficits of sudden onset. [28] [29] [28]

Historically, coronary artery disease typically presents with angina that is a pain of substernal location, described as a crushing or pressure in nature, that may radiate to the medial aspect of the left upper extremity, to the neck or the jaw and that can be associated with nausea, vomiting, palpitations, diaphoresis, syncope or even sudden death. [30]  Physicians and other health care providers should be aware of possible variations in symptom presentation for these patients and maintain a high index of suspicion despite an atypical presentation, for example, dizziness and nausea as the only presenting symptoms in patients having an acute MI [31] ), particularly in people with a known history of CAD/MI and for those with the presence of CVD risk factors. [32] [33] [34] [33] [32]  Additional chest pain features suggestive of ischemic etiology are the exacerbation with exercise and or activity and resolution with rest or nitroglycerin. [35]

Neurologic deficits are the hallmark of cerebrovascular disease including TIA and stroke where the key differentiating factor is the resolution of symptoms within 24 hours for patients with TIA. [36]  Although the specific symptoms depend on the affected area of the brain, the sudden onset of extremity weakness, dysarthria, and facial droop are among the most commonly reported symptoms that raise concern for a diagnosis of a stroke. [37] [38]  Ataxia, nystagmus and other subtle symptoms as dizziness, headache, syncope, nausea or vomiting are among the most reported symptoms with people with posterior circulation strokes challenging to correlate and that require highly suspicion in patients with risks factors. [39]

Patients with PAD may present with claudication of the limbs, described as a cramp-like muscle pain precipitated by increased blood flow demand during exercise that typically subsides with rest. [40] Severe PAD might present with color changes of the skin and changes in temperature. [41]  

Most patients with thoracic aortic aneurysm will be asymptomatic, but symptoms can develop as it progresses from subtle symptoms from compression to surrounding tissues causing cough, shortness of breath or dysphonia, to the acute presentation of sudden crushing chest or back pain due to acute rupture. [42]  The same is true for abdominal aortic aneurysms (AAA) that cause no symptoms in early stages to the acute presentation of sudden onset of abdominal pain or syncope from acute rupture. [43]

A thorough physical examination is paramount for the diagnosis of CVD. Starting with a general inspection to look for signs of distress as in patients with angina or with decompensated heart failure, or chronic skin changes from PAD. Carotid examination with the patient on supine position and the back at 30 degrees for the palpation and auscultation of carotid pulses, bruits and to evaluate for jugular venous pulsations on the neck is essential. Precordial examination starting with inspection, followed by palpation looking for chest wall tenderness, thrills, and identification of the point of maximal impulse should then be performed before auscultating the precordium. Heart sounds auscultation starts in the aortic area with the identification of the S1 and S2 sounds followed by characterization of murmurs if present. Paying attention to changes with inspirations and maneuvers to correctly characterize heart murmurs is encouraged. Palpating peripheral pulses with bilateral examination and comparison when applicable is an integral part of the CVD examination. [44]

Thorough clinical history and physical exam directed but not limited to the cardiovascular system are the hallmarks for the diagnosis of CVD. Specifically, a history compatible with obesity, angina, decreased exercise tolerance, orthopnea, paroxysmal nocturnal dyspnea, syncope or presyncope, and claudication should prompt the clinician to obtain a more detailed history and physical exam and, if pertinent, obtain ancillary diagnostic test according to the clinical scenario (e.g., electrocardiogram and cardiac enzymes for patients presenting with chest pain). 

Besides a diagnosis prompted by clinical suspicion, most of the efforts should be oriented for primary prevention by targeting people with the presence of risk factors and treat modifiable risk factors by all available means. All patient starting at age 20 should be engaged in the discussion of CVD risk factors and lipid measurement. [9]  Several calculators that use LDL-cholesterol and HDL-cholesterol levels and the presence of other risk factors calculate a 10-year or 30-year CVD score to determine if additional therapies like the use of statins and aspirin are indicated for primary prevention, generally indicated if such risk is more than ten percent. [10]  Like other risk assessment tools, the use of this calculators have some limitations, and it is recommended to exert precaution when assessing patients with diabetes and familial hypercholesterolemia as their risk can be underestimated. Another limitation to their use is that people older than 79 were usually excluded from the cohorts where these calculators were formulated, and individualized approach for these populations is recommended by discussing risk and benefits of adjunctive therapies and particular consideration of life expectancy. Some experts recommend a reassessment of CVD risk every 4 to 6 years. [9]

Preventative measures like following healthy food habits, avoiding overweight and following an active lifestyle are pertinent in all patients, particularly for people with non-modifiable risk factors such as family history of premature CHD or post-menopause. [9] [8]

The use of inflammatory markers and other risk assessment methods as coronary artery calcification score (CAC) are under research and have limited applications that their use should not replace the identification of people with known risk factors, nonetheless these resources remain as promising tools in the future of primary prevention by detecting people with subclinical atherosclerosis at risk for CVD. [45]

  • Treatment / Management

Management of CVD is very extensive depending on the clinical situation (catheter-directed thrombolysis for acute ischemic stroke, angioplasty for peripheral vascular disease, coronary stenting for CHD); however, patients with known CVD should be strongly educated on the need for secondary prevention by risk factor and lifestyle modification. [9] [46]

  • Differential Diagnosis
  • Acute pericarditis
  • Angina pectoris
  • Artherosclerosis
  • Coronary artery vasospasm
  • Dilated cardiomyopathy
  • Giant cell arteritis
  • Hypertension
  • Hypertensive heart disease
  • Kawasaki disease
  • Myocarditis

The prognosis and burden of CVD have been discussed in other sections.

  • Complications

The most feared complication from CVD is death and, as explained above, despite multiple discoveries in the last decades CVD remains in the top leading causes of death all over the world owing to the alarming prevalence of CVD in the population. [2]  Other complications as the need for longer hospitalizations, physical disability and increased costs of care are significant and are the focus for health-care policymakers as it is believed they will continue to increase in the coming decades. [20]

For people with heart failure with reduced ejection fraction (HFreEF) of less than 35%, as the risk of life-threatening arrhythmias is exceedingly high in these patients, current guidelines recommend the implantation of an implantable-cardioverter defibrillator (ICD) for those with symptoms equivalent to a New York Heart Association (NYHA) Class II-IV despite maximal tolerated medical therapy. [47]

Strokes can leave people with severe disabling sequelae like dysarthria or aphasia, dysphagia, focal or generalized muscle weakness or paresis that can be temporal or cause permanent physical disability that may lead to a complete bedbound state due to hemiplegia with added complications secondary to immobility as is the higher risk of developing urinary tract infections and/or risk for thromboembolic events. [48] [49]

There is an increased risk of all-cause death for people with PAD compared to those without evidence of peripheral disease. [50]  Chronic wounds, physical limitation, and limb ischemia are among other complications from PAD. [51]

  • Consultations

An interprofessional approach that involves primary care doctors, nurses, dietitians, cardiologists, neurologists, and other specialists is likely to improve outcomes. This has been shown to be beneficial in patients with heart failure, [52]  coronary disease, [53]  and current investigations to assess the impact on other forms of CVD are under planning and promise encouraging results.

  • Deterrence and Patient Education

Efforts should be directed toward primary prevention by leading a healthy lifestyle, and an appropriate diet starting as early as possible with the goal of delay or avoid the initiation of atherosclerosis as it relates to the future risk of CVD. The AHA developed the concept of "ideal cardiovascular health" defined by the presence of [8] :

  • Ideal health behaviors: Nonsmoking, body mass index less than 25 kg/m2, physical activity at goal levels, and the pursuit of a diet consistent with current guideline recommendations
  • Ideal health factors: Untreated total cholesterol less than 200 mg/dL, untreated blood pressure less than 120/80 mm Hg, and fasting blood glucose less than 100 mg/dL) with the goal to improve the health of all Americans with an expected decrease in deaths from CVD by 20%

Specific attention should be made to people at higher risk for CVD as are people with diabetes, hypertension, hyperlipidemia, smokers, and obese patients. Risk factors modification by controlling their medical conditions, avoiding smoking, taking appropriate measures to lose weight and maintaining an active lifestyle is of extreme importance. [8] [9] [10] The recommendations on the use of statins and low-dose aspirin for primary and secondary prevention has been discussed in other sections.

  • Pearls and Other Issues

Cardiovascular disease generally refers to 4 general entities: CAD, CVD, PVD, and aortic atherosclerosis. 

CVD is the main cause of death globally.

Measures aimed to prevent the progression of atherosclerosis are the hallmark for primary prevention of CVD.

Risk factor and lifestyle modification are paramount in the prevention of CVD.

  • Enhancing Healthcare Team Outcomes

An interprofessional and patient-oriented approach can help to improve outcomes for people with cardiovascular disease as shown in patients with heart failure (HF) who had better outcomes when the interprofessional involvement of nurses, dietitians, pharmacists, and other health professionals was used (Class 1A). [52]

Similarly, positive results were obtained in people in an intervention group who were followed by an interprofessional team comprised of pharmacists, nurses and a team of different physicians. This group had a reduction in all-cause mortality associated with CAD by 76% compared to the control group. [53]  Healthcare workers should educate the public on lifestyle changes and reduce the modifiable risk factors for heart disease to a minimum.

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Atherosclerosis as a result of coronary heart disease. Contributed by National Heart, Lung and Blood Institute (NIH)

Coronary Artery Disease Pathophysiology. Coronary artery disease is usually caused by an atherosclerotic plaque that blocks the lumen of a coronary artery, typically the left anterior descending artery. Contributed by S Bhimji, MD

Disclosure: Edgardo Olvera Lopez declares no relevant financial relationships with ineligible companies.

Disclosure: Brian Ballard declares no relevant financial relationships with ineligible companies.

Disclosure: Arif Jan declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Olvera Lopez E, Ballard BD, Jan A. Cardiovascular Disease. [Updated 2023 Aug 22]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Similar articles in PubMed

  • Review De-risking primary prevention: role of imaging. [Ther Adv Cardiovasc Dis. 2021] Review De-risking primary prevention: role of imaging. Shafter AM, Shaikh K, Johanis A, Budoff MJ. Ther Adv Cardiovasc Dis. 2021 Jan-Dec; 15:17539447211051248.
  • Shared and non-shared familial susceptibility of coronary heart disease, ischemic stroke, peripheral artery disease and aortic disease. [Int J Cardiol. 2013] Shared and non-shared familial susceptibility of coronary heart disease, ischemic stroke, peripheral artery disease and aortic disease. Calling S, Ji J, Sundquist J, Sundquist K, Zöller B. Int J Cardiol. 2013 Oct 3; 168(3):2844-50. Epub 2013 Apr 30.
  • Stenting for peripheral artery disease of the lower extremities: an evidence-based analysis. [Ont Health Technol Assess Ser....] Stenting for peripheral artery disease of the lower extremities: an evidence-based analysis. Medical Advisory Secretariat. Ont Health Technol Assess Ser. 2010; 10(18):1-88. Epub 2010 Sep 1.
  • Polyvascular disease and long-term cardiovascular outcomes in older patients with non-ST-segment-elevation myocardial infarction. [Circ Cardiovasc Qual Outcomes....] Polyvascular disease and long-term cardiovascular outcomes in older patients with non-ST-segment-elevation myocardial infarction. Subherwal S, Bhatt DL, Li S, Wang TY, Thomas L, Alexander KP, Patel MR, Ohman EM, Gibler WB, Peterson ED, et al. Circ Cardiovasc Qual Outcomes. 2012 Jul 1; 5(4):541-9. Epub 2012 Jun 19.
  • Review Prevention of ventricular fibrillation, acute myocardial infarction (myocardial necrosis), heart failure, and mortality by bretylium: is ischemic heart disease primarily adrenergic cardiovascular disease? [Am J Ther. 2004] Review Prevention of ventricular fibrillation, acute myocardial infarction (myocardial necrosis), heart failure, and mortality by bretylium: is ischemic heart disease primarily adrenergic cardiovascular disease? Bacaner M, Brietenbucher J, LaBree J. Am J Ther. 2004 Sep-Oct; 11(5):366-411.

Recent Activity

  • Cardiovascular Disease - StatPearls Cardiovascular Disease - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

COMMENTS

  1. Early and accurate detection and diagnosis of heart disease ...

    Heart disease is a fatal human disease, rapidly increases globally in both developed and undeveloped countries and consequently, causes death.

  2. Global Burden of Cardiovascular Diseases and Risk Factors, 1990-2019

    Cardiovascular diseases (CVDs), principally ischemic heart disease (IHD) and stroke, are the leading cause of global mortality and a major contributor to disability. This paper reviews the magnitude of total CVD burden, including 13 underlying causes ...

  3. A Systematic Review of Major Cardiovascular Risk Factors: A Growing

    This systematic review focuses on the prevalence of the potential risk factors of cardiovascular diseases, irrespective of age and gender, and its impacts on global public health. Furthermore, the study aims to draw attention to the need for health practitioners to ensure early interventions to prevent cardiovascular disease and its complications.

  4. 2024 Heart Disease and Stroke Statistics: A Report of US and Global

    BACKGROUND: The American Heart Association (AHA), in conjunction with the National Institutes of Health, annually reports the most up-to-date statistics related to heart disease, stroke, and cardiovascular risk factors, including core health behaviors (smoking, physical activity, nutrition, sleep, and obesity) and health factors (cholesterol, blood pressure, glucose control, and metabolic ...

  5. Heart Disease and Stroke Statistics—2022 Update: A Report From the

    Methods: The American Heart Association, through its Statistics Committee, continuously monitors and evaluates sources of data on heart disease and stroke in the United States to provide the most current information available in the annual Statistical Update. The 2022 Statistical Update is the product of a full year's worth of effort by dedicated volunteer clinicians and scientists ...

  6. Hot topics and trends in cardiovascular research

    Overarching, strongly growing topics in clinical and population sciences are evidence-based guidance for treatment, research on outcomes, prognosis, and risk factors. 'Hot' topics include novel treatments in valve disease and in coronary artery disease, and imaging.

  7. Machine learning prediction in cardiovascular diseases: a meta ...

    Controlled vocabulary, supplemented with keywords, was used to search for studies of ML algorithms and coronary heart disease, stroke, heart failure, and cardiac arrhythmias.

  8. A Brief Review of Cardiovascular Diseases, Associated Risk ...

    Learn about the causes, consequences and treatments of cardiovascular diseases, a global health challenge, from this comprehensive review article.

  9. Cardiovascular diseases

    Cardiovascular diseases articles from across Nature Portfolio Cardiovascular diseases are pathological conditions affecting the heart and/or blood vessels that is, the cardiovascular system.

  10. Cardiology

    V. SanchorawalaN Engl J Med 2024;390:2295-2307. Amyloidosis, a systemic disease that manifests in various ways, should be in the differential diagnosis of unexplained proteinuria, restrictive ...

  11. of Cardiovascular Diseases and Risk Factors

    Cardiovascular diseases (CVDs), principally ischemic heart disease (IHD) and stroke, are the leading cause of. global mortality and a major contributor to disability. This paper reviews the ...

  12. Coronary Heart Disease Research

    Heart disease, including coronary heart disease, remains the leading cause of death in the United States. However, the rate of heart disease deaths has declined by 70% over the past 50 years, thanks in part to NHLBI-funded research. Many current studies funded by the NHLBI focus on discovering genetic associations and finding new ways to ...

  13. Heart Disease and Stroke Statistics—2020 Update: A Report From the

    Methods: The American Heart Association, through its Statistics Committee, continuously monitors and evaluates sources of data on heart disease and stroke in the United States to provide the most current information available in the annual Statistical Update. The 2020 Statistical Update is the product of a full year's worth of effort by dedicated volunteer clinicians and scientists ...

  14. Heart Disease Prediction Using Machine Learning

    Cardiovascular disease refers to any critical condition that impacts the heart. Because heart diseases can be life-threatening, researchers are focusing on designing smart systems to accurately diagnose them based on electronic health data, with the aid of machine learning algorithms. This work presents several machine learning approaches for predicting heart diseases, using data of major ...

  15. Using Machine Learning for Heart Disease Prediction

    Our paper is part of the research on the detection and prediction of heart disease. It is based on the application of Machine Learning algorithms, of which w e have. chosen the 3 most used ...

  16. State of the Science: The Relevance of Symptoms in Cardiovascular

    Secondary aims include (1) describing symptom measurement methods in research and application in clinical practice and (2) describing the importance of cardiovascular disease symptoms in terms of clinical events and other patient-reported outcomes as applicable.

  17. Primary prevention of cardiovascular disease: A review of contemporary

    Cardiovascular disease (CVD) is an umbrella term for a number of linked pathologies, commonly defined as coronary heart disease (CHD), cerebrovascular disease, peripheral arterial disease, rheumatic and congenital heart diseases and venous thromboembolism. Globally CVD accounts for 31% of mortality, the majority of this in the form of CHD and ...

  18. Heart disease risk prediction using deep learning techniques with

    Cardiovascular diseases state as one of the greatest risks of death for the general population. Late detection in heart diseases highly conditions the chances of survival for patients. Age, sex, cholesterol level, sugar level, heart rate, among other factors, are known to have an influence on life-threatening heart problems, but, due to the high amount of variables, it is often difficult for ...

  19. Coronary Artery Disease

    Genetics: Coronary Artery Disease In the past decade, remarkable strides have been made in understanding the genetics of cardiovascular disease (CVD). McPherson 1 demonstrated that the genetic architecture of CAD is largely determined by the combined effects of multiple common genetic variants of which, individually, contribute little to disease risk. This is opposed to rare genetic variants ...

  20. Cardiovascular Research Topics

    Preventive Cardiology Stem cell and Regenerative Biology Women and Heart Disease Breakthrough Discoveries Core Lab The Johns Hopkins Core Lab provides access to Small Animal Cardiovascular Phenotyping and Model Core. Learn more about the lab Learn more about our cardiovascular research topics.

  21. COVID-19: Long-term effects

    Most people who get coronavirus disease 2019 (COVID-19) recover within a few weeks. ... Research suggests that between one month and one year after having COVID-19, ... The effects also could lead to the development of new conditions, such as diabetes or a heart or nervous system condition.

  22. Exploring the predictive factors of heart disease using rare ...

    The World Health Organization (WHO) identifies cardiovascular diseases (CVDs) as the foremost cause of mortality globally, presenting a significant global health challenge 1.This issue extends ...

  23. Machine Learning Technology-Based Heart Disease Detection Models

    The present survey paper gives the best idea regarding different machine learning-based heart disease detection methods.This research can be updated in the future by adding more attributes to the heart disease dataset and making it more interactive for the users.

  24. Contemporary Diagnosis and Management of Rheumatic Heart Disease

    The global burden of rheumatic heart disease continues to be significant although it is largely limited to poor and marginalized populations. In most endemic regions, affected patients present with heart failure. This statement will seek to examine the current state-of-the-art recommendations and to identify gaps in diagnosis and treatment globally that can inform strategies for reducing ...

  25. Exposed: FDA fails to police food additives properly thanks to loophole

    In a shocking revelation, new research exposes an alleged gaping hole in the Food and Drug Administration's (FDA) oversight of food ingredients.

  26. Cardiovascular Disease

    The cardiovascular system consists of the heart and blood vessels.[1] There is a wide array of problems that may arise within the cardiovascular system, for example, endocarditis, rheumatic heart disease, abnormalities in the conduction system, among others, cardiovascular disease (CVD) or heart disease refer to the following 4 entities that are the focus of this article[2]:

  27. Health Economics of Cardiovascular Disease in the United States

    Cardiovascular disease (CVD), including coronary artery disease, heart failure, arrhythmias, and stroke, is the leading cause of morbidity and mortality in the United States, and accounts for >$400 billion per year in direct medical spending and indirect costs, such as lost productivity. 1 The ...