ADVERTISEMENT
Using Artificial Intelligence in Predicting Ischemic Stroke Events After Percutaneous Coronary Intervention
Abstract
BACKGROUND. Ischemic stroke (IS) is an uncommon but severe complication in patients undergoing percutaneous coronary intervention (PCI). Despite significant morbidity and economic cost associated with post PCI IS, a validated risk prediction model is not currently available. AIMS. We aim to develop a machine learning model that predicts IS after PCI. METHODS. We analyzed data from Mayo Clinic CathPCI registry from 2003 to 2018. Baseline clinical and demographic data, electrocardiography (ECG), intra/post-procedural data, and echocardiographic variables were abstracted. A random forest (RF) machine learning model and a logistic regression (LR) model were developed. The receiver operator characteristic (ROC) analysis was used to assess model performance in predicting IS at 6-month, 1-, 2-, and 5-years post-PCI. RESULTS. A total of 17,356 patients were included in the final analysis. The mean age of this cohort was 66.9 ± 12.5 years, and 70.7% were male. Post-PCI IS was noted in 109 patients (.6%) at 6 months, 132 patients (.8%) at 1 year, 175 patients (1%) at 2 years, and 264 patients (1.5%) at 5 years. The area under the curve of the RF model was superior to the LR model in predicting ischemic stroke at 6 months, 1-, 2-, and 5-years. Periprocedural stroke was the strongest predictor of IS post discharge. CONCLUSIONS. The RF model accurately predicts short- and long-term risk of IS and outperforms logistic regression analysis in patients undergoing PCI. Patients with periprocedural stroke may benefit from aggressive management to reduce the future risk of IS.
J INVASIVE CARDIOL 2023;35(6):E297-E311. doi: 10.25270/jic/23.00045
Key words: percutaneous coronary intervention (PCI), ischemic stroke, machine learning, outcome.
Introduction
Percutaneous coronary intervention (PCI) is the most commonly utilized therapeutic procedure for patients with coronary artery disease (CAD). Several risk-prediction models have been developed to predict short-term and long-term mortality post-PCI.1-4 Ischemic stroke (IS), either onset early or late after PCI, is a relatively uncommon but severe complication in this population.5,6 The incidence of peri-procedural IS in the United States has increased, primarily driven by the complexity of patients undergoing PCI.4 After PCI, the incidence of IS events in the chronic phase has been reported in the range of 1%-11%, but risk factors weren’t investigated as well.7-11 Despite substantial morbidity and economic burden associated with IS after PCI,4,12,13 a valid risk prediction model for post-PCI subsequent IS events has not yet been developed.
While many risk prediction models based on conventional logistic regression were able to successfully guide clinical decision-making,14,15 these models are largely based on the assumption that linear correlations exist between different variables. Recent advances in the realms of artificial intelligence and machine learning have generated significant interest in the application of these techniques in clinical medicine. Unlike most traditional statistical methods, machine learning can adapt to conditional probabilities, where one variable is dependent on the other, and be applied to non-linear relationships. Most human biological systems demonstrate conditional, interdependent components, and non-linear relationships, which explains machine learning’s superior performance in predicting outcomes in medicine. In the field of cardiovascular diseases, machine learning techniques have been used to improve the accuracy of diagnostic and prognostic models.16-18
In this study, we aimed to develop and validate a novel machine learning model to predict the short-term and long-term risk of ischemic stroke in a large patient population that underwent a PCI procedure. We hypothesized that a machine learning-based model is superior to a conventional logistic regression model in predicting the risk of subsequent ischemic stroke in patients treated with PCI.
Methods
Study population and demographic/baseline variables. Retrospective data from the Mayo Health Clinic Systems across four sites (La Crosse, Wisconsin; Mankato, Minnesota; Rochester, Minnesota; Scottsdale, Arizona) for 21872 patients who underwent percutaneous coronary intervention between January 2006 and December 2018 were obtained. Patients lost to follow-up or without an echocardiogram within 30 days of the PCI were excluded. The final cohort included 17356 patients. One hundred sixty-four variables encompassing clinical and patient demographics were recorded for each patient (Supplemental Table 1). The Institutional Review Board at Mayo Clinic approved the study protocol and research authorization was provided by all the patients to utilize their medical information.
Echocardiography variables. A total of 204 transthoracic echocardiogram (TTE) variables were collected for the study (Supplemental Table 2). The measurement of TTE variables were based on the current American Society of Echocardiography (ASE) guidelines.19 The calculation of selected important TTE variables are summarized below:
The left ventricular ejection fraction (LVEF) was derived using Simpson’s biplane method of disks. Mild, moderate, and severe regurgitation across mitral, tricuspid, and aortic valves was assessed using standard diagnostic criteria. The right ventricular systolic pressure (RVSP) was derived using a simplified Bernoulli equation: 4(V2) + right atrial pressure, where V is the peak velocity of the TR jet. Right atrial pressure was estimated from the diameter of inferior vena cava and its respirophasic changes. Left ventricular outflow tract (LVOT) diameter was measured in systole and long parasternal view between the bases of aortic valve cusps. Cardiac output was measured using the formula: (LVOT diameter2 x .785 x LVOT velocity time integral) in L/min.
Primary clinical endpoints. The primary clinical endpoints were incidents of ischemic stroke at 6-months, 1-, 2-, and 5-years post-PCI. New ischemic stroke events were defined as events that occurred after the index hospitalization/visit of PCI procedure.
Model variables. The model was generated on 368 predictor variables, including 204 TTE variables and 164 baseline demographics and clinical variables (Supplemental Tables 1 and 2). Each of the 368 predictor variables were fed into the model for consideration. Missing data was internally adjusted by the random forest algorithm. The final model included data from all predictor variables, but final discrimination did not depend on all 368 variables.
Model generation. The statistical analyses were performed using several R packages under R (v3.5.1). Firstly, the data were summarized with package dlookr (v0.3.0), which reports the summary and missing percentage of each variable. A random forest machine learning algorithm was used from the R package for risk prediction model generation. Model hyperparameters are tunable variables that control the model’s learning process, which were defined prior to the model being fitted. The study population was randomly divided into a training cohort (80%) and validation cohort (20%). For each of the 4 outcomes, the random forest machine learning algorithm was developed using a 5-fold cross validation repeated 10 times. The optimal value for the hyperparameters was defined as the combination of values that resulted in the highest area under the receiver operating characteristic curve (AUC). The AUC is a numeric representation that measures how well the model can discriminate between patients with events and without events. The model then averaged the predicted probabilities for all 10 repetions for each patient. The average of these repetitions was then used to compute the final AUC. The process of the model development is demonstrated in Figure 1.
Variable importance for each variable is determined by calculating its relative influence in the model. The variable importance plot takes the top-ranked variables and shows which have the most influence. A partial dependence analysis was performed for the top-ranked variables to determine how each variable contributed to the model prediction.
Model comparison. As there is no available risk-prediction model for post-PCI ischemic stroke, the machine learning (ML) model was compared to a logistic regression (LR) model based upon the known clinical risk factors for stroke (Supplemental Table 3). A receiver operating curve (ROC) for LR model performance was compared to the ML model. The AUC of both the LR model and RF model were tested in pre-specified subgroups which include age >75 years, female sex, STEMI, cardiogenic shock, NSTEMI, diabetes mellitus, peripheral arterial disease, GFR <60, BMI >35 kg/m2, radial access, and femoral access.
Statistical computing. All statistical analysis was performed in R (Version 3.5.1, 7/2/2018). Categorical and ordinal variables were compared either with the chi-square and expressed as counts and percentage. Continuous variables were compared with the t-test and expressed as mean ± standard deviation (SD). Baseline differences were assessed using Pearson’s χ2 test and Student t-test. R software version 3.5.1 was used to run the analysis, and values of P<.05 were considered statistically significant.12
Results
Patient demographics and TTE measurements. Retrospective data were collected on patients that were treated with PCI between August 1996 and December 2017. Procedures were performed accross 4 sites within the Mayo Clinic Health System (Rochester, La Crosse, Mankato, and Phoenix) and the data were recorded in an inter-site database. The database originally returned 21872 patients who underwent PCI during this time frame. A total of 17356 patients were included in the final analysis after applying the exclusion criteria. The mean age of this cohort was 66.9 ± 12.5 years, 70.7% were male (n=12271), and 19.7% (n= 3423) had ST-elevation MI. In terms of past medical history, 25.5% (n=4424) had prior MI, 24.4% (n= 4233) had prior PCI, 17.0% (n= 2959) had prior CABG, 13.9% (n= 1936) had prior cerebrovascular diseases, 77.24% (n=13406) had hypertension, 29.5% (n=5122) were diabetic, and 19.63% (n=3407) were current smokers. Table 1 includes a summary of selected baseline variables. Detailed patient characteristic data are available in Supplemental Table 1 and detailed TTE data are summarized in Supplemental Table 2.
Primary clinical outcomes. Subsequent ischemic stroke occurred in 109 patients (.6%) at 6 months, 132 patients (.8%) at 1 year, 175 patients (1.0%) at 2 years, and 264 patients (1.5%) at 5 years. Supplemental Table 3 summarizes the odds ratio and the 95% confidence interval of selected variables included in the LR model in predicting IS events at the follow-up timepoints.
Model variable weighting. The top 10 variables with most contributions in the RF model and the corresponding importance score for 4 primary clinical outcomes of interest are plotted in Figure 2. The leading variable in predicting IS outcome at 6-months, 1-, 2-, and 5-years was history of periprocedural ischemic stroke. Following in-hospital ischemic stroke, in-hospital hemorrhagic stroke is the second most important variable, except in predicting 5-year IS outcome. In addition to in-hospital IS, pre-procedural hemoglobin level, baseline QTc interval, troponin level, body mass index, systolic blood pressure, age, and glomerular filtration rate (GFR) also had significant weight in the model. The partial dependence analysis showed that lower QTc, lower hemoglobin level, higher troponin level, higher systolic blood pressure, higher body mass index, lower GFR, lower heart rate, higher creatinine level, longer QRS duration, higher age, and higher diastolic blood pressure were associated with prediction of an IS event.
Model comparison. The comparisons of the AUC of the random forest model and the logistic regression model at 6-months, 1-, 2-, and 5-years are summarized in Table 2. Figure 3 shows a comparison of the RF model and the LR model ROC curves at 4 different timepoints. The random forest model significantly supersedes the logistic regression model at all 4 timepoints in terms of accurately predicting post-PCI IS. In all the pre-defined subgroups, the RF model outperformed the LR model in predicting all 4 primary outcomes of interest (Table 3).
Discussion
Regardless of whether it occurs during the periprocedural phase or in the chronic phase, stroke after PCI can be catastrophic and is associated with considerable morbidity and mortality.3,4,20 Compared to predicting IS events in the chronic phase, the risk factors for peri-PCI procedure stroke have been well established. Previous studies have established that older age, female sex, chronic kidney disease, history of prior CVA, and pre-existing vascular disease are associated with an increased risk of periprocedural stroke after PCI.4,20 However, many of these studies have utilized traditional logistic regression analysis.
In the current study, application of random forest machine learning algorithm outperformed traditional logistic regression analysis in predicting future incidence of ischemic stroke at all recorded timepoints (6-months, 1-, 2-, and 5-years) post PCI. The top 10 variables with highest relative importance include periprocedural ischemic/hemorrhagic stroke, pre-procedural hemoglobin level, pre-procedural troponin level, baseline QTc interval, body mass index, systolic blood pressure, age, and glomerular filtration rate (GFR). In a subgroup analysis, we demonstrated superiority of the RF model across several domains, including patients with age >75 years, female sex, STEMI, cardiogenic shock, NSTEMI, diabetes mellitus, peripheral arterial disease, GFR <60 ml/min, BMI>35 kg/m2, radial access, and femoral access.
To our knowledge, this is the first study utilizing machine learning algorithms to predict short- and long-term risk of ischemic stroke after PCI. Machine learning algorithms have a significant advantage over traditional regression analysis for risk prediction due to the discretization of variables, which allows the identification of non-linear and non-monotonic responses to variables, and thereby minimizes the tedious task of identifying these complex relationships for researchers. Amongst all subgroups, the RF model showed superior accuracy when compared to linear regression. This may relate to the limited performance of traditional risk prediction models in the presence of low event rates.
We illustrated that patients with a history of periprocedural stroke are at a significantly higher risk of developing future ischemic stroke than patients without periprocedural stroke. Patients with prior strokes often have concomitant conditions such as diabetes, hypertension, and vascular disease. In addition, the presence of a significant burden of atherosclerosis in the ascending aorta and arch vessels places these patients at future risk of ischemic stroke from disruption of aortic plaque with diagnostic and guide catheters. A prior study by Keeley demonstrated that disruption and stripping of aortic plaque occurs in greater than 50% of percutaneous procedures.21
Furthermore, an additional potential mechanism for periprocedural stroke is procedural hypotension resulting in cerebral hypoperfusion in patients with significant vascular disease of the arch vessels. This is consistent with previous studies. Hoffman et al have previously reported a significant increase in the risk of periprocedural stroke in patients over the age of 80 with a prior cerebrovascular event.20
The remainder of variables with high relative weight in the RF model included pre-procedural hemoglobin levels, serum troponin levels, body mass index, baseline QTc interval, systolic blood pressure, age, and glomerular filtration rate (GFR). Many of these variables have been previously shown to be associated with periprocedural stroke post-PCI.4-6,22 The significance of these variables in predicting short- and long-term risk of stroke demonstrates the prolonged residual risk of these clinical parameters in patients undergoing PCI. While many of these risks are non-modifiable, it is paramount to incorporate these variables into each patients’ assessment prior to PCI.
Appropriate resource allocation in the current environment of rising costs and increased push for efficiency in the healthcare system makes it paramount to identify patients at high risk of developing stroke, especially given the high cost of in-hospital and post-stroke care. Risk prediction models with high accuracy can assist us in identifying these high-risk patients and develop strategies to mitigate the risk. Some of these measures include close outpatient monitoring and initiation of aggressive preventative therapies.23,24
Machine learning algorithms are proven to improve risk prediction by integrating massive amounts of patient level parameters. This is currently being achieved by data collection from the electronic medical record (EMR), allocation of significant personnel, and implementation of institution-specific protocols to create databases containing patient level information to successfully execute these exhaustive algorithms. However, newer EMR systems are designing automated processes to streamline data collection and reduce the time and resource requirements to create and maintain clinical datasets. Machine learning algorithms can be applied to these datasets to identify outcomes of clinical relevance.
Limitations. The patient population of this study is solely from the Mayo Clinic Health System and as a consequence may be subject to selection bias. Machine learning algorithms were developed based on the unique patient population treated at our quaternary referral centers. Therefore, one should be cautious when generalizing our results to a different population. Our database did not include patients with a history of atrial fibrillation or serum albumin levels, both of which have been associated with ischemic stroke events. The current machine learning models that were developed cannot be applied at the bedside easily. However, integration of ML models with the EMR could improve penetration and usability of the models and help guide clinicians in clinical care. Our study is a primary feasibility study. Prospective external validation is required in the future to determine its generalizability to other patient populations. Additionally, a simplified model without compromising performance can potentially be developed to facilitate future clinical application.
Conclusion
The present study demonstrates that the performance of the random forest model based on machine learning algorithms is superior to the traditional regression-based risk prediction model in predicting short- and long-term risk of ischemic stroke. This was further noted in all the major subgroups of patients undergoing PCI. Future studies must focus on validating our models on large population-based datasets to demonstrate external validity. Incorporation of risk prediction models based on machine learning algorithms into current electronic medical records will improve the utilization and penetration of these models in guiding patient care leading to enhanced patient outcomes and reduction of health care resource utilization.
Impact on daily practice. The RF model accurately predicts short- and long-term risk of IS and outperforms logistic regression analysis in patients undergoing PCI. Patients with periprocedural stroke may benefit from aggressive management to reduce the future risk of IS.
Affiliations and Disclosures
*contributed equally.
From the 1Department of Cardiovascular Medicine, Mayo Clinic Arizona, Scottsdale, Arizona; 2Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, Minnesota; 3Department of Neurology, Mayo Clinic Rochester, Rochester, Minnesota; 4Columbia University, Division of Cardiology at Mount Sinai Medical Center, Miami Beach, Florida.
Disclosure: The authors have completed and returned the ICMJE Form for Disclosure of Potential Conflicts of Interest. The authors report no conflicts of interest regarding the content herein.
Manuscript accepted April 26, 2023.
Address for correspondence: Reza Arsanjani, MD, Associate Professor of Medicine, Director, Echocardiography Lab, Department of Cardiovascular Diseases, Mayo Clinic Arizona, Scottsdale, Arizona. Email: arsanjani.reza@mayo.edu
References
1. Peterson ED, Dai D, DeLong ER, et al. Contemporary mortality risk prediction for percutaneous coronary intervention: results from 588,398 procedures in the National Cardiovascular Data Registry. J Am Coll Cardiol. 2010;55(18):1923-1932. doi:10.1016/j.jacc.2010.02.005
2. Spoon DB, Psaltis PJ, Singh M, et al. Trends in cause of death after percutaneous coronary intervention. Circulation. 2014;129(13):1286-1294. doi:10.1161/circulationaha.113.006518
3. Brener SJ, Tarantini G, Leon MB, et al. Cardiovascular and noncardiovascular death after percutaneous coronary intervention: insights from 32,882 patients enrolled in 21 randomized trials. Circ Cardiovasc Interv. 2018;11(10):e006488. doi:10.1161/circinterventions.118.006488
4. Alkhouli M, Alqahtani F, Tarabishy A, Sandhu G, Rihal CS. Incidence, predictors, and outcomes of acute ischemic stroke following percutaneous coronary intervention. JACC Cardiovasc Interv. 2019;12(15):1497-1506. doi:10.1016/j.jcin.2019.04.015
5. Shoji S, Kohsaka S, Kumamaru H, et al. Stroke after percutaneous coronary intervention in the era of transradial intervention. Circ Cardiovasc Interv. 2018;11(7):e006761. doi:10.1161/circinterventions.118.006761
6. Fuchs S, Stabile E, Kinnaird TD, et al. Stroke complicating percutaneous coronary interventions: incidence, predictors, and prognostic implications. Circulation. 2002;106(1):86-91. doi:10.1161/01.cir.0000020678.16325.e0
7. Muraoka S, Somiya D, Ebata A, Kumagai Y, Koketsu N. Future stroke risk in the chronic phase of post-percutaneous coronary intervention. PLoS One. 2021;16:e0251253. doi:10.1371/journal.pone.0251253
8. Varenne O, Cook S, Sideris G, et al. Drug-eluting stents in elderly patients with coronary artery disease (SENIOR): a randomised single-blind trial. Lancet. 2018;391(10115):41-50. doi:10.1016/s0140-6736(17)32713-7
9. Natsuaki M, Morimoto T, Yamaji K, et al. Prediction of thrombotic and bleeding events after percutaneous coronary intervention: CREDO-Kyoto thrombotic and bleeding risk scores. J Am Heart Assoc. 2018;7. doi:10.1161/jaha.118.008708
10. Mäkikallio T, Holm NR, Lindsay M, et al. Percutaneous coronary angioplasty versus coronary artery bypass grafting in treatment of unprotected left main stenosis (NOBLE): a prospective, randomised, open-label, non-inferiority trial. Lancet. 2016;388(10061):2743-2752. doi:10.1016/s0140-6736(16)32052-9
11. Bønaa KH, Mannsverk J, Wiseth R, et al. Drug-eluting or bare-metal stents for coronary artery disease. N Engl J Med. 2016;375(13):1242-1252. doi:10.1056/nejmoa1607991
12. Rajsic S, Gothe H, Borba HH, et al. Economic burden of stroke: a systematic review on post-stroke care. Eur J Health Econ. 2018;20(1):107-134. doi:10.1007/s10198-018-0984-0
13. Carlo AD. Human and economic burden of stroke. Age Ageing. 2008;38(1):4-5. doi:10.1093/ageing/afn282
14. Lip GYH, Nieuwlaat R, Pisters R, Lane DA, Crijns HJGM. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach. Chest. 2010;137(2):263-272. doi:10.1378/chest.09-1584
15. Gage BF, Waterman AD, Shannon W, Boechler M, Rich MW, Radford MJ. Validation of clinical classification schemes for predicting stroke. JAMA. 2001;285(22):2864. doi:10.1001/jama.285.22.2864
16. Loghmanpour NA, Kormos RL, Kanwar MK, Teuteberg JJ, Murali S, Antaki JF. A Bayesian model to predict right ventricular failure following left ventricular assist device therapy. JACC Heart Fail. 2016;4(9):711-721. doi:10.1016/j.jchf.2016.04.004
17. Arsanjani R, Xu Y, Dey D, et al. Improved accuracy of myocardial perfusion SPECT for detection of coronary artery disease by machine learning in a large population. J Nucl Cardiol. 2013;20(4):553-562. doi:10.1007/s12350-013-9706-2
18. Arsanjani R, Dey D, Khachatryan T, et al. Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population. J Nucl Cardiol. 2014;22(4):877-884. doi:10.1007/s12350-014-0027-x
19. Mitchell C, Rahko PS, Blauwet LA, et al. Guidelines for performing a comprehensive transthoracic echocardiographic examination in adults: recommendations from the American Society of Echocardiography. J Am Soc Echocardiogr. 2018;32(1):1-64. doi:10.1016/j.echo.2018.06.004
20. Hoffman SJ, Holmes DR, Rabinstein AA, et al. Trends, predictors, and outcomes of cerebrovascular events related to percutaneous coronary intervention. JACC Cardiovasc Interv. 2011;4(4):415-422. doi:10.1016/j.jcin.2010.11.010
21. Keeley E, Grines CL. Scraping of aortic debris by coronary guiding catheters. J Am Coll Cardiol. 1998;32(7):1861-1865. doi:10.1016/s0735-1097(98)00497-5
22. Soliman EZ, Howard G, Cushman M, et al. Prolongation of QTc and risk of stroke: The REGARDS (REasons for Geographic and Racial Differences in Stroke) Study. J Am Coll Cardiol. 2012;59(15):1460-1467. doi:10.1016/j.jacc.2012.01.025
23. Rayfield C, Agasthi P, Mookadam F, et al. Machine learning on high-dimensional data to predict bleeding post percutaneous coronary intervention. J Invasive Cardiol. 2020;32(12):E122-E129.
24. Agasthi P, Ashraf H, Pujari SH, et al. Artificial intelligence trumps TAVI2-SCORE and CoreValve Score in predicting 1-year mortality post Transcatheter Aortic Valve Replacement. Cardiovasc Revasc Med. 2020. doi:10.1016/j.carrev.2020.08.010
Related Reading
Novel Prognostic Score for Immediate and Late Success After Percutaneous Mitral Balloon Commissurotomy in Patients With Mitral Stenosis
A Novel Model to Predict 1-Year Mortality in Elective Transfemoral Aortic Valve Replacement: The TAVR-Risk Score
Bilateral Transbrachial Intracardiac Echocardiography-Guided Patent Foramen Ovale Closure in Patient With Bilateral Deep Vein Thrombosis
True Efficacy of LAA Closure: Patient Outcomes on Long-term Single-Antiplatelet or No Therapy: Insights From the EWOLUTION Registry