Multimodality AI: Examples of Implementation in Atrial Fibrillation
© 2024 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of EP Lab Digest or HMP Global, their employees, and affiliates.
Featured is the presentation entitled "Multimodality AI: Examples of Implementation in Atrial Fibrillation" by Tina Baykaner, MD, MPH, at the 2024 Western AF Symposium.
Video Transcript
Tina Baykaner, MD, MPH: Nassir, thank you so much for the invitation. This is my eighth Western AF. As shown from these beautiful photos here, this is the reason we are all banned from the slopes!
This year, I was given the task to talk about multimodal artificial intelligence (AI) and how it can be used in atrial fibrillation (AF). When you hear the term multimodal, the mode refers to the data type that is inputted into a network. When you hear unimodal model, it refers to the input and output being a single modality. It could be text or an image, and the output is usually the same as the input. When you hear the term multimodal model, it means the network is capable of handling different kinds of modalities, that is, inputs into the model. An example of a unimodal network is ChatGPT 3.5, which is free for use. You input a type. For example, in my slide, I used, “Who is Nassir Marrouche?” That is the text input, and the output would be the text that the ChatGPT provides you. That is a unimodal network. An example of a multimodal network that you might be familiar with is Google Gemini, which is Google's version of ChatGPT. So, you can input a photo. On the slide, you can see I asked it to describe this photo, and it gave me an output. The output could be audio, text, an image, or song, whatever it might be.
So, how is it relevant in AF? You heard in the first talk that multimodal models can be perfectly useful for screening, but they can also be useful for trying to tell us which patients benefit from AF ablation. These are the most recent AF trials. You see, there is still a good amount of patients who have recurrence of AF after ablation. Why do we care? If I know perfectly well which patient is going to have absolutely no recurrence after an ablation, I might tailor my clinical follow-up. I might tailor my entire arrhythmic choice, if there is enough data in the future, about how I'll continue oral anticoagulation in those patients. So, I think there is a role in predicting which patients might do well or who might have recurrent AF episodes after an ablation procedure.
Of course, we have tried our best with the clinical prediction models. There is the Apple score. The CHA2DS2-VASc score has received a beating in the 2 days of this meeting. I stole this slide from Dr Eric Prystowsky. Unfortunately, these scores are okay, but their predictive value is not much better than the flip of a coin in trying to predict the patients who would do well. That is where AI comes into the picture. We ideally want a model that is much more accurate than the existing clinical scores to help us in predicting the future.
So, I will go back to the basics. How do these machine learning models work that are multimodal? Ideally, you have a set of patients who have had AF ablation. Here is an example of a 40-year-old man with no scar on their MRI, perfect BMI, who had an ablation, and we know his follow-up. That's what we present to the model with all his multimodal input. We provide all this input, we teach the network that he did well, and you find similar patients who also did not do well. We train the model with all the same inputs. We train the model to tell the model that these patients do not do well after ablation. How do you test the model? How do we figure out if the model is working or not? We find patients that the model has not seen before. We know their real outcomes. We know these patients had recurrence, and we know these patients did not have recurrence, but we give it to the model. We figure out how the model performs, and that is how you get an AUROC metric or all the other metrics that are in a way very similar to the standard statistical metrics that we are familiar with, like positive predictive value or negative predictive value. That is the performance of a network.
Here is a real-life example of such a multimodal network that Sanjiv, in the earlier talk, referred to. These are 321 patients at Stanford who all had AF. They had CT scans before the procedure, and we know who did well and who had recurrence of AF.
We created a model, because it is so hard to manually segment these CTs to automate this left atrial segmentation, and then we designed this model. We incorporated their demographic data into a mini neural network, their raw CT slices into a neural network, their left atrial shape into a different network, and merged them all to interpret and integrate all this multimodal data to train the network. When we tested this network, we see the individual contributions of everything to the multimodal network. The multimodal network performance is the purple one up top, but you see the contribution of the clinical data, which is the blue curve, the left atrial geometry, which is the red curve, and the raw CT slices, which is the green curve, into the final performance of the network. Of course, these are a little better than the existing clinical scores performance.
Similar multimodal networks have been published. This is from the Cleveland Clinic Group as well as the Hopkins group that use CTs as well as MRIs to show similar reasonable performance of predicting patients who do well and who do not do well after catheter ablation of the AF.
How about what the Tulane group presented earlier? For patients who are already in the cath lab, now you have access to signals from the heart. You have access to so much more. Can we do more with this data to predict who does well?
Here is a smaller study and a smaller model we devised from 156 patients with AF who had ablation. We had access to their ECGs and sinus rhythm anywhere a year before the procedure, and we had access to their intracardiac electrograms during the procedure. Again, we devised a similar model that trained the EGMs that trained ECGs, that incorporated clinical data and tested its accuracy. So, you see, even just a 12-lead ECG can add almost 12 points to the AUROC of the CHA2DS2-VASc score’s predictive value, and when you create your multimodal network, you can add even 9 more points to the AUROC of the individual.
Here is a similar graph showing the individual contributions. The green ones are the electrogram, the biosignals, I should say, the 12-lead ECG, and the intracardiacs. The purple is the demographics, and the red is the performance of the multimodal network.
So, in summary, multimodal AI models are well-suited for AF, where we have demographics, where we have clinical risk factors, we have lots of imaging data, signal modalities that we can all integrate into one model that might perform better. I can argue that for predicting the success of catheter ablation in given patients, these models already work much better than the existing clinical score. As the Tulane group presented, maybe incorporating smarter inputs like your ablation sets, like the scar, like the voltage distribution, the activation, or the strain, can hopefully improve the performance of these networks. But I should be cautious, because yesterday I was chatting with the AI engineer from the Tulane group, and he educated me that the unimodal network they had for a specific task, just the ECG, worked the same as ECG plus all the clinical factors. So, in a way, it's our job to test the performance of multiple models, unimodal and multimodal, to see what actually works for that specific test we're giving. Of course, larger datasets, external validation, these are all in the future. Thank you so much.
The transcripts have been edited for clarity and length.